| A | B |
| units | (rows) individuals in a study are represented by units. Also known as the "subjects" of "who" of a study |
| variables | (columns) aspects of the units that are being measured. Also known as the "what" of a study |
| sample | units that are actually measured in teh data |
| samling frame | consists of all the units in a population (group to apply an inference to) that could possibly be sampled |
| types of variables: categorical | measurement places unit into a category (ex. male or female) |
| types of variables: quantitative (numeric) | variables use a number (ex. asking subject hours of tv they watch per-week) |
| types of samples: simple random sample | taken in such a way that each unit has the same chance of being included in the data |
| types of samples: convenience sample | means you only get units that are easy to attain (ex. people in just one spot on campus) |
| sampling bias | means sampling method systematically causes some units o have less chance of inclusion, and thus is not truly random |
| non-response bias | kind of sampling bias where units that DON'T respond may be different in some way than those who do |
| voluntary response bias | kind of sampling bias where people who DO respond may be different in some way than those who don't (ex. those who feel they've been treated unfairly are more likely to respond to survey) |
| types of samples: systematic sample | ex. just take every 10th person who walks in a room (less bias than a convenience sample if you're just pulling from a list) |
| types of samples: stratified sample | break population into strata and do a random sample within each strata that is proportionate (ex if there are 3000 kids and 1000 are freshmen, your survey of 100 kids should include 33 freshmen, because 1000/3000 = 33) |
| types of samples: cluster sample | helps get larger sample more easily. If units naturally occur in clusters, then you can randomly select clusters and include all units in cluster (ex. sample 5 classes at UNC-A) |
| types of samples: multi-stage sample | (cluster is a type of this) you take a sample within a cluster and don't include all units |
| types of samples: census | sample every single unit in the population |
| predictor variable | Variable that is predicted to impact a response variable, but may not always be the cause of the response variable due to lurking variables |
| types of studies: observational | levels of predictor variable occur naturally |
| types of studies: designed experiment | levels of predictor variable are assigned by an experimenter (this is a better way to find out if there's causation) |
| considerations of a designed study: | Placebo (solution: blind study); Ethics (can't give inferior treatment knowingly); Randomization; Control (try to control variables); Replication; Blocking (divide units into blocks of things that may impact response, like gender) |
| matched pairs design | a common way of blocking, involves giving each unit both treatments |
| ways to picture relationships between 2 categorical variables | do a contingency table with the predictor variable in the row; come up with a conditional table for each combination of predictor level and response level, find percent of given predictor level in response table |
| ways to picture 1 categorical variable | Make: a table, a pie chart, or a histogram |
| what constitutes "center"? | mean and the median |
| what constitutes "spread"? | IQR and standard deviation |
| what constitutes "shape" | skew or outliers (best shown by graphing) |
| which way is it skewed? | if the data is positively skewed, there is a tail on the right, if the data is negatively skewed there will be a tail on the left (if there is a tail to the let, median will be larger than mean) |
| how to get the range | the range is the highest value - the lowest value. it is easily influenced by outliers |
| best way to calculate Q3? | calculate the median (M) and then use it to split the data in half. Them Q3 will be the median of the upper half of your data and Q1 will be the median of your lower data. if the # of data points (n) is odd, put median in both upper and lower halves |
| how to get the upper and lower fences in box plots | the upper fence will be Q3 + (1.5 x IQR); the lower fence will be Q1 - (1.5 x IQR). Extend whiskers to the highest and lowest values that fall within the fence, and things that fall outside of it will be outliers marked with an * |
| to get the z score: | z = your variable score - sample mean / standard deviation |
| what happens when we shift all data points by a given amount? | the mean and median will both be shifted by "a" units, but the IQR and standard deviation will stay the same (ie. center changes, spread stays the same) |
| if your z score for a test is positive, are you above or below average? | a positive z score indicates that you are ABOVE average, and negative z score would be below average (mean z = 0; standard deviation of z = 1) |
| empirical rule (normal model) | About 68% of units will fall within 1 standard deviation of the mean. 95% will fall within 2 standard devs. |
| for correlation between two variables: | both x and y must be numeric (quantitative), and the points need to fit a line approximately |
| what correlation will we get as x tends to increase? | we'll get negatie correlation because as x increases, y tends to decrease. -1 is the lowest correlation and 1 is the highest correlation (negative goes down from left to right, positive goes up form left to right) |
| if a plot has data points in a parabola shape, is there correlation? | NO. There is no correlation in a parabola graph, however, there may still be a RELATIONSHIP. There just wont be a linear relationship. |
| What does the shotgun effect look like? | a random cluster of data points on a plot. add one strange outlier and you'll have a lollipop effect. In this case there is no correlation and no relationship between x and y |
| what does the dumbbell effect look like? | there are two clusters of data points that may correlate once a line is drawn through them, but the groups themselves could be causing this due to lurking variables in grouping |
| in a blind study... | the units don't know what treatment they are going to receive (usually use a placebo to throw off placebo effect). in a double blind study, both subjects AND evaluators aren't aware of the treatment that they're getting or observing |