| A | B |
| Individual | objects described by set of data |
| Variable | the characteristics of individual |
| Categorical / Qualatative | separates individuals into categories |
| Quantitative Variable | numerical values |
| To describe distribution is to | describe Shape, Center and Spread |
| Outlier | any data that falls outside the overall pattern (to prove, Q3+(1.5*IQR) < x or x < Q1-(1.5*IQR)) |
| Symmetric | if the right and left are mirror image of each other |
| Skewed to the right | tail to the right |
| Ogive | a Relative cumulative frequency graph |
| IQR | Q3 - Q1 |
| Five Number Summary | Minimum, Q1, M, Q3, Maximum |
| Boxplot | Label, minimum, Q1, Q3, M, MAX |
| Modified Boxplot | a boxplot that excludes outliers |
| Variance | (standard deviation) power 2 |
| Properties of Standard Deviation | 1) measures spread about the mean when mean is chosen as center, 2) s = 0 when no spread, 3) s is not resistant |
| Median and IQR are | resistant |
| mean, SD, variance, range are not | resistant |
| Response Variable | measures an outcome of a study |
| Explanatory Variable | attempts to explain the observed outcome |
| Scatterplot | shows relationship b/t two quantitative variables |
| Examining scatterplot 3 ways | Form (straight or curved line), Direction+/-, Strength (strong?weak?mod?) |
| Association | Postive Association or Negative Association |
| Correlation | measure direction and strength, R |
| SST | sum of (y-mean of y)2 |
| SSE | sum of (y- y hat)2 |
| R squared(2) | (SST-SSE)/(SST) |
| What is coefficient of determination | r2 (the fraction of variation in values of y) |
| LSRL | always passes (mean of x, mean of y) |
| Residual | observed y - predicted y |
| Mean of LSR is? | 0 |
| residual plot | increasing spread, curved pattern, linear |
| Outlier | if it is removed, the r2 will improve |
| Influential Observation | if it is removed, the r2 will aggravate |
| Definition of Density Curve | 1) is y > 0 2) area of 1 |
| In a skewed graph, which is closer to the tail; mean or median | mean |
| In a normal curve mean and median are? | equal |
| Median | a point that divides the area equally |
| Inflection Points | 1SD of mean |
| the Emperical Rule | 68%, 95%, 99.7% |
| Standardized Value | z = (x-average)/(SD) |
| Assessing Normality | 1) constructing frequency histogram or stemplot (to see if they are symmetric about mean) 2) make normal probability plot |
| Monotonic Function | f(t) moves in one direction when t increases |
| Concavity | Concave up, down |
| log(A*B) | LOG A + LOG B |
| log(A/B) | LOG A - LOG B |
| log x^p | p log X |
| Exponential model | 1.06^x |
| Power model | x^1.06 |
| correlation and LSRL are resistant? | NO |
| Extrapolation | predicting another data |
| lurking variable | a variable that is not explanatory or reponse |
| Causation | one cause other (not necessarily 100% causation) |
| Common Response | the variables x and y are explained by z |
| Confounded | effects on reponse variable |
| Marginal Distribution | depending on the margin |
| roundoff error | when the numbers dont add up due to rounding error |
| Conditional Distribution | only satisfied the condition |
| Simpsons paradox | when the direction of comparison is REVERSED by combining them |
| Observational study | no influence on outcome |
| experiment | CRITICAL influence on the outcome |
| voluntary response sample | who chooses to appeal |
| convenience sampling | easiest to reach |
| biased | if it systematically favors certain outcome |
| SRS | set of individuals has an equal chance to be selected |
| stratified random sample | pre determines the group called strata |
| Undercoverage | when some groups are left out of survey |
| Nonreponse | individuals cant be contacted or does not cooperate |
| wording effects | bad wording (quesitons) |
| Statistically signiificant | an observed effect so large that it would rarely occur by chance |
| Double-blind | both dont know about experiment |
| lack of realism | when they know its a experiment |
| matched pair design | put two on one |
| 5 SIMULATION STEPS | 1) state the problem. 2) state the assumptions and INDEPENDENCY, 3) represent outcomes, 4) simulate, 5) conclude |
| randomness | regular distribution of outcomes come in a large number reps |
| sample space S | set of all possible outcomes |
| Event | any outcome |
| Probability Model | sample space S and a way of assigning probabilities to events |
| empty event | disjoint |
| Define Independent | 1) P(A and B) = P(A)P(B) or 2) P(B|A) = P(B) |
| Disjoint events are NOT | INDEPENDENT |
| Conditional Probability | P(A|B) = whats the probability of A GIVEN THAT B occured |
| P(B|A) | P(A∩B)/P(A) |
| Discrete random varibale X | has countable number of possible values |
| probability distribution X | lists the vales and their probabilities |
| Continuous random variable X | every number is so small that it needs a range |
| MEAN of a random variable | Sum of (Event X x probability) |
| variance | S^2 = (x - mean)^2 probability |
| law of large numbers | when simulated so many times, they become stable |
| if X and Y are independent, | S(X+Y)^2 = S^2 X + S^2 Y |
| variance of independent random variables | X and Y with correlation S(X+Y)^2 = S^2 X + S^2 Y + 2psXsY |