A | B |
Individual | objects described by set of data |
Variable | the characteristics of individual |
Categorical / Qualatative | separates individuals into categories |
Quantitative Variable | numerical values |
To describe distribution is to | describe Shape, Center and Spread |
Outlier | any data that falls outside the overall pattern (to prove, Q3+(1.5*IQR) < x or x < Q1-(1.5*IQR)) |
Symmetric | if the right and left are mirror image of each other |
Skewed to the right | tail to the right |
Ogive | a Relative cumulative frequency graph |
IQR | Q3 - Q1 |
Five Number Summary | Minimum, Q1, M, Q3, Maximum |
Boxplot | Label, minimum, Q1, Q3, M, MAX |
Modified Boxplot | a boxplot that excludes outliers |
Variance | (standard deviation) power 2 |
Properties of Standard Deviation | 1) measures spread about the mean when mean is chosen as center, 2) s = 0 when no spread, 3) s is not resistant |
Median and IQR are | resistant |
mean, SD, variance, range are not | resistant |
Response Variable | measures an outcome of a study |
Explanatory Variable | attempts to explain the observed outcome |
Scatterplot | shows relationship b/t two quantitative variables |
Examining scatterplot 3 ways | Form (straight or curved line), Direction+/-, Strength (strong?weak?mod?) |
Association | Postive Association or Negative Association |
Correlation | measure direction and strength, R |
SST | sum of (y-mean of y)2 |
SSE | sum of (y- y hat)2 |
R squared(2) | (SST-SSE)/(SST) |
What is coefficient of determination | r2 (the fraction of variation in values of y) |
LSRL | always passes (mean of x, mean of y) |
Residual | observed y - predicted y |
Mean of LSR is? | 0 |
residual plot | increasing spread, curved pattern, linear |
Outlier | if it is removed, the r2 will improve |
Influential Observation | if it is removed, the r2 will aggravate |
Definition of Density Curve | 1) is y > 0 2) area of 1 |
In a skewed graph, which is closer to the tail; mean or median | mean |
In a normal curve mean and median are? | equal |
Median | a point that divides the area equally |
Inflection Points | 1SD of mean |
the Emperical Rule | 68%, 95%, 99.7% |
Standardized Value | z = (x-average)/(SD) |
Assessing Normality | 1) constructing frequency histogram or stemplot (to see if they are symmetric about mean) 2) make normal probability plot |
Monotonic Function | f(t) moves in one direction when t increases |
Concavity | Concave up, down |
log(A*B) | LOG A + LOG B |
log(A/B) | LOG A - LOG B |
log x^p | p log X |
Exponential model | 1.06^x |
Power model | x^1.06 |
correlation and LSRL are resistant? | NO |
Extrapolation | predicting another data |
lurking variable | a variable that is not explanatory or reponse |
Causation | one cause other (not necessarily 100% causation) |
Common Response | the variables x and y are explained by z |
Confounded | effects on reponse variable |
Marginal Distribution | depending on the margin |
roundoff error | when the numbers dont add up due to rounding error |
Conditional Distribution | only satisfied the condition |
Simpsons paradox | when the direction of comparison is REVERSED by combining them |
Observational study | no influence on outcome |
experiment | CRITICAL influence on the outcome |
voluntary response sample | who chooses to appeal |
convenience sampling | easiest to reach |
biased | if it systematically favors certain outcome |
SRS | set of individuals has an equal chance to be selected |
stratified random sample | pre determines the group called strata |
Undercoverage | when some groups are left out of survey |
Nonreponse | individuals cant be contacted or does not cooperate |
wording effects | bad wording (quesitons) |
Statistically signiificant | an observed effect so large that it would rarely occur by chance |
Double-blind | both dont know about experiment |
lack of realism | when they know its a experiment |
matched pair design | put two on one |
5 SIMULATION STEPS | 1) state the problem. 2) state the assumptions and INDEPENDENCY, 3) represent outcomes, 4) simulate, 5) conclude |
randomness | regular distribution of outcomes come in a large number reps |
sample space S | set of all possible outcomes |
Event | any outcome |
Probability Model | sample space S and a way of assigning probabilities to events |
empty event | disjoint |
Define Independent | 1) P(A and B) = P(A)P(B) or 2) P(B|A) = P(B) |
Disjoint events are NOT | INDEPENDENT |
Conditional Probability | P(A|B) = whats the probability of A GIVEN THAT B occured |
P(B|A) | P(A∩B)/P(A) |
Discrete random varibale X | has countable number of possible values |
probability distribution X | lists the vales and their probabilities |
Continuous random variable X | every number is so small that it needs a range |
MEAN of a random variable | Sum of (Event X x probability) |
variance | S^2 = (x - mean)^2 probability |
law of large numbers | when simulated so many times, they become stable |
if X and Y are independent, | S(X+Y)^2 = S^2 X + S^2 Y |
variance of independent random variables | X and Y with correlation S(X+Y)^2 = S^2 X + S^2 Y + 2psXsY |