Rucete ✏ AP Statistics In a Nutshell
9. Inference for Quantitative Data: Slopes
This chapter introduces inference procedures for slopes of least squares regression lines, including confidence intervals, hypothesis tests, sampling distributions, and conditions necessary for valid inference.
Sampling Distribution for the Slope
• When certain conditions are met, the sampling distribution of the sample slope b is approximately normal with:
• Mean: β (the true slope).
• Standard deviation: σ / (σx√n).
• Using sample data, we estimate σ with s (standard deviation of residuals) and σx with sx (standard deviation of x-values), leading to a t-distribution with df = n − 2.
Conditions for Inference on the Slope
• Linearity: The true relationship between x and y is linear (check scatterplot and residual plot).
• Independence: Data come from a random sample or random assignment; use 10% condition if sampling without replacement.
• Normality: For each x, the distribution of y is approximately normal (check histogram, dotplot, or n ≥ 30).
• Equal Standard Deviations (Equal SD): The spread of residuals is roughly constant across x-values (check for no fanning in residual plot).
• Random: The data must be collected randomly.
Understanding Standard Error of the Slope
• Standard error of the slope (SE(b)) estimates how much sample slopes vary around the true slope.
• SE(b) is typically provided in computer regression output.
• Larger spread in residuals (higher s) increases SE(b); larger spread in x-values (higher sx) or larger n decreases SE(b).
Generic Computer Output for Regression
• Coef: Contains y-intercept (a) and slope (b).
• SE Coef: Contains standard errors of the intercept and slope.
• T-statistic and P-value are provided for hypothesis tests on coefficients.
• R-Squared (r²) indicates the proportion of variability in y explained by x.
Reading Regression Output
• Identify slope, standard error of slope, degrees of freedom (n − 2), t-statistic, and P-value.
• Understand that reported P-values are for two-sided tests unless otherwise noted.
Constructing a Confidence Interval for the Slope
• Formula: b ± t*(SE(b)), where:
• b = sample slope,
• SE(b) = standard error of the slope,
• t* = critical value from the t-distribution with df = n − 2 for the desired confidence level.
• Interpretation must include the confidence level, slope, and context of the variables.
Performing a Hypothesis Test for the Slope
• Hypotheses:
• H₀: β = 0 (no linear relationship between x and y).
• Ha: β ≠ 0, β > 0, or β < 0, depending on the research question.
• Test statistic:
• t = (b − 0) / SE(b).
• Use t-distribution with df = n − 2 to find the P-value.
• A small P-value provides strong evidence that there is a linear relationship between x and y.
Steps for Inference About the Slope
• Check conditions: linearity, independence, normality, equal standard deviations, and randomness.
• Identify slope and standard error from regression output.
• Compute t-statistic and P-value.
• Make a conclusion based on the P-value and the context of the problem.
Important Interpretation Tips
• A statistically significant slope means that changes in x are associated with changes in y, but not necessarily that x causes y.
• Context is crucial when interpreting confidence intervals and significance test results.
• Be careful with extrapolation: Predictions outside the range of observed x-values are unreliable.
Common Pitfalls to Avoid
• Not checking residual plots for linearity and equal standard deviations.
• Confusing correlation with causation.
• Ignoring the possibility of lurking variables affecting the observed relationship.
In a Nutshell
Inference for regression slopes allows estimation and testing of the strength of linear relationships between quantitative variables. The t-distribution accounts for sample variability when σ is unknown. Careful checking of conditions, correct interpretation of confidence intervals and P-values, and attention to study design are essential for valid conclusions about population relationships based on sample data.