Rucete ✏ AP Statistics In a Nutshell
7. Inference for Quantitative Data: Means
This chapter covers inference procedures involving population means, including the use of t-distributions when the population standard deviation is unknown, construction and interpretation of confidence intervals, significance testing, and understanding Type I and II errors and power.
The t-Distribution
• Use the t-distribution when the population standard deviation σ is unknown.
• The t-distribution is symmetric and bell-shaped but has heavier tails than the normal distribution.
• Degrees of freedom (df) = n − 1; as df increases, the t-distribution approaches the normal distribution.
• Technology (tcdf, invT) is typically used for t-distribution calculations.
Conditions for Using the t-Distribution
• Random sample required.
• 10% condition if sampling without replacement.
• Either the population is approximately normal, or for large samples (n ≥ 30), the CLT allows use of the t-distribution.
• For small samples, check for no strong skewness or outliers (using dotplots, stemplots, or histograms).
Confidence Intervals for a Population Mean
• Formula: x̄ ± t*(s/√n).
• Standard error: s/√n.
• Confidence level reflects long-term success rate of the method.
• Increasing the sample size or lowering the confidence level narrows the interval.
Factors Affecting the Width of Confidence Intervals
• Larger sample size → narrower interval.
• Higher confidence level → wider interval.
• Smaller sample standard deviation → narrower interval.
Significance Test for a Mean
• Null hypothesis H₀: μ = μ₀; Alternative hypothesis Ha: μ ≠ μ₀, μ > μ₀, or μ < μ₀.
• Test statistic: t = (x̄ − μ₀) / (s/√n).
• P-value interpretation: Probability of observing a result as extreme or more extreme assuming H₀ is true.
Errors in Hypothesis Testing
• Type I Error (α): Rejecting a true null hypothesis.
• Type II Error (β): Failing to reject a false null hypothesis.
• Power: Probability of correctly rejecting a false null hypothesis (1 − β).
• Power increases with larger sample size, higher α, larger effect size, and reduced variability.
Effect of Sample Size and Confidence Level
• Larger samples decrease variability and increase power.
• Higher confidence levels require wider intervals.
• Higher α (significance level) increases power but also increases risk of Type I error.
Inference for Two Independent Means
• When comparing two population means (μ₁ and μ₂), use two independent random samples.
• Conditions:
• Independent random samples or random assignment.
• 10% condition for each sample if sampling without replacement.
• Approximately normal population or large sample sizes (n₁ ≥ 30 and n₂ ≥ 30).
Confidence Interval for μ₁ − μ₂
• Formula: (x̄₁ − x̄₂) ± t*(√((s₁²/n₁) + (s₂²/n₂))).
• Degrees of freedom determined by technology or using conservative method (smaller of n₁ − 1 and n₂ − 1).
• Interpretation must include the difference between the population means and confidence level.
Hypothesis Test for μ₁ − μ₂
• Null hypothesis H₀: μ₁ − μ₂ = 0.
• Alternative hypotheses: μ₁ − μ₂ ≠ 0, μ₁ − μ₂ > 0, or μ₁ − μ₂ < 0.
• Test statistic:
• t = (x̄₁ − x̄₂) / √((s₁²/n₁) + (s₂²/n₂)).
• Calculate P-value based on the t-distribution with appropriate degrees of freedom.
Inference for Paired Data (Matched Pairs Design)
• Use matched pairs when data are naturally paired (e.g., before-and-after measurements).
• Analyze the differences for each pair (d = x₁ − x₂) and perform inference on the mean difference μd.
Confidence Interval for Paired Data
• Formula: d̄ ± t*(sd/√n).
• Where d̄ is the mean of differences, sd is the standard deviation of differences, and n is the number of pairs.
Hypothesis Test for Paired Data
• Null hypothesis H₀: μd = 0 (no difference).
• Alternative hypotheses: μd ≠ 0, μd > 0, or μd < 0.
• Test statistic: t = (d̄ − 0) / (sd/√n).
Simulation Techniques for Inference
• When conditions for using t-procedures are questionable (e.g., small n, non-normal data), simulations can be used to approximate P-values or construct confidence intervals.
• Randomization tests and bootstrapping are common simulation-based methods.
Choosing Between One-Sample and Two-Sample Methods
• One-sample procedures are used for a single group compared to a known value.
• Two-sample procedures are used to compare two independent groups.
• Paired procedures are used when data are matched or dependent.
In a Nutshell
Inference for means uses t-distributions to construct confidence intervals and conduct significance tests when population standard deviations are unknown. One-sample, two-sample, and paired data procedures allow flexible modeling of real-world data. Proper checking of conditions, understanding of errors, and careful interpretation ensure valid conclusions about population means and mean differences.