Inference for Categorical Data: Proportions ✏ AP Statistics

byRUCETE -September 26, 2025

0

Rucete ✏ AP Statistics In a Nutshell

6. Inference for Categorical Data: Proportions

This chapter introduces methods to estimate population proportions using confidence intervals, test hypotheses about population proportions, and understand errors and power in significance testing.

Basics of Statistical Inference

• We can never know the exact population parameter; we estimate it using sample statistics.

• Sampling error (variability among samples) is unavoidable but can be quantified.

• Confidence intervals allow statements about plausible ranges for parameters, with associated margins of error.

• Confidence levels (e.g., 90%, 95%, 99%) describe the success rate of the method, not the probability for a particular interval.

Margin of Error vs. Standard Error

• Margin of error = critical value × standard error.

• Standard error measures typical variation; margin of error scales this based on desired confidence level.

• Larger samples decrease margin of error.

Conditions for Inference on Proportions

• Independence: Random sampling or random assignment required. Use 10% rule for sampling without replacement (n ≤ 10% of population).

• Normality: Use normal approximation if both np and n(1 − p) are at least 10 (Large Counts condition).

Constructing Confidence Intervals for a Population Proportion

• Confidence interval: p̂ ± z*(√(p̂(1 − p̂)/n)).

• Interpretations must include the confidence level, population, and parameter.

• Common misconception: Confidence level does not mean the probability that a specific interval contains the parameter.

Determining Sample Size for Desired Margin of Error

• To achieve a desired margin of error, increase the sample size.

• Conservative estimate for unknown p: Use p = 0.5 to maximize variability.

• Required sample size: n ≥ (z*/m)² × p*(1 − p*).

Significance Testing for a Population Proportion

• Null hypothesis H₀: p = p₀ vs. Alternative hypothesis Ha: p ≠ p₀, p > p₀, or p < p₀.

• Check conditions (randomization, 10% condition, Large Counts) using p₀.

• Test statistic: z = (p̂ − p₀) / √(p₀(1 − p₀)/n).

• P-value: Probability of getting a sample statistic as extreme or more extreme than observed, assuming H₀ is true.

Interpreting P-values and Significance

• Small P-value → strong evidence against H₀ → reject H₀.

• Large P-value → weak evidence → fail to reject H₀.

• Never "accept" H₀; only fail to reject.

Errors in Significance Testing

• Type I Error (α): Rejecting a true H₀.

• Type II Error (β): Failing to reject a false H₀.

• Power of a test: Probability of correctly rejecting a false H₀ (1 − β).

• Increasing sample size, significance level, or true effect size increases power.

Inference for the Difference of Two Proportions

• When comparing two population proportions (p₁ and p₂), use sample proportions (p̂₁ and p̂₂) from two independent random samples.

• Conditions for inference:

• Random samples or random assignment.

• 10% condition for each sample.

• Large counts: n₁p̂₁, n₁(1 − p̂₁), n₂p̂₂, and n₂(1 − p̂₂) are all at least 10.

Constructing a Confidence Interval for p₁ − p₂

• Formula: (p̂₁ − p̂₂) ± z*(√((p̂₁(1 − p̂₁)/n₁) + (p̂₂(1 − p̂₂)/n₂))).

• Interpretation must include the confidence level, difference in proportions, and population context.

Hypothesis Test for p₁ − p₂

• Null hypothesis H₀: p₁ − p₂ = 0 (or equivalently, p₁ = p₂).

• Alternative hypotheses could be two-sided (p₁ ≠ p₂) or one-sided (p₁ > p₂ or p₁ < p₂).

• Test statistic:

• z = (p̂₁ − p̂₂) / √(p̂(1 − p̂)(1/n₁ + 1/n₂)), where p̂ is the pooled sample proportion:

• p̂ = (x₁ + x₂) / (n₁ + n₂).

Steps in Two-Proportion Z-Test

• State hypotheses clearly (H₀ and Ha).

• Check conditions for inference (randomness, 10% condition, Large Counts).

• Calculate test statistic z and P-value.

• Make a conclusion based on comparison of P-value to significance level α.

Interpreting Confidence Intervals vs. Significance Tests

• Confidence intervals estimate a plausible range of values for the parameter (difference in proportions).

• Significance tests assess the evidence against a specific null hypothesis about the parameter.

• Confidence intervals and two-sided tests at significance level α match with confidence level (1 − α).

Choosing Between One-Proportion and Two-Proportion Procedures

• Use one-proportion methods when analyzing a single group.

• Use two-proportion methods when comparing two groups based on independent samples.

When Procedures Are Robust

• When conditions are slightly violated (e.g., counts slightly below 10), methods are often still approximately valid, but caution is needed.

• Simulation-based methods can provide better estimates if conditions are severely violated.

In a Nutshell

Inference for proportions allows estimation and testing of population parameters based on sample data. Confidence intervals provide plausible ranges, while significance tests assess evidence against hypotheses. Proper checking of conditions (randomness, independence, large counts) ensures validity. Understanding Type I and II errors, and properly interpreting results, leads to sound statistical conclusions about populations.

Tags: AP Statistics In a Nutshell