Rucete ✏ AP Statistics In a Nutshell
8. Inference for Categorical Data: Chi-Square
This chapter introduces the chi-square statistic, which measures differences between observed and expected counts, and covers three chi-square tests: goodness-of-fit, independence, and homogeneity.
Chi-Square Test for Goodness-of-Fit
• Used to determine whether an observed distribution of a categorical variable matches a claimed distribution.
• Hypotheses:
• H₀: The observed distribution fits the expected distribution.
• Ha: The observed distribution does not fit the expected distribution.
• Chi-square statistic: χ² = Σ[(Observed − Expected)² / Expected].
• The larger the χ², the more evidence against H₀.
• Degrees of freedom: df = number of categories − 1.
• P-value: Probability of getting a χ² as extreme or more extreme if H₀ is true.
Conditions for Chi-Square Goodness-of-Fit Test
• Random sample required.
• 10% condition if sampling without replacement.
• All expected counts should be greater than 5.
Properties of the Chi-Square Distribution
• Only positive values (χ² ≥ 0).
• Skewed right; becomes more normal-like as degrees of freedom increase.
• The χ²-distribution is continuous, even though it is applied to discrete data.
Calculating Expected Counts
• For goodness-of-fit: Expected count = (sample size) × (claimed proportion).
Example Interpretation (Goodness-of-Fit)
• If P-value is small (e.g., less than 0.05), reject H₀: there is convincing evidence that the observed distribution differs from the expected.
• If P-value is large, fail to reject H₀: no convincing evidence of a difference.
Chi-Square Test for Independence
• Used to determine whether there is an association between two categorical variables in a single population.
• Hypotheses:
• H₀: The two variables are independent.
• Ha: The two variables are associated (not independent).
Chi-Square Test for Homogeneity
• Used to determine whether the distribution of a categorical variable is the same across different populations or groups.
• Hypotheses:
• H₀: The distributions are the same across populations.
• Ha: The distributions are different across populations.
Common Steps for Chi-Square Tests (Independence and Homogeneity)
• Set up hypotheses clearly.
• Check conditions:
• Random sample(s) or random assignment.
• 10% condition if sampling without replacement.
• All expected counts greater than 5.
• Calculate expected counts for each cell:
• Expected count = (row total × column total) / overall total.
• Calculate the chi-square statistic:
• χ² = Σ[(Observed − Expected)² / Expected].
• Degrees of freedom:
• df = (number of rows − 1) × (number of columns − 1).
• Find P-value using chi-square distribution with calculated df.
• Make a conclusion based on the P-value and significance level.
Interpreting the Results
• Small P-value → strong evidence against H₀ → conclude association or difference exists.
• Large P-value → fail to reject H₀ → no convincing evidence of association or difference.
Comparing Chi-Square Tests
• Goodness-of-Fit: One categorical variable compared to a specified distribution.
• Independence: Two categorical variables measured on the same individuals.
• Homogeneity: One categorical variable compared across multiple populations or groups.
Important Notes About Chi-Square Tests
• The chi-square statistic is always positive.
• Large residuals (Observed − Expected) indicate cells that contribute most to the chi-square statistic.
• Standardized residuals can be used to identify specific cells with large deviations.
In a Nutshell
Chi-square tests assess how observed categorical data compare to expected distributions. The goodness-of-fit test evaluates a single distribution, the test for independence evaluates the relationship between two variables, and the test for homogeneity compares distributions across groups. Careful checking of conditions ensures the validity of conclusions about associations or differences in categorical data.