Rucete ✏ AP Statistics In a Nutshell
5. Sampling Distributions
This chapter introduces sampling distributions, the central limit theorem (CLT), properties of sample proportions and means, and how statistics vary across different samples, forming predictable distributions as sample size grows.
Understanding Sampling Distributions
• A sample statistic (e.g., sample mean, sample proportion) varies from sample to sample.
• The distribution of sample statistics from repeated random sampling is the sampling distribution.
• A population parameter (e.g., μ, p) is fixed, but sample statistics fluctuate.
• Larger samples tend to have more predictable behavior; many sampling distributions are approximately normal for large n.
Normal Distribution Calculations
• Use z-scores to find probabilities: z = (value − mean) / standard deviation.
• Calculator functions:
• normalcdf(lower bound, upper bound, mean, SD) calculates area (probability).
• invNorm(probability, mean, SD) finds the value corresponding to a given cumulative probability.
• Sketching normal curves helps avoid errors in calculations.
Sampling Distribution of a Sample Proportion
• Conditions:
• Random sample.
• Independence: n ≤ 10% of population.
• Large sample: np ≥ 10 and n(1 − p) ≥ 10.
• If conditions met, sampling distribution of sample proportion is approximately normal with:
• Mean: p.
• Standard deviation: √(p(1 − p)/n).
Sampling Distribution of a Sample Mean
• For any population:
• Mean of sampling distribution: μ.
• Standard deviation: σ/√n.
• If the population is normal, the sampling distribution is normal for any n.
• If the population is not normal, the Central Limit Theorem (CLT) states the sampling distribution becomes approximately normal if n ≥ 30.
Central Limit Theorem (CLT)
• Regardless of the population’s shape, the sampling distribution of sample means is approximately normal for large sample sizes (n ≥ 30).
• Larger samples produce distributions closer to normality.
• Averages vary less than individual observations.
Bias and Variability in Estimators
• An estimator is unbiased if its sampling distribution is centered at the true parameter.
• Sample proportions, sample means, and sample slopes are unbiased estimators.
• Variability decreases as sample size increases.
Sampling Distribution of the Difference of Sample Proportions
• When comparing two sample proportions (p̂₁ − p̂₂):
• Mean: p₁ − p₂.
• Standard deviation: √((p₁(1 − p₁)/n₁) + (p₂(1 − p₂)/n₂)).
• Approximate normality conditions:
• Random samples from each population.
• Independence within and between samples (10% condition for each).
• Large counts for both groups: n₁p₁, n₁(1 − p₁), n₂p₂, n₂(1 − p₂) ≥ 10.
Sampling Distribution of the Difference of Sample Means
• When comparing two sample means (x̄₁ − x̄₂):
• Mean: μ₁ − μ₂.
• Standard deviation: √((σ₁²/n₁) + (σ₂²/n₂)).
• If both populations are normal, the sampling distribution is normal for any sample size.
• If not normal, normality is approximate if both n₁ and n₂ ≥ 30 (by CLT).
Simulation to Estimate Sampling Distributions
• Simulations help visualize sampling variability when theory is complex.
• Steps for simulation:
• Describe the process and random mechanism clearly.
• Simulate many samples.
• Calculate the sample statistic for each sample.
• Analyze the distribution of the statistics (center, spread, shape).
Why Larger Samples Are Better
• Larger samples have smaller variability in estimates.
• Larger n produces narrower sampling distributions, leading to more precise estimates.
• Sample size is more important than population size (as long as the 10% condition is met).
Sample Size vs Population Size
• The population size has little effect if it is much larger than the sample (at least 10 times larger).
• Focus is on absolute sample size rather than proportion of the population sampled.
In a Nutshell
Sampling distributions describe how sample statistics vary from sample to sample. For large samples, distributions of sample means and proportions tend to be normal due to the Central Limit Theorem. Understanding the behavior of sampling distributions allows for correct estimation of variability, enables valid inference, and helps distinguish between random variation and true differences in populations.