Rucete ✏ AP Statistics In a Nutshell
3. Collecting Data
This chapter introduces the key concepts in designing and conducting studies, including types of studies, bias in data collection, and sampling and experimental design techniques necessary for making valid inferences.
Population vs. Sample
• Population: The entire group of individuals about which we want information.
• Sample: A subset of the population, selected to represent the population.
• Parameter: A number describing a population.
• Statistic: A number describing a sample.
Observational Studies: Retrospective vs. Prospective
• Retrospective studies look backward at existing data.
• Prospective studies follow subjects into the future to record outcomes.
• Observational studies observe individuals without influencing responses and can only suggest associations, not causation.
Experimental Studies
• Experiments apply a treatment to individuals to observe their responses.
• Only experiments, especially with random assignment, can establish cause-and-effect relationships.
Key Principles of Sampling
• Randomization ensures that the sample is representative and avoids bias.
• Larger samples provide more accurate estimates than smaller ones.
• Generalization is valid only to the population from which the sample is drawn.
• Observational studies cannot conclude causality.
Random Sampling vs. Random Assignment
• Random sampling: How subjects are selected from a population, allowing generalization.
• Random assignment: How subjects are assigned to treatments, allowing cause-and-effect conclusions.
Confounding Variables
• Confounding occurs when it’s unclear whether a difference in outcomes is caused by the explanatory variable or another variable.
Bias in Data Collection
• Voluntary Response Bias: People with strong opinions (especially negative) are overrepresented.
• Convenience Sampling Bias: Easy-to-reach individuals are overrepresented, often missing important groups.
• Undercoverage Bias: Some groups are left out of the selection process.
• Nonresponse Bias: Selected individuals fail to respond.
• Response Bias: Survey wording or social desirability affects answers.
• Quota Sampling Bias: Lack of randomization allows interviewer selection bias.
Sampling Methods
• Simple Random Sample (SRS): Every individual and every possible sample has an equal chance of being selected.
• Stratified Random Sample: Divide the population into strata (groups of similar individuals), then randomly sample from each stratum.
• Cluster Sample: Divide the population into clusters (often naturally occurring groups) and randomly select entire clusters.
• Systematic Random Sample: Select every nth individual after randomly choosing a starting point.
• Multistage Sampling: Combining several sampling methods for complex populations.
Choosing Sampling Methods Wisely
• SRS is simple but may miss important subgroups in small samples.
• Stratified sampling reduces variability and ensures representation of all groups.
• Cluster sampling is cost-effective but may increase variability if clusters are not homogeneous.
• Systematic sampling is efficient but risky if there is hidden periodicity.
Designing Experiments
• Explanatory variable: Factor being manipulated in an experiment.
• Response variable: Outcome measured after applying treatments.
• Treatments: Specific conditions applied to experimental units (subjects).
• Control Group: Baseline group for comparison, may receive placebo or no treatment.
Key Principles of Experimental Design
• Control: Keep other variables constant to reduce confounding.
• Random Assignment: Use chance to assign experimental units to treatments.
• Replication: Use enough subjects to reduce the role of chance variation.
• Comparison: Compare treatment groups to a control or to each other.
Completely Randomized Design
• Randomly assign all subjects to treatments without considering any other variables.
Blocking
• Group similar subjects into blocks based on variables expected to affect response.
• Random assignment is then carried out separately within each block.
• Reduces variability due to the blocking variable, leading to more precise conclusions.
Matched Pairs Design
• Special case of blocking where two very similar units are paired, or one unit receives both treatments in random order.
• Reduces variability from differences between subjects.
Blinding and Placebos
• Single-blind: Subjects or experimenters do not know which treatment is assigned.
• Double-blind: Neither subjects nor experimenters know who receives which treatment.
• Placebos are used to control for psychological effects.
Common Pitfalls in Experimentation
• Lack of random assignment can lead to confounding and bias.
• Voluntary response or convenience in choosing subjects weakens generalizability.
• Small sample sizes reduce reliability and reproducibility of results.
Scope of Inference
• Random sample from population → Generalize to population.
• Random assignment of treatments → Infer cause-and-effect.
• Without random sampling or assignment, inferences are limited.
In a Nutshell
Collecting data properly is critical for valid conclusions. Random sampling methods produce representative samples, while careful experimental designs allow valid causal inference. Key principles like randomization, control, replication, and blocking minimize bias and variability. Clear understanding of sampling vs. random assignment and correct inference scope ensures that results are trustworthy and meaningful.