Statistics

Sampling & Data Collection: Populations, Samples and Bias

Year 10 · Year 11

  • By the end of this lesson students will be able to define and distinguish between a population and a sample.
  • By the end of this lesson students will be able to explain the purpose of sampling in data collection.
  • By the end of this lesson students will be able to identify and explain different types of bias in data collection.
  • By the end of this lesson students will be able to suggest improvements to reduce bias and ensure data is representative.

Key concepts

Population

The entire group of individuals or items that we are interested in studying. For example, if we want to know the average height of all students in a particular school, then all students in that school constitute the population.

Sample

A smaller, manageable group selected from the population. It is chosen to represent the characteristics of the larger population. For example, if we select 50 students from a school to measure their heights, these 50 students form a sample.

Census

A survey that collects data from every single member of the entire population. This provides complete information but can be costly, time-consuming, and sometimes impractical or impossible.

Sampling Frame

A list of all members of the population from which a sample can be drawn. For example, a school's register of all students could serve as a sampling frame for selecting a sample of students.

Bias

A systematic error in a sampling method or data collection process that causes the results to be unrepresentative of the true characteristics of the population. Bias can lead to misleading conclusions and invalid inferences.

Representative Sample

A sample that accurately reflects the characteristics (e.g., age, gender, opinions, socio-economic status) of the population from which it was drawn. A representative sample is crucial for making valid inferences about the population.

Types of Bias

Bias can arise from various sources, including: 1. **Sampling Bias:** When the method of selecting a sample systematically excludes or over-represents certain parts of the population (e.g., only surveying people at a specific location or time). 2. **Response Bias:** When respondents provide inaccurate or untruthful answers (e.g., due to social desirability, misunderstanding the question, or leading questions). 3. **Questionnaire Bias (Leading Questions):** When the wording of a question influences the respondent's answer, pushing them towards a particular viewpoint (e.g., 'Don't you agree that...'). 4. **Non-response Bias:** When a significant portion of the selected sample does not respond, and those who do respond differ systematically from those who don't.

Key facts to remember

  • 1A population is the entire group of interest, while a sample is a smaller, selected subset of that group.
  • 2A census involves collecting data from every member of the population, which is often impractical.
  • 3Sampling is used when a population is too large or difficult to survey completely.
  • 4Bias is a systematic error that makes a sample or data collection method unrepresentative of the population.
  • 5A representative sample accurately reflects the characteristics of the population.
  • 6Common sources of bias include sampling bias, non-response bias, and questionnaire bias (leading questions).
  • 7Random sampling methods help to minimise bias and increase the likelihood of a representative sample.
  • 8Unbiased data collection is essential for drawing valid and reliable conclusions about a population.

Worked examples

Example 1

A local council wants to find out how residents feel about a new recycling scheme. They send a questionnaire to 1000 randomly selected households in the area. 350 households return the questionnaire.

I1. **Identify the population:** The population is all residents in the local council area.
II2. **Identify the sample:** The initial sample is the 1000 randomly selected households. The actual responding sample is the 350 households that returned the questionnaire.
III3. **Identify potential sources of bias:** There is potential for non-response bias. The 350 households who responded might have stronger opinions (either positive or negative) about the recycling scheme than the 650 who did not respond. This means the results might not be representative of all residents.
IV4. **Suggest an improvement to reduce bias:** To reduce non-response bias, the council could send reminders to non-respondents, offer incentives for participation, or follow up with a smaller, targeted survey of non-respondents to understand their reasons for not participating and their views.

Answer

Population: All residents in the local council area. Sample: The 350 households that returned the questionnaire. Bias: Non-response bias, as those who responded may have different views from those who didn't. Improvement: Follow up with non-respondents or offer incentives to increase response rate.

It's important to distinguish between the intended sample and the actual responding sample, as non-response can introduce bias.

Example 2

A school newspaper wants to survey students about their favourite school lunch. They stand outside the canteen entrance at the start of lunch break and ask the first 20 students who enter.

I1. **Identify the population:** All students in the school.
II2. **Identify the sample:** The first 20 students who enter the canteen at the start of lunch break.
III3. **Identify potential sources of bias:** This method uses opportunity sampling. It introduces sampling bias because: * Only students who eat in the canteen are included, excluding those who bring packed lunches or leave school for lunch. * Only students who arrive early are included, potentially missing those who arrive later or have different schedules. * The sample size (20 students) is very small relative to the entire school population, making it unlikely to be representative.
IV4. **Suggest an improvement to reduce bias:** To get a more representative sample, the newspaper could use a simple random sample (e.g., selecting student names randomly from the school register) or a stratified sample (e.g., ensuring a proportional number of students from each year group are included). They should also aim for a larger sample size.

Answer

Population: All students in the school. Sample: The first 20 students entering the canteen. Bias: Sampling bias (opportunity sampling), as it excludes students who don't use the canteen or arrive later, and the sample size is too small. Improvement: Use a random sampling method (e.g., simple random or stratified sampling) and increase the sample size.

Opportunity sampling is often convenient but rarely produces a representative sample.

Example 3

A political party conducts a survey asking: 'Do you agree that the government's recent tax cuts, which will boost the economy and create jobs, are a good idea?'

I1. **Identify the question:** 'Do you agree that the government's recent tax cuts, which will boost the economy and create jobs, are a good idea?'
II2. **Identify potential sources of bias:** The question is a leading question. It includes positive statements ('boost the economy and create jobs') that are presented as facts, influencing respondents to agree with the premise of the tax cuts. This creates questionnaire bias.
III3. **Suggest an improvement to reduce bias:** The question should be rephrased to be neutral and objective, allowing respondents to express their true opinions without being swayed. A better question might be: 'What is your opinion on the government's recent tax cuts?' or 'Do you support or oppose the government's recent tax cuts?'

Answer

Bias: Questionnaire bias (leading question). The question uses positive phrasing ('boost the economy and create jobs') to influence respondents towards agreement. Improvement: Rephrase the question to be neutral, for example: 'What is your opinion on the government's recent tax cuts?'

Always look for emotive language or statements presented as facts within a survey question.

Common mistakes

  • Confusing the terms 'population' and 'sample' or using them interchangeably.
  • Failing to identify leading questions or loaded language as a source of bias in survey questions.
  • Assuming that any sample, regardless of how it was selected, will be representative of the population.
  • Selecting a sample that is too small or drawn from a very specific subgroup, making it unrepresentative.
  • Not considering the practical difficulties or ethical implications when designing a data collection method.

Exam tips

  • When asked to define population and sample, always do so in the context of the given scenario.
  • When identifying bias, don't just state that it's biased; explain *how* and *why* it makes the data unrepresentative.
  • For questions asking to improve a data collection method, suggest specific, practical changes that would reduce bias and improve representativeness.
  • Remember that a larger sample size reduces sampling error, but it does not eliminate bias if the sampling method itself is flawed.

Ready to practise?

Try a problem on this topic

Snap a photo or type a question — get step-by-step working instantly.