Strand 1 — Statistics & Probability

Handling Data: Populations, Samples and Data Types

1st Year · 2nd Year · 3rd Year (Junior Cert)

  • By the end of this lesson students will be able to define and distinguish between a population and a sample.
  • By the end of this lesson students will be able to identify potential sources of bias in data collection.
  • By the end of this lesson students will be able to classify different types of data as categorical or numerical, and further as discrete or continuous.
  • By the end of this lesson students will be able to understand key considerations for designing effective surveys.

Key concepts

Population

The entire group of individuals or items that we are interested in studying. It includes every single member of the group about which we want to draw conclusions.

Sample

A smaller, representative subgroup selected from the population. We study the sample to collect data and make inferences or generalisations about the larger population.

Bias

A systematic error in a study that leads to an incorrect or unrepresentative conclusion. It occurs when a sample is not truly representative of the population, or when the method of data collection unfairly favours certain outcomes. Common types include selection bias (how the sample is chosen), response bias (how people answer), and question bias (how questions are phrased).

Categorical Data

Data that can be sorted into categories or groups based on a characteristic or quality. It describes qualities or attributes rather than quantities. Examples include favourite colour, gender, or type of car.

Numerical Data

Data that consists of numbers and can be measured or counted. It represents quantities and can be used in mathematical calculations. Examples include age, height, or number of siblings.

Discrete Data

Numerical data that can only take on specific, distinct values, often whole numbers. It usually results from counting and there are clear gaps between possible values. Examples include the number of students in a class, the number of goals scored in a match, or shoe size.

Continuous Data

Numerical data that can take any value within a given range. It usually results from measuring and can include decimals or fractions. There are no gaps between possible values, as measurements can be infinitely precise. Examples include height, weight, temperature, or time.

Designing Surveys

When designing a survey, it is crucial to: 1. Clearly define the research question and objectives. 2. Choose an appropriate and unbiased sampling method (e.g., random sampling) to ensure the sample is representative of the population. 3. Use clear, simple, and unambiguous language in all questions. 4. Avoid leading or biased questions that might influence the respondent's answer. 5. Ensure response options are comprehensive (cover all possibilities) and mutually exclusive (no overlap). 6. Consider the method of administration (e.g., online, paper, interview) and its potential impact on responses.

Key facts to remember

  • 1A population is the entire group being studied, while a sample is a smaller, representative subset of that group.
  • 2Bias occurs when a sample is not representative or when the data collection method unfairly influences results.
  • 3Categorical data describes qualities or categories (e.g., hair colour, favourite sport).
  • 4Numerical data represents quantities (e.g., age, number of siblings).
  • 5Discrete data is numerical data that can be counted and takes specific, distinct values (e.g., number of cars).
  • 6Continuous data is numerical data that can be measured and can take any value within a range (e.g., height, weight).
  • 7Random sampling is a key method to reduce bias and ensure a sample is representative of the population.
  • 8Well-designed survey questions are clear, neutral, and do not lead or influence the respondent.

Worked examples

Example 1

A local council wants to find out the average number of hours per week teenagers in their town spend on social media. They survey 100 teenagers who attend a local youth club.

I1. Identify the population for this study.
II2. Identify the sample used in this study.
III3. Discuss whether the sample is likely to be biased and explain why.

Answer

1. Population: All teenagers in the town. 2. Sample: The 100 teenagers who attend the local youth club. 3. Bias: Yes, the sample is likely biased. Teenagers who attend a youth club might have different social media habits compared to the general population of teenagers in the town (e.g., they might be more social in person, or have more free time). This sample is not representative of all teenagers in the town.

A truly random sample from all teenagers in the town would be more representative.

Example 2

Classify each of the following types of data as categorical or numerical. If numerical, further classify it as discrete or continuous: (a) The brand of smartphone a person owns (b) The number of text messages sent in a day (c) The time taken to run 100 metres (d) A student's grade (e.g., A, B, C) (e) The temperature inside a classroom in degrees Celsius

I1. For each item, determine if it describes a quality/category or a quantity/number.
II2. If numerical, decide if it can only take specific, separate values (discrete) or any value within a range (continuous).

Answer

(a) The brand of smartphone a person owns: Categorical (b) The number of text messages sent in a day: Numerical, Discrete (c) The time taken to run 100 metres: Numerical, Continuous (d) A student's grade (e.g., A, B, C): Categorical (e) The temperature inside a classroom in degrees Celsius: Numerical, Continuous

Remember, discrete data is counted, while continuous data is measured.

Example 3

A student is designing a survey to find out if people in their neighbourhood support a new pedestrianisation plan for the town centre. They propose asking the question: 'Do you agree that the wonderful new pedestrianisation plan will greatly improve our town centre?'

I1. Identify any issues with the proposed survey question.
II2. Suggest an improved, unbiased question for the survey.

Answer

1. Issues: The question is highly leading and biased. It uses positive, emotive language ('wonderful', 'greatly improve') which encourages respondents to agree with the statement. It also assumes the plan is 'new' and 'wonderful', influencing the answer. 2. Improved question: 'Do you support or oppose the proposed pedestrianisation plan for the town centre?' (with response options such as 'Support', 'Oppose', 'No opinion' or 'Unsure').

Always strive for neutral language to avoid influencing respondents' answers.

Common mistakes

  • Confusing the population with the sample, especially when the sample size is large.
  • Failing to recognise sources of bias in a given scenario, such as selection bias or question bias.
  • Incorrectly classifying discrete and continuous data, particularly with numerical data that might appear continuous but is discrete (e.g., shoe size).
  • Using leading or loaded questions in survey design, which can skew the results.
  • Assuming a convenient or easily accessible sample is representative of a larger, diverse population.

Exam tips

  • When asked to identify population and sample, be precise in your definitions for the given context.
  • If a question asks about bias, always explain *why* a particular method or question is biased, not just state that it is.
  • Practise classifying various data types, paying close attention to whether the data is counted (discrete) or measured (continuous).
  • For survey design questions, focus on making questions neutral, clear, and ensuring all possible responses are covered without overlap.

Ready to practise?

Try a problem on this topic

Snap a photo or type a question — get step-by-step working instantly.