Statistics
Scatter Graphs and Correlation
Year 10 · Year 11
- ✓By the end of this lesson students will be able to construct and interpret scatter graphs.
- ✓By the end of this lesson students will be able to identify and describe positive, negative, and no correlation, and assess their strength.
- ✓By the end of this lesson students will be able to draw a line of best fit by eye on a scatter graph.
- ✓By the end of this lesson students will be able to use a line of best fit to make predictions through interpolation.
- ✓By the end of this lesson students will be able to understand the limitations and unreliability of extrapolation.
Key concepts
A scatter graph (or scatter plot) is a type of graph used to display the relationship between two numerical variables. Each point on the graph represents a pair of data values, one for each variable. The independent variable is usually plotted on the horizontal (x-axis) and the dependent variable on the vertical (y-axis). Scatter graphs help us to visualise if there is a pattern or trend between the two variables.
Correlation describes the relationship or association between two variables. It indicates how closely the points on a scatter graph follow a trend. There are three main types of correlation: * **Positive Correlation**: As one variable increases, the other variable also tends to increase. The points generally go upwards from left to right. * **Negative Correlation**: As one variable increases, the other variable tends to decrease. The points generally go downwards from left to right. * **No Correlation**: There is no clear relationship or pattern between the two variables. The points appear randomly scattered. Correlation can also be described by its **strength**: strong, moderate, or weak. The closer the points are to forming a straight line, the stronger the correlation.
A line of best fit (also known as a trend line) is a straight line drawn on a scatter graph to show the general trend of the data. It should be drawn by eye so that it passes through the 'middle' of the points, with roughly an equal number of points above and below the line. It does not have to pass through the origin or any specific data point. The purpose of the line of best fit is to help us make predictions.
Interpolation is the process of using the line of best fit to estimate a value for one variable that lies *within* the range of the original data points. For example, if you have data for ages 10 to 20, interpolating would be estimating a value for age 15. Interpolation is generally considered reliable because it is based on observed data trends.
Extrapolation is the process of using the line of best fit to estimate a value for one variable that lies *outside* the range of the original data points. For example, if you have data for ages 10 to 20, extrapolating would be estimating a value for age 5 or age 25. Extrapolation is generally considered unreliable and should be done with caution, as the trend observed within the data range may not continue beyond it.
It is important to remember that correlation does not imply causation. Just because two variables show a strong correlation, it does not necessarily mean that one causes the other. There might be a third, unobserved variable influencing both, or the correlation could be purely coincidental.
Key facts to remember
- 1Scatter graphs show the relationship between two numerical variables.
- 2Correlation describes the type (positive, negative, no) and strength (strong, moderate, weak) of the relationship.
- 3Positive correlation: as one variable increases, the other tends to increase.
- 4Negative correlation: as one variable increases, the other tends to decrease.
- 5No correlation: no clear relationship between variables.
- 6A line of best fit is drawn by eye to show the general trend of the data, with roughly equal points above and below.
- 7Interpolation is making predictions within the range of the given data and is generally reliable.
- 8Extrapolation is making predictions outside the range of the given data and is generally unreliable.
- 9Correlation does not imply causation.
Worked examples
Example 1
A group of students recorded their revision time (in hours) and their maths test score (out of 100). The data is shown below: | Revision Time (hours) | Maths Test Score | |-----------------------|------------------| | 2 | 45 | | 3 | 55 | | 5 | 70 | | 1 | 30 | | 4 | 60 | | 6 | 80 | | 2 | 40 | | 5 | 75 | Plot a scatter graph for this data and describe the type and strength of correlation.
Answer
The scatter graph shows a strong positive correlation between revision time and maths test score. As revision time increases, the maths test score tends to increase.
When plotting points, ensure accuracy. Use a sharp pencil and mark points clearly with a cross (x) or a dot with a circle around it.
Example 2
The scatter graph below shows the number of hours spent watching TV and the number of hours spent exercising for 10 students. A line of best fit has been drawn. [Imagine a scatter graph with 'Hours Watching TV' on the x-axis (0-10) and 'Hours Exercising' on the y-axis (0-10). Points show a general downward trend. A line of best fit is drawn, e.g., passing through (2,8) and (8,2).] a) Describe the correlation shown. b) Use the line of best fit to estimate the hours of exercise for a student who watches 4 hours of TV. c) Use the line of best fit to estimate the hours of TV watched by a student who exercises for 6 hours.
Answer
a) The scatter graph shows a strong negative correlation. As the hours spent watching TV increase, the hours spent exercising tend to decrease. b) For a student who watches 4 hours of TV, the estimated hours of exercise are approximately 6 hours. (This is an interpolation). c) For a student who exercises for 6 hours, the estimated hours of TV watched are approximately 4 hours. (This is an interpolation).
Your exact answers for parts b and c may vary slightly depending on how precisely the line of best fit is drawn, but they should be close to the values obtained from a reasonable line.
Example 3
A company records the age of its employees and their annual salary. The scatter graph shows a strong positive correlation for employees aged 20 to 60. A line of best fit is drawn. a) Use the line of best fit to estimate the salary of an employee aged 45. b) Explain why it would be unreliable to use the line of best fit to estimate the salary of an employee aged 70.
Answer
a) (Assuming a reasonable line of best fit, e.g., if a 45-year-old earns £35,000) The estimated annual salary for an employee aged 45 is approximately £35,000. b) It would be unreliable to estimate the salary of an employee aged 70 because this would be an act of extrapolation. The age of 70 is outside the range of the data used to create the scatter graph (20 to 60). The trend observed within the given age range may not continue for older ages, as factors such as retirement or reduced working hours could significantly alter the relationship between age and salary.
Always clearly state whether you are interpolating or extrapolating, and justify the reliability based on whether the point is within or outside the data range.
Common mistakes
- ✗Confusing positive and negative correlation.
- ✗Drawing the line of best fit incorrectly (e.g., forcing it through the origin, or not balancing points above and below).
- ✗Extrapolating too far beyond the data range and assuming the trend will continue indefinitely.
- ✗Stating that correlation proves causation.
- ✗Not using a ruler to draw the line of best fit or to read values from the graph.
Exam tips
- ★Always use a ruler and a sharp pencil for drawing scatter graphs and lines of best fit to ensure accuracy.
- ★Clearly label both axes with the variable name and units.
- ★When asked to describe correlation, mention both the type (positive/negative/no) and the strength (strong/moderate/weak).
- ★Be precise when reading values from your line of best fit for interpolation or extrapolation.
- ★Remember to explain *why* extrapolation is unreliable, referring to the data range.
Ready to practise?
Try a problem on this topic
Snap a photo or type a question — get step-by-step working instantly.
