Statistics
Cumulative Frequency and Box Plots (Higher)
Year 10 · Year 11
- ✓By the end of this lesson students will be able to construct and interpret cumulative frequency tables and diagrams.
- ✓By the end of this lesson students will be able to estimate the median, lower quartile, upper quartile, and interquartile range from a cumulative frequency diagram.
- ✓By the end of this lesson students will be able to construct and interpret box plots from cumulative frequency data or raw data.
- ✓By the end of this lesson students will be able to compare two or more distributions using cumulative frequency diagrams and/or box plots.
Key concepts
Cumulative frequency is a running total of the frequencies in a frequency distribution. It tells you how many data points are less than or equal to a particular value. To calculate it, you add the frequency of each class to the sum of the frequencies of all preceding classes.
A cumulative frequency diagram is a graph that displays the cumulative frequency against the upper class boundary of each class interval. The points are plotted at the upper class boundary and the corresponding cumulative frequency, then joined by a smooth curve (an ogive). This diagram is used to estimate measures of central tendency and spread.
The median is the middle value of a dataset when it is ordered from least to greatest. From a cumulative frequency diagram, the median is estimated by finding the value on the x-axis corresponding to half of the total cumulative frequency (N/2) on the y-axis.
The lower quartile (Q1) is the value below which 25% of the data lies. From a cumulative frequency diagram, Q1 is estimated by finding the value on the x-axis corresponding to one-quarter of the total cumulative frequency (N/4) on the y-axis.
The upper quartile (Q3) is the value below which 75% of the data lies. From a cumulative frequency diagram, Q3 is estimated by finding the value on the x-axis corresponding to three-quarters of the total cumulative frequency (3N/4) on the y-axis.
The interquartile range (IQR) is a measure of statistical dispersion, representing the range of the middle 50% of the data. It is calculated as the difference between the upper quartile (Q3) and the lower quartile (Q1). A smaller IQR indicates less spread in the middle 50% of the data.
A box plot is a standardised way of displaying the distribution of data based on a five-number summary: the minimum value, the lower quartile (Q1), the median (Q2), the upper quartile (Q3), and the maximum value. The 'box' represents the middle 50% of the data (from Q1 to Q3), with a line at the median. The 'whiskers' extend from the box to the minimum and maximum values.
When comparing two or more distributions, we typically look at two main aspects: central tendency (e.g., using the median) and spread (e.g., using the interquartile range or range). Box plots are particularly useful for visual comparison. A higher median suggests generally higher values, while a smaller IQR suggests more consistent or less varied data.
Key facts to remember
- 1Cumulative frequency is a running total of frequencies, plotted against the upper class boundary.
- 2The median is the middle value, found at N/2 on the cumulative frequency axis.
- 3The lower quartile (Q1) is at N/4 and the upper quartile (Q3) is at 3N/4 on the cumulative frequency axis.
- 4The Interquartile Range (IQR) is a measure of spread, calculated as Q3 - Q1.
- 5A box plot displays the five-number summary: minimum, Q1, median, Q3, and maximum.
- 6When comparing distributions, comment on both a measure of central tendency (e.g., median) and a measure of spread (e.g., IQR).
Worked examples
Example 1
The table shows the heights of 80 students. Construct a cumulative frequency table and diagram. Use your diagram to estimate the median, lower quartile, upper quartile, and interquartile range.
Answer
From the cumulative frequency diagram (values may vary slightly due to reading from graph): Median (Q2) ≈ 164 cm Lower Quartile (Q1) ≈ 157 cm Upper Quartile (Q3) ≈ 170 cm Interquartile Range (IQR) = 170 - 157 = 13 cm
Ensure your graph axes are labelled correctly and use a smooth curve, not straight lines between points. Show your working lines on the graph.
Example 2
The five-number summary for the weights (in kg) of a group of students is: Minimum = 45, Lower Quartile = 52, Median = 60, Upper Quartile = 68, Maximum = 75. Draw a box plot for this data.
Answer
A box plot showing a horizontal number line from 40 to 80. A box is drawn from 52 to 68. A vertical line is inside the box at 60. A whisker extends from 52 to 45, and another whisker extends from 68 to 75.
The box represents the middle 50% of the data, and the whiskers show the spread of the remaining 50%.
Example 3
Two classes, Class A and Class B, took the same maths test. Their results are summarised by the following box plots: Class A: Min=30, Q1=50, Med=65, Q3=80, Max=95 Class B: Min=25, Q1=40, Med=55, Q3=70, Max=90 Compare the performance of the two classes.
Answer
Class A generally performed better than Class B, as indicated by its higher median score (65 compared to 55). The spread of the middle 50% of scores for both classes is similar, as both have an interquartile range of 30, suggesting comparable consistency in their core performance.
Always comment on both a measure of central tendency (like the median) and a measure of spread (like the IQR or range) when comparing distributions.
Common mistakes
- ✗Plotting cumulative frequency against the midpoint or lower class boundary instead of the upper class boundary.
- ✗Drawing straight lines between points on a cumulative frequency diagram instead of a smooth curve.
- ✗Incorrectly calculating the position for median, Q1, or Q3 (e.g., using N instead of N/2 for median).
- ✗Confusing the range (maximum - minimum) with the interquartile range (Q3 - Q1).
- ✗Only comparing one aspect (e.g., just the median) when asked to compare two distributions, rather than both central tendency and spread.
Exam tips
- ★Always use a ruler and pencil for all diagrams. Draw smooth curves for cumulative frequency diagrams.
- ★Clearly show your working on cumulative frequency diagrams by drawing horizontal and vertical lines from the axes to the curve and then to the other axis to indicate how you found your values.
- ★Read values from your graph as accurately as possible. Be prepared for a small tolerance in answers due to graph reading.
- ★When comparing distributions, ensure you make at least two comparative statements: one about central tendency (e.g., median) and one about spread (e.g., IQR). Use comparative language like 'higher', 'lower', 'more consistent', 'more varied'.
Ready to practise?
Try a problem on this topic
Snap a photo or type a question — get step-by-step working instantly.
