Statistics

Cumulative Frequency and Box Plots (Higher)

Year 10 · Year 11

  • By the end of this lesson students will be able to construct and interpret cumulative frequency tables and diagrams.
  • By the end of this lesson students will be able to estimate the median, lower quartile, upper quartile, and interquartile range from a cumulative frequency diagram.
  • By the end of this lesson students will be able to construct and interpret box plots from cumulative frequency data or raw data.
  • By the end of this lesson students will be able to compare two or more distributions using cumulative frequency diagrams and/or box plots.

Key concepts

Cumulative Frequency

Cumulative frequency is a running total of the frequencies in a frequency distribution. It tells you how many data points are less than or equal to a particular value. To calculate it, you add the frequency of each class to the sum of the frequencies of all preceding classes.

Cumulative Frequency Diagram (Ogive)

A cumulative frequency diagram is a graph that displays the cumulative frequency against the upper class boundary of each class interval. The points are plotted at the upper class boundary and the corresponding cumulative frequency, then joined by a smooth curve (an ogive). This diagram is used to estimate measures of central tendency and spread.

Median (Q2)

The median is the middle value of a dataset when it is ordered from least to greatest. From a cumulative frequency diagram, the median is estimated by finding the value on the x-axis corresponding to half of the total cumulative frequency (N/2) on the y-axis.

Lower Quartile (Q1)

The lower quartile (Q1) is the value below which 25% of the data lies. From a cumulative frequency diagram, Q1 is estimated by finding the value on the x-axis corresponding to one-quarter of the total cumulative frequency (N/4) on the y-axis.

Upper Quartile (Q3)

The upper quartile (Q3) is the value below which 75% of the data lies. From a cumulative frequency diagram, Q3 is estimated by finding the value on the x-axis corresponding to three-quarters of the total cumulative frequency (3N/4) on the y-axis.

Interquartile Range (IQR)

The interquartile range (IQR) is a measure of statistical dispersion, representing the range of the middle 50% of the data. It is calculated as the difference between the upper quartile (Q3) and the lower quartile (Q1). A smaller IQR indicates less spread in the middle 50% of the data.

IQR = Q3 - Q1
Box Plot (Box-and-Whisker Diagram)

A box plot is a standardised way of displaying the distribution of data based on a five-number summary: the minimum value, the lower quartile (Q1), the median (Q2), the upper quartile (Q3), and the maximum value. The 'box' represents the middle 50% of the data (from Q1 to Q3), with a line at the median. The 'whiskers' extend from the box to the minimum and maximum values.

Comparing Distributions

When comparing two or more distributions, we typically look at two main aspects: central tendency (e.g., using the median) and spread (e.g., using the interquartile range or range). Box plots are particularly useful for visual comparison. A higher median suggests generally higher values, while a smaller IQR suggests more consistent or less varied data.

Key facts to remember

  • 1Cumulative frequency is a running total of frequencies, plotted against the upper class boundary.
  • 2The median is the middle value, found at N/2 on the cumulative frequency axis.
  • 3The lower quartile (Q1) is at N/4 and the upper quartile (Q3) is at 3N/4 on the cumulative frequency axis.
  • 4The Interquartile Range (IQR) is a measure of spread, calculated as Q3 - Q1.
  • 5A box plot displays the five-number summary: minimum, Q1, median, Q3, and maximum.
  • 6When comparing distributions, comment on both a measure of central tendency (e.g., median) and a measure of spread (e.g., IQR).

Worked examples

Example 1

The table shows the heights of 80 students. Construct a cumulative frequency table and diagram. Use your diagram to estimate the median, lower quartile, upper quartile, and interquartile range.

IFirst, create a cumulative frequency table:
IIHeight (h cm) | Frequency | Cumulative Frequency
III--------------|-----------|---------------------
IV140 < h ≤ 150 | 8 | 8
V150 < h ≤ 160 | 20 | 8 + 20 = 28
VI160 < h ≤ 170 | 32 | 28 + 32 = 60
VII170 < h ≤ 180 | 16 | 60 + 16 = 76
VIII180 < h ≤ 190 | 4 | 76 + 4 = 80
9Next, plot the cumulative frequency against the upper class boundary. The points to plot are (150, 8), (160, 28), (170, 60), (180, 76), (190, 80). Remember to start at (140, 0) if the first class starts from 140.
10Draw a smooth curve through these points.
11To find the median (Q2): Total frequency N = 80. Median position = N/2 = 80/2 = 40th value. Draw a horizontal line from 40 on the cumulative frequency axis to the curve, then a vertical line down to the height axis. Read the value.
12To find the lower quartile (Q1): Q1 position = N/4 = 80/4 = 20th value. Draw a horizontal line from 20 to the curve, then down to the height axis. Read the value.
13To find the upper quartile (Q3): Q3 position = 3N/4 = 3 * 80/4 = 60th value. Draw a horizontal line from 60 to the curve, then down to the height axis. Read the value.
14Calculate the Interquartile Range (IQR) = Q3 - Q1.

Answer

From the cumulative frequency diagram (values may vary slightly due to reading from graph): Median (Q2) ≈ 164 cm Lower Quartile (Q1) ≈ 157 cm Upper Quartile (Q3) ≈ 170 cm Interquartile Range (IQR) = 170 - 157 = 13 cm

Ensure your graph axes are labelled correctly and use a smooth curve, not straight lines between points. Show your working lines on the graph.

Example 2

The five-number summary for the weights (in kg) of a group of students is: Minimum = 45, Lower Quartile = 52, Median = 60, Upper Quartile = 68, Maximum = 75. Draw a box plot for this data.

IDraw a suitable scale on a number line that covers the range from 40 to 80 kg.
IIMark the five key values above the number line: Minimum (45), Q1 (52), Median (60), Q3 (68), Maximum (75).
IIIDraw a box from Q1 (52) to Q3 (68).
IVDraw a vertical line inside the box at the Median (60).
VDraw 'whiskers' (horizontal lines) from Q1 to the Minimum (from 52 to 45) and from Q3 to the Maximum (from 68 to 75).

Answer

A box plot showing a horizontal number line from 40 to 80. A box is drawn from 52 to 68. A vertical line is inside the box at 60. A whisker extends from 52 to 45, and another whisker extends from 68 to 75.

The box represents the middle 50% of the data, and the whiskers show the spread of the remaining 50%.

Example 3

Two classes, Class A and Class B, took the same maths test. Their results are summarised by the following box plots: Class A: Min=30, Q1=50, Med=65, Q3=80, Max=95 Class B: Min=25, Q1=40, Med=55, Q3=70, Max=90 Compare the performance of the two classes.

ICalculate the median for each class: Median A = 65, Median B = 55.
IICalculate the Interquartile Range (IQR) for each class: IQR A = Q3 - Q1 = 80 - 50 = 30. IQR B = Q3 - Q1 = 70 - 40 = 30.
IIICompare the medians: Class A has a higher median score (65) compared to Class B (55). This suggests that, on average, students in Class A performed better on the test.
IVCompare the IQRs: Both classes have the same interquartile range (30). This indicates that the spread of the middle 50% of scores is similar for both classes, meaning their consistency in the middle range of scores is comparable.

Answer

Class A generally performed better than Class B, as indicated by its higher median score (65 compared to 55). The spread of the middle 50% of scores for both classes is similar, as both have an interquartile range of 30, suggesting comparable consistency in their core performance.

Always comment on both a measure of central tendency (like the median) and a measure of spread (like the IQR or range) when comparing distributions.

Common mistakes

  • Plotting cumulative frequency against the midpoint or lower class boundary instead of the upper class boundary.
  • Drawing straight lines between points on a cumulative frequency diagram instead of a smooth curve.
  • Incorrectly calculating the position for median, Q1, or Q3 (e.g., using N instead of N/2 for median).
  • Confusing the range (maximum - minimum) with the interquartile range (Q3 - Q1).
  • Only comparing one aspect (e.g., just the median) when asked to compare two distributions, rather than both central tendency and spread.

Exam tips

  • Always use a ruler and pencil for all diagrams. Draw smooth curves for cumulative frequency diagrams.
  • Clearly show your working on cumulative frequency diagrams by drawing horizontal and vertical lines from the axes to the curve and then to the other axis to indicate how you found your values.
  • Read values from your graph as accurately as possible. Be prepared for a small tolerance in answers due to graph reading.
  • When comparing distributions, ensure you make at least two comparative statements: one about central tendency (e.g., median) and one about spread (e.g., IQR). Use comparative language like 'higher', 'lower', 'more consistent', 'more varied'.

Ready to practise?

Try a problem on this topic

Snap a photo or type a question — get step-by-step working instantly.