Strand 1 — Statistics & Probability

Descriptive Statistics

5th Year · 6th Year (Leaving Cert)

  • By the end of this lesson students will be able to calculate and interpret measures of central tendency (mean, median, mode) for both ungrouped and grouped data.
  • By the end of this lesson students will be able to calculate and interpret measures of spread (range, standard deviation, interquartile range) for both ungrouped and grouped data.
  • By the end of this lesson students will be able to determine quartiles (Q1, Q2, Q3) for a given data set.
  • By the end of this lesson students will be able to construct and interpret box plots using the five-number summary.
  • By the end of this lesson students will be able to choose appropriate descriptive statistics to summarise and compare data sets.

Key concepts

Mean (Arithmetic Mean)

The mean is the sum of all values in a data set divided by the number of values. It is a measure of central tendency. For grouped data, the mid-interval value of each class is used to represent the data in that class.

Ungrouped: \(\bar{x} = \frac{\sum x}{n}\)
Grouped: \(\bar{x} = \frac{\sum fx}{\sum f}\)
Median

The median is the middle value of a data set when the data is arranged in ascending or descending order. If there is an even number of values, the median is the average of the two middle values. It is a measure of central tendency that is less affected by outliers than the mean. For grouped data, the median class is the class interval containing the \((\frac{n}{2})^{th}\) value in the cumulative frequency.

Mode

The mode is the value that appears most frequently in a data set. A data set can have one mode (unimodal), more than one mode (multimodal), or no mode. For grouped data, the modal class is the class interval with the highest frequency.

Range

The range is the difference between the maximum and minimum values in a data set. It is a simple measure of spread.

Range = Maximum Value - Minimum Value
Standard Deviation (\(\sigma\))

The standard deviation is a measure of the average amount of variation or dispersion of data points around the mean. A low standard deviation indicates that data points tend to be close to the mean, while a high standard deviation indicates that data points are spread out over a wider range of values. It is calculated as the square root of the variance.

Ungrouped: \(\sigma = \sqrt{\frac{\sum (x - \bar{x})^2}{n}} \text{ or } \sigma = \sqrt{\frac{\sum x^2}{n} - (\bar{x})^2}\)
Grouped: \(\sigma = \sqrt{\frac{\sum f(x - \bar{x})^2}{\sum f}} \text{ or } \sigma = \sqrt{\frac{\sum fx^2}{\sum f} - (\bar{x})^2}\)
Quartiles (Q1, Q2, Q3) & Interquartile Range (IQR)

Quartiles divide an ordered data set into four equal parts. Q1 (Lower Quartile) is the median of the lower half of the data (25th percentile). Q2 is the median of the entire data set (50th percentile). Q3 (Upper Quartile) is the median of the upper half of the data (75th percentile). The Interquartile Range (IQR) is the difference between the upper and lower quartiles, representing the spread of the middle 50% of the data.

IQR = Q3 - Q1
Box Plots (Box-and-Whisker Plots)

A box plot is a graphical representation of the five-number summary of a data set: minimum value, Q1, median (Q2), Q3, and maximum value. It provides a visual summary of the centre, spread, and skewness of the data.

Key facts to remember

  • 1Always order data in ascending or descending order before finding the median or quartiles.
  • 2The mean is sensitive to extreme values (outliers), while the median is resistant to them.
  • 3The standard deviation measures the typical distance of data points from the mean.
  • 4The Interquartile Range (IQR) measures the spread of the middle 50% of the data, making it robust to outliers.
  • 5For grouped data, use the mid-interval value to represent each class when calculating mean and standard deviation.
  • 6A box plot visually summarises the minimum, Q1, median, Q3, and maximum values, providing insight into data distribution and skewness.
  • 7The sum of deviations from the mean, \(\sum (x - \bar{x})\), is always zero.

Worked examples

Example 1

The following are the scores of 10 students in a maths test: 65, 72, 88, 60, 75, 92, 68, 80, 75, 70. Calculate the mean, median, mode, range, and standard deviation for this data set.

I1. **Order the data:** 60, 65, 68, 70, 72, 75, 75, 80, 88, 92.
II2. **Calculate the Mean (\(\bar{x}\)):**
III \(\sum x = 60 + 65 + 68 + 70 + 72 + 75 + 75 + 80 + 88 + 92 = 745\)
IV \(n = 10\)
V \(\bar{x} = \frac{\sum x}{n} = \frac{745}{10} = 74.5\)
VI3. **Calculate the Median:** There are 10 data points (even number). The median is the average of the 5th and 6th values.
VII 5th value = 72, 6th value = 75
VIII Median = \(\frac{72 + 75}{2} = \frac{147}{2} = 73.5\)
94. **Calculate the Mode:** The value that appears most frequently is 75 (it appears twice).
10 Mode = 75
115. **Calculate the Range:**
12 Maximum value = 92, Minimum value = 60
13 Range = 92 - 60 = 32
146. **Calculate the Standard Deviation (\(\sigma\)):** Using the formula \(\sigma = \sqrt{\frac{\sum x^2}{n} - (\bar{x})^2}\)
15 \(\sum x^2 = 60^2 + 65^2 + 68^2 + 70^2 + 72^2 + 75^2 + 75^2 + 80^2 + 88^2 + 92^2\)
16 \(\sum x^2 = 3600 + 4225 + 4624 + 4900 + 5184 + 5625 + 5625 + 6400 + 7744 + 8464 = 56391\)
17 \(\sigma = \sqrt{\frac{56391}{10} - (74.5)^2}\)
18 \(\sigma = \sqrt{5639.1 - 5550.25}\)
19 \(\sigma = \sqrt{88.85} \approx 9.426\) (to 3 decimal places)

Answer

Mean = 74.5, Median = 73.5, Mode = 75, Range = 32, Standard Deviation \(\approx 9.426\)

Always order the data first for median and quartiles. Use your calculator's statistical functions to check your mean and standard deviation.

Example 2

The table below shows the distribution of heights (in cm) of 50 students. Calculate the mean height and the standard deviation of the heights.

I| Height (cm) | Frequency (f) | Mid-interval (x) | fx | fx² |
II|-------------|---------------|------------------|----|-----|
III| 150-155 | 6 | 152.5 | 915 | 139537.5 |
IV| 155-160 | 10 | 157.5 | 1575 | 247968.75 |
V| 160-165 | 18 | 162.5 | 2925 | 475312.5 |
VI| 165-170 | 12 | 167.5 | 2010 | 336675 |
VII| 170-175 | 4 | 172.5 | 690 | 118950 |
VIII| **Total** | **\(\sum f = 50\)** | | **\(\sum fx = 8115\)** | **\(\sum fx^2 = 1318443.75\)** |
91. **Calculate the Mid-interval (x) for each class:** (Lower bound + Upper bound) / 2.
10 e.g., for 150-155, x = (150+155)/2 = 152.5
112. **Calculate fx and fx² for each class:** Multiply frequency by mid-interval value (fx) and by the square of the mid-interval value (fx²).
123. **Calculate \(\sum f\), \(\sum fx\), and \(\sum fx^2\):** Sum the respective columns.
134. **Calculate the Mean (\(\bar{x}\)):**
14 \(\bar{x} = \frac{\sum fx}{\sum f} = \frac{8115}{50} = 162.3\)
155. **Calculate the Standard Deviation (\(\sigma\)):** Using the formula \(\sigma = \sqrt{\frac{\sum fx^2}{\sum f} - (\bar{x})^2}\)
16 \(\sigma = \sqrt{\frac{1318443.75}{50} - (162.3)^2}\)
17 \(\sigma = \sqrt{26368.875 - 26341.29}\)
18 \(\sigma = \sqrt{27.585} \approx 5.252\) (to 3 decimal places)

Answer

Mean height \(\approx 162.3\) cm, Standard Deviation \(\approx 5.252\) cm

Ensure you use the mid-interval values (x) for calculations involving grouped data, not the class boundaries.

Example 3

The number of goals scored by a football team in their last 15 matches are: 1, 3, 0, 2, 4, 1, 1, 3, 2, 0, 5, 1, 2, 3, 1. Find the five-number summary and draw a box plot for this data.

I1. **Order the data:** 0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 5. (n = 15)
II2. **Find the Minimum Value:** Min = 0
III3. **Find the Maximum Value:** Max = 5
IV4. **Find the Median (Q2):** The position of the median is \(\frac{n+1}{2} = \frac{15+1}{2} = 8^{th}\) value.
V Q2 = 2 (the 8th value in the ordered list)
VI5. **Find the Lower Quartile (Q1):** Q1 is the median of the lower half of the data (values before Q2).
VII Lower half: 0, 0, 1, 1, 1, 1, 1 (7 values). The median position is \(\frac{7+1}{2} = 4^{th}\) value.
VIII Q1 = 1 (the 4th value in the lower half)
96. **Find the Upper Quartile (Q3):** Q3 is the median of the upper half of the data (values after Q2).
10 Upper half: 2, 3, 3, 3, 4, 5 (7 values). The median position is \(\frac{7+1}{2} = 4^{th}\) value.
11 Q3 = 3 (the 4th value in the upper half)
127. **Five-number summary:** Min = 0, Q1 = 1, Median = 2, Q3 = 3, Max = 5.
138. **Calculate IQR:** IQR = Q3 - Q1 = 3 - 1 = 2.
149. **Draw the Box Plot:**
15 * Draw a number line covering the range of the data (0 to 5).
16 * Draw a box from Q1 (1) to Q3 (3).
17 * Draw a vertical line inside the box at the Median (2).
18 * Draw 'whiskers' from Q1 to the Minimum (0) and from Q3 to the Maximum (5).
19 (Visual representation of the box plot would be drawn on paper, with a clear scale and labels).

Answer

Five-number summary: Minimum = 0, Q1 = 1, Median = 2, Q3 = 3, Maximum = 5. IQR = 2. (Box plot would be drawn as described in step 9).

When finding quartiles, if 'n' is odd, the median (Q2) is included in neither the lower nor upper half for finding Q1 and Q3. If 'n' is even, the data is simply split into two equal halves.

Common mistakes

  • Not ordering data before calculating the median or quartiles.
  • Confusing the formula for population standard deviation with sample standard deviation (though NCCA typically focuses on population unless specified).
  • Incorrectly calculating mid-interval values for grouped data.
  • Errors in using calculator functions for statistics, or not showing manual steps when required.
  • Drawing box plots without a clear, labelled scale or incorrectly plotting the five-number summary points.

Exam tips

  • Read the question carefully to determine if the data is ungrouped or grouped, as different formulas and methods apply.
  • Always show your working step-by-step, even if you use a calculator for the final answer. This is crucial for gaining full marks.
  • Familiarise yourself with your scientific calculator's statistical functions (e.g., 'STAT' mode) to quickly verify your manual calculations for mean and standard deviation.
  • When drawing graphs like box plots, ensure all axes are labelled, a clear scale is used, and all points of the five-number summary are accurately represented.

Ready to practise?

Try a problem on this topic

Snap a photo or type a question — get step-by-step working instantly.