Strand 1 — Statistics & Probability

Inferential Statistics

5th Year · 6th Year (Leaving Cert)

  • By the end of this lesson students will be able to understand the characteristics and applications of the Normal Distribution.
  • By the end of this lesson students will be able to apply the Empirical Rule to estimate probabilities within a normal distribution.
  • By the end of this lesson students will be able to calculate and interpret z-scores to standardise data.
  • By the end of this lesson students will be able to calculate the margin of error for a sample proportion.
  • By the end of this lesson students will be able to (HL) conduct hypothesis tests for proportions using the p-value approach and (HL) construct and interpret confidence intervals for population proportions.

Key concepts

Normal Distribution

A continuous probability distribution that is symmetric about its mean, bell-shaped, and asymptotic to the x-axis. Many natural phenomena, such as heights, weights, and exam scores, tend to follow this distribution. It is fully defined by its mean (μ) and standard deviation (σ). The total area under the curve is equal to 1.

The Empirical Rule (68-95-99.7 Rule)

For a normal distribution, approximately 68% of the data falls within 1 standard deviation of the mean, 95% falls within 2 standard deviations of the mean, and 99.7% falls within 3 standard deviations of the mean.

z-scores

A z-score (also called a standard score) measures how many standard deviations an individual data point (x) is from the mean (μ) of its distribution. A positive z-score indicates the data point is above the mean, while a negative z-score indicates it is below the mean. Z-scores allow for comparison of data from different normal distributions.

z = (x - μ) / σ
Margin of Error (for proportions)

The margin of error (E) quantifies the maximum expected difference between the true population proportion and the sample proportion. For a 95% confidence level, it is often approximated as 1/√n for a sample of size n. This approximation is commonly used at Ordinary Level or for quick estimates. For Higher Level, the margin of error is more precisely calculated as z * √(p̂(1-p̂)/n) within the context of confidence intervals.

E ≈ 1/√n (for 95% confidence, as an approximation)
Hypothesis Testing (HL)

Hypothesis testing is a statistical method used to determine if there is enough evidence in a sample data to infer that a certain condition is true for the entire population. It involves setting up a null hypothesis (H₀) and an alternative hypothesis (H₁), calculating a test statistic (e.g., a z-score for proportions), finding a p-value, and comparing it to a significance level (α) to make a decision. The null hypothesis represents the status quo or no effect, while the alternative hypothesis represents what we are trying to find evidence for.

z = (p̂ - p) / √(p(1-p)/n)
Confidence Intervals for Proportions (HL)

A confidence interval for a population proportion is a range of values, derived from sample statistics, that is likely to contain the true population proportion with a certain level of confidence (e.g., 95% or 99%). It provides an estimated range of plausible values for the unknown population parameter.

p̂ ± z * √(p̂(1-p̂)/n)

Key facts to remember

  • 1The Normal Distribution is bell-shaped, symmetric, and defined by its mean (μ) and standard deviation (σ).
  • 2The Empirical Rule (68-95-99.7 Rule) describes the percentage of data within 1, 2, and 3 standard deviations of the mean in a normal distribution.
  • 3A z-score standardises a data point by indicating how many standard deviations it is from the mean: z = (x - μ) / σ.
  • 4The approximate margin of error for a 95% confidence level is E ≈ 1/√n.
  • 5(HL) Hypothesis testing involves setting up H₀ and H₁, calculating a test statistic, finding a p-value, and comparing it to a significance level (α).
  • 6(HL) The formula for a confidence interval for a proportion is p̂ ± z * √(p̂(1-p̂)/n), where z is the critical z-value for the desired confidence level.
  • 7(HL) If the p-value is less than the significance level (α), reject H₀. Otherwise, fail to reject H₀.
  • 8(HL) The critical z-value for a 95% confidence interval is 1.96.

Worked examples

Example 1

The scores on a Leaving Certificate maths exam are normally distributed with a mean of 70 marks and a standard deviation of 8 marks. What percentage of students scored between 62 and 86 marks?

IIdentify the mean (μ) and standard deviation (σ): μ = 70, σ = 8.
IICalculate the z-score for x = 62:
IIIz₁ = (62 - 70) / 8 = -8 / 8 = -1.
IVCalculate the z-score for x = 86:
Vz₂ = (86 - 70) / 8 = 16 / 8 = 2.
VIInterpret the z-scores using the Empirical Rule:
VIIA score of 62 is 1 standard deviation below the mean.
VIIIA score of 86 is 2 standard deviations above the mean.
9According to the Empirical Rule:
1068% of data is within ±1σ of the mean.
1195% of data is within ±2σ of the mean.
12The percentage of data between -1σ and +2σ can be found by combining parts of the rule:
13Percentage between -1σ and μ = 68% / 2 = 34%.
14Percentage between μ and +2σ = 95% / 2 = 47.5%.
15Total percentage = 34% + 47.5% = 81.5%.

Answer

81.5% of students scored between 62 and 86 marks.

A sketch of the normal curve with the mean and standard deviations marked can help visualise this problem.

Example 2

In a random sample of 400 voters, 220 stated they would vote for Candidate A. a) Calculate the sample proportion, p̂. b) Calculate the margin of error for this sample proportion using the approximation 1/√n. c) (HL) Construct a 95% confidence interval for the true population proportion of voters who would vote for Candidate A.

Ia) Calculate the sample proportion (p̂):
IIp̂ = Number of successes / Sample size = 220 / 400 = 0.55.
IIIb) Calculate the margin of error (E) using the approximation 1/√n:
IVE = 1 / √n = 1 / √400 = 1 / 20 = 0.05.
Vc) (HL) Construct a 95% confidence interval for the population proportion:
VIThe formula for a 95% confidence interval is p̂ ± z * √(p̂(1-p̂)/n).
VIIFor a 95% confidence level, the critical z-value is 1.96.
VIIISubstitute the values: p̂ = 0.55, n = 400, z = 1.96.
9Standard error = √(p̂(1-p̂)/n) = √(0.55 * (1 - 0.55) / 400) = √(0.55 * 0.45 / 400) = √(0.2475 / 400) = √0.00061875 ≈ 0.02487.
10Margin of error for CI = 1.96 * 0.02487 ≈ 0.04874.
11Confidence Interval = 0.55 ± 0.04874.
12Lower bound = 0.55 - 0.04874 = 0.50126.
13Upper bound = 0.55 + 0.04874 = 0.59874.

Answer

a) The sample proportion, p̂, is 0.55. b) The margin of error using 1/√n is 0.05. c) The 95% confidence interval for the true population proportion is [0.501, 0.599] (rounded to 3 decimal places).

Note the difference between the approximate margin of error (1/√n) and the more precise margin of error used in the confidence interval calculation (z * √(p̂(1-p̂)/n)).

Example 3

(HL) A company claims that 70% of its customers are satisfied with its service. A consumer group surveys 150 customers and finds that 95 of them are satisfied. Test the company's claim at the 5% level of significance.

I1. State the null and alternative hypotheses:
IIH₀: p = 0.70 (The true proportion of satisfied customers is 70%).
IIIH₁: p ≠ 0.70 (The true proportion of satisfied customers is not 70%). This is a two-tailed test.
IV2. Identify the significance level (α): α = 0.05.
V3. Calculate the sample proportion (p̂):
VIp̂ = Number of satisfied customers / Sample size = 95 / 150 ≈ 0.6333.
VII4. Calculate the test statistic (z-score):
VIIIUnder H₀, we assume p = 0.70. So, we use p=0.70 for the standard error calculation.
9z = (p̂ - p) / √(p(1-p)/n)
10z = (0.6333 - 0.70) / √(0.70 * (1 - 0.70) / 150)
11z = (-0.0667) / √(0.70 * 0.30 / 150)
12z = (-0.0667) / √(0.21 / 150)
13z = (-0.0667) / √0.0014
14z = (-0.0667) / 0.037416 ≈ -1.782.
155. Find the p-value:
16Since this is a two-tailed test, we need to find P(Z < -1.782) + P(Z > 1.782).
17Using a z-table or calculator, P(Z < -1.782) ≈ 0.0374.
18p-value = 2 * 0.0374 = 0.0748.
196. Make a decision:
20Compare the p-value to the significance level: p-value (0.0748) > α (0.05).
21Since the p-value is greater than α, we fail to reject the null hypothesis.

Answer

Based on the sample data, with a p-value of 0.0748, which is greater than the significance level of 0.05, we fail to reject the null hypothesis. There is not sufficient evidence to conclude that the true proportion of satisfied customers is different from 70%.

Remember to always state your conclusion in the context of the original problem.

Common mistakes

  • Confusing the standard deviation (σ) with the variance (σ²).
  • Incorrectly applying the Empirical Rule, for example, assuming 95% of data is *exactly* between μ-2σ and μ+2σ without considering the tails.
  • Using the approximate margin of error (1/√n) when a question specifically requires a more precise calculation for a confidence interval (HL).
  • (HL) Incorrectly stating the null (H₀) and alternative (H₁) hypotheses, especially for two-tailed vs. one-tailed tests.
  • (HL) Misinterpreting the p-value or making the wrong conclusion in hypothesis testing (e.g., rejecting H₀ when p-value > α).
  • (HL) Using p̂ (sample proportion) in the standard error calculation for hypothesis testing instead of p (population proportion from H₀), or vice-versa for confidence intervals.

Exam tips

  • Always draw a sketch of the normal distribution curve for problems involving z-scores or the Empirical Rule to help visualise the areas.
  • Clearly state your null (H₀) and alternative (H₁) hypotheses at the beginning of any hypothesis testing problem (HL).
  • Show all steps in your calculations, including formulas, substitutions, and intermediate results, especially for z-scores, standard errors, and confidence intervals.
  • Remember to interpret your final answer in the context of the original problem, explaining what your statistical findings mean.
  • Pay close attention to the significance level (α) given in hypothesis testing questions, as it dictates your decision rule (HL).

Ready to practise?

Try a problem on this topic

Snap a photo or type a question — get step-by-step working instantly.