Statistics

Hypothesis Testing

Year 12 · Year 13

  • By the end of this lesson students will be able to formulate null and alternative hypotheses for a given problem.
  • By the end of this lesson students will be able to perform hypothesis tests for binomial distributions, including calculating critical regions and p-values.
  • By the end of this lesson students will be able to perform hypothesis tests for a population mean using the Normal distribution.
  • By the end of this lesson students will be able to perform hypothesis tests for Pearson's product-moment correlation coefficient.
  • By the end of this lesson students will be able to interpret the results of hypothesis tests in context and understand Type I and Type II errors.

Key concepts

Introduction to Hypothesis Testing

Hypothesis testing is a statistical method used to make decisions about a population parameter based on sample data. It involves setting up two competing hypotheses: the null hypothesis (H₀) and the alternative hypothesis (H₁). The null hypothesis represents a statement of no effect or no change, while the alternative hypothesis represents what we are trying to find evidence for. We calculate a test statistic from our sample data and compare it to a critical value or use it to find a p-value. The significance level (α) is the probability of incorrectly rejecting the null hypothesis when it is true (Type I error). A one-tailed test is used when the alternative hypothesis specifies a direction (e.g., greater than or less than), while a two-tailed test is used when the alternative hypothesis specifies a difference in either direction (e.g., not equal to). The critical region is the range of values for the test statistic that would lead to the rejection of H₀. The p-value is the probability of observing a test statistic as extreme as, or more extreme than, the one calculated, assuming H₀ is true. If p-value < α, we reject H₀.

Type I and Type II Errors

A Type I error occurs when the null hypothesis is rejected when it is actually true. The probability of a Type I error is equal to the significance level, α. A Type II error occurs when the null hypothesis is not rejected when it is actually false. The probability of a Type II error is denoted by β. We aim to minimise both types of errors, but reducing one often increases the other.

Hypothesis Testing for Binomial Distribution

This test is used when we have a fixed number of trials (n), each with two possible outcomes (success/failure), and a constant probability of success (p) under the null hypothesis. We test a hypothesis about the population proportion (p). The test statistic is the number of successes (X) observed in the sample. We calculate the probability of observing a result as extreme as, or more extreme than, the sample result, assuming H₀ is true, using the binomial probability distribution. This probability is the p-value. If the p-value is less than the significance level, we reject H₀.

X ~ B(n, p)
Hypothesis Testing for a Population Mean (Normal Distribution)

This test is used to determine if a sample mean (x̄) is significantly different from a hypothesised population mean (μ₀) when the population standard deviation (σ) is known, or when the sample size is large enough (n > 30) for the Central Limit Theorem to apply, allowing the sample mean to be approximated by a Normal distribution. The test statistic (z) measures how many standard errors the sample mean is away from the hypothesised population mean. We compare this z-value to critical values from the standard Normal distribution or use it to find a p-value.

z = (x̄ - μ₀) / (σ / √n)
Hypothesis Testing for Pearson's Product-Moment Correlation Coefficient (PMCC)

This test is used to determine if there is a statistically significant linear relationship between two variables in a population, based on a sample's PMCC (r). The null hypothesis (H₀) typically states that there is no linear correlation in the population (ρ = 0), where ρ is the population PMCC. The alternative hypothesis (H₁) can be that there is a positive correlation (ρ > 0), a negative correlation (ρ < 0), or simply a non-zero correlation (ρ ≠ 0). We compare the calculated sample PMCC (r) to critical values found in statistical tables, which depend on the sample size (n) and the significance level (α). If the absolute value of r exceeds the critical value, we reject H₀.

Key facts to remember

  • 1The null hypothesis (H₀) is a statement of no effect or no difference, while the alternative hypothesis (H₁) is what we are trying to find evidence for.
  • 2The significance level (α) is the maximum probability of making a Type I error (rejecting H₀ when it is true).
  • 3A one-tailed test is used when H₁ specifies a direction (e.g., p > 0.5), and a two-tailed test when H₁ specifies a difference (e.g., p ≠ 0.5).
  • 4The p-value is the probability of observing a test statistic as extreme as, or more extreme than, the one calculated, assuming H₀ is true.
  • 5If p-value < α, reject H₀. If p-value ≥ α, do not reject H₀.
  • 6For binomial tests, use the Binomial distribution B(n, p) and calculate cumulative probabilities.
  • 7For Normal mean tests, use the z-test statistic z = (x̄ - μ₀) / (σ / √n) when σ is known or n is large.
  • 8For correlation tests, compare the sample PMCC (r) to critical values from tables, based on sample size (n) and significance level (α).

Worked examples

Example 1

A manufacturer claims that 20% of their new light bulbs are faulty. A quality control manager tests a random sample of 50 light bulbs and finds 15 faulty ones. Test, at the 5% significance level, whether there is evidence that the proportion of faulty light bulbs is higher than the manufacturer's claim.

I1. Define the random variable and state the hypotheses:
IILet X be the number of faulty light bulbs in a sample of 50. Under the manufacturer's claim, X ~ B(50, 0.20).
IIIH₀: p = 0.20 (The proportion of faulty light bulbs is 20%)
IVH₁: p > 0.20 (The proportion of faulty light bulbs is higher than 20%)
VSignificance level α = 0.05. This is a one-tailed test.
VI2. Calculate the p-value:
VIIThe observed number of faulty light bulbs is 15. We need to find P(X ≥ 15) given X ~ B(50, 0.20).
VIIIP(X ≥ 15) = 1 - P(X ≤ 14)
9Using a calculator or tables for B(50, 0.20):
10P(X ≤ 14) ≈ 0.9639
11P(X ≥ 15) = 1 - 0.9639 = 0.0361
123. Compare the p-value with the significance level:
13p-value = 0.0361
14α = 0.05
15Since 0.0361 < 0.05, we reject H₀.
164. State the conclusion in context:
17There is sufficient evidence at the 5% significance level to suggest that the proportion of faulty light bulbs is higher than the manufacturer's claim.

Answer

Reject H₀. There is evidence that the proportion of faulty light bulbs is higher than the manufacturer's claim.

For binomial tests, always use cumulative probabilities (P(X ≤ x) or P(X ≥ x)) to find the p-value or critical region.

Example 2

A company claims that the mean weight of their cereal boxes is 500g. A random sample of 40 boxes has a mean weight of 497g. The population standard deviation is known to be 8g. Test, at the 1% significance level, whether there is evidence that the mean weight of the cereal boxes is not 500g.

I1. State the hypotheses:
IIH₀: μ = 500 (The mean weight is 500g)
IIIH₁: μ ≠ 500 (The mean weight is not 500g)
IVSignificance level α = 0.01. This is a two-tailed test.
V2. Calculate the test statistic (z-value):
VIGiven: n = 40, x̄ = 497, μ₀ = 500, σ = 8.
VIIThe test statistic is z = (x̄ - μ₀) / (σ / √n)
VIIIz = (497 - 500) / (8 / √40)
9z = -3 / (8 / 6.3245...)
10z = -3 / 1.2649...
11z ≈ -2.371
123. Determine the critical values:
13For a two-tailed test at α = 0.01, we split the significance level into two tails: 0.01 / 2 = 0.005 for each tail.
14From standard Normal distribution tables, the critical z-values are ±z(0.005).
15P(Z > z) = 0.005 implies z ≈ 2.5758.
16So, the critical values are -2.5758 and 2.5758.
174. Compare the test statistic with the critical values:
18Test statistic z = -2.371
19Critical region: z < -2.5758 or z > 2.5758
20Since -2.5758 < -2.371 < 2.5758, the test statistic does not fall into the critical region. We do not reject H₀.
215. State the conclusion in context:
22There is insufficient evidence at the 1% significance level to suggest that the mean weight of the cereal boxes is not 500g.

Answer

Do not reject H₀. There is insufficient evidence that the mean weight is not 500g.

Remember to divide the significance level by 2 for two-tailed tests when finding critical values or p-values from tables.

Example 3

A researcher wants to investigate if there is a positive linear correlation between hours of study (x) and exam scores (y) for a particular subject. A random sample of 12 students yields a Pearson's product-moment correlation coefficient (r) of 0.55. Test, at the 5% significance level, whether there is evidence of a positive linear correlation.

I1. State the hypotheses:
IIH₀: ρ = 0 (There is no linear correlation between hours of study and exam scores)
IIIH₁: ρ > 0 (There is a positive linear correlation between hours of study and exam scores)
IVSignificance level α = 0.05. This is a one-tailed test.
V2. Identify the sample size and observed PMCC:
VISample size n = 12.
VIIObserved sample PMCC r = 0.55.
VIII3. Find the critical value:
9Using statistical tables for Pearson's product-moment correlation coefficient, for n = 12 and a one-tailed test at α = 0.05, the critical value is 0.4973.
104. Compare the observed PMCC with the critical value:
11Observed r = 0.55
12Critical value = 0.4973
13Since 0.55 > 0.4973, the observed PMCC is greater than the critical value. We reject H₀.
145. State the conclusion in context:
15There is sufficient evidence at the 5% significance level to suggest a positive linear correlation between hours of study and exam scores.

Answer

Reject H₀. There is evidence of a positive linear correlation.

Ensure you use the correct critical value from the tables, considering whether it's a one-tailed or two-tailed test and the correct significance level and sample size.

Common mistakes

  • Confusing Type I and Type II errors, or not understanding their implications.
  • Incorrectly formulating the null and alternative hypotheses, especially for one-tailed vs two-tailed tests.
  • Using the wrong distribution or formula for the test statistic (e.g., using Normal approximation for Binomial when n is too small, or using incorrect standard error).
  • Failing to adjust the significance level for two-tailed tests when finding critical values or p-values.
  • Not stating the conclusion in context of the original problem, or making definitive statements when only 'evidence' is found.

Exam tips

  • Always clearly state H₀, H₁, and the significance level at the start of your solution.
  • Draw a sketch of the distribution (Normal or Binomial) to visualise the critical region and test statistic, especially for two-tailed tests.
  • Show all steps of your calculations, including the formula used, substitution of values, and the final test statistic or p-value.
  • Ensure your conclusion is directly related to the problem's context and uses appropriate statistical language (e.g., 'sufficient evidence', 'insufficient evidence').

Ready to practise?

Try a problem on this topic

Snap a photo or type a question — get step-by-step working instantly.