CAPE PENINSULA UNIVERSITY OF TECHNOLOGY STAT151X

Hypothesis Testing - Two Samples

From the statistics 1B curriculum · Updated May 29, 2026

Hypothesis Testing - Two Samples

1. Introduction & Overview

  • The Mental Model: Hypothesis testing for two samples is akin to a forensic comparison, meticulously evaluating whether observed differences between two distinct sets of evidence (data) are genuine and statistically significant, or merely artifacts of random variability, thereby determining if distinct underlying processes are at play.
  • Significance:
    • Medical Research: Comparing efficacy of two drugs (Drug A vs. Drug B), comparing incidence rates of disease in treated vs. control groups.
    • Engineering Quality Control: Assessing if two production lines yield products with significantly different defect rates or tensile strengths.
    • Social Sciences: Determining if two demographic groups (Gender A vs. Gender B, Age Group X vs. Age Group Y) exhibit statistically different mean scores on a psychological construct.
    • Business Analytics: Evaluating if a new marketing strategy (Strategy A) results in significantly higher sales conversion rates than an old one (Strategy B).
    • Environmental Science: Comparing pollutant levels in two different geographical regions or at two different time points.
mindmap
  root((Hypothesis Testing - Two Samples))
    Objectives
      Compare means ("Quantitative Data")
        "Independent Samples"
        "Paired Samples"
          "Known Variance"
          "Unknown Variance (Pooled)"
          "Unknown Variance (Welch's)"
      Compare proportions ("Categorical Data")
        "Independent Samples"
        "Known N, P"
      Compare variances
        "F-test"
    Assumptions
      "Independence"
      "Normality"
      "Homoscedasticity"
      "Random Sampling"
    Test Statistics
      "t-statistic"
      "z-statistic"
      "F-statistic"
    Decision Rule
      "p-value approach"
      "Critical value approach"
    "Type I Error (α)"
    "Type II Error (β)"
    "Power (1-β)"

2. In-Depth Theory, Equations & Mechanisms

Hypothesis testing for two samples primarily involves comparing parameters (means, proportions, variances) from two distinct populations based on sample data. The fundamental principle remains the construction of a null hypothesis ($H_0$), representing no difference, and an alternative hypothesis ($H_1$ or $H_a$), representing a significant difference.

2.1 Comparison of Two Population Means ($\mu_1 - \mu_2$)

2.1.1 Independent Samples, Population Variances Known ($\sigma_1^2, \sigma_2^2$ known)

This scenario, though rare in practice (as known population variances usually imply known means), serves as a foundational theoretical case.
* Assumptions:
1. Samples are drawn independently from two populations.
2. Both populations are normally distributed, or sample sizes ($n_1, n_2$) are sufficiently large ($n_1 \geq 30, n_2 \geq 30$) for the Central Limit Theorem to apply.
3. Population variances $\sigma_1^2$ and $\sigma_2^2$ are known.
* Hypotheses Formulation:
* Two-tailed: $H_0: \mu_1 = \mu_2$ (or $\mu_1 - \mu_2 = 0$) vs. $H_1: \mu_1
eq \mu_2$ (or $\mu_1 - \mu_2
eq 0$)
* One-tailed (Left): $H_0: \mu_1 \geq \mu_2$ (or $\mu_1 - \mu_2 \geq 0$) vs. $H_1: \mu_1 < \mu_2$ (or $\mu_1 - \mu_2 < 0$)
* One-tailed (Right): $H_0: \mu_1 \leq \mu_2$ (or $\mu_1 - \mu_2 \leq 0$) vs. $H_1: \mu_1 > \mu_2$ (or $\mu_1 - \mu_2 > 0$)
* Test Statistic: The $z$-statistic is employed due to known population variances.
$$Z = \frac{(\bar{X}1 - \bar{X}_2) - (\mu_1 - \mu_2){H_0}}{\sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}}$$
Under $H_0: \mu_1 = \mu_2$, the term $(\mu_1 - \mu_2){H_0}$ becomes 0.
$$Z
{calc} = \frac{\bar{X}_1 - \bar{X}_2}{\sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}}$$
* Distribution: Standard Normal Distribution $N(0, 1)$.

2.1.2 Independent Samples, Population Variances Unknown But Assumed Equal ($\sigma_1^2 = \sigma_2^2$)

This is a very common scenario, often justified by prior knowledge or an F-test on sample variances.
* Assumptions:
1. Samples are drawn independently from two populations.
2. Both populations are normally distributed.
3. Population variances are unknown but assumed equal ($\sigma_1^2 = \sigma_2^2 = \sigma^2$).
* Pooled Sample Variance ($S_p^2$): Since we assume equal population variances, we pool the sample variances to get a better estimate of the common population variance.
$$S_p^2 = \frac{(n_1 - 1)S_1^2 + (n_2 - 1)S_2^2}{n_1 + n_2 - 2}$$
where $S_1^2$ and $S_2^2$ are the sample variances.
* Test Statistic: The $t$-statistic is used.
$$t_{calc} = \frac{(\bar{X}1 - \bar{X}_2) - (\mu_1 - \mu_2){H_0}}{\sqrt{S_p^2\left(\frac{1}{n_1} + \frac{1}{n_2}\right)}}$$
Under $H_0: \mu_1 = \mu_2$, the term $(\mu_1 - \mu_2){H_0}$ becomes 0.
$$t
{calc} = \frac{\bar{X}_1 - \bar{X}_2}{\sqrt{\frac{(n_1 - 1)S_1^2 + (n_2 - 1)S_2^2}{n_1 + n_2 - 2}\left(\frac{1}{n_1} + \frac{1}{n_2}\right)}}$$
* Degrees of Freedom (df): $df = n_1 + n_2 - 2$.
* Distribution: Student's $t$-distribution with $n_1 + n_2 - 2$ degrees of freedom.

2.1.3 Independent Samples, Population Variances Unknown and Unequal ($\sigma_1^2

eq \sigma_2^2$) - Welch's t-test
This is the most robust and generally recommended approach when population variances are unknown. It is often referred to as the Welch-Satterthwaite equation for degrees of freedom.
* Assumptions:
1. Samples are drawn independently from two populations.
2. Both populations are normally distributed.
3. Population variances are unknown and not assumed equal.
* Test Statistic: The $t$-statistic is used, similar to the pooled case but without pooling variances.
$$t_{calc} = \frac{(\bar{X}1 - \bar{X}_2) - (\mu_1 - \mu_2){H_0}}{\sqrt{\frac{S_1^2}{n_1} + \frac{S_2^2}{n_2}}}$$
Under $H_0: \mu_1 = \mu_2$, the term $(\mu_1 - \mu_2){H_0}$ becomes 0.
$$t
{calc} = \frac{\bar{X}_1 - \bar{X}_2}{\sqrt{\frac{S_1^2}{n_1} + \frac{S_2^2}{n_2}}}$$
* Degrees of Freedom (df): This is approximated using the Welch-Satterthwaite equation, which typically results in a non-integer value.
$$df = \frac{\left(\frac{S_1^2}{n_1} + \frac{S_2^2}{n_2}\right)^2}{\frac{(S_1^2/n_1)^2}{n_1 - 1} + \frac{(S_2^2/n_2)^2}{n_2 - 1}}$$
This value is usually rounded down to the nearest integer for conservative critical value determination.
* Distribution: Student's $t$-distribution with calculated (approximated) degrees of freedom.

2.1.4 Paired Samples (Dependent Samples)

This scenario occurs when observations in the two samples are naturally linked or matched (e.g., before-and-after measurements on the same subjects, or matched pairs of subjects).
* Assumptions:
1. The pairs are independent.
2. The differences ($D_i = X_{1i} - X_{2i}$) are normally distributed.
* Hypotheses Formulation:
* $H_0: \mu_D = 0$ (Mean difference is zero) vs. $H_1: \mu_D
eq 0$ (Mean difference is not zero).
* Test Statistic: The $t$-statistic is employed for the mean difference.
$$t_{calc} = \frac{\bar{D} - \mu_{D,H_0}}{S_D / \sqrt{n}}$$
where $\bar{D} = \frac{\sum D_i}{n}$ is the mean of the differences, $S_D = \sqrt{\frac{\sum (D_i - \bar{D})^2}{n-1}}$ is the standard deviation of the differences, and $n$ is the number of pairs. Under $H_0: \mu_D = 0$, the term $\mu_{D,H_0}$ becomes 0.
$$t_{calc} = \frac{\bar{D}}{S_D / \sqrt{n}}$$
* Degrees of Freedom (df): $df = n - 1$.
* Distribution: Student's $t$-distribution with $n-1$ degrees of freedom.

2.2 Comparison of Two Population Proportions ($p_1 - p_2$)

This test is used when comparing the success rates or prevalence of an event in two independent categorical datasets.
* Assumptions:
1. Samples are drawn independently from two populations.
2. Both samples are large enough such that $n_1p_1 \geq 5, n_1(1-p_1) \geq 5$, $n_2p_2 \geq 5, n_2(1-p_2) \geq 5$. (Sometimes $n_ip_i \geq 10$ and $n_i(1-p_i) \geq 10$). These conditions ensure the sampling distribution of the sample proportion is approximately normal.
* Hypotheses Formulation:
* $H_0: p_1 = p_2$ (or $p_1 - p_2 = 0$) vs. $H_1: p_1
eq p_2$ (or $p_1 - p_2
eq 0$).
* Pooled Sample Proportion ($\hat{p}$): Under the null hypothesis that $p_1 = p_2 = p$, we pool the sample data to estimate this common proportion.
$$\hat{p} = \frac{x_1 + x_2}{n_1 + n_2}$$
where $x_1$ and $x_2$ are the number of successes in samples 1 and 2, respectively.
* Test Statistic: The $z$-statistic is employed as the sampling distribution of the difference in proportions is approximately normal for large samples.
$$Z_{calc} = \frac{(\hat{p}1 - \hat{p}_2) - (p_1 - p_2){H_0}}{\sqrt{\hat{p}(1-\hat{p})\left(\frac{1}{n_1} + \frac{1}{n_2}\right)}}$$
Under $H_0: p_1 = p_2$, the term $(p_1 - p_2){H_0}$ becomes 0.
$$Z
{calc} = \frac{\hat{p}_1 - \hat{p}_2}{\sqrt{\hat{p}(1-\hat{p})\left(\frac{1}{n_1} + \frac{1}{n_2}\right)}}$$
* Distribution: Standard Normal Distribution $N(0, 1)$.

2.3 Comparison of Two Population Variances ($\sigma_1^2, \sigma_2^2$)

This test is often conducted as a preliminary step before comparing two means, especially to ascertain whether a pooled $t$-test or Welch's $t$-test is appropriate.
* Assumptions:
1. Samples are drawn independently from two populations.
2. Both populations are normally distributed. (This assumption is critical and the F-test is highly sensitive to violations).
* Hypotheses Formulation:
* Two-tailed: $H_0: \sigma_1^2 = \sigma_2^2$ vs. $H_1: \sigma_1^2
eq \sigma_2^2$
* One-tailed: $H_0: \sigma_1^2 \leq \sigma_2^2$ vs. $H_1: \sigma_1^2 > \sigma_2^2$ (or vice-versa)
* Test Statistic: The F-statistic. Conventionally, the larger sample variance is placed in the numerator to ensure $F_{calc} \geq 1$.
$$F_{calc} = \frac{S_1^2}{S_2^2}$$
* Degrees of Freedom (df): $df_1 = n_1 - 1$ (numerator degrees of freedom), $df_2 = n_2 - 1$ (denominator degrees of freedom).
* Distribution: F-distribution with $df_1$ and $df_2$ degrees of freedom.

radar-beta
  title "Comparative Robustness & Sensitivity"
  series
    name "Sensitivity to Normality"
    data [9, 6, 7, 5, 8]
  series
    name "Robustness to Unequal Variances"
    data [5, 9, 10, 6, 6]
  series
    name "Power (typical scenarios)"
    data [7, 8, 9, 9, 7]
  series
    name "Ease of Calculation"
    data [10, 8, 7, 9, 6]
  labels ["Z-test (known σ)", "Pooled t-test (equal σ)", "Welch's t-test (unequal σ)", "Paired t-test", "F-test (for variances)"]

3. Technical Procedures & Applications

3.1 Procedure for Two-Sample Independent t-test (Welch's approach)

This procedure outlines the steps for performing a Welch's t-test, which is generally preferred due to its robustness against the assumption of equal variances.

sequenceDiagram
    participant Analyst
    participant DataCollection
    participant Statistician
    participant DecisionMaker

    Analyst->>DataCollection: 1. Define Research Question (e.g., "Is mean yield of Process A different from Process B?")
    DataCollection->>Analyst: 2. Collect two independent samples (n1, n2)
    Analyst->>Analyst: 3. Calculate sample statistics: mean (X̄1, X̄2), stDev (S1, S2), size (n1, n2) for each sample.
    Analyst->>Statistician: 4. Formulate Hypotheses:
        Note left of Statistician: H0: µ1 = µ2 (No difference)
        Note left of Statistician: H1: µ1 ≠ µ2 (Difference exists)
    Analyst->>Analyst: 5. Set Significance Level (α), typically 0.05.
    Analyst->>Analyst: 6. Calculate Test Statistic (Welch's t):
        Note left of Analyst: $$t_{calc} = (\bar{X}_1 - \bar{X}_2) / \sqrt{(S_1^2/n_1) + (S_2^2/n_2)}$$
    Analyst->>Analyst: 7. Calculate Degrees of Freedom (Welch-Satterthwaite):
        Note left of Analyst: $$df = ( (S_1^2/n_1) + (S_2^2/n_2) )^2 / ( ( (S_1^2/n_1)^2 / (n1-1) ) + ( (S_2^2/n_2)^2 / (n2-1) ) )$$
    Analyst->>Analyst: 8. Determine Critical Value(s) or p-value:
        Note left of Analyst: Using t-distribution table or software with calculated df and α.
    Analyst->>Analyst: 9. Compare Test Statistic to Critical Value OR p-value to α.
        alt If |t_calc| > t_critical (or p-value < α)
            Analyst->>Statistician: 10a. Reject H0
        else If |t_calc| <= t_critical (or p-value >= α)
            Analyst->>Statistician: 10b. Fail to Reject H0
        end
    Statistician->>DecisionMaker: 11. Interpret Results and Draw Conclusion.
        Note right of DecisionMaker: "There is/is no sufficient evidence at α level to conclude a difference in means."
    DecisionMaker->>DecisionMaker: 12. Make Practical Decision Based on Statistical Conclusion.

3.2 Procedure for Two-Sample Z-test for Proportions

This models the technical procedure for comparing two proportions.

sequenceDiagram
    participant Researcher
    participant DataEngineer
    participant StatsModule
    participant ReportGen

    Researcher->>DataEngineer: 1. Define Hypotheses (e.g., H0: p1=p2 vs. H1: p1!=p2)
    DataEngineer->>DataEngineer: 2. Extract counts of successes (x1, x2) and total sample sizes (n1, n2) for two groups.
    DataEngineer->>StatsModule: 3. Calculate sample proportions:
        Note over StatsModule: $$\hat{p}_1 = x_1/n_1$$
        Note over StatsModule: $$\hat{p}_2 = x_2/n_2$$
    StatsModule->>StatsModule: 4. Check large sample conditions:
        Note over StatsModule: $$n_i \hat{p}_i \ge 5, n_i(1-\hat{p}_i) \ge 5$$
        alt If conditions not met
            StatsModule->>ReportGen: Flag: "Sample size insufficient for Z-test. Consider Fisher's Exact Test."
            deactivate StatsModule
            break
        end
    StatsModule->>StatsModule: 5. Calculate pooled proportion under H0:
        Note over StatsModule: $$\hat{p} = (x_1 + x_2) / (n_1 + n_2)$$
    StatsModule->>StatsModule: 6. Calculate Standard Error of the difference:
        Note over StatsModule: $$SE_{\hat{p}_1-\hat{p}_2} = \sqrt{\hat{p}(1-\hat{p})(1/n_1 + 1/n_2)}$$
    StatsModule->>StatsModule: 7. Calculate Test Statistic (Z-score):
        Note over StatsModule: $$Z_{calc} = (\hat{p}_1 - \hat{p}_2) / SE_{\hat{p}_1-\hat{p}_2}$$
    StatsModule->>StatsModule: 8. Determine p-value from Standard Normal Distribution.
    StatsModule->>ReportGen: 9. Output Z-calc, p-value, and confidence interval for (p1-p2).
    ReportGen->>Researcher: 10. Present conclusive report & recommendation.

4. Examiner's Breakdown

4.1 Comparative Analysis

Feature One-Sample Test Two-Sample Independent Test Two-Sample Paired Test
Primary Objective Compare sample parameter to known population parameter/hypothesized value. Compare parameters of two independent populations. Compare parameters from the same or matched subjects under two conditions.
Statistical Units $n$ observations from 1 group $n_1$ observations from Group 1, $n_2$ from Group 2 $n$ pairs of observations (e.g., $X_{before}, X_{after}$)
Relationship between Samples Single sample No direct relationship; random sampling ensures independence. Direct, one-to-one correspondence or repeated measures.
Variability Focus Sample mean vs. population mean Difference in sample means/proportions Variability of the differences within pairs.
Degrees of Freedom (mean) $n-1$ $n_1+n_2-2$ (pooled), complex (Welch's) $n-1$ (where $n$ is number of pairs)
Primary Benefit Simplest baseline comparison Versatile for comparing distinct groups Controls for inter-subject variability, increasing statistical power.
Use Case Example Test if average product weight is 100g. Test if Drug A lowers blood pressure more than Drug B. Test if a new diet reduces weight in the same individuals.
Sensitivity to Assumptions Normality of sample mean (CLT) Normality of populations, equal variances (pooled t). Normality of differences.
Robustness Good with large N (CLT) Welch's t-test: Robust to unequal variances. Generally robust if differences are symmetric.
Test Statistics Z or t Z (proportions, known σ), t (unknown σ) t

4.2 High-Yield Marking Keywords

  1. "Null Hypothesis ($H_0$) and Alternative Hypothesis ($H_1$)": Explicitly stated, using appropriate symbols ($\mu, p, \sigma^2$) and directionality.
  2. "Appropriate Test Statistic": Correct selection from $Z$, $t$, or $F$ based on known/unknown population parameters and sample structure.
  3. "Degrees of Freedom (df)": Correct calculation, specifically for pooled $t$-test ($n_1 + n_2 - 2$), paired $t$-test ($n-1$), or Welch's approximation.
  4. "Critical Value(s) or p-value comparison": Clear statement of comparison mechanics and decision rule.
  5. "Pooled Sample Variance ($S_p^2$)": If applicable, the correctly formulated equation for estimating common variance.
  6. "Independence of Samples": A stated assumption crucial for all non-paired tests.
  7. "Normality or Large Sample Sizes": Justification for using Z/t distributions.
  8. "Conclusion in Context": Interpreting the statistical decision within the problem's real-world implications, avoiding definitive claims of "proof."

4.3 Trapdoor Mistakes

  1. Incorrectly Using Pooled t-test when Variances are Unequal: Students often default to the pooled $t$-test formula ($df = n_1+n_2-2$) without first checking the assumption of equal variances (e.g., via an F-test or by inspection).
    • Correct way: If variances are known/assumed unequal, use Welch's $t$-test with its specific, complex degrees of freedom formula ($df_{Welch}$). If an F-test leads to rejection of $H_0: \sigma_1^2 = \sigma_2^2$, then Welch's test is mandated.
  2. Applying Independent Sample Test to Paired Data: Treating paired observations (e.g., before/after measurements on the same subject) as independent samples. This overlooks the inherent dependency and significantly inflates the standard error, thereby reducing power.
    • Correct way: Formulate the differences ($D_i = X_{1i} - X_{2i}$) and perform a one-sample $t$-test on these differences with $df = n-1$ (where $n$ is the number of pairs). This effectively controls for inter-subject variability.
  3. Misinterpreting "Fail to Reject $H_0$": Concluding that failing to reject the null hypothesis definitively proves the null hypothesis is true.
    • Correct way: "Fail to reject $H_0$" means there is insufficient evidence at the specified significance level to conclude that $H_1$ is true. It does not imply that $H_0$ is proven. Consider the possibility of Type II error or inadequate statistical power.
  4. Ignoring Sample Size Conditions for Z-test on Proportions: Applying the Z-test for two proportions when $np$ or $n(1-p)$ for either sample is less than 5 (or 10, depending on conservative guidelines).
    • Correct way: If these conditions ($n_1\hat{p}_1$, $n_1(1-\hat{p}_1)$, $n_2\hat{p}_2$, $n_2(1-\hat{p}_2) \geq 5$) are not met, the normal approximation to the binomial distribution is invalid. Fisher's Exact Test or other exact methods based on the hypergeometric distribution should be considered.
  5. Incorrectly Placing sample variances in F-test: Placing the smaller sample variance in the numerator for a two-tailed F-test.
    • Correct way: For a two-tailed F-test for variances, always place the larger sample variance in the numerator ($F_{calc} = S_{larger}^2 / S_{smaller}^2$). This ensures $F_{calc} \ge 1$ and allows direct comparison with a single critical value from the upper tail of the F-distribution (using $\alpha/2$). For one-tailed tests, the hypothesized direction dictates the numerator.

Get the full statistics 1B curriculum

Clone the complete plan to your dashboard for unlimited AI-generated notes, practice quizzes, and a personalised revision schedule.

Create Free Account