CAPE PENINSULA UNIVERSITY OF TECHNOLOGY STAT151X

Chi-Square Tests and Non-Parametric Methods

From the statistics 1B curriculum · Updated May 29, 2026

Chi-Square Tests and Non-Parametric Methods

1. Introduction & Overview

  • The Mental Model: Chi-square tests quantify the discrepancy between observed frequencies and those expected under a null hypothesis, while non-parametric methods provide robust inferential conclusions without stringent distributional assumptions, particularly concerning population parameters like means or variances.
  • Significance:
    • Categorical Data Analysis: Essential for analyzing qualitative data, market research, epidemiological studies, and social sciences where observations fall into discrete categories.
    • Distributional Assumptions: Crucial when data deviate significantly from normality, when sample sizes are small, or when working with ordinal scales, preventing Type I or Type II errors inherent in parametric violations.
    • Robustness: Offers statistical inference when population distributions are unknown or highly skewed, enhancing external validity in fields like psychometrics or medical trials.
    • Hypothesis Testing Versatility: Applicable across a wide range of hypothesis testing scenarios, from association between variables to comparing medians or distributions.
mindmap
    root((Chi-Square Tests & Non-Parametric Methods))
        Chi-Square Tests
            "Goodness-of-Fit Test (GOF)"
                "Univariate Categorical Data"
                "Compares Observed vs. Expected Frequencies"
                "Hypothesis: H0: Data fits specified distribution"
                "Formula: χ² = Σ [(Oi - Ei)² / Ei]"
            "Test of Independence"
                "Bivariate Categorical Data"
                "Compares Observed vs. Expected Frequencies in Contingency Tables"
                "Hypothesis: H0: Variables are independent"
                "Formula: χ² = Σ Σ [(Oij - Eij)² / Eij]"
            "Test of Homogeneity"
                "Compares Distribution across Groups"
                "Similar to Independence, but one variable is fixed"
                "Hypothesis: H0: Distributions are homogeneous"
            "Assumptions (Chi-Square)"
                "Expected Frequencies > 5 (for >80% cells)"
                "Independence of Observations"
                "Random Sampling"
                "Nominal or Ordinal Data"
        "Non-Parametric Methods"
            "Alternatives to Parametric Tests"
                "Robustness to Outliers"
                "No Distributional Assumptions (e.g., Normality)"
                "Primary Use: Ordinal Data, Small Samples, Skewed Data"
            "One-Sample Tests"
                "Sign Test (Median)"
                "Wilcoxon Signed-Rank Test (Median, Symmetric Distribution)"
            "Two-Sample Tests"
                "Mann-Whitney U Test (Independent Samples, Medians/Distributions)"
                "Wilcoxon Rank-Sum Test (Equivalent to Mann-Whitney)"
                "Kolmogorov-Smirnov Test (Distributional Equivalence)"
            "K-Sample Tests (ANOVA Alternatives)"
                "Kruskal-Wallis H Test (Independent Samples, Medians)"
                "Friedman Test (Related Samples, Medians)"
            "Correlation"
                "Spearman's Rank Correlation (Monotonic Association)"
                "Kendall's Tau (Monotonic Association)"
            "Assumptions (General Non-Parametric)"
                "Independence of Observations (most tests)"
                "Random Sampling"
                "Underlying continuity (for rank-based tests)"
                "Symmetry (for some median tests like Wilcoxon Signed-Rank)"

2. In-Depth Theory, Equations & Mechanisms

2.1 Chi-Square Goodness-of-Fit Test ($\chi^2$-GOF)

The $\chi^2$-GOF test evaluates whether observed frequencies ($O_i$) from a single categorical variable match expected frequencies ($E_i$) derived from a hypothesized population distribution.
* Null Hypothesis ($H_0$): The observed frequency distribution does not differ significantly from the hypothesized frequency distribution.
* Alternative Hypothesis ($H_1$): The observed frequency distribution differs significantly from the hypothesized frequency distribution.
* Test Statistic Formula:
$$ \chi^2 = \sum_{i=1}^{k} \frac{(O_i - E_i)^2}{E_i} $$
Where:
* $O_i$: Observed frequency in category $i$.
* $E_i$: Expected frequency in category $i$.
* $k$: Number of categories.
* Degrees of Freedom (df): $df = k - 1 - p$, where $p$ is the number of parameters estimated from the sample to determine the expected frequencies (e.g., mean and standard deviation for a normal distribution, but often $p=0$ for specified probabilities).
* Expected Frequency Calculation: $E_i = n \times P_i$, where $n$ is the total sample size and $P_i$ is the hypothesized proportion for category $i$.
* Assumptions:
1. Categorical Data: The variable under examination is categorical (nominal or ordinal).
2. Independence: Each observation is independent of others.
3. Random Sampling: The data are obtained from a random sample.
4. Expected Frequencies: No more than 20% of expected frequencies should be less than 5, and no expected frequency should be less than 1. If this condition is violated, categories may need to be combined.

2.2 Chi-Square Test of Independence

This test assesses whether there is a statistically significant association between two categorical variables in a contingency table.
* Null Hypothesis ($H_0$): The two categorical variables are independent.
* Alternative Hypothesis ($H_1$): The two categorical variables are dependent (associated).
* Test Statistic Formula:
$$ \chi^2 = \sum_{i=1}^{r} \sum_{j=1}^{c} \frac{(O_{ij} - E_{ij})^2}{E_{ij}} $$
Where:
* $O_{ij}$: Observed frequency in row $i$ and column $j$.
* $E_{ij}$: Expected frequency in row $i$ and column $j$.
* $r$: Number of rows.
* $c$: Number of columns.
* Degrees of Freedom (df): $df = (r - 1)(c - 1)$.
* Expected Frequency Calculation:
$$ E_{ij} = \frac{(\text{Row i Total}) \times (\text{Column j Total})}{\text{Grand Total}} $$
* Assumptions: Same as $\chi^2$-GOF test, specifically the expected frequency criterion ($E_{ij} \ge 5$ for >80% cells, $E_{ij} \ge 1$ for all cells).

2.3 Chi-Square Test of Homogeneity

This test is conceptually and mathematically identical to the test of independence but addresses a different research question. It assesses whether the distribution of a single categorical variable is the same across different populations or groups.
* Null Hypothesis ($H_0$): The proportions of a categorical variable are the same across different populations (i.e., the distributions are homogeneous).
* Alternative Hypothesis ($H_1$): The proportions are not the same across different populations.
* Test Statistic and Degrees of Freedom: Identical to the test of independence.
* Expected Frequency Calculation: Identical to the test of independence.
* Assumptions: Same as $\chi^2$-GOF test. The key distinction from independence is the sampling scheme: homogeneity samples from each population separately, while independence samples once and categorizes.

2.4 Non-Parametric Methods

2.4.1 Sign Test

A simple non-parametric test used for matched pairs or one-sample location problems, focusing on the direction (sign) of differences rather than their magnitude.
* Hypothesis: Tests hypotheses about the population median ($M$).
* $H_0: M = M_0$
* $H_1: M
e M_0$ (or one-sided)
* Mechanism: For matched pairs, calculate $D_i = X_{1i} - X_{2i}$. Count the number of positive signs ($N^+$), negative signs ($N^-$), and zeroes ($N^0$). Exclude zeroes.
* Test Statistic: $S = \min(N^+, N^-)$. Under $H_0$, $N^+$ (or $N^-$) follows a binomial distribution $B(n', 0.5)$, where $n'$ is the number of non-zero differences. For large $n' (>20)$, approximation to normal distribution can be used:
$$ Z = \frac{(S \pm 0.5) - (n'/2)}{\sqrt{n'/4}} $$
* Assumptions: Independent observations within pairs, and $D_i$ are independent.

2.4.2 Wilcoxon Signed-Rank Test

A more powerful alternative to the sign test for matched pairs or one-sample, sensitive to both direction and magnitude of differences. Assumes symmetric distribution of differences around the median.
* Hypothesis: Tests hypotheses about the population median ($M$) of differences.
* $H_0: M_{D} = 0$
* $H_1: M_{D}
e 0$ (or one-sided)
* Mechanism:
1. Calculate differences $d_i = X_{1i} - X_{2i}$ (or $X_i - M_0$).
2. Exclude $d_i = 0$.
3. Rank the absolute differences $|d_i|$. Assign average ranks for ties.
4. Assign the original sign back to the ranks.
5. Calculate $T^+ = \sum (\text{positive ranks})$ and $T^- = \sum (\text{negative ranks})$.
* Test Statistic: $W = \min(T^+, T^-)$. For large $n' (>20)$, $W$ is approximately normally distributed:
$$ Z = \frac{W - [n'(n'+1)/4]}{\sqrt{n'(n'+1)(2n'+1)/24}} $$
* Assumptions:
1. Matched pairs (or one sample compared to a constant).
2. Differences are independent.
3. The distribution of differences is symmetric about the median.

2.4.3 Mann-Whitney U Test (or Wilcoxon Rank-Sum Test)

A non-parametric alternative to the independent samples t-test, comparing two independent groups.
* Hypothesis:
* $H_0: P(X_1 > X_2) = 0.5$ (or more commonly, that the two population distributions are identical, or that their medians are equal if distributions are assumed to have similar shapes).
* $H_1: P(X_1 > X_2)
e 0.5$ (or distributions are not identical, or medians are unequal).
* Mechanism (M-W U):
1. Combine data from both groups and rank all observations from smallest to largest. Average ranks for ties.
2. Calculate the sum of ranks for each group ($R_1, R_2$).
3. Calculate $U_1$ and $U_2$:
$$ U_1 = n_1 n_2 + \frac{n_1(n_1+1)}{2} - R_1 $$
$$ U_2 = n_1 n_2 + \frac{n_2(n_2+1)}{2} - R_2 $$
4. The test statistic is $U = \min(U_1, U_2)$.
* Mechanism (WRS): The test statistic is simply the sum of ranks of the smaller sized group (or specific group if $H_1$ is one-sided).
* Test Statistic Distribution: For small samples ($n_1, n_2 < 20$), use exact tables. For larger samples ($n_1, n_2 \ge 20$), $U$ (or $R$) is approximately normally distributed:
$$ Z = \frac{U - (n_1 n_2 / 2)}{\sqrt{n_1 n_2 (n_1 + n_2 + 1) / 12}} $$
* Assumptions:
1. Two independent random samples.
2. Measurements are at least ordinal.
3. The underlying distributions are continuous (though often applied to discrete data).

2.4.4 Kruskal-Wallis H Test

A non-parametric alternative to one-way ANOVA for comparing three or more independent groups.
* Hypothesis:
* $H_0:$ The $k$ population distributions are identical (or population medians are equal assuming identical shapes).
* $H_1:$ At least one population distribution stochastically dominates another (or at least one population median is different).
* Mechanism:
1. Combine all data from $k$ groups and rank them from smallest to largest across all $N$ observations. Average ranks for ties.
2. Calculate the sum of ranks for each group $R_j$.
* Test Statistic:
$$ H = \left[ \frac{12}{N(N+1)} \sum_{j=1}^{k} \frac{R_j^2}{n_j} \right] - 3(N+1) $$
Where:
* $N$: Total number of observations ($N = \sum n_j$).
* $n_j$: Sample size of group $j$.
* $R_j$: Sum of ranks for group $j$.
* Degrees of Freedom: $df = k - 1$.
* Test Statistic Distribution: For sufficient sample sizes ($n_j \ge 5$ for all groups), $H$ approximately follows a $\chi^2$ distribution with $k-1$ degrees of freedom.
* Assumptions:
1. $k$ independent random samples.
2. Measurements are at least ordinal.
3. The underlying distributions are continuous.

2.4.5 Friedman Test

A non-parametric alternative to repeated measures ANOVA for comparing three or more related samples (e.g., multiple treatments on the same subjects).
* Hypothesis:
* $H_0:$ The $k$ population distributions are identical (or population medians are equal) across the $k$ treatments.
* $H_1:$ At least one population distribution stochastically dominates another.
* Mechanism:
1. For each block/subject, rank the $k$ observations from smallest to largest. Average ranks for ties within each block.
2. Calculate the sum of ranks for each treatment $R_j$.
* Test Statistic:
$$ F_r = \left[ \frac{12}{Nk(k+1)} \sum_{j=1}^{k} R_j^2 \right] - 3N(k+1) $$
Where:
* $N$: Number of blocks/subjects.
* $k$: Number of treatments/conditions.
* $R_j$: Sum of ranks for treatment $j$.
* Degrees of Freedom: $df = k - 1$.
* Test Statistic Distribution: For large $N$ or $k$, $F_r$ approximately follows a $\chi^2$ distribution with $k-1$ degrees of freedom.
* Assumptions:
1. Data consists of $N$ blocks, with $k$ observations per block.
2. Measurements are at least ordinal.
3. Observations within each block are dependent, but blocks are independent.

stateDiagram-v2
    state "Parametric Test Decision" as ParamDecision
    state "Non-Parametric Test Decision" as NonParamDecision
    state "Chi-Square Test Decision" as ChiSqDecision

    [*] --> ParamDecision: Start
    ParamDecision --> "Assumptions Met?"
    "Assumptions Met?" --> "Yes" : "Normal Distribution, Interval/Ratio, Homoscedasticity"
    "Assumptions Met?" --> "No": "Violations (e.g., Skewed, Ordinal, Small N)"

    "Yes" --> "Parametric Tests (t-test, ANOVA etc.)"

    "No" --> NonParamDecision
    NonParamDecision --> "Categorical Data?"
    "Categorical Data?" --> "Yes" : "Nominal/Ordinal, Frequencies"
    "Categorical Data?" --> "No" : "Ordinal/Continuous but non-normal"

    "Yes" --> ChiSqDecision
    ChiSqDecision --> "1 Variable (GOF) or 2+ Variables (Independence/Homogeneity)"
    "1 Variable (GOF) or 2+ Variables (Independence/Homogeneity)" --> "Chi-Square Tests"

    "No" --> "Non-Parametric Rank-Based Tests"
    "Non-Parametric Rank-Based Tests" --> "1 Sample: Sign/Wilcoxon Signed-Rank"
    "Non-Parametric Rank-Based Tests" --> "2 Independent Samples: Mann-Whitney U"
    "Non-Parametric Rank-Based Tests" --> "k Independent Samples: Kruskal-Wallis H"
    "Non-Parametric Rank-Based Tests" --> "k Related Samples: Friedman Test"
    "Non-Parametric Rank-Based Tests" --> "Correlation: Spearman/Kendall"

    "Chi-Square Tests" --> [*]
    "Parametric Tests (t-test, ANOVA etc.)" --> [*]
    "Non-Parametric Rank-Based Tests" --> [*]

3. Technical Procedures & Applications

3.1 Procedure for Chi-Square Test of Independence on Pharmaceutical Side Effect Data

Scenario: A clinical trial investigates the association between a new drug (Drug A vs. Placebo) and the incidence of a specific side effect (Headache vs. No Headache).
Data: A contingency table is constructed with observed frequencies.

Headache No Headache Total
Drug A $O_{11}=45$ $O_{12}=155$ $N_1=200$
Placebo $O_{21}=15$ $O_{22}=185$ $N_2=200$
Total $C_1=60$ $C_2=340$ $N=400$
sequenceDiagram
    participant Investigator as "Clinical Investigator"
    participant Statistician as "Biostatistician"
    participant Software as "Statistical Software (e.g., R, SPSS)"
    participant Journal as "Peer-Reviewed Journal"

    Investigator->Statistician: 1. Formulate Research Question
    Note over Statistician: Is there an association between drug type and headache incidence?
    Statistician->Statistician: 2. State Hypotheses (Duguid, 1989)
        Statistician: H0: "Drug type is independent of headache incidence."
        Statistician: H1: "Drug type is dependent on headache incidence."
    Investigator->Statistician: 3. Provide Observed Frequencies (Oij)
        Note over Investigator: Observed: Drug A (Headache=45, No Headache=155), Placebo (Headache=15, No Headache=185)
    Statistician->Statistician: 4. Calculate Expected Frequencies (Eij)
        loop For each cell (i, j)
            Statistician: Eij = (Row i Total * Col j Total) / Grand Total
            Statistician: E11 = (200 * 60) / 400 = 30
            Statistician: E12 = (200 * 340) / 400 = 170
            Statistician: E21 = (200 * 60) / 400 = 30
            Statistician: E22 = (200 * 340) / 400 = 170
        end
    Statistician->Statistician: 5. Calculate Chi-Square Test Statistic
        Statistician: χ² = ΣΣ [(Oij - Eij)² / Eij]
        Statistician: χ² = (45-30)²/30 + (155-170)²/170 + (15-30)²/30 + (185-170)²/170
        Statistician: χ² = 7.5 + 1.32 + 7.5 + 1.32 = 17.64
    Statistician->Statistician: 6. Determine Degrees of Freedom (df)
        Statistician: df = (rows - 1)(cols - 1) = (2-1)(2-1) = 1
    Statistician->Software: 7. Obtain p-value from Chi-Square Distribution
        Note over Software: Using χ² = 17.64, df = 1
        Software-->Statistician: p-value < 0.001
    Statistician->Statistician: 8. Compare p-value to Significance Level (α)
        Note over Statistician: α = 0.05
        Statistician: p-value (0.001) < α (0.05)
    Statistician->Investigator: 9. Make Decision & Interpretation
        Statistician: Reject H0. Conclude significant association between drug type and headache presence (χ²=17.64, df=1, p<0.001).
        Note over Investigator: "Drug A increases headache incidence relative to placebo."
    Investigator->Journal: 10. Report Findings meticulously.

3.2 Procedure for Mann-Whitney U Test on Pain Scores

Scenario: Two different analgesics (A and B) are administered to independent groups of patients, and their pain relief is measured on an ordinal scale (0-10, lower is better).
Data:
* Group A: [4, 5, 2, 6, 7] ($n_A = 5$)
* Group B: [1, 3, 0, 2, 4] ($n_B = 5$)

  1. State Hypotheses:
    • $H_0$: The two population distributions of pain relief scores are identical.
    • $H_1$: The two population distributions of pain relief scores are not identical (or median pain relief differs).
  2. Combine and Rank Data:
    | Score | Group | Rank |
    |-------|-------|------|
    | 0 | B | 1 |
    | 1 | B | 2 |
    | 2 | A | 3.5 | (Tied: 2 in A, 2 in B; ranks 3, 4 $\rightarrow$ avg 3.5 each)
    | 2 | B | 3.5 |
    | 3 | B | 5 |
    | 4 | A | 6.5 | (Tied: 4 in A, 4 in B; ranks 6, 7 $\rightarrow$ avg 6.5 each)
    | 4 | B | 6.5 |
    | 5 | A | 8 |
    | 6 | A | 9 |
    | 7 | A | 10 |
  3. Calculate Sum of Ranks for each Group:
    • $R_A = 3.5 + 6.5 + 8 + 9 + 10 = 37$
    • $R_B = 1 + 2 + 3.5 + 5 + 6.5 = 18$
    • Check: $R_A + R_B = 37 + 18 = 55$. Total possible sum of ranks $= N(N+1)/2 = 10(11)/2 = 55$. Correct.
  4. Calculate Mann-Whitney U Statistics:
    • $U_A = n_A n_B + \frac{n_A(n_A+1)}{2} - R_A = (5 \times 5) + \frac{5(6)}{2} - 37 = 25 + 15 - 37 = 3$
    • $U_B = n_A n_B + \frac{n_B(n_B+1)}{2} - R_B = (5 \times 5) + \frac{5(6)}{2} - 18 = 25 + 15 - 18 = 22$
  5. Determine Test Statistic: $U = \min(U_A, U_B) = 3$.
  6. Determine Critical Value / p-value: For $n_A=5, n_B=5$, and a two-tailed test at $\alpha=0.05$, consult a Mann-Whitney U table. The critical value for $U$ is typically 2. Since $3 > 2$, we fail to reject the null hypothesis. (Alternatively, a $p$-value for $U=3$ is approx 0.088, which is $>0.05$).
  7. Conclusion: Fail to reject $H_0$. There is insufficient evidence to conclude a significant difference in pain relief distributions between Analgesic A and Analgesic B (U=3, p > 0.05).

4. Examiner's Breakdown

4.1 Comparative Analysis

Feature Chi-Square Tests (GOF, Independence, Homogeneity) Non-Parametric Rank-Based Tests (e.g., M-W U, Kruskal-Wallis)
Data Type Nominal or Ordinal (primarily counts/frequencies) Ordinal, or Interval/Ratio data violating parametric assumptions (rank-transformed)
Primary Goal Analyze associations between categorical variables, assess fit to a distribution, compare proportions. Compare locations (medians) or distributions of groups, assess monotonic correlation.
Hypothesis Focus Frequencies, Proportions, Independence Medians, Distributions, Stochastic Dominance
Assumptions 1. Independence of observations.
2. Expected cell frequencies $E_i \ge 5$ (for >80% cells, $E_i \ge 1$ for all).
1. Independence of observations (between groups for independent tests, within blocks for related tests).
2. Data are at least ordinal.
3. Underlying continuity (for rank assignment).
4. Symmetry of differences (for Wilcoxon Signed-Rank).
Statistical Power Potentially lower than interval-level analyses if data permits Generally less powerful than equivalent parametric tests if parametric assumptions are met, but more robust if assumptions are violated.
Interpretation Presence/absence of association, goodness-of-fit. Effect size via Cramer's V or Phi. Difference in central tendency (medians) or overall distribution shape/location.
Sensitivity Sensitive to cell sizes, particularly small expected frequencies. Robust to outliers and departures from normality. Sensitive to ties in ranking.

4.2 High-Yield Marking Keywords

  1. "Expected Frequencies $\ge 5$ (for >80% cells and $\ge 1$ for all cells)" – Crucial assumption for Chi-Square validity.
  2. "Degrees of Freedom (df) = $(r-1)(c-1)$" – Exact formula for Chi-Square Independence/Homogeneity.
  3. "Ranks the absolute differences" – Exact step in Wilcoxon Signed-Rank calculation.
  4. "Sum of ranks for each group" – Core component in Mann-Whitney U and Kruskal-Wallis calculations.
  5. "Non-parametric tests do not assume particular distribution shapes" – Fundamental distinction and justification.
  6. "Robust to outliers" – Key advantage of non-parametric methods.
  7. "Population Medians" – The parameter often compared in non-parametric tests, as opposed to means.
  8. "Monotonic relationship" – Explicitly describes the type of association assessed by Spearman's/Kendall's rho/tau.

4.3 Trapdoor Mistakes

  1. Incorrectly applying Chi-Square tests to raw data instead of frequencies: Students often attempt to put individual data points directly into the $\chi^2$ formula instead of first summarizing them into observed frequency counts.
    • Correct Answer: First, data must be aggregated into a contingency table with observed counts for each category combination. The $\chi^2$ formula operates on these counts, not individual values.
  2. Violating the Expected Frequency Assumption for Chi-Square: Neglecting to check $E_i \ge 5$ (or >80% of cells) and $E_i \ge 1$ (all cells). This leads to inflated $\chi^2$ values and Type I errors.
    • Correct Answer: Always calculate expected frequencies. If the assumption is violated, consider combining categories (if logically sound) or using Fisher's Exact Test for $2 \times 2$ tables, particularly when cell counts are small.
  3. Using parametric tests (e.g., t-test, ANOVA) when data are clearly ordinal or highly skewed, or when sample sizes are extremely small: This leads to invalid inferences due to assumption violations.
    • Correct Answer: For such data characteristics, specify the appropriate non-parametric alternative (e.g., Mann-Whitney U for two independent groups instead of independent samples t-test). Justify this choice by citing the distributional assumptions violated.
  4. Misinterpreting the null hypothesis of non-parametric rank tests as comparing means: While rank tests compare distributions, if the distributions are assumed to have similar shapes, then a significant result can be interpreted as a difference in medians. However, the default $H_0$ is that the distributions are identical.
    • Correct Answer: State the null hypothesis for rank-based tests precisely as "The distributions are identical" (e.g., for Mann-Whitney U, Kruskal-Wallis, Friedman). If assuming similar shapes, then state "The medians are equal" and acknowledge the shape assumption for this interpretation. Avoid any mention of "means" unless the test explicitly allows for it under specific conditions (which is rare for introductory non-parametric).

Get the full statistics 1B curriculum

Clone the complete plan to your dashboard for unlimited AI-generated notes, practice quizzes, and a personalised revision schedule.

Create Free Account