CAPE PENINSULA UNIVERSITY OF TECHNOLOGY STAT151X

Probability Distributions

From the statistics 1B curriculum · Updated May 29, 2026

Probability Distributions

1. Introduction & Overview

  • The Mental Model: Probability distributions function as the genomic blueprints of random variables, encoding all possible outcomes and their respective likelihoods, thereby defining the statistical characterology of a stochastic process.
  • Significance:
    • Inferential Statistics: Foundation for hypothesis testing and confidence interval estimation in varied disciplines such as biostatistics, econometrics, and quality control.
    • Risk Management: Quantifying financial risk (e.g., Value at Risk via Extreme Value Distributions) and actuarial science.
    • Machine Learning: Bayesian inference, generative models (e.g., Gaussian Mixture Models), and regularization techniques.
    • Engineering & Physics: Modeling noise, system reliability, and quantum phenomena (e.g., Bose-Einstein or Fermi-Dirac distributions).
    • Operations Research: Stochastic optimization and queuing theory (e.g., Erlang distribution).
mindmap
    root((Probability Distributions))
        Discrete Distributions
            Bernoulli(p)
                "P(X=k) = p^k (1-p)^(1-k)"
            Binomial(n, p)
                "P(X=k) = C(n,k) p^k (1-p)^(n-k)"
            Poisson(λ)
                "P(X=k) = (e^(-λ) λ^k) / k!"
            Geometric(p)
                "P(X=k) = (1-p)^(k-1) p"
            Hypergeometric(N, K, n)
                "P(X=k) = (C(K,k) C(N-K, n-k)) / C(N,n)"
        Continuous Distributions
            Uniform(a, b)
                "f(x) = 1/(b-a)"
            Normal(μ, σ^2)
                "f(x) = (1/(σ√(2π))) e^(-(x-μ)^2 / (2σ^2))"
            Exponential(λ)
                "f(x) = λ e^(-λx)"
            Gamma(α, β)
                "f(x) = (β^α / Γ(α)) x^(α-1) e^(-βx)"
            Beta(α, β)
                "f(x) = (x^(α-1) (1-x)^(β-1)) / B(α,β)"
            Chi-squared(k)
                "f(x) = (1 / (2^(k/2) Γ(k/2))) x^((k/2)-1) e^(-x/2)"
            Student's T(ν)
                "f(t) = Γ((ν+1)/2) / (√(νπ) Γ(ν/2)) (1 + t^2/ν)^(- (ν+1)/2)"
        Key Concepts
            Probability Mass Function (PMF)
            Probability Density Function (PDF)
            Cumulative Distribution Function (CDF)
            Expected Value (Mean)
            Variance
            Moment Generating Function (MGF)
            Characteristic Function (CF)
            Quantile Function
            Parameters
            Support

2. In-Depth Theory, Equations & Mechanisms

Probability distributions are formal mathematical descriptions of the likelihood of different outcomes for a random variable. They are classified into discrete and continuous types based on the nature of their random variable's support.

2.1 Discrete Probability Distributions

A discrete random variable $X$ has a countable number of possible outcomes. Its probability distribution is defined by a Probability Mass Function (PMF), $P(X=x)$, where $x$ is a specific outcome.

2.1.1 Bernoulli Distribution

  • Notation: $X \sim \text{Bernoulli}(p)$
  • Context: Models the outcome of a single trial with two possible results: "success" (value 1) or "failure" (value 0).
  • Parameters: $p$, the probability of success, where $p \in [0, 1]$.
  • Support: $x \in {0, 1}$.
  • PMF:
    $P(X=x) = p^x (1-p)^{1-x}$ for $x \in {0, 1}$
  • Expected Value (Mean): $E[X] = p$
  • Variance: $\text{Var}(X) = p(1-p)$
  • Moment Generating Function (MGF): $M_X(t) = (1-p) + pe^t$ for all real $t$.

2.1.2 Binomial Distribution

  • Notation: $X \sim \text{Binomial}(n, p)$
  • Context: Models the number of successes in a fixed number of $n$ independent Bernoulli trials, each with probability of success $p$.
  • Parameters: $n$, number of trials ($n \in \mathbb{N}^+$); $p$, probability of success ($p \in [0, 1]$).
  • Support: $x \in {0, 1, \dots, n}$.
  • PMF:
    $P(X=x) = \binom{n}{x} p^x (1-p)^{n-x}$ for $x \in {0, 1, \dots, n}$
    where $\binom{n}{x} = \frac{n!}{x!(n-x)!}$ is the binomial coefficient.
  • Expected Value (Mean): $E[X] = np$
  • Variance: $\text{Var}(X) = np(1-p)$
  • MGF: $M_X(t) = ((1-p) + pe^t)^n$ for all real $t$.

2.1.3 Poisson Distribution

  • Notation: $X \sim \text{Poisson}(\lambda)$
  • Context: Models the number of events occurring in a fixed interval of time or space, given these events occur with a known constant mean rate $\lambda$ and independently of the time since the last event. Often used as an approximation to the Binomial distribution when $n$ is large and $p$ is small, such that $np = \lambda$.
  • Parameters: $\lambda$, the average rate of occurrence ($\lambda > 0$).
  • Support: $x \in {0, 1, 2, \dots}$.
  • PMF:
    $P(X=x) = \frac{e^{-\lambda} \lambda^x}{x!}$ for $x \in {0, 1, 2, \dots}$
  • Expected Value (Mean): $E[X] = \lambda$
  • Variance: $\text{Var}(X) = \lambda$
  • MGF: $M_X(t) = e^{\lambda(e^t - 1)}$ for all real $t$.

2.1.4 Geometric Distribution

  • Notation: $X \sim \text{Geometric}(p)$
  • Context: Models the number of Bernoulli trials required to achieve the first success.
  • Parameters: $p$, probability of success ($p \in (0, 1]$).
  • Support: $x \in {1, 2, 3, \dots}$. (Note: Some definitions start from $x=0$, representing the number of failures before the first success. Here, we use the "number of trials" definition.)
  • PMF:
    $P(X=x) = (1-p)^{x-1} p$ for $x \in {1, 2, 3, \dots}$
  • Expected Value (Mean): $E[X] = 1/p$
  • Variance: $\text{Var}(X) = (1-p)/p^2$
  • MGF: $M_X(t) = \frac{pe^t}{1-(1-p)e^t}$ for $t < -\ln(1-p)$.

2.2 Continuous Probability Distributions

A continuous random variable $X$ has an uncountable number of possible outcomes. Its probability distribution is defined by a Probability Density Function (PDF), $f(x)$, satisfying $f(x) \ge 0$ for all $x$ and $\int_{-\infty}^{\infty} f(x) dx = 1$. The probability of $X$ falling within an interval $[a, b]$ is $\int_a^b f(x) dx$. $P(X=x) = 0$ for any specific $x$.

2.2.1 Uniform Distribution

  • Notation: $X \sim \text{Uniform}(a, b)$
  • Context: Models situations where all outcomes between two specified bounds, $a$ and $b$, are equally likely.
  • Parameters: $a$, lower bound; $b$, upper bound ($a < b$).
  • Support: $x \in [a, b]$.
  • PDF:
    $f(x) = \begin{cases} \frac{1}{b-a} & \text{for } a \le x \le b \ 0 & \text{otherwise} \end{cases}$
  • Expected Value (Mean): $E[X] = \frac{a+b}{2}$
  • Variance: $\text{Var}(X) = \frac{(b-a)^2}{12}$
  • MGF: $M_X(t) = \frac{e^{tb} - e^{ta}}{t(b-a)}$ for $t
    e 0$, and $M_X(0) = 1$.

2.2.2 Normal (Gaussian) Distribution

  • Notation: $X \sim N(\mu, \sigma^2)$
  • Context: The most ubiquitous distribution in natural and social sciences, crucial due to the Central Limit Theorem. Many natural phenomena, measurement errors, and sampling distributions approximate normality.
  • Parameters: $\mu$, mean; $\sigma^2$, variance ($\sigma > 0$).
  • Support: $x \in (-\infty, \infty)$.
  • PDF:
    $f(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2}$ for $x \in (-\infty, \infty)$
  • Expected Value (Mean): $E[X] = \mu$
  • Variance: $\text{Var}(X) = \sigma^2$
  • MGF: $M_X(t) = e^{\mu t + \frac{1}{2}\sigma^2 t^2}$ for all real $t$.
  • Standard Normal Distribution: A special case where $\mu=0$ and $\sigma^2=1$, denoted $Z \sim N(0, 1)$. Its PDF is $\phi(z) = \frac{1}{\sqrt{2\pi}} e^{-z^2/2}$ and its CDF is $\Phi(z)$. Any normal variable $X$ can be standardized by $Z = (X-\mu)/\sigma$.

2.2.3 Exponential Distribution

  • Notation: $X \sim \text{Exp}(\lambda)$
  • Context: Models the time until an event occurs in a Poisson process, i.e., the time between two successive events. It exhibits the property of "memorylessness."
  • Parameters: $\lambda$, rate parameter ($\lambda > 0$). Equivalent to $1/\beta$ where $\beta$ is the scale parameter for some texts.
  • Support: $x \in [0, \infty)$.
  • PDF:
    $f(x) = \lambda e^{-\lambda x}$ for $x \ge 0$, and $0$ otherwise.
  • Expected Value (Mean): $E[X] = 1/\lambda$
  • Variance: $\text{Var}(X) = 1/\lambda^2$
  • MGF: $M_X(t) = \frac{\lambda}{\lambda-t}$ for $t < \lambda$.
  • Memorylessness Property: $P(X > s+t \mid X > s) = P(X > t)$ for $s, t \ge 0$.

2.2.4 Gamma Distribution

  • Notation: $X \sim \text{Gamma}(\alpha, \beta)$
  • Context: A flexible distribution modeling waiting times. When $\alpha=1$, it reduces to the Exponential distribution. When $\alpha$ is a positive integer, it models the sum of $\alpha$ independent Exponential random variables.
  • Parameters: $\alpha$, shape parameter ($\alpha > 0$); $\beta$, rate parameter ($\beta > 0$). (Some texts use $\theta = 1/\beta$ as a scale parameter).
  • Support: $x \in [0, \infty)$.
  • PDF:
    $f(x) = \frac{\beta^\alpha}{\Gamma(\alpha)} x^{\alpha-1} e^{-\beta x}$ for $x \ge 0$, and $0$ otherwise.
    where $\Gamma(\alpha) = \int_0^\infty t^{\alpha-1} e^{-t} dt$ is the Gamma function. For integer $\alpha$, $\Gamma(\alpha) = (\alpha-1)!$.
  • Expected Value (Mean): $E[X] = \alpha/\beta$
  • Variance: $\text{Var}(X) = \alpha/\beta^2$
  • MGF: $M_X(t) = \left(\frac{\beta}{\beta-t}\right)^\alpha$ for $t < \beta$.

2.2.5 Beta Distribution

  • Notation: $X \sim \text{Beta}(\alpha, \beta)$
  • Context: Models probabilities, proportions, or values bounded between 0 and 1. Highly flexible shape based on parameters. Often used as a prior distribution in Bayesian statistics.
  • Parameters: $\alpha > 0$, $\beta > 0$.
  • Support: $x \in [0, 1]$.
  • PDF:
    $f(x) = \frac{x^{\alpha-1} (1-x)^{\beta-1}}{B(\alpha, \beta)}$ for $0 \le x \le 1$, and $0$ otherwise.
    where $B(\alpha, \beta) = \frac{\Gamma(\alpha)\Gamma(\beta)}{\Gamma(\alpha+\beta)}$ is the Beta function.
  • Expected Value (Mean): $E[X] = \frac{\alpha}{\alpha+\beta}$
  • Variance: $\text{Var}(X) = \frac{\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+1)}$
  • MGF: No simple closed-form MGF exists.

2.2.6 Key Distribution Properties & Relationships

radar-beta
    title "Comparative Properties of Distributions"
    series
        "Normal (μ=0, σ=1)"
        "Exponential (λ=1)"
        "Uniform (a=0, b=1)"
        "Poisson (λ=1)"
    data
        Skewness = 0, 2, 0, 1
        Kurtosis = 3, 9, 1.8, 4
        "Memoryless P" = 0, 1, 0, 0
        "Finite Support" = 0, 0, 1, 0
        "Countable Outcomes" = 0, 0, 0, 1
        "Approximates Binom" = 1, 0, 0, 1
  • Central Limit Theorem (CLT): States that the sampling distribution of the mean of a large number of independent, identically distributed random variables approaches a normal distribution, regardless of the shape of the original population distribution. This is fundamental for parametric inference.
    If $X_1, X_2, \dots, X_n$ are i.i.d. with $E[X_i] = \mu$ and $\text{Var}(X_i) = \sigma^2$, then as $n \to \infty$,
    $\sqrt{n}(\bar{X}_n - \mu) \xrightarrow{D} N(0, \sigma^2)$ or $\bar{X}_n \approx N(\mu, \sigma^2/n)$.
  • Relationships:
    • Bernoulli is a special case of Binomial (n=1).
    • Exponential is a special case of Gamma ($\alpha=1$).
    • Chi-squared distribution ($X \sim \chi^2_k$) is a special case of Gamma distribution where $\alpha = k/2$ and $\beta = 1/2$. It describes the sum of squares of $k$ independent standard normal random variables.
    • Student's t-distribution ($X \sim t_
      u$) arises when estimating the mean of a normally distributed population when the sample size is small and the population standard deviation is unknown. It approaches the standard normal distribution as $
      u \to \infty$.
    • F-distribution arises in ANOVA when comparing variances of two normally distributed populations.
    • The Poisson distribution can be derived as the limit of the Binomial distribution under specific conditions ($n \to \infty, p \to 0, np \to \lambda$).

2.3 Cumulative Distribution Function (CDF)

  • Definition: For any random variable $X$, the CDF, denoted $F_X(x)$, gives the probability that $X$ will take a value less than or equal to $x$.
  • Formula:
    • Discrete: $F_X(x) = P(X \le x) = \sum_{t \le x} P(X=t)$
    • Continuous: $F_X(x) = P(X \le x) = \int_{-\infty}^{x} f(t) dt$
  • Properties:
    • $0 \le F_X(x) \le 1$ for all $x$.
    • $F_X(x)$ is non-decreasing: if $x_1 < x_2$, then $F_X(x_1) \le F_X(x_2)$.
    • $\lim_{x \to -\infty} F_X(x) = 0$.
    • $\lim_{x \to \infty} F_X(x) = 1$.
    • For continuous distributions, $f(x) = \frac{d}{dx} F_X(x)$.
    • $P(a < X \le b) = F_X(b) - F_X(a)$.
stateDiagram-v2
    direction LR
    type Discrete_State_Space { "Countable Outcomes" }
    type Continuous_State_Space { "Uncountable Outcomes" }

    "Random Variable" --> "Discrete RV" : Finite/Countable range
    "Random Variable" --> "Continuous RV" : Infinite/Uncountable range

    "Discrete RV" --> "PMF P(X=x)": "Probability Mass Function (P_x)"
    "Continuous RV" --> "PDF f(x)dx": "Probability Density Function (f_x)"

    "PMF P(X=x)" --> "CDF F(x) = ΣP(t)": "Calculated via summation"
    "PDF f(x)dx" --> "CDF F(x) = ∫f(t)dt": "Calculated via integration"

    "CDF F(x) = ΣP(t)" --> "Quantile Function Q(p)"
    "CDF F(x) = ∫f(t)dt" --> "Quantile Function Q(p)"

    subgraph Characteristic Properties
        "PMF P(X=x)" -- "P(X=x)>0"
        "PMF P(X=x)" -- "Σ P(X=x) = 1"
        "PDF f(x)dx" -- "f(x) >= 0"
        "PDF f(x)dx" -- "∫ f(x)dx = 1"
        "CDF F(x) = ΣP(t)" -- "Monotonically Non-Decreasing"
        "CDF F(x) = ∫f(t)dt" -- "Monotonically Non-Decreasing"
        "CDF F(x) = ΣP(t)" -- "Right-Continuous"
        "CDF F(x) = ∫f(t)dt" -- "Continuous"
    end

3. Technical Procedures & Applications

3.1 Parameter Estimation for a Normal Distribution (Maximum Likelihood Estimation)

The procedure to estimate the mean ($\mu$) and variance ($\sigma^2$) of a normal distribution from a sample dataset $X_1, X_2, \dots, X_n$ using Maximum Likelihood Estimation (MLE). This is a standard procedure in statistical modeling.

sequenceDiagram
    participant Data_Acquisition as "Data Set (x1, ..., xn)"
    participant Analyst as "Statistical Analyst"
    participant PDF as "Normal PDF f(x|μ, σ²)"
    participant Likelihood as "Likelihood Function L(μ, σ²|x)"
    participant LogLikelihood as "Log-Likelihood ln(L)"
    participant Optimization as "Optimization Engine"
    participant MLE_Estimates as "MLE Estimates (μ̂, σ̂²)"

    Data_Acquisition->Analyst: "Provide sample data xi"
    Analyst->PDF: "Identify parametric model: Normal Distribution"
    PDF-->Analyst: "Retrieve PDF: (1/(σ√(2π))) e^(-(xi-μ)² / (2σ²))"

    Analyst->Likelihood: "Formulate Likelihood Function"
    Likelihood->Analyst: "L(μ, σ²|x) = Π f(xi|μ, σ²)"
    Analyst->LogLikelihood: "Take natural logarithm for simplification"
    LogLikelihood->Analyst: "ln(L) = Σ [-0.5 ln(2π) - ln(σ) - (xi-μ)² / (2σ²)]"

    Analyst->Optimization: "Compute Partial Derivatives w.r.t. μ and σ²"
    Optimization->Analyst: "d(lnL)/dμ = Σ (xi-μ)/σ²"
    Optimization->Analyst: "d(lnL)/d(σ²) = Σ [(xi-μ)²/(2σ⁴) - 1/(2σ²)]"

    Analyst->Optimization: "Set derivatives to zero and solve for μ̂, σ̂²"
    Optimization->MLE_Estimates: "∂lnL/∂μ = 0  => μ̂ = (1/n) Σ xi"
    Optimization->MLE_Estimates: "∂lnL/∂(σ²) = 0 => σ̂² = (1/n) Σ (xi - μ̂)²"

    MLE_Estimates->Analyst: "Provide Maximum Likelihood Estimators"
    Analyst->Analyst: "Interpret and utilize μ̂, σ̂²"

Detailed Step-by-Step Procedure:

  1. Define the Probability Density Function (PDF): For a normal distribution, the PDF for a single observation $x_i$ is:
    $f(x_i; \mu, \sigma^2) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(x_i - \mu)^2}{2\sigma^2}\right)$

  2. Formulate the Likelihood Function ($L$): Assuming $n$ independent observations $x_1, x_2, \dots, x_n$, the likelihood function is the product of the individual PDFs:
    $L(\mu, \sigma^2 | x_1, \dots, x_n) = \prod_{i=1}^n f(x_i; \mu, \sigma^2) = \prod_{i=1}^n \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(x_i - \mu)^2}{2\sigma^2}\right)$
    $L(\mu, \sigma^2 | \mathbf{x}) = (2\pi\sigma^2)^{-n/2} \exp\left(-\frac{1}{2\sigma^2} \sum_{i=1}^n (x_i - \mu)^2\right)$

  3. Transform to Log-Likelihood Function ($\ln L$): To simplify differentiation, we take the natural logarithm of the likelihood function. Maximum likelihood estimates are invariant under monotonic transformations.
    $\ln L(\mu, \sigma^2 | \mathbf{x}) = \ln\left((2\pi\sigma^2)^{-n/2} \exp\left(-\frac{1}{2\sigma^2} \sum_{i=1}^n (x_i - \mu)^2\right)\right)$
    $\ln L(\mu, \sigma^2 | \mathbf{x}) = -\frac{n}{2}\ln(2\pi) - \frac{n}{2}\ln(\sigma^2) - \frac{1}{2\sigma^2} \sum_{i=1}^n (x_i - \mu)^2$

  4. Compute Partial Derivatives:

    • With respect to $\mu$:
      $\frac{\partial \ln L}{\partial \mu} = \frac{\partial}{\partial \mu} \left( -\frac{1}{2\sigma^2} \sum_{i=1}^n (x_i - \mu)^2 \right)$
      $\frac{\partial \ln L}{\partial \mu} = -\frac{1}{2\sigma^2} \sum_{i=1}^n 2(x_i - \mu)(-1) = \frac{1}{\sigma^2} \sum_{i=1}^n (x_i - \mu)$

    • With respect to $\sigma^2$ (treating it as a single parameter):
      $\frac{\partial \ln L}{\partial (\sigma^2)} = \frac{\partial}{\partial (\sigma^2)} \left( -\frac{n}{2}\ln(\sigma^2) - \frac{1}{2\sigma^2} \sum_{i=1}^n (x_i - \mu)^2 \right)$
      $\frac{\partial \ln L}{\partial (\sigma^2)} = -\frac{n}{2\sigma^2} + \frac{1}{2(\sigma^2)^2} \sum_{i=1}^n (x_i - \mu)^2$

  5. Set Derivatives to Zero and Solve:

    • For $\mu$:
      $\frac{1}{\sigma^2} \sum_{i=1}^n (x_i - \mu) = 0$
      $\sum_{i=1}^n x_i - \sum_{i=1}^n \mu = 0$
      $\sum_{i=1}^n x_i - n\mu = 0$
      $n\mu = \sum_{i=1}^n x_i$
      $\hat{\mu}{\text{MLE}} = \frac{1}{n}\sum{i=1}^n x_i = \bar{X}$ (the sample mean)

    • For $\sigma^2$ (substituting $\hat{\mu}$):
      $-\frac{n}{2\sigma^2} + \frac{1}{2(\sigma^2)^2} \sum_{i=1}^n (x_i - \hat{\mu})^2 = 0$
      Multiply by $2(\sigma^2)^2$:
      $-n\sigma^2 + \sum_{i=1}^n (x_i - \hat{\mu})^2 = 0$
      $n\sigma^2 = \sum_{i=1}^n (x_i - \hat{\mu})^2$
      $\hat{\sigma}^2_{\text{MLE}} = \frac{1}{n}\sum_{i=1}^n (x_i - \hat{\mu})^2$ (the biased sample variance)

    Note: The MLE of $\sigma^2$ is a biased estimator. The unbiased estimator used in frequentist statistics is $s^2 = \frac{1}{n-1}\sum_{i=1}^n (x_i - \bar{X})^2$. However, $\hat{\sigma}^2_{\text{MLE}}$ is asymptotically unbiased and consistent.

4. Examiner's Breakdown

4.1 Comparative Analysis

Feature Discrete Probability Distributions Continuous Probability Distributions
Output Type Countable outcomes Uncountable outcomes
Quantification Probability Mass Function (PMF), $P(X=x)$ Probability Density Function (PDF), $f(x)$
P(X=x) for specific x $P(X=x) > 0$ for allowed $x$ $P(X=x) = 0$ for any specific $x$
Sum/Integral $\sum_x P(X=x) = 1$ $\int_{-\infty}^{\infty} f(x) dx = 1$
Probability over range $\sum_{a \le x \le b} P(X=x)$ $\int_a^b f(x) dx$
CDF behavior Step function Continuous function
Key examples Bernoulli, Binomial, Poisson, Geometric Uniform, Normal, Exponential, Gamma
Derivative of CDF Not directly applicable (step function) $f(x) = d/dx F(x)$ if CDF is differentiable
Common use cases Counting events, number of successes Measuring quantities, time, rates
Moment Generating Function (MGF) $M_X(t) = E[e^{tX}] = \sum_x e^{tx} P(X=x)$ $M_X(t) = E[e^{tX}] = \int_{-\infty}^{\infty} e^{tx} f(x) dx$

4.2 High-Yield Marking Keywords

  1. Probability Mass Function (PMF): Used exclusively for discrete random variables, listing probabilities for point outcomes.
  2. Probability Density Function (PDF): Used exclusively for continuous random variables, where probability is determined by integrating over an interval.
  3. Cumulative Distribution Function (CDF): $F_X(x) = P(X \le x)$, universally applicable and monotonically non-decreasing.
  4. Memoryless Property: Characteristic of Exponential and Geometric distributions, meaning the future duration is independent of past duration.
  5. Central Limit Theorem (CLT): The sampling distribution of means or sums of independent, identically distributed random variables approaches a normal distribution as sample size $n \to \infty$.
  6. Moment Generating Function (MGF): $E[e^{tX}]$, uniquely defines a distribution and simplifies finding moments.
  7. Parameters (e.g., $\mu, \sigma^2, \lambda, n, p, \alpha, \beta$): Specific constants defining the shape and location of a distribution.
  8. Support (of a distribution): The set of all possible values a random variable can take with non-zero probability.

4.3 Trapdoor Mistakes

  1. Confusing PMF and PDF: Students often apply $\int f(x) dx = 1$ to discrete distributions or state $P(X=x) > 0$ for continuous distributions.
    • Correct Answer: Discrete: Sum PMF to 1, $P(X=x)$ are point probabilities. Continuous: Integrate PDF to 1, $P(X=x)=0$, probability is over intervals.
  2. Incorrectly Applying Normal Approximation: Assuming normality for binomial or Poisson without adequate conditions ($np \ge 5$ and $n(1-p) \ge 5$ for binomial; $\lambda \ge 10$ for Poisson) or without continuity correction.
    • Correct Answer: State and verify conditions for approximation. Apply continuity correction (e.g., $P(X \le k) \approx P(X_{Normal} \le k+0.5)$ for discrete to continuous conversion).
  3. Misinterpreting MGF properties: Confusing $M_X(t)$ directly with moments or failing to realize its uniqueness property.
    • Correct Answer: State that $E[X^k] = M_X^{(k)}(0)$, the $k$-th derivative of MGF evaluated at $t=0$. Explicitly state MGFs uniquely determine the distribution.
  4. Ignoring Support Conditions: Calculating probabilities or moments outside the defined support of a distribution (e.g., negative values for Exponential, values > 1 for Beta).
    • Correct Answer: Always explicitly state the distribution's support and ensure all calculations are bounded within it. For probabilities outside, they are 0.

Get the full statistics 1B curriculum

Clone the complete plan to your dashboard for unlimited AI-generated notes, practice quizzes, and a personalised revision schedule.

Create Free Account