Introduction to Confidence Intervals
From the STATISTIK PERNIAGAAN curriculum
Introduction to Confidence Intervals
TL;DR
Confidence intervals give us a range of values that likely contains the true population parameter we're trying to estimate. It's a way to express the uncertainty around our sample's estimate, instead of just a single number. We use a specific confidence level, like 95%, to quantify how "sure" we are that the interval captures the true value.
1. The Mental Model
Imagine trying to guess the average number of hours all students in your university sleep each night by asking only 50 students. You'll get an average from those 50, but it's probably not the exact average for all students. A confidence interval is like drawing a net around your sample's average, giving you a range where you're pretty sure the true university-wide average actually lies.
2. The Core Material
When we take a sample from a larger population, we calculate things like the sample mean ($\bar{x}$) or sample proportion ($\hat{p}$). These are our point estimates for the true population mean ($\mu$) or population proportion ($p$). However, a single point estimate rarely hits the true population parameter exactly. That's where confidence intervals come in.
A confidence interval (CI) provides an estimated range of values which is likely to include an unknown population parameter. It's usually expressed with a confidence level, for example, a "95% confidence interval."
Understanding Confidence Level

Photo by Max Fischer on Pexels
The confidence level (e.g., 90%, 95%, 99%) tells you the probability that if you were to take many, many samples and construct an interval for each, a certain percentage of those intervals would contain the true population parameter. It does not mean there's a 95% chance the specific interval you calculated contains the population parameter. Once an interval is calculated, the parameter is either in it or it isn't; the probability refers to the method, not the single interval.
Components of a Confidence Interval

Photo by Tanha Tamanna Syed on Pexels
The general formula for a confidence interval is:
$$
\text{Point Estimate} \pm (\text{Critical Value} \times \text{Standard Error})
$$
Let's break down these components:
- Point Estimate: This is the single value calculated from your sample data (e.g., sample mean $\bar{x}$, sample proportion $\hat{p}$).
- Critical Value: This value depends on your chosen confidence level and the sampling distribution. For large samples or when the population standard deviation is known, we often use z-scores from the standard normal distribution. For smaller samples (and unknown population standard deviation), we use t-scores.
- Standard Error: This is the standard deviation of the sampling distribution of your point estimate. It measures how much your sample statistic is expected to vary from sample to sample. For a mean, it's typically $\frac{\sigma}{\sqrt{n}}$ (if $\sigma$ is known) or $\frac{s}{\sqrt{n}}$ (if $\sigma$ is unknown, using sample standard deviation $s$).
Constructing a Confidence Interval for a Population Mean

Photo by Jerson Martins on Pexels
The most common scenario is estimating a population mean $\mu$.
Case 1: Population Standard Deviation ($\sigma$) is Known
If you know the population standard deviation, you use the z-distribution.
The formula is:
$$
\bar{x} \pm z^* \left( \frac{\sigma}{\sqrt{n}} \right)
$$
Where:
* $\bar{x}$ is the sample mean.
* $z^$ is the critical z-value for your chosen confidence level (e.g., for 95% CI, $z^ = 1.96$).
* $\sigma$ is the population standard deviation.
* $n$ is the sample size.
Case 2: Population Standard Deviation ($\sigma$) is Unknown
More often, $\sigma$ is unknown, so we use the sample standard deviation ($s$) and the t-distribution. The t-distribution is similar to the normal distribution but has fatter tails, accounting for the extra uncertainty from estimating $\sigma$.
The formula is:
$$
\bar{x} \pm t^* \left( \frac{s}{\sqrt{n}} \right)
$$
Where:
* $\bar{x}$ is the sample mean.
* $t^*$ is the critical t-value for your chosen confidence level and degrees of freedom ($df = n-1$). You'll find this using a t-table or statistical software.
* $s$ is the sample standard deviation.
* $n$ is the sample size.
graph TD
A["Ready to estimate a population mean (μ)?"] --> B{Population Standard Deviation (σ) Known?};
B -- "Yes (Rare)" --> C["Use Z-distribution"];
C --> D["Critical Value: Z*"];
B -- "No (Common)" --> E["Use T-distribution"];
E --> F["Critical Value: T* (with df = n-1)"];
D --> G["Standard Error: σ/√n"];
F --> H["Standard Error: s/√n"];
G --> I["Confidence Interval: X̄ ± Z*(σ/√n)"];
H --> I;
I --> J["Interpret the Interval (e.g., '95% confident...')"];
Interpretation
A 95% confidence interval for the mean sales of a product might be (RM 100, RM 120). This means that based on our sample, we are 95% confident that the true average sales for all such products in the population fall somewhere between RM 100 and RM 120.
3. Worked Example
Let's say a business analyst wants to estimate the average spending per customer at a new cafe. They randomly sample 30 customers and find the following:
- Sample Mean ($\bar{x}$) = RM 25.50
- Sample Standard Deviation ($s$) = RM 7.00
- Sample Size ($n$) = 30
The analyst wants to construct a 90% confidence interval for the true average spending per customer.
Since the population standard deviation is unknown and the sample size is relatively small ($n < 30$ is often used as a rough guideline for preferring t-distribution over z-distribution when population standard deviation is unknown, though strictly speaking t-distribution is always more appropriate in this case), we use the t-distribution.
- Identify Point Estimate: $\bar{x}$ = RM 25.50
- Determine Degrees of Freedom: $df = n - 1 = 30 - 1 = 29$.
- Find Critical t-value: For a 90% confidence interval with 29 degrees of freedom, we need the t-value that leaves 5% in each tail (since 100% - 90% = 10%, split into two tails is 5% or 0.05). Looking up a t-table for $df=29$ and $\alpha=0.10$ (two-tailed), or $\alpha=0.05$ (one-tailed), we find $t^* \approx 1.699$.
- Calculate Standard Error: $SE = \frac{s}{\sqrt{n}} = \frac{7.00}{\sqrt{30}} = \frac{7.00}{5.477} \approx 1.278$
- Calculate Margin of Error: $ME = t^* \times SE = 1.699 \times 1.278 \approx 2.171$
- Construct the Confidence Interval:
Lower Bound = $\bar{x} - ME = 25.50 - 2.171 = 23.329$
Upper Bound = $\bar{x} + ME = 25.50 + 2.171 = 27.671$
So, the 90% confidence interval for the average spending per customer is (RM 23.33, RM 27.67).
Interpretation: We are 90% confident that the true average spending per customer at this cafe lies between RM 23.33 and RM 27.67. This gives the cafe owner a much better idea of typical customer spending than just RM 25.50.
4. Key Takeaways
- Confidence intervals provide a range of plausible values for an unknown population parameter.
- The confidence level indicates the long-run proportion of intervals that will contain the true parameter if the sampling process is repeated.
- A confidence interval is built around a point estimate, using a critical value and standard error.
- Use the t-distribution when the population standard deviation is unknown (which is most common).
- A wider interval means more uncertainty, while a narrower interval suggests more precision.
- Increasing the sample size typically narrows the confidence interval, improving precision.
- Increasing the confidence level (e.g., from 90% to 99%) will widen the interval.
Common Mistakes to Avoid:
- Misinterpreting the confidence level: It's about the method's reliability, not the probability that a specific interval contains the true value.
- Confusing a confidence interval for individual data points: It's about the population parameter (like the mean), not where individual observations fall.
- Forgetting to check assumptions: Confidence intervals often assume random sampling and a roughly normal distribution of the sample mean (which is often true for larger sample sizes due to the Central Limit Theorem).
- Using the z-distribution when the t-distribution is more appropriate (i.e., when population standard deviation is unknown).
5. Now Try It
You're a product manager analyzing the time customers spend on your company's new app feature. You collect data from 40 randomly selected users and find their average time spent is 7.2 minutes with a standard deviation of 2.5 minutes. Calculate a 95% confidence interval for the true average time all users spend on the new app feature. Explain what your interval means for the product manager. What success looks like: You should arrive at a lower and upper bound for the interval, and a clear, single-sentence interpretation of what those bounds mean concerning the true population mean.
Frequently asked about Introduction to Confidence Intervals
Get the full STATISTIK PERNIAGAAN curriculum
Clone the complete plan to your dashboard for unlimited AI-generated notes, practice quizzes, and a personalised revision schedule.
Create Free Account