Statistics and Probability
From the 9D curriculum
Statistics and Probability
TL;DR
Statistics helps us understand data by organizing, summarizing, and interpreting it to find patterns and make predictions. Probability measures the likelihood of events happening, giving us tools to quantify uncertainty. Together, they allow us to make informed decisions and draw reliable conclusions from data.
1. The Mental Model
Imagine you're trying to figure out what someone likes based on their past choices. Statistics is like looking at all their choices, grouping them, and seeing which ones show up most often; probability is like guessing what their next choice will be based on that information.
2. The Core Material
When we talk about statistics and probability, we're essentially dealing with data and the likelihood of things happening.
Understanding Data: Statistics
Statistics is all about collecting, organizing, analyzing, interpreting, and presenting data. It helps us make sense of large amounts of information and draw meaningful conclusions.
- Descriptive Statistics: This is about summarizing and describing the main features of a dataset. Think of things like averages and how spread out the data is.
- Measures of Central Tendency: These tell you where the center of your data lies.
- Mean (Average): Add all the numbers and divide by how many numbers there are.
- Median: The middle number when the data is ordered from smallest to largest. If there are two middle numbers, it's their average.
- Mode: The number that appears most frequently in your data.
- Measures of Spread/Dispersion: These tell you how spread out your data is.
- Range: The difference between the highest and lowest values.
- Interquartile Range (IQR): The range of the middle 50% of your data. It's the difference between the upper quartile (75th percentile) and the lower quartile (25th percentile).
- Measures of Central Tendency: These tell you where the center of your data lies.
- Inferential Statistics: This involves using sample data to make predictions or inferences about a larger population. We use probability to support these inferences.
Quantifying Uncertainty: Probability
Probability is the likelihood of an event occurring. It's expressed as a number between 0 and 1 (or 0% and 100%), where 0 means impossible and 1 means certain.
- Basic Probability Formula:
$$P(\text{Event}) = \frac{\text{Number of favorable outcomes}}{\text{Total number of possible outcomes}}$$ - Types of Events:
- Independent Events: The outcome of one doesn't affect the outcome of the other (e.g., flipping a coin twice).
- $P(A \text{ and } B) = P(A) \times P(B)$
- Dependent Events: The outcome of one does affect the outcome of the other (e.g., drawing two cards from a deck without replacement).
- Mutually Exclusive Events: Events that can't happen at the same time (e.g., rolling a 1 and rolling a 2 on a single die).
- $P(A \text{ or } B) = P(A) + P(B)$
- Complementary Events: An event and the event that it doesn't happen (e.g., rolling a 6 vs. not rolling a 6).
- $P(\text{not } A) = 1 - P(A)$
- Independent Events: The outcome of one doesn't affect the outcome of the other (e.g., flipping a coin twice).
graph TD
A["Data Analysis Journey"] --> B["Collect Data"]
B --> C{"Organize & Summarize?"}
C -- "Yes (Descriptive)" --> D["Calculate Mean, Median, Mode"]
D --> E["Calculate Range, IQR"]
C -- "No (Inferential)" --> F["Formulate Hypothesis"]
F --> G["Use Samples to Infer Population"]
G --> H["Apply Probability to Test Hypothesis"]
H --> I["Draw Conclusions & Make Predictions"]
I --> J{"Event Likelihood?"}
J --> K["Identify Outcomes"]
K --> L["Calculate P(Event)"]
L --> M{"Combined Events?"}
M -- "Independent" --> N["P(A) * P(B)"]
M -- "Mutually Exclusive" --> O["P(A) + P(B)"]
M -- "Complementary" --> P["1 - P(A)"]
N --> Q["Understand Uncertainty"]
O --> Q
P --> Q
Probability vs. Odds
While related, probability and odds are slightly different.
* Probability is the chance an event will happen out of all possibilities.
* Odds compare the number of ways an event can happen to the number of ways it can't happen.
3. Worked Example
Let's say you have a small bag of candies: 5 red, 3 blue, and 2 green.
Statistics Part (Descriptive):
What's the mode color? Red, because there are 5 of them, which is more than any other color.
What's the mean number of candies per color if we consider each color a group? $(5 + 3 + 2) / 3 = 10 / 3 \approx 3.33$ candies. (This isn't always the most meaningful statistical measure for categorical data, but shows how it could be applied).
Probability Part:
1. What is the total number of candies? $5 + 3 + 2 = 10$ candies.
2. What's the probability of picking a red candy?
$P(\text{Red}) = \frac{\text{Number of red candies}}{\text{Total candies}} = \frac{5}{10} = 0.5$ or 50%.
3. What's the probability of picking a blue or a green candy? (Mutually exclusive events)
$P(\text{Blue or Green}) = P(\text{Blue}) + P(\text{Green}) = \frac{3}{10} + \frac{2}{10} = \frac{5}{10} = 0.5$ or 50%.
4. What's the probability of NOT picking a red candy? (Complementary event)
$P(\text{Not Red}) = 1 - P(\text{Red}) = 1 - 0.5 = 0.5$ or 50%.
You could also calculate $P(\text{Blue}) + P(\text{Green}) = 0.3 + 0.2 = 0.5$.
4. Key Takeaways
- Statistics is about understanding existing data, while probability is about predicting future events.
- Mean, median, and mode describe the central tendency of your data.
- Range and interquartile range tell you how spread out your data values are.
- Probability is always a value between 0 (impossible) and 1 (certain).
- Remember whether events are independent, dependent, or mutually exclusive when calculating probabilities.
-
Complementary events cover all possible outcomes when one event doesn't happen.
-
Don't confuse the mean with the median, especially in skewed data.
- Don't forget to consider all possible outcomes when calculating probability; missing one will give you a wrong answer.
- Don't confuse "probability" (chance out of total) with "odds" (favorable vs. unfavorable).
- Don't assume events are independent; always check if one outcome affects another.
5. Now Try It
You have a standard deck of 52 playing cards (4 suits: hearts, diamonds, clubs, spades; 13 cards per suit: A, 2-10, J, Q, K).
- Calculate the probability of drawing a "face card" (Jack, Queen, or King).
- Calculate the probability of drawing a "red card" (hearts or diamonds).
- Are these two events (drawing a face card and drawing a red card) independent or dependent? Why?
- What's the probability of drawing a red card OR a black card?
- What's the probability of drawing a 7 and then, without replacement, drawing another 7?
Success looks like you correctly calculate each probability and can explain the relationship between the events in question 3.
Frequently asked about Statistics and Probability
More from 9D
Get the full 9D curriculum
Clone the complete plan to your dashboard for unlimited AI-generated notes, practice quizzes, and a personalised revision schedule.
Create Free Account