Lecture 4: Probability Theory#
Note
This lecture introduces foundational concepts in probability theory — both its philosophical basis and formal mathematical axioms. We explore how probability is used to represent uncertainty and build up to the formal rules it must follow.
Interpreting Probability#
There are multiple interpretations of what probability means:
Classical Interpretation: Probability is the ratio of favorable outcomes to total possible outcomes, assuming all outcomes are equally likely.
Frequency Interpretation: Probability is the long-run relative frequency of an event occurring in repeated trials.
Subjective Interpretation: Probability represents personal belief or degree of certainty about an event.
Propensity Interpretation: Probability is a measure of the tendency of a given type of physical situation to yield an outcome of a certain kind.
All interpretations aim to provide a consistent way of reasoning about uncertainty.
Test Yourself
A fair six-sided dice is rolled. The chance of getting a 4 is calculated as 1 favorable outcome out of 6 equally likely possibilities. Which interpretation of probability does this reflect?
A fair six-sided dice is rolled 600 times, and the outcome 2 appears 108 times. Based on this, we estimate the probability of a rolling a 2 as 0.18. Which interpretation of probability does this reflect?
A player claims there’s a 40% chance of rolling a 3 on a biased (weighted) six-sided dice. Which interpretation of probability does this reflect?
A dice is weighted in such a way that its physical structure causes the 5 to land more often. Modeling suggests the chance of a 6 is 1/3. Which interpretation of probability does this reflect?
Tip
Before we can dive deeper into the probaility theory, let’s cover some basics of set theory,
Probability theory is deeply rooted in set theory, where outcomes of experiments are represented as sets. Some key set operations include,
Union (\(A \cup B\)): All elements in \(A\), in \(B\), or in both.
Intersection (\(A \cap B\)): All elements common to both \(A\) and \(B\).
Complement (\(A^c\)): All elements not in \(A\).
Difference (\(A - B\)): Elements in \(A\) but not in \(B\).
Subset: \(A \subseteq B\) means every element in \(A\) is also in \(B\).
Disjoint Sets: \(A \cap B = \emptyset\)
Set operations obey the follwing laws,
Commutativity: This law states that the order of the sets does not matter when performing union or intersection.
Associativity: This law allows us to group sets differently without affecting the result.
Distributivity: This law allows us to distribute one operation over another.
Axioms of Probability#
Let \(S\) be a sample space (set of all possible outcomes), and let \(P\) be a probability function mapping subsets of \(S\) to real numbers, then the axioms are:
Axiom #1 - Non-Negativity: \(P(E) \geq 0\) for any event \(E\).
Axiom #2 - Normalization: \(\sum_{E \in S} P(E) = 1\); the probability of the sample space is 1.
Axiom #3 - Additivity: If \(A\) and \(B\) are disjoint events (i.e., \(A \cap B = \emptyset\)), then \(P(A \cup B) = P(A) + P(B)\)
Note
Two events are disjoint or mutually exclusive if the occurrence of one is incompatible with the occurrence of the other; that is, if they can’t both happen at once (if they have no outcome in common). Equivalently, two events are disjoint if their intersection is the empty set.
These axioms render the following laws of probability,
Complement Rule: \(P(A^c) = 1 - P(A)\) follows from Axiom #2 and Axiom #3
Monotonicity: If \(A \subset B\), then \(P(A) \leq P(B)\) follows from Axiom #3
Inclusion-Exclusion: \(P(A \cup B) = P(A) + P(B) - P(A \cap B)\) can be shown as follows,
Since, the three sets are disjoint, Axiom #3 renders,
Further,
Hence,
Rendering,
Union Bound: Given events \(A_1, A_2, ..., A_n\), then, \(P(A_1 \cup A_2 \cup \dots \cup A_n) \leq \sum_{i=1}^{n} P(A_i)\) follows from Axiom #3
Conditional Probability#
Conditional probability quantifies the likelihood of an event occurring given that another event has already occurred. The conditional probability of event A given event B is defined as:
If \(A\) and \(B\) are independent, i.e, the occurrence of event B does not affect the probability of event A then, \(P(A \mid B) = P(A)\), and hence, \(P(A \cap B) = P(A) \times P(B)\)
Note
Independence and Mutual Exclusivity Are different! If two events are mutually exclusive, they cannot both occur in the same trial, however, if two events are independent, both can occur in the same trial.
Bayes’ Rule#
Expanding over condtional probability, Bayes’ rule renders
Bayes’ Rule is useful to find the conditional probability of A given B in terms of the conditional probability of B given A, which is the more natural quantity to measure in some problems, and the easier quantity to compute in some problems. For example, in screening for a disease, the natural way to calibrate a test is to see how well it does at detecting the disease when the disease is present, and to see how often it raises false alarms when the disease is not present. These are, respectively, the conditional probability of detecting the disease given that the disease is present, and the conditional probability of incorrectly raising an alarm given that the disease is not present. However, the interesting quantity for an individual is the conditional chance that he or she has the disease, given that the test raised an alarm.
More specifically, suppose a diagnostic test for a disease is 99% accurate, i.e., it gives a positive result 99% of the time if the person has the disease (true positive), and 99% of the time it gives a negative result if the person does not have the disease (true negative). Assuming 1% of the population has the disease, what’s the probability that a person does have the disease given that they tested positive?
Let D = has disease, T = tests positive.
Because only a small fraction of the population actually have benign chronic flatulence, the chance that a positive test result for someone selected at random from the population is a false positive is 50%, even though the test is 90% accurate.
Attention
The Monty Hall Problem
The Monty Hall Problem is a classic example of how human intuition often goes against probabilistic reasoning.
Scenario: You’re on a game show, wherein there are 3 doors. Behind one door is a car (the prize), while behind the other two doors are goats. You pick one door (say, Door #1), then the host (Monty Hall), who knows what’s behind all the doors, opens another door (say, Door #3), which has a goat. Monty then gives you a choice: Stick with your original pick (Door #1), or Switch to the remaining unopened door (Door #2).
Question: Should you switch?
Solution: Most people think the chances are now 50-50 between the two remaining doors, but this intuition is wrong. When you initially picked a door, your chance of picking the car was 1/3, while the chance of picking a goat was 2/3. Since Monty will always open a door with a goat, giving you information, thus if your original pick was wrong (2/3 chance), switching will win you the car, however, if your original pick was right (1/3 chance), switching will lose you the car. Essentially, the probability of winning if you switch is 2/3, whereas sticking keeps you at 1/3. The table below details these possible set of events.
Initial Pick |
Car Location |
Monty Opens |
Switch Wins? |
---|---|---|---|
Door 1 |
Door 1 |
Door 2/3 |
❌ |
Door 1 |
Door 2 |
Door 3 |
✅ |
Door 1 |
Door 3 |
Door 2 |
✅ |
Out of 3 equally likely configurations, switching wins in 2 of them.