Lecture 4: Probability Theory

Lecture 4: Probability Theory#

Note

This lecture introduces foundational concepts in probability theory — both its philosophical basis and formal mathematical axioms. We explore how probability is used to represent uncertainty and build up to the formal rules it must follow.

Axioms of Probability#

Let \(S\) be a sample space (set of all possible outcomes), and let \(P\) be a probability function mapping subsets of \(S\) to real numbers, then the axioms are:

Axiom #1 - Non-Negativity: \(P(E) \geq 0\) for any event \(E\).
Axiom #2 - Normalization: \(\sum_{E \in S} P(E) = 1\); the probability of the sample space is 1.
Axiom #3 - Additivity: If \(A\) and \(B\) are disjoint events (i.e., \(A \cap B = \emptyset\)), then \(P(A \cup B) = P(A) + P(B)\)

Note

Two events are disjoint or mutually exclusive if the occurrence of one is incompatible with the occurrence of the other; that is, if they can’t both happen at once (if they have no outcome in common). Equivalently, two events are disjoint if their intersection is the empty set.

These axioms render the following laws of probability,

Complement Rule: \(P(A^c) = 1 - P(A)\) follows from Axiom #2 and Axiom #3
Monotonicity: If \(A \subset B\), then \(P(A) \leq P(B)\) follows from Axiom #3
Inclusion-Exclusion: \(P(A \cup B) = P(A) + P(B) - P(A \cap B)\) can be shown as follows,

\[\begin{split} \begin{aligned} & A \cup B = (A \cap B^c) \cup (A^c \cap B) \cup (A \cap B) \\ & P(A \cup B) = P((A \cap B^c) \cup (A^c \cap B) \cup (A \cap B)) \\ \end{aligned} \end{split}\]

Since, the three sets are disjoint, Axiom #3 renders,

\[\begin{split} \begin{aligned} & P(A \cup B) = P(A \cap B^c) + P(A^c \cap B) + P(A \cap B) \\ \end{aligned} \end{split}\]

Further,

\[\begin{split} \begin{aligned} & P(A) = P(A \cap B^c) + P(A \cap B) \\ & P(B) = P(A^c \cap B) + P(A \cap B) \end{aligned} \end{split}\]

Hence,

\[P(A \cup B) = (P(A) - P(A \cap B)) + (P(B) - P(A \cap B)) + P(A \cap B)\]

Rendering,

\[P(A \cup B) = P(A) + P(B) - P(A \cap B)\]

Union Bound: Given events \(A_1, A_2, ..., A_n\), then, \(P(A_1 \cup A_2 \cup \dots \cup A_n) \leq \sum_{i=1}^{n} P(A_i)\) follows from Axiom #3

Conditional Probability#

Conditional probability quantifies the likelihood of an event occurring given that another event has already occurred. The conditional probability of event A given event B is defined as:

\[ P(A \mid B) = \frac{P(A \cap B)}{P(B)} \quad \text{provided} \ \ P(B) > 0 \]

If \(A\) and \(B\) are independent, i.e, the occurrence of event B does not affect the probability of event A then, \(P(A \mid B) = P(A)\), and hence, \(P(A \cap B) = P(A) \times P(B)\)

Note

Independence and Mutual Exclusivity Are different! If two events are mutually exclusive, they cannot both occur in the same trial, however, if two events are independent, both can occur in the same trial.

Bayes’ Rule#

Expanding over condtional probability, Bayes’ rule renders

\[ P(A \mid B) = \frac{P(B \mid A) \cdot P(A)}{P(B)} \quad \text{provided } P(B) > 0 \]

Bayes’ Rule is useful to find the conditional probability of A given B in terms of the conditional probability of B given A, which is the more natural quantity to measure in some problems, and the easier quantity to compute in some problems. For example, in screening for a disease, the natural way to calibrate a test is to see how well it does at detecting the disease when the disease is present, and to see how often it raises false alarms when the disease is not present. These are, respectively, the conditional probability of detecting the disease given that the disease is present, and the conditional probability of incorrectly raising an alarm given that the disease is not present. However, the interesting quantity for an individual is the conditional chance that he or she has the disease, given that the test raised an alarm.

More specifically, suppose a diagnostic test for a disease is 99% accurate, i.e., it gives a positive result 99% of the time if the person has the disease (true positive), and 99% of the time it gives a negative result if the person does not have the disease (true negative). Assuming 1% of the population has the disease, what’s the probability that a person does have the disease given that they tested positive?

Let D = has disease, T = tests positive.

\[\begin{split} \begin{align*} P(D) &= 0.01 \\ P(T \mid D) &= 0.99 \\ P(T \mid D^c) &= 0.01 \\ P(D^c) &= 0.99 \\ P(T) &= P(T \mid D) P(D) + P(T \mid D^c) P(D^c) \\ &= (0.99)(0.01) + (0.01)(0.99) = 0.0198 \\ P(D \mid T) &= \frac{(0.99)(0.01)}{0.0198} \approx 0.50 \end{align*} \end{split}\]

Because only a small fraction of the population actually have benign chronic flatulence, the chance that a positive test result for someone selected at random from the population is a false positive is 50%, even though the test is 90% accurate.

Attention

The Monty Hall Problem

The Monty Hall Problem is a classic example of how human intuition often goes against probabilistic reasoning.

Scenario: You’re on a game show, wherein there are 3 doors. Behind one door is a car (the prize), while behind the other two doors are goats. You pick one door (say, Door #1), then the host (Monty Hall), who knows what’s behind all the doors, opens another door (say, Door #3), which has a goat. Monty then gives you a choice: Stick with your original pick (Door #1), or Switch to the remaining unopened door (Door #2).

Question: Should you switch?

Solution: Most people think the chances are now 50-50 between the two remaining doors, but this intuition is wrong. When you initially picked a door, your chance of picking the car was 1/3, while the chance of picking a goat was 2/3. Since Monty will always open a door with a goat, giving you information, thus if your original pick was wrong (2/3 chance), switching will win you the car, however, if your original pick was right (1/3 chance), switching will lose you the car. Essentially, the probability of winning if you switch is 2/3, whereas sticking keeps you at 1/3. The table below details these possible set of events.

Initial Pick	Car Location	Monty Opens	Switch Wins?
Door 1	Door 1	Door 2/3	❌
Door 1	Door 2	Door 3	✅
Door 1	Door 3	Door 2	✅

Out of 3 equally likely configurations, switching wins in 2 of them.

Lecture 4: Probability Theory

Contents

Lecture 4: Probability Theory#

Interpreting Probability#

Axioms of Probability#

Conditional Probability#

Bayes’ Rule#

The Monty Hall Problem