Lecture 10: Hypothesis Testing#
Note
This lecture introduces the rules of decision-making under uncertainty - Hypothesis Testing.
Foundations#
This lecture introduces hypothesis testing—a framework for making decisions between competing hypotheses under uncertainty. These hypotheses are typically:
The null hypothesis (\(H_o\)): the default or status quo assumption, and
The alternative hypothesis (\(H_1\)): the competing claim we seek to test.
The objective of a hypthesis test is to assess whether the observed data provides sufficient statistical evidence to reject the null hypothesis in favor of the alternative.
Warning
In hypothesis testing, we use the language of rejecting or not rejecting the null hypothesis—not accepting it or the alternative hypothesis.
The observed data may be such that, it may fail to provide strong evidence against the null hypothesis—even if it is, in fact, false. In such cases, failing to reject the null hypothesis should not be interpreted as evidence in its favor. At best, it indicates that the available data do not contradict it strongly enough. Alternatively, even when we reject the null, it doesn’t mean the alternative hypothesis is confirmed; it simply means that either the null hypothesis is false or we have observed an outcome that was highly improbable—at most as likely as the chosen significance level—if the null were true.
In practice, the set of plausible explanations for the observed data often extends beyond just the null and alternative hypotheses.
Tip
To understand this better, consider a courtroom trial. The defendant is presumed innocent (null hypothesis). The prosecution’s role is to present evidence strong enough to convince the jury to reject this presumption beyond a reasonable doubt.
If the evidence is inconclusive, the jury fails to reject the assumption of innocence — not because the defendant is necessarily innocent, but because the evidence isn’t strong enough to conclude otherwise. On the other hand, even if the jury does reject the presumption of innocence, that doesn’t prove the defendant is guilty with absolute certainty — it only means the evidence was strong enough to rule out innocence beyond a reasonable doubt.
Errors of Testing#
A key concept in hypothesis testing is the power of a test (denoted by \(1 - \beta\)), which refers to the probability of correctly rejecting a false null hypothesis. In contrast, failing to reject a false null hypothesis results in a Type II error, which occurs with probability \(\beta\). Further, the test might reject a true null hypothesis, resulting in a Type I error. The probability of this error is denoted by \(\alpha\) and is also called the significance level of the test. An ideal test is therefore one that strikes a balance: it keeps the significance level low to minimize Type I errors, while maintaining high power to reduce Type II errors.
Warning
In practice, there’s often a trade-off between the two errors, and thus hypothesis tests must have contextual signficance.
Tip
To understand this better, consider a courtroom trial. Here, the null hypothesis assumes that the defendant is innocent, while the alternative hypothesis asserts that the defendant is guilty. A Type I error corresponds to convicting an innocent person, while a Type II error means acquitting someone who is guilty. Most legal systems prioritize minimizing Type I errors, reflecting the foundational principle that it is better to let a guilty person go free than to wrongly convict an innocent one.
Test Yourself
Context: According to regulatory standards issued by the Ministry of Road Transport and Highways (MoRTH), all new passenger vehicles sold on or after July 1, 2019, must be equipped with a speed warning system. This system emits intermittent beeps when the vehicle exceeds 80 km/h, and a continuous beep when the speed crosses 120 km/h. Formulate a hypothesis test to evaluate the effectiveness of this policy using high-speed crash data for vehicles with and without such a warning system.
The null hypothesis of the test is?
A Type I error occurs if
A Type II error occurs if
Proceedure of Testing#
The basic steps in statistical hypothesis testing are as follows:
Formulate the hypotheses: Clearly define the null hypothesis (\(H_0\)), which typically represents the status quo or no effect, and the alternative hypothesis (\(H_1\)), which reflects the effect or difference you aim to detect.
Set the significance level: Specify the maximum acceptable probability of making a Type I error—that is, rejecting the null hypothesis when it is actually true. This threshold, known as the significance level (\(\alpha\)), is commonly set at 0.10, 0.05, or 0.01.
Choose the testing procedure: Select a statistical method and corresponding test statistic appropriate for your data and hypotheses. Define a rejection region based on the chosen significance level—this region includes values of the test statistic that would lead you to reject the null hypothesis.
Perform the test: Calculate the test statistic using the observed data.
If the test statistic falls within the rejection region, reject the null hypothesis.
If it does not, fail to reject the null hypothesis.
Alternatively, you may compute and report the p-value. The p-value refers to the smallest significance level at which the observed value of the test statistic would lead to rejection of the null hypothesis. This makes the p-value a flexible tool: rather than fixing significance level in advance, one can interpret the p-value relative to various thresholds to assess the strength of evidence against the null hypothesis.
Warning
A common misunderstanding in hypothesis testing is to interpret the p-value or significance level as the probability that the null hypothesis is true. This is a misinterpreation; in absolute sense, the null hypothesis is either true or false - its status is fixed and not probabilistic. What is probabilistic, however, is the likelihood of observing data as extreme as—or more extreme than—what we actually observed, assuming the null hypothesis is true.
Tip
To understand this better, consider a courtroom trial. The defendant is presumed innocent (null hypothesis). The job of the prosecution is to present evidence strong enough to convince the jury to reject that assumption. The p-value, in this analogy, reflects how unusual/extreme the evidence is if the defendant were actually innocent. If the p-value is high, it means the evidence is quite typical even if the defendant is innocent, compelling the jury to hold the presumption of innosence. A very low p-value means the evidence would be highly improbable under the assumption of innocence, leading the jury to reject the presumption of innocence. Regardless of the verdict, the fact of guilt/innocence of the defendant remains fixed.
Likewise, in statistics, the truth of the null hypothesis remains fixed. Thus, p-value doesn’t tell us the probability that the null hypothesis is true; it tells us how surprising the observed data would be if the null hypothesis were true.