9 Hypothesis Testing - An Introduction

9.2 A First Example

Goals:
Introduce direction of extreme.
Introduce critical value.
Introduce α and β.
Introduce p-values.

Suppose that there are two populations of numbers. Population 1 contains one 1, two 2s, three 3s, four 4s, and five 5s, whereas Population 2 contains five 1s, four 2s, three 3s, two 4s, and one 5. The two populations are represented in Figure 9.1.

Figure 9.1: Populations 1 & 2

Suppose that one of the populations is chosen at random, but without us knowing which was chosen. Now from the chosen population, suppose that a random sample of size 1 is selected. Based on the sample, we are to make a decision on the following hypotheses:

H0:The sample came from Population 1.H1:The sample came from Population 2.

Note that the truth value of H0 is well-defined, i.e., when a population is chosen, that determines whether H0 is true or false. For example, if the chosen population is Population 2, then H0 is false.

Since the sample is of size 1, it is reasonable to define the test statistic as the value of observed, so we’ll do just that.

9.2.1 Direction of Extreme - To the Left

Note that the smaller the selected value, the stronger the evidence against H0. That is, if H0 is true, then smaller values are less likely to be observed. This is called the direction of extreme. The direction of extreme is identified by looking at what test statistic values provide stronger evidence against the null hypothesis. In this example, the direction of extreme is said to be to the left, because on a number line, numbers get smaller as you move to the left (at least in the manner we usually draw them).

9.2.2 Decision Rule and Critical Value

In a hypothesis test, we must decide on what constitutes “significant evidence against H0.” This choice is called a decision rule. A decision rule will be of the form, “if we observe a test statistic of this value or more extreme, then we will reject H0.” For example, a decision rule could be:

  1. We will reject H0 if the number selected is 1 or less.

The chosen value of 1 is called a critical value.

9.2.3 Significance Level; α and β

If H0 is true, then the chance of making a Type I error is called the significance level and is denoted by α. Recall that a Type I error occurs when H0 is true, but the decision is to reject H0.

A decision rule and significance level are equivalent. For example, suppose the decision rule is “reject H0 if the number selected is 1 or less.” If H0 is true, the chance of rejecting H0 is α=1/15. Conversely, suppose that α=1/15. Then this would determine a decision rule of “reject H0 if the number selected is 1 or less,” with the critical value being 1.

If H0 is false, then the chance of making a Type II error is denoted by β. Recall that a Type II error is made when H0 is false, but we fail to reject H0. A decision rule or choice for α both determine β. For example, suppose the decision rule is “reject H0 if the number selected is 1 or less,” and suppose that H0 is false. In order to fail to reject H0, a number greater than 1 must be selected. If H0 is false, the chance of that happening is β=10/15=2/3. Figure 9.2 gives a visual representation of the computation for computing α and β for this decision rule.

Figure 9.2: α and β

A choice of α or decision rule leads to the following decision making scheme: The critical value and direction of extreme define a rejection region. If the test statistic falls within the rejection region, then H0 is rejected, i.e., if the test statistic is at the critical value or more extreme, then H0 is rejected.

To illustrate, suppose that we again use the decision rule of “reject H0 if the number selected is 1 or less.” And suppose upon selecting the population, the number 3 is chosen as the sample. Since 3 does not fall with the rejection region, we fail to reject H0, i.e., there is insufficient evidence to reject H0. Figure 9.3 gives a visualization of the regions.

Figure 9.3: Rejection Region

How hypothesis testing is done varies by discipline, and varies by the vintage of the hypothesis tester. The “rejection region” approach is something you may well see, but it is a bit “old fashioned.” Officially, it is called the Classical Approach. A more modern hypothesis testing scheme is as follows: Instead of picking a decision rule, it is more common that the researcher picks a value for α, i.e., the researcher decides on an acceptable level of risk in making a Type I error.3838Recall that a Type I error cannot occur if the chosen H0 is false. Since a Type I error is generally the more serious error, common choices for α are 0.10, 0.05 or 0.01; the more serious a Type I error is, the smaller the chosen value of α.3939Even smaller values of α are used, for example in space flight, when an error can lead to catastrophic results. Instead of computing a critical value from the choice of α, the researcher computes a p-value, which is discussed next.

9.2.4 p-values

After choosing H0 and H1, picking α, designing an experiment and collecting the data, and computing the test statistic from the sample, the researcher then asks:

  • What is the probability of observing this test statistic, or anything more extreme, assuming H0 is true?

This probability is called a p-value. The “more extreme” phrase in the p-value question always references the direction of extreme, just as with the decision rule and computations of α and β. Computing a p-value is similar to computing α, in that the first step is to assume H0 is true.

For example, suppose that the number 3 is chosen. Then the corresponding p-value is computed by answering the following question: Assuming the chosen population is Population 1 (H0 is true), what is the chance of observing a 3 (the test statistic) or anything smaller (more extreme)? Thus, if a 3 is chosen, then the p-value is 6/15, or 0.4, since there are six numbers in Population 1 that are 3 or less. Figure 9.7 illustrates.

Figure 9.4: p-value if 1 is Selected

To make a decision, the researcher compares the computed p-value to the the choice for α. If the p-value is α, then H0 is rejected; otherwise one fails to reject H0. This procedure is equivalent to the decision rule scheme. Why? The p-value is α if and only if the test statistic falls within the rejection region.

Some researchers have started to skip using an α, i.e., in reporting results, researchers simply report p-values, and let readers themselves interpret whether the results are significant. We won’t do that in this text, i.e., we will consistently use a choice for α in interpreting results.


Concepts Check

  1. 1.

    If H0 is false, what is the truth value of H1?
    Answer: True

  2. 2.

    When a researcher chooses H0, what is the probability that H0 is true?
    Answer: 0 or 1

  3. 3.

    If one fails to reject H0, what type of error might have been made?
    Answer: Type II, but remember that this error is possible only if H0 is false

  4. 4.

    The direction of extreme is identified by looking at what values of the test statistic are less likely if H0 is true.
    Answer: True

  5. 5.

    True or false: The p-value is the chance of observing the test statistic, or anything more extreme, assuming H0 is true.
    Answer: True

  6. 6.

    True or false: The significance level, α, is the chance of making a Type I error.
    Answer: True, but remember that a Type I error is possible only if H0 is true

  7. 7.

    Assuming the Population 1 / Population 2 scenario above, if the number selected is 2, what is the corresponding p-value?
    Answer: 3/15=1/5=0.2

  8. 8.

    Assuming the Population 1 / Population 2 scenario above, if the decision rule is “reject H0 if the selected number is 2 or less,” what is the value of α?
    Answer: 0.2, same as above

  9. 9.

    Assuming the Population 1 / Population 2 scenario above, if the decision rule is “reject H0 if the selected number is 2 or less,” what is the value of β?
    Answer: Remember to assume H0 is false, and that the voucher gotten is greater than $200, so β=6/15=0.4

  10. 10.

    Suppose that in a hypothesis test, the researcher chose α=0.05. Suppose further that after computing the test statistic, the researcher got a p-value of 0.0421. What is the researchers decision?
    Answer: Reject H0

  11. 11.

    Suppose that in a hypothesis test, the researcher chose α=0.05. Suppose further that after computing the test statistic, the researcher got a p-value of 0.0882. What is the researchers decision?
    Answer: Fail to reject H0