12 Testing Two Population Means

12.2 Paired Samples T-test

Goals:
Learn how to identify paired data;
Learn how to do a T-test on paired data;
Learn how to build a confidence interval using
  paired data.

Referring to the introduction of this chapter, suppose that a physical therapist has developed a new treatment that she believes will improve flexibility for people suffering from rotator cuff injuries. How might she test the claim? A natural thing to do would be something akin to the following: Select a sample of people suffering from the same rotator cuff injuries. In a consistent fashion numerically measure flexibility of each subject. Then have each subject perform the treatment for the same amount of time. Once completed, using the same technique to measure flexibility, retest each subject. This generates two samples of numbers, but the samples are paired.

Suppose again that X and Y are populations of numbers with means μX and μY, respectively. The populations are paired if for each member x in X, there is a corresponding member y in Y, and vice versa.

To execute a test on the two means, one needs samples from X and Y. If X and Y are paired, the samples will be naturally paired. If the difference μX-μY makes sense, then a Paired-Samples T-Test may be appropriate in testing the competing hypotheses.4747Populations can be paired, but taking a difference might be nonsensical. For example, if X is the collection of all heights (in inches) of living Americans, and Y is the population of all weights (in pounds)of living Americans, then these populations are paired, but computing μX-μY does not make sense.

It is important to be able to identify paired data, as treating paired data as unpaired weakens the analysis (knowledge of the samples would be discarded).

Example 12.2.1.

A researcher wished to test whether filling a football with air or helium would change the average distance of a punt. One set of footballs were filled with air and the other were filled with helium. Each football was punted by the researcher, and the distance recorded in yards. Are the samples paired or unpaired?

Solution. There is no relationship between the two samples of gas-filled footballs, so the data is unpaired.

Example 12.2.2.

A researcher wished to test whether consuming caffeine shortens the average time it takes to run a mile. To test the claim, she timed participants running a mile. One week later, allowing participants time to recuperate, she gave each participant a 29 mg caffeine tablet, and then ten minutes after consuming the tablet, she timed participants running another mile. Are the samples paired or unpaired?

Solution. For each subject two identical measurements are made, one before the treatment (caffeine given) and one after. Thus, the data is paired.

Suppose that X and Y are paired populations of numbers where μX-μY makes sense. For each pair (x,y) of corresponding members x and y in X and Y, respectively, the difference d=x-y makes sense as well. The population of differences D consisting of all differences of pairs d=x-y is well-defined and has mean μD=μX-μY.

Thus, a hypothesis test on the difference of means μX-μY is equivalent to a test on the mean μD. If it is reasonable to assume the population D is normally distributed4848If X and Y are normally distributed, then D=X-Y is also normally distributed. or if the sample size taken from D sufficiently large, then is is reasonable to use a T-Test on the sample of differences.

Let’s set up notation for the test. The competing hypotheses will be one of the following three cases:

direction of extreme hypotheses equivalent hypotheses
to the left H0:μX-μY0 H0:μD0
H1:μX-μY<0 H1:μD<0
to the right H0:μX-μY0 H0:μD0
H1:μX-μY>0 H1:μD>0
two-sided H0:μX-μY=0 H0:μD=0
H1:μX-μY0 H1:μD0

Suppose that from paired populations X and Y, paired samples of size n are generated:

(x1,y1),(x2,y2),,(xn,yn).

For each pair (xi,yi), compute the difference di=xi-yi. The collection d1, d2, …, dm is then the samle of size n from the population of differences D. The data can be summarized in a table such as this:

xiyidi=xi-yix1y1d1x2y2d2xnyndn

Let d¯ and s denote the sample mean and sample standard deviation of the differences di. The test statistic is then

t=d¯-0s/n, (12.1)

with df=n-1.

Example 12.2.3.

A researcher wished to test whether consuming caffeine shortens the average time it takes to run a mile. To test the claim, she timed participants running a mile. One week later, allowing participants time to recuperate, she gave each participant a 29 mg caffeine tablet, and then ten minutes after consuming the tablet, she timed participants running another mile. At a significance level of 10%, test whether the data below (running times are in minutes) provide significant evidence that consuming caffeine does lower the average time to run a mile. Assume that the differences in times are normally distributed.

𝐏𝐚𝐫𝐭𝐢𝐜𝐢𝐩𝐚𝐧𝐭Time BeforeTime After110.79.9211.49.7310.59.7410.512.0510.49.569.89.5711.29.4810.79.0910.710.3109.011.0

Solution. Recall that if we let μb and μa denote the running time before and after consuming caffeine, respectively, and if we let μD=μa-μb, then the competing hypotheses are

H0:μa-μb0H1:μa-μb<0

and the direction of extreme is to the left.

The first computation is to compute the differences of the pairs, using the order as given in the choice for μD. This is shown in Figure 12.1.

Figure 12.1: Computing the Sample of Differences

Then compute the summary statistics for the sample differences, as shown in Figure 12.2.

Figure 12.2: Summary Statistics for the Sample Differences

Lastly, compute the test statistic and p-value, as in Figure 12.3.

Figure 12.3: T-Test Calculation for the Sample Differences

With t(9)=-1.91 and p-value = 0.1319, at the significance level 10%, there is insufficient evidence that consumption of caffeine lowers the average running time of a mile.

Remark: As in Section LABEL:Sec:effect-size-single-mean, if the test yields a significant result, then it is appropriate to estimate the effect size using Cohen’s d.

12.2.1 Confidence Interval for Difference of Means - Paired Case

If the two samples are paired, and if it is reasonable to assume that the differences come from a normally distributed population, or if the number of pairs is sufficiently large, then a confidence interval on the difference of means can be constructed using a T-Distribution as in Section 8.1.2:

d¯±t*sn,df=n-1. (12.2)
Example 12.2.4.

A researcher wished to test whether consuming caffeine shortens the average time it takes to run a mile. To test the claim, she timed participants running a mile. One week later, allowing participants time to recuperate, she gave each participant a 29 mg caffeine tablet, and then ten minutes after consuming the tablet, she timed participants running another mile. Assuming the differences in times are normally distributed, use the sample data below to construct a 95% confidence interval for the difference of mean running times.

𝐏𝐚𝐫𝐭𝐢𝐜𝐢𝐩𝐚𝐧𝐭Time BeforeTime After110.79.9211.49.7310.59.7410.512.0510.49.569.89.5711.29.4810.79.0910.710.3109.011.0

Solution. As with the hypothesis test, first compute the differences of the pairs, followed by the summary statistics for the sample of differences. Then, compute the the value of t* using the desired level of confidence, as in Figure 12.4.

Figure 12.4: Computing the Confidence Interval - Part I

Using Equation (12.2), compute the margin of error for the interval, as in Figure 12.5.

Figure 12.5: Computing the Confidence Interval - Part II

Lastly, compute the upper and lower bounds for the confidence interval, as in Figure 12.6.

Figure 12.6: Computing the Confidence Interval - Part III

Thus, we are 95% confident that the true difference of mean running times is between -1.42 and 0.44 minutes.

Concepts Check: 1. A study on iron deficiency among infants compared samples of infants following two different feeding regiments. One group was breastfed while the other received a standard baby formula without iron supplements. Is the average blood hemoglobin levels significantly higher in breastfed babies than in formula-fed babies? Are the samples paired or unpaired? Answer: Unpaired. 2. A researcher thinks that a summer program for high school students would improve average interest in engineering as a profession. To test the claim, she measures participant interest in engineering before and after the summer program. Measurements are rankings of interest from 0 to 100, with larger values denoting greater interest. Are the samples paired or unpaired? Answer: Paired.

12.2.2 Exercises

  1. 1.

    A researcher thinks that a summer program for high school students would improve average interest in engineering as a profession. To test the claim, she measures participant interest in engineering before and after the summer program. Measurements are rankings of interest from 0 to 100, with larger values denoting greater interest. At a significance level of 5%, does the data below provide significant evidence that the summer program improves interest in engineering? You may assume that the population of differences is normally distributed.

    𝐏𝐚𝐫𝐭𝐢𝐜𝐢𝐩𝐚𝐧𝐭𝐁𝐞𝐟𝐨𝐫𝐞𝐀𝐟𝐭𝐞𝐫1547726464370714567756271658727726586672
    1. (a)

      What is the population of interest?

    2. (b)

      State the competing hypotheses.

    3. (c)

      What is the direction of extreme?

    4. (d)

      What test will you use, and why is it reasonable to use?

    5. (e)

      Compute the test statistic and corresponding p-value.

    6. (f)

      Sketch the p-value.

    7. (g)

      State your conclusion, i.e., do you reject H0, or fail to reject H0?

    8. (h)

      State the conclusion in a manner appropriate for a scientific journal.

    9. (i)

      What type of error could have been made?

    10. (j)

      Compute and interpret the effect size if the result is significant.

  2. 2.

    Create a Python function called, paired_T_test that takes two sets of paired data and performs a paired samples T-test. It should output the test statistic and the p-value. In addition, based on the α given, it should also output the decision.

  3. 3.

    A researcher thinks that a summer program for high school students would improve average interest in engineering as a profession. To test the claim, she measures participant interest in engineering before and after the summer program. Measurements are rankings of interest from 0 to 100, with larger values denoting greater interest. Assuming that the population of differences is normally distributed, use the paired samples below to construct a 90% confidence interval for the true mean difference in change of interest in engineering as a profession. Interpret your results.

    𝐏𝐚𝐫𝐭𝐢𝐜𝐢𝐩𝐚𝐧𝐭𝐁𝐞𝐟𝐨𝐫𝐞𝐀𝐟𝐭𝐞𝐫1547726464370714567756271658727726586672