8 Confidence Intervals for One Parameter 8 Confidence Intervals for One Parameter 8.2 Confidence Intervals for Proportions

8.1 Confidence Intervals for the Mean

Goals:

\bullet

Learn how to calculate confidence intervals for

\mu

when

\sigma

known or not known;

\bullet

Learn how to interpret what a confidence interval is telling us.

When analyzing data, we would like to know what the population mean is. However, most of the time we are faced with only a sample from the population. Thus, we can only calculate a sample mean as an estimate of the actual population mean. If given a sample mean, is it possible to obtain an interval about the sample mean that attempts to show where the population mean may fall? In this section, we will study a way to do just that.

8.1.1 $\sigma$ Is Known

Given a very large population, the sample mean, $\bar{x}$ will likely never be exactly the population mean, $\mu$ . Our goal here is to obtain a range of values that likely represents the location of the population mean. Such a range of values is called a confidence interval.

Definition (Confidence Interval).

A range of values described by a lower value $L$ and an upper value $U$ that has a $1-\alpha$ confidence in containing the population parameter. We call $1-\alpha$ the confidence level of the associated confidence interval, usually written as a percentage.

Typically, we want $\alpha$ to be small, say 0.1, 0.05, or 0.01. But there are drawbacks to being too small, as we will see.

A confidence interval for $\mu$ involves finding $L$ and $U$ so that

L<\mu<U,

with a confidence of $1-\alpha$ . But how do we go about constructing such a confidence interval? The following theorem gives us the formula used to calculate the confidence interval for the population mean, $\mu$ .

Theorem (Confidence Interval for $\mu$ , with $\sigma$ Known).

Given a random sample $x_{1},x_{2},\ldots,x_{n}$ , and the population standard deviation $\sigma$ , the $1-\alpha$ confidence interval for $\mu$ is given by

\left(\bar{x}-z^{*}\dfrac{\sigma}{\sqrt{n}},\ \bar{x}+z^{*}\dfrac{\sigma}{% \sqrt{n}}\right),

where $\bar{x}$ is the sample mean and $z^{*}$ is the critical value from the standard normal distribution such that $P(Z\geq z^{*})=\alpha/2$ .

Remark: It is common to write the confidence interval as

\bar{x}\pm E,

where $E=z^{*}\frac{\sigma}{\sqrt{n}}$ is called the margin of error of $\bar{X}$ .

The most difficult part of calculating the confidence interval for $\mu$ , with $\sigma$ known, is finding $z^{*}$ . The rest is basic arithmetic. We will first need to recap how to find $z^{*}$ given a confidence level of $1-\alpha$ .

Finding $z^{*}$

Recall that a $Z$ -score is a random variable that is normally distributed with mean 0 and standard deviation 1. That is, $Z\sim N(0,1)$ . The critical value $z^{*}$ is an outcome from a $Z$ -score that has the following relation to $\alpha$ .

Definition (Critical Value: $z^{*}$ ).

We define the critical value, $z^{*}$ , as the value in which the following holds:

P(Z\geq z^{*})=\alpha/2,

where $Z$ is a random variable having a standard normal distribution. That is, the area under the standard normal density curve that is to the right of the critical value $z^{*}$ is $\alpha/2$ .

Figure 8.1 shows a representation of what $z^{*}$ stands for on a standard normal distribution. Since the standard normal distribution is symmetric about 0, there is an equal and opposite value, $-z^{*}$ that stands for $P(Z\leq-z^{*})=\alpha/2$ . If you add these two areas together, you get a combined area of $\alpha$ . Thus, the area between $-z^{*}$ and $z^{*}$ is $1-\alpha$ , the confidence level we intend on having, since the total area under the standard normal density curve is 1. How does $P(-z^{*}\leq Z\leq z^{*})=1-\alpha$ help give us a confidence interval for $\mu$ ? Since we have the relation

Z=\dfrac{\bar{X}-\mu}{\frac{\sigma}{\sqrt{n}}},

with a bit of algebra, we can manipulate the inequality within the probability statement to look like

P\left(\bar{X}-z^{*}\dfrac{\sigma}{\sqrt{n}}<\mu<\bar{X}+z^{*}\dfrac{\sigma}{% \sqrt{n}}\right)=1-\alpha.

Extracting the inequality from within the parentheses and replacing the random variable $\bar{X}$ with an outcome $\bar{x}$ , we have the confidence interval desired. For further explanation, see Derivation of Confidence Interval for $\mu$ , $\sigma$ Known at the end of this section.

So obtaining the critical value, $z^{*}$ , is necessary in order to construct a $1-\alpha$ confidence interval for $\mu$ . Let’s look into how to find $z^{*}$ . In Excel and Python, areas under probability density curves are calculated from the left. If we want the area under the standard normal density curve to the right of $z^{*}$ to be $\alpha/2$ , then the area to the left of $z^{*}$ would be $1-\alpha/2$ . The commands in Figure 8.2 reflect this along with reminding you of the commands used to calculate $z^{*}$ in Excel and Python.

Compute $z^{*}$ in Excel	Compute $z^{*}$ in Python
$\tt{\color{red}\colorlet{pgfstrokecolor}{.}NORMINV(1-0.5*\alpha,0,1)}$	norm.ppf( $1-0.5*\alpha$ ))

•

To use the Python command, it is required that you load scipy.stats first.

Figure 8.2: Commands used to calculate

z^{*}

Let’s practice finding $z^{*}$ in both Excel and Python.

Example 8.1.1.

Given $1-\alpha=0.97$ , find the associated critical value $z^{*}$ in Excel.

We first need to calculate what $\alpha$ is. On a new sheet in Excel and in cell $\tt{\color{red}\colorlet{pgfstrokecolor}{.}A1}$ , type the string Alpha.

We need to first find $\alpha$ . Since $1-\alpha=0.97$ , then $1-(1-\alpha)=\alpha$ . In cell $\tt{\color{red}\colorlet{pgfstrokecolor}{.}B1}$ , type

\tt{\color{red}\colorlet{pgfstrokecolor}{.}=1-0.97}

In cell $\tt{\color{red}\colorlet{pgfstrokecolor}{.}A2}$ , type the string Critical Value.

In cell $\tt{\color{red}\colorlet{pgfstrokecolor}{.}B2}$ , use cell-referencing by typing the command

\tt{\color{red}\colorlet{pgfstrokecolor}{.}=NORMINV(1-0.5*B1,0,1)}

Hence, you should obtain that $z^{*}=2.170090378$ . Figure 8.3 represents the layout you should have in the end.

Figure 8.3: Computing

z^{*}

in Excel

$\clubsuit$

Example 8.1.2.

Given $1-\alpha=0.88$ , find the associated critical value $z^{*}$ in Python.

Make sure you first have scipy.stats loaded. If not, type the following command.

from scipy.stats import *

Again, we need to find $\alpha$ first. Based on the same relationship that $1-(1-\alpha)=\alpha$ , type the following in Python to compute and save $\alpha$ to the variable alpha.

alpha = 1-0.88

Compute and store the critical value, $z^{*}$ , by typing the command

critval = norm.ppf(1-0.5*alpha)

You should arrive at $z^{*}=1.5547735945968535$ . Figure 8.4 represents the layout you should have in the end. Remember to call critval to obtain the value for $z^{*}$ .

Figure 8.4: Computing

z^{*}

in Python

$\clubsuit$

Constructing the Confidence Interval

There are a few things that are required first in order to construct the $1-\alpha$ confidence interval for $\mu$ , knowing $\sigma$ :

\left(\bar{x}-z^{*}\dfrac{\sigma}{\sqrt{n}},\ \bar{x}+z^{*}\dfrac{\sigma}{% \sqrt{n}}\right).

Be sure you have the following:

•

Make sure your unbiased random sample is coming from a normally distributed population or the sample size is large enough ( $n\geq 30$ ).³³³³It is always a good idea to assess normality first.
•

Make sure you have $\sigma$ . Is it given to you?
•

Calculate the sample mean, $\bar{x}$ , unless it is given to you.
•

Based on your confidence level, $1-\alpha$ , calculate the critical value $z^{*}$ .

The following table recaps the commands in Excel and Python that are necessary for computing the confidence interval for the mean, with $\sigma$ known. Each row stands for equivalent commands.

Action	Excel Commands	Python Commands
Compute the Mean	$\tt{\color{red}\colorlet{pgfstrokecolor}{.}AVERAGE(\ldots)}$	mean(…)
Compute the Square Root	$\tt{\color{red}\colorlet{pgfstrokecolor}{.}SQRT(\ldots)}$	sqrt(…)
Compute $z^{*}$	$\tt{\color{red}\colorlet{pgfstrokecolor}{.}NORMINV(1-0.5*\alpha,0,1)}$	norm.ppf( $1-0.5*\alpha$ )

•

The mean and sqrt commands require the library numpy . The norm.ppf command requires the library scipy.stats .

Figure 8.5: Commands Needed to Construct Confidence Interval for

\mu

Let’s see a few examples of constructing confidence intervals for $\mu$ , $\sigma$ known, using Excel and Python.

Example 8.1.3.

Suppose a simple random sample of 35 salaries of college football coaches, with a sample mean of $415,230$ , is taken from a normally distributed population. Assuming that $\sigma=265,000$ , use Excel to find a 90% confidence interval for the mean $\mu$ .

It is wise to make sure that our sample comes from a normally distributed population. In this case, it is mentioned in the problem. So normality is assumed and we can proceed.

From the problem, it is understood that $1-\alpha=90\%=0.9$ . Hence, $\alpha=0.10$ . From our formula for the confidence interval, we need to find $z^{*}$ . On a new sheet in Excel, type in cell $\tt{\color{red}\colorlet{pgfstrokecolor}{.}A1}$ the string $\tt{\color{red}\colorlet{pgfstrokecolor}{.}Z^{*}}$ .

Then in cell $\tt{\color{red}\colorlet{pgfstrokecolor}{.}B1}$ , type the following command that will compute the critical value $z^{*}$ associated with a confidence level of $1-\alpha$ .

\tt{\color{red}\colorlet{pgfstrokecolor}{.}=NORMINV(1-0.5*0.1,0,1)}

Since the formula for the confidence interval is

\left(\bar{x}-z^{*}\dfrac{\sigma}{\sqrt{n}},\ \bar{x}+z^{*}\dfrac{\sigma}{% \sqrt{n}}\right),

let’s first compute the lower value: $\bar{x}-z^{*}\frac{\sigma}{\sqrt{n}}$ . In cells $\tt{\color{red}\colorlet{pgfstrokecolor}{.}D1}$ and $\tt{\color{red}\colorlet{pgfstrokecolor}{.}E1}$ , type the strings $\tt{\color{red}\colorlet{pgfstrokecolor}{.}LOWER}$ and $\tt{\color{red}\colorlet{pgfstrokecolor}{.}UPPER}$ , respectively.

With $\bar{x}=415,230$ , $\sigma=265,000$ , plug into cell $\tt{\color{red}\colorlet{pgfstrokecolor}{.}D2}$ the following:

\tt{\color{red}\colorlet{pgfstrokecolor}{.}=415230-B1*265000/SQRT(35)}

Notice that we are cell-referencing the value for $z^{*}$ .

Repeat the process for the upper value in cell $\tt{\color{red}\colorlet{pgfstrokecolor}{.}E2}$ by typing the following:

\tt{\color{red}\colorlet{pgfstrokecolor}{.}=415230+B1*265000/SQRT(35)}

Figure 8.6: Confidence Interval in Excel

Cells $\tt{\color{red}\colorlet{pgfstrokecolor}{.}D2}$ and $\tt{\color{red}\colorlet{pgfstrokecolor}{.}E2}$ represent the lower and upper values of the confidence interval. Hence the 90% confidence interval for the population mean is $(341551.7828,488908.2172)$ . Figure 8.6 reflects the layout of the problem in Excel. $\clubsuit$

Example 8.1.4.

Colleges frequently provide estimates of student expenses such as housing. Suppose the data is normally distributed with $\sigma=\$122.3$ . A simple random sample of 43 houses was collected and the sample mean for student housing was calculated to be $\$621.63$ . Construct a 87% confidence interval for $\mu$ , the population mean of student housing cost.

The problem states that the data are normally distributed. So we may move forward with constructing the confidence interval.

Since we are using Python, remember to load numpy and scipy.stats. If you have not yet, use the following commands to do so.

		from numpy import *
		from scipy.stats import *

From the problem, it is understood that the $1-\alpha=87\%=0.87$ . Hence $\alpha=0.13$ . Let’s find $z^{*}$ . Type the following command that will assign the critical value, $z^{*}$ , to the variable name zstar.

zstar = norm.ppf(1-0.5*0.13)

Let’s name the lower value of the confidence interval as LOWER. At the prompt, type the following to assign it to LOWER.

LOWER = 621.63 - zstar*122.3/sqrt(43)

Repeat the process for calculating the upper value of the confidence interval. Assign it to the name UPPER.

UPPER = 621.63 + zstar*122.3/sqrt(43)

Let’s have Python print out the confidence interval. Type the following.

print "(%f, %f)" %(LOWER,UPPER)

You should obtain the confidence interval $(593.391129,649.868871)$ . Figure 8.7 represents the layout that you should have in Python.

Figure 8.7: Confidence Interval in Python

$\clubsuit$

Derivation of Confidence Interval for $\mu$ , $\sigma$ Known (Optional)

You may wonder: How does this formula come to be? Recall from the Central Limit Theorem that for a sufficiently large sample size $n$ , the distribution of sample means, $\bar{X}$ , becomes normal. That is,

\bar{X}\sim N\left(\mu,\frac{\sigma}{\sqrt{n}}\right).

Knowing this tells us the likelihood of obtaining different sample means. For standardization purposes, we can transform $\bar{X}$ into a $Z$ -score by using the following formula:

Z=\dfrac{\bar{X}-\mu}{\sigma/\sqrt{n}},

and, as a result, the random variable $Z$ is normally distributed with $\mu=0$ and $\sigma=1$ , or, rather

Z\sim N(0,1).

We want $1-\alpha$ to be the likelihood that the $Z$ -score, obtained from the sample mean calculated, lies between $-z_{\alpha/2}$ and $z_{\alpha/2}$ , which are just $Z$ -scores that correspond to $P(Z\leq-z_{\alpha/2}=\alpha/2$ and $P(Z\geq z_{\alpha/2})=\alpha/2$ , respectively. This means that

P\left(-z^{*}<Z<z^{*}\right)=1-\alpha.

By replacing $Z$ with $\dfrac{\bar{X}-\mu}{\sigma/\sqrt{n}}$ and manipulating the inside inequality, we have

	$\displaystyle 1-\alpha$	$\displaystyle=P\left(-z^{}<Z<z^{}\right)$
		$\displaystyle=P\left(-z^{}<\dfrac{\bar{X}-\mu}{\sigma/\sqrt{n}}<z^{}\right)$
		$\displaystyle=P\left(-z^{}\dfrac{\sigma}{\sqrt{n}}<\bar{X}-\mu<z^{}\dfrac{% \sigma}{\sqrt{n}}\right)$
		$\displaystyle=P\left(-z^{}\dfrac{\sigma}{\sqrt{n}}-\bar{X}<-\mu<z^{}\dfrac{% \sigma}{\sqrt{n}}-\bar{x}\right)$
		$\displaystyle=P\left(\bar{X}+z^{}\dfrac{\sigma}{\sqrt{n}}>\mu>z^{}\bar{X}-% \dfrac{\sigma}{\sqrt{n}}\right)$

Flipping the inside inequality around, we have

P\left(\bar{X}-z^{*}\dfrac{\sigma}{\sqrt{n}}<\mu<\bar{X}+z^{*}\dfrac{\sigma}{% \sqrt{n}}\right)=1-\alpha.

(8.1)

Remember that $\bar{X}$ is a random variable and $\bar{x}$ is an outcome from $\bar{X}$ . Equation 8.1 states that out of all confidence intervals we can construct of the form

\left(\bar{x}-z^{*}\dfrac{\sigma}{\sqrt{n}},\ \bar{x}+z^{*}\dfrac{\sigma}{% \sqrt{n}}\right),

$(1-\alpha)$ % of them will capture the population mean, $\mu$ . Hence, we have obtained our desired confidence interval.

8.1.2 $\sigma$ Is Not Known

In reality, the population standard deviation, $\sigma$ , is never known exactly. We tend to use the sample standard deviation, $s$ , as an estimate for $\sigma$ since it’s the next best thing we can obtain. Since we are now estimating another parameter, we must adjust the way we calculation of the confidence to account for more variability. Recall that the random variable

T=\dfrac{\bar{X}-\mu}{s/\sqrt{n}}

has a $t$ -distribution with $n-1$ degrees of freedom. This distribution looks like standard normal distribution, $N(0,1)$ . In fact as degrees of freedom, $n-1$ , increases, the $t$ -distribution approaches and eventually becomes the standard normal distribution. Figure 8.8 depicts what happens to the $t$ -distribution as the degrees of freedom, $n-1$ , increases.

Figure 8.8: Increasing Degrees of Freedom

The $t$ -distribution will have fatter tails to allow for more variability due to $s$ now being an estimate for $\sigma$ .

The process for constructing a $1-\alpha$ confidence interval for $\mu$ will be the similar to the case when we knew $\sigma$ . However, since we do not know $\sigma$ we must change a couple things. The following theorem represents the changes needed in the construction of a $1-\alpha$ confidence interval for $\mu$ , with $\sigma$ not known.

Theorem (Confidence Interval for $\mu$ , with $\sigma$ Not Known).

Given a random sample $x_{1},x_{2},\ldots,x_{n}$ , and the associated sample standard deviation $s$ , the $1-\alpha$ confidence interval for $\mu$ is given by

\left(\bar{x}-t^{*}\dfrac{s}{\sqrt{n}},\ \bar{x}+t^{*}\dfrac{s}{\sqrt{n}}% \right),

where $\bar{x}$ is the sample mean and $t^{*}$ is the critical value from the Student’s $t$ -distribution with $n-1$ degrees of freedom such that $P(T\geq t^{*})=\alpha/2$ .

Remark: Again, it is common to write the confidence interval as

\bar{x}\pm E,

where $E=t^{*}\frac{s}{\sqrt{n}}$ is called the margin of error of $\bar{X}$ . Note that $E$ may be used interchangeably between both cases of when either $\sigma$ is known and $\sigma$ is not known.

Similar to constructing confidence intervals for $\mu$ knowing $\sigma$ , the most difficult part of constructing confidence intervals for $\mu$ not knowing $\sigma$ is finding $t^{*}$ . Let’s recap how to find $t^{*}$ given a confidence interval of $1-\alpha$ and degrees of freedom $n-1$ .

Finding $t^{*}$

The random variable $T$ has a $t$ -distribution with degrees of freedom $n-1$ . The critical value, $t^{*}$ is a particular outcome of $T$ that has the following relation to $\alpha$ .

Definition (Critical Value: $t^{*}$ ).

We define the critical value, $t^{*}$ , as the value in which the following holds:

P(T\geq t^{*})=\alpha/2,

where $T$ is a random variable having a $t$ -distribution of $n-1$ degrees of freedom. That is, the area under the $t$ -distribution with $n-1$ degrees of freedom to the right of the critical value $t^{*}$ is $\alpha/2$ .

The $t$ -distribution is also symmetric about 0, similar to that of standard normal distribution. So the symmetric value, $-t^{*}$ , stands for $P(T\leq-t^{*})=\alpha/2$ . Hence $P(-t^{*}\leq T\leq t^{*})=1-\alpha$ , and thus showing the connection to the $1-\alpha$ confidence level we desire out of the confidence interval.

Obtaining $t^{*}$ is definitely necessary in order to construct a $1-\alpha$ confidence interval for $\mu$ , not knowing $\sigma$ . In order to find $t^{*}$ , we need the degrees of freedom. Degrees of freedom is easy to calculate. Just remember,

\text{Degrees of freedom}=n-1,

where $n$ is the sample size taken. When calculating $t^{*}$ we will need to supply the degrees of freedom into the commands. Below the commands used to calculate $t^{*}$ in $E$ and Python.

Compute $t^{*}$ in Excel	Compute $t^{*}$ in Python
$\tt{\color{red}\colorlet{pgfstrokecolor}{.}T.INV(1-0.5*\alpha,N-1)}$	t.ppf( $1-0.5*\alpha$ , $n-1$ )

•

To use the Python command, it is required that you load scipy.stats first.

Figure 8.9: Commands used to calculate

t^{*}

Do you notice the similarity in the commands to the commands for finding $z^{*}$ ? Can you reason out why we are still using $1-0.5*\alpha$ ? Also notice that second input in both commands involves telling the programs what the degrees of freedom is. In both cases, $n$ stands for the sample size.

Let’s practice finding $t^{*}$ in both Excel and Python.

Example 8.1.5.

Given $1-\alpha=0.92$ , find the associated critical value $t^{*}$ in Excel knowing that the sample size taken is $n=42$ .

We first need to calculate what $\alpha$ is. On a new sheet in Excel and in cell $\tt{\color{red}\colorlet{pgfstrokecolor}{.}A1}$ , type the string Alpha.

Since $1-\alpha=0.92$ , then $1-(1-\alpha)=\alpha$ . In cell $\tt{\color{red}\colorlet{pgfstrokecolor}{.}B1}$ , type

\tt{\color{red}\colorlet{pgfstrokecolor}{.}=1-0.92}

In cell $\tt{\color{red}\colorlet{pgfstrokecolor}{.}A2}$ , type the string Critical Value.

Knowing that degrees of freedom is $n-1=41$ , use cell-referencing to compute $t^{*}$ by typing the following command in cell in cell $\tt{\color{red}\colorlet{pgfstrokecolor}{.}B2}$ :

\tt{\color{red}\colorlet{pgfstrokecolor}{.}=T.INV(1-0.5*B1,41)}

You should obtain $t^{*}=1.795172845$ . Figure 8.10 represents the layout in Excel you should have in the end.

Figure 8.10: Computing

t^{*}

in Excel

$\clubsuit$

Example 8.1.6.

Given $1-\alpha=0.77$ , find the associated critical value $t^{*}$ in Python knowing that the sample size taken is $n=23$ .

Make sure you first have scipy.stats loaded. If not, type the following command.

from scipy.stats import *

We need to find $\alpha$ first. Type the following in Python to compute and save $\alpha$ to the variable alpha.

alpha = 1-0.77

Note that the degrees of freedom in this case is $n-1=23-1=22$ . Compute and store the critical value, $t^{*}$ , as critval by typing the command

critval = t.ppf(1-0.5*alpha,22)

You should obtain $t^{*}=1.2346116580430575$ . Figure 8.11 represents the layout you should have in the end. Remember to call critval to obtain the value for $t^{*}$ .

Figure 8.11: Computing $t^{*}$ in Python

$\clubsuit$

Constructing the Confidence Interval

As in the case when we were constructing a $1-\alpha$ confidence interval for $\mu$ with $\sigma$ known, we need to make sure we have a few things before proceeding with constructing a $1-\alpha$ confidence interval for $\mu$ , not knowing $\sigma$ :

\left(\bar{x}-t^{*}\dfrac{s}{\sqrt{n}},\ \bar{x}+t^{*}\dfrac{s}{\sqrt{n}}\right)

Be sure you have the following:

•

Make sure your unbiased random sample is coming from a normally distributed population.
•

The sample size, $n$ . The sample size may be small in this case ( $n<30$ ).³⁴³⁴Extremely large sample sizes allow for one to use $z^{*}$ instead of $t^{*}$ due to the $t$ -distribution becoming standard normal for incredibly large sample sizes.
•

You should not have $\sigma$ in this case. Use $s$ , the sample standard deviation, as an estimate of $\sigma$ . You may need to calculate it.
•

Calculate the degrees of freedom. This is $n-1$ , where $n$ is the sample size.
•

Based on your confidence level, $1-\alpha$ and degrees of freedom, calculate the critical value $t^{*}$ using appropriate commands in either Excel or Python.

As a recap, here are the commands you will encounter in Excel and Python for constructing a $1-\alpha$ confidence interval for $\mu$ , with $\sigma$ not known.

Action	Excel Commands	Python Commands
Compute the Mean	$\tt{\color{red}\colorlet{pgfstrokecolor}{.}AVERAGE(\ldots)}$	mean(…)
Compute the Square Root	$\tt{\color{red}\colorlet{pgfstrokecolor}{.}SQRT(\ldots)}$	sqrt(…)
Compute $t^{*}$	$\tt{\color{red}\colorlet{pgfstrokecolor}{.}T.INV(1-0.5*\alpha,N-1)}$	t.ppf( $1-0.5*\alpha$ , $n-1$ )

•

The mean and sqrt commands require the library numpy . The norm.ppf command requires the library scipy.stats .

Figure 8.12: Commands Needed to Construct Confidence Interval for

\mu

\sigma

Not Known

Let’s revisit Examples 8.1.3 and 8.1.4 but for the case that we do not know $\sigma$ .

Example 8.1.7.

Suppose a simple random sample of 35 salaries of college football coaches, with a sample mean of $415,230$ , is taken from a normally distributed population. Assuming that the calculated sample standard deviation is $s=265,000$ , use Excel to find a 90% confidence interval for the mean $\mu$ .

Since our population is normally distributed, we can proceed with constructing the confidence interval. Notice that we are not given $\sigma$ . So we will be using $t^{*}$ instead of $z^{*}$ .

Since $1-\alpha=90\%$ , we have $\alpha=0.10$ . On a new sheet in Excel, type in cell $\tt{\color{red}\colorlet{pgfstrokecolor}{.}A1}$ the string $\tt{\color{red}\colorlet{pgfstrokecolor}{.}T^{*}}$ .

Then in cell $\tt{\color{red}\colorlet{pgfstrokecolor}{.}B1}$ , type the following command that will compute the critical value $t^{*}$ associated with the confidence level $1-\alpha$ . Note that the degrees of freedom in this problem $n-1=35-1=34$ .

\tt{\color{red}\colorlet{pgfstrokecolor}{.}=T.INV(1-0.5*0.1,34)}

Since the formula for the confidence interval is

\left(\bar{x}-t^{*}\dfrac{s}{\sqrt{n}},\ \bar{x}+t^{*}\dfrac{s}{\sqrt{n}}% \right),

let’s first compute the lower value: $\bar{x}-t^{*}\frac{s}{\sqrt{n}}$ . In cells $\tt{\color{red}\colorlet{pgfstrokecolor}{.}D1}$ and $\tt{\color{red}\colorlet{pgfstrokecolor}{.}E1}$ , type the strings $\tt{\color{red}\colorlet{pgfstrokecolor}{.}LOWER}$ and $\tt{\color{red}\colorlet{pgfstrokecolor}{.}UPPER}$ , respectively.

With $\bar{x}=415,230$ , $s=264,000$ , plug into cell $\tt{\color{red}\colorlet{pgfstrokecolor}{.}D2}$ the following:

\tt{\color{red}\colorlet{pgfstrokecolor}{.}=415230-B1*265000/SQRT{35}}

Notice that we are cell-referencing the value of $t^{*}$ .

Repeat the process for the upper value in cell $\tt{\color{red}\colorlet{pgfstrokecolor}{.}E2}$ by typing the following:

\tt{\color{red}\colorlet{pgfstrokecolor}{.}=415230+B1*265000/SQRT{35}}

Figure 8.13: Confidence Interval in Excel

$\clubsuit$

Example 8.1.8.

Colleges frequently provide estimates of student expenses such as housing. Suppose the data is normally distributed. A simple random sample of 43 houses was collected and the sample mean for student housing was calculated to be $\$621.63$ along with a sample standard deviation of $s=\$122.3$ . Construct a 87% confidence interval for $\mu$ , the population mean of student housing cost.

Again, the population is assumed to be normally distributed. We can move forward with producing the confidence interval.

Remember to load numpy and scipy.stats since we are using Python. To do so, type:

from numpy import *

from scipy.stats import *

Since $1-\alpha=87\%$ , then $\alpha=0.13$ . To find $t^{*}$ , type the following command that will assign the value of $t^{*}$ to tstar. Notice that the degrees of freedom is $n-1=43-1=42$ .

tstar = t.ppf(1-0.5*0.13,42)

Let’s name the lower value of the confidence interval as Lower. At the prompt, type the following to compute the lower value of the confidence interval and assign it to Lower.

Lower = 621.63 - tstar*122.3/sqrt(43)

Repeat the process for calculating the upper value of the confidence interval. Assign it to the name Upper.

Upper = 621.63 + tstar*122.3/sqrt(43)

Let’s have Python print out the confidence interval. Type the following.

print "(%f, %f)" %(Lower, Upper)

You should obtain the confidence interval $(592.826538,\ 650.433462)$ . Figure 8.14 represents the layout you should have in Python when finished.

Figure 8.14: Confidence Interval In Python

$\clubsuit$

Comparing the confidence intervals with Examples 8.1.3 and 8.1.4 with those in Examples 8.1.7 and 8.1.8, you will notice that the confidence intervals in Examples 8.1.7 and 8.1.8 are wider. As mentioned before, estimating $\sigma$ with $s$ means we need to account for more variability. So we expect wider confidence intervals whenever we use $t^{*}$ instead of $z^{*}$ .

8.1.3 Understanding the Confidence Interval

Interpreting a $1-\alpha$ Confidence Interval

It is important to understand what the confidence interval is trying to describe to us. First and foremost, you should remove the following from your belief of what the confidence interval is.

The true mean $\mu$ is always within the range represented by the confidence interval.

This may not be true. Confidence intervals have a likelihood involved. In fact, the confidence level $1-\alpha$ is the likelihood associated with the confidence interval.

Let’s get a better grasp of what is going on. Let’s create a $1-\alpha$ confidence interval for $\mu$ , assuming the data we gather come from a normally distributed population. Assume further that $\sigma$ is known and the sample size $n$ is fixed.

We have everything needed to start calculating the $1-\alpha$ confidence interval for $\mu$ except for the sample mean. Well, that’s easy to obtain. We take a simple random sample from the normally distributed population and calculate a sample mean, call it $\bar{x}_{1}$ . We have everything we need. So, we can go ahead and calculate a $1-\alpha$ confidence interval for $\mu$ .

\text{Ours: }\left(\bar{x}_{1}-z^{*}\dfrac{\sigma}{\sqrt{n}},\ \bar{x}_{1}+z^{% *}\dfrac{\sigma}{\sqrt{n}}\right)

Meanwhile, a friend of ours from class also does the same thing. He or she takes a simple random sample from the population and calculates a sample mean, call it $\bar{x}_{2}$ . Then they calculate a $1-\alpha$ confidence interval for $\mu$ . (Note that their $\alpha$ , $\sigma$ , $t^{*}$ , and $n$ are all the same.) Their confidence interval looks like

\text{Friend's: }\left(\bar{x}_{2}-z^{*}\dfrac{\sigma}{\sqrt{n}},\ \bar{x}_{2}% +z^{*}\dfrac{\sigma}{\sqrt{n}}\right)

What’s the difference? Well, their $\bar{x}_{2}$ will most likely be different than ours. So their $1-\alpha$ confidence interval for $\mu$ will most likely be different. An argument ensues over which interval has $\mu$ in it. As a result, we both decide to repeat the process. We obtain a different sample mean, $\bar{x}_{3}$ , and our obstinate friend gets a different sample mean, $\bar{x}_{4}$ . Here are the different $1-\alpha$ confidence intervals for $\mu$ .

\text{Ours: }\left(\bar{x}_{3}-z^{*}\dfrac{\sigma}{\sqrt{n}},\ \bar{x}_{3}+z^{% *}\dfrac{\sigma}{\sqrt{n}}\right)

\text{Friend's: }\left(\bar{x}_{4}-z^{*}\dfrac{\sigma}{\sqrt{n}},\ \bar{x}_{4}% +z^{*}\dfrac{\sigma}{\sqrt{n}}\right)

These are different than our originals! Is there something wrong with the way the intervals are calculated? Nope! Remember that $\bar{x}$ changes with each simple random sample collected from a population. If our friend and us each did this 50 times for a combined total of 100 different confidence intervals, then we should notice that some of the intervals may overlap. In fact, we should expect the proportion of $1-\alpha$ of them to overlap. This is what the confidence interval is telling us.

For example, suppose that $1-\alpha=95\%$ . Then if we were to construct one hundred $95\%$ confidence intervals for $\mu$ , 95 of them should contain the true population mean $\mu$ .

So, as a result, we should interpret the $1-\alpha$ confidence interval for $\mu$ by stating the following.

We are $1-\alpha$ confident that the confidence interval we obtained contains the population mean, $\mu$ .

In the statement, we are placing emphasis on the fact that there is a chance the confidence interval we obtained may not contain the population mean, $\mu$ . The above interpretation of a $1-\alpha$ confidence interval for $\mu$ can be extended to the case of when we do not know $\sigma$ . In fact, this same interpretation can be extended further to a $1-\alpha$ confidence interval for any parameter, not just $\mu$ .

Changing $\alpha$ : How a $1-\alpha$ Confidence Intervals Changes

Based on this understanding of the confidence interval, your first thought may be, “Why not make $\alpha$ as small as possible, say $\alpha=0.000001$ ?” This is a good question. The smaller $\alpha$ is, the more likely a constructed $1-\alpha$ confidence interval will capture the population mean, $\mu$ . However, there is a drawback to making $\alpha$ be super small. The smaller $\alpha$ gets, the range the $1-\alpha$ confidence interval describes will widen. The best way to see this is through an example.

Example 8.1.9.

Let $\bar{x}=0$ , $\sigma=1$ , and $n=30$ . In Excel, calculate $1-\alpha$ confidence intervals for $\mu$ given the following confidence levels.

1.

$1-\alpha=0.95$

So $\alpha=0.05$ . Using the command $\tt{\color{red}\colorlet{pgfstrokecolor}{.}NORMINV(1-0.5*0.05,0,1)}$ , we obtain $z^{*}=1.959963985$ . Calculating the confidence interval in the same manner as that in Example 8.1.3, we have

$(-0.357838829,0.357838829).$

Figure 8.15: Excel Confidence Interval $1-\alpha=0.95$
2.

$1-\alpha=0.99$

So $\alpha=0.01$ . Using the command $\tt{\color{red}\colorlet{pgfstrokecolor}{.}NORMINV(1-0.5*0.01,0,1)}$ , we obtain $z^{*}=2.575829304$ . Calculating the confidence interval, we have

$(-0.470279938,0.470279938).$

Figure 8.16: Excel Confidence Interval $1-\alpha=0.99$
3.

$1-\alpha=0.999$

So $\alpha=0.001$ . Using the command $\tt{\color{red}\colorlet{pgfstrokecolor}{.}NORMINV(1-0.5*0.001,0,1)}$ , we obtain $z^{*}=3.290526731$ . Calculating the confidence interval, we have

$(-0.600765239,0.600765239).$

Figure 8.17: Excel Confidence Interval $1-\alpha=0.999$
4.

$1-\alpha=0.9999$

So $\alpha=0.0001$ . Using the command $\tt{\color{red}\colorlet{pgfstrokecolor}{.}NORMINV(1-0.5*0.0001,0,1)}$ , we obtain $z^{*}=3.890591886$ . Calculating the confidence interval, we have

$(-0.710321646,0.710321646).$

Figure 8.18: Excel Confidence Interval $1-\alpha=0.9999$

Notice that as $\alpha$ decreases, the confidence interval widens. $\clubsuit$

Ideally, we $\alpha=0.05$ or $\alpha=0.01$ will suffice. However, if targeting the population mean is important, there is another tactic we can take to try to shrink the confidence interval.

Controlling the Width of Confidence Interval through $n$

While keeping $\alpha=0.05$ is ideal, we can still adjust the confidence interval by adjusting the sample size $n$ . That will allow us to control the width of the sample size.

Recall that the confidence interval is sometimes written as

\bar{x}\pm E,

where $E=z^{*}\frac{\sigma}{\sqrt{n}}$ or $E=t^{*}\frac{s}{\sqrt{n}}$ , depending on whether $\sigma$ is known or not, this is called the margin of error for $\bar{X}$ . The margin of error tells us how much we are allowed to deviate to the left and to the right of the sample mean when constructing the $1-\alpha$ confidence interval for $\mu$ .

Say we want the margin of error to be not more than $\pm 0.01$ . That is, whenever we construct a $1-\alpha$ confidence interval for $\mu$ we want the interval to look like

(\bar{x}-0.01,\ \bar{x}+0.01),

for some $\bar{x}$ that was taken from a sample. How do we insure that the margin of error is $\pm 0.01$ or rather any other desired value of margin of error?

Since $\alpha$ is assumed to be fixed and $\sigma$ is given, the only other variable we can adjust is the sample size, $n$ . With a little algebra, we can obtain an expression that will tell us how big $n$ needs to be. In fact, by letting $c=z^{*}$ (or $c=t^{*}$ ), and $v=\sigma$ (or $v=s$ ), and solving the formula

E=c\dfrac{v}{\sqrt{n}}

for $n$ , we have

	$\displaystyle E$	$\displaystyle=c\dfrac{v}{\sqrt{n}}$
	$\displaystyle E\sqrt{n}$	$\displaystyle=cv$
	$\displaystyle\sqrt{n}$	$\displaystyle=c\dfrac{v}{E}$
	$\displaystyle n$	$\displaystyle=\left(c\dfrac{v}{E}\right)^{2}.$

This then gives us what we need.

Theorem (Controlling Margin of Error with $n$ ).

Given $\alpha$ , $\sigma$ and $E$ , in order to insure that the $1-\alpha$ confidence interval for $\mu$ has a margin of error of at most $E$ , then the sample size, $n$ , needs to be at least

n=\left(c\dfrac{v}{E}\right)^{2},

where $c=z^{*}$ (or $t^{*}$ ), and $v=\sigma$ (or $v=s$ ), depending on whether we know $\sigma$ or not.

Let’s see an example in Python for determining the appropriate sample size.

Example 8.1.10.

A quality controller wants to determine a 90% confidence interval for the average size of bolt manufactured. He knows that his population standard deviation is 1.2 mm and that he wants a margin of error of $\pm 0.02$ . How big must his sample sizes be in order to achieve his desired confidence intervals?

This question is really asking for the minimum sample size needed to ensure that the margin of error is at most $\pm 0.02$ . This is done by calculating the expression

n=\left(z^{*}\dfrac{\sigma}{E}\right)^{2}.

Since we will be using Python, make sure you first load the following libraries before proceeding.

from numpy import *

from scipy.stats import *

First, we need to find $z^{*}$ . Since $1-\alpha=0.9$ , then $\alpha=0.1$ . In Python, type the following command will tell us $z^{*}$ . Store it as zstar.

zstar = norm.ppf(1-0.5*0.1)

We are given that $\sigma=1.2$ and that we want a margin of error of $E=\pm 0.02$ . Since the expression for $n$ involves squaring, we do not need to worry about the $\pm$ as squaring will always return a non-negative value.

In Python, type the following to evaluate the expression for $n$ . Save it as n.

n = (zstar*1.2/0.02)**2

Recall that squaring in Python will involve ** instead of a caret symbol.³⁵³⁵If done in Excel, you would use $\tt{\color{red}\colorlet{pgfstrokecolor}{.}\wedge}$ instead. Also notice that we do not need to worry about integer division since all values are floating-point.

Call the variable n. You should obtain $n=9739.9564347434862$ . Figure 8.19 represents the output you should obtain in Python.

Figure 8.19: Margin of Error Computation

Since $n$ is an integer, we will round-up to the next whole integer. So, the quality controller should choose a sample size of at least 9740 to insure the margin of error is $\pm 0.02$ .

$\clubsuit$

Concepts Check: 1. Assuming normality in the population, compute a 83% confidence interval given

\bar{x}=23

\sigma=2.3

n=56

. Answer:

(22.57825257,\ 23.42174743)

2. Assuming normality in the population, compute a 96% confidence interval given

\bar{x}=524

s=10.1

n=12

. Anwer:

(517.2120316,\ 530.7879684)

3. What is the minimum sample size needed to insure a margin of error of

E=\pm 0.01

given that

\alpha=0.1

and

\sigma=2.5

? Answer: At least

n=169097

8.1.4 Exercises

1.
Answer each of the following statements as True or False.
1. (a)
  
  The $1-\alpha$ confidence intervals involving $t^{*}$ tend to be wider than the $1-\alpha$ confidence intervals involving $z^{*}$ .
2. (b)
  
  A $1-\alpha$ confidence intervals for $\mu$ will always capture $\mu$ .
3. (c)
  
  Making $\alpha$ smaller will make the confidence interval wider.
4. (d)
  
  There is a chance that the confidence interval you construct may not contain the population parameter.
5. (e)
  
  The sample standard deviation can be used as an estimate of $\sigma$ .
2.
Assuming normality in the population is satisfied, compute the $1-\alpha$ confidence interval for $\mu$ in Excel given $\bar{x}$ , $\sigma$ , $n$ , and $\alpha$ .
1. (a)
  
  $\bar{x}=22.2$ , $\sigma=1.5$ , $n=43$ , $\alpha=0.08$ .
2. (b)
  
  $\bar{x}=108.5$ , $\sigma=22.8$ , $n=104$ , $\alpha=0.01$ .
3. (c)
  
  $\bar{x}=1004$ , $\sigma=20.99$ , $n=84$ , $\alpha=0.1$ .
3.
Assuming normality in the population is satisfied, compute the $1-\alpha$ confidence interval for $\mu$ in Excel given $\bar{x}$ , $s$ , $n$ , and $\alpha$ .
1. (a)
  
  $\bar{x}=22.2$ , $\sigma=1.5$ , $n=12$ , $\alpha=0.12$ .
2. (b)
  
  $\bar{x}=108.5$ , $\sigma=22.8$ , $n=25$ , $\alpha=0.2$ .
3. (c)
  
  $\bar{x}=1004$ , $\sigma=20.99$ , $n=30$ , $\alpha=0.04$ .
4.
Assuming normality in the population is satisfied, compute the $1-\alpha$ confidence interval for $\mu$ in Python given $\bar{x}$ , $\sigma$ , $n$ , and $\alpha$ .
1. (a)
  
  $\bar{x}=34.2$ , $\sigma=2.4$ , $n=22$ , $\alpha=0.12$ .
2. (b)
  
  $\bar{x}=101.5$ , $\sigma=12.8$ , $n=32$ , $\alpha=0.2$ .
3. (c)
  
  $\bar{x}=82$ , $\sigma=16.49$ , $n=654$ , $\alpha=0.04$ .
5.
Assuming normality in the population is satisfied, compute the $1-\alpha$ confidence interval for $\mu$ in Excel given $\bar{x}$ , $s$ , $n$ , and $\alpha$ .
1. (a)
  
  $\bar{x}=34.2$ , $\sigma=2.4$ , $n=11$ , $\alpha=0.001$ .
2. (b)
  
  $\bar{x}=101.5$ , $\sigma=12.8$ , $n=14$ , $\alpha=0.03$ .
3. (c)
  
  $\bar{x}=82$ , $\sigma=16.49$ , $n=25$ , $\alpha=0.06$ .
6.
Given $\sigma$ and $\alpha$ , find the minimum sample size needed to obtain the stated margin of error, $E$ .
1. (a)
  
  $E=\pm 0.02$ ; $\sigma=1.2$ , $\alpha=0.01$ .
2. (b)
  
  $E=\pm 0.1$ ; $\sigma=10.2$ , $\alpha=0.08$ .
3. (c)
  
  $E=\pm 0.04$ ; $\sigma=100$ , $\alpha=0.1$ .
7.
Given $s$ , and $\alpha$ , find the minimum sample size needed to obtain the stated margin of error, $E$ .
1. (a)
  
  $E=\pm 0.02$ ; $s=1.2$ , $\alpha=0.01$ .
2. (b)
  
  $E=\pm 0.1$ ; $s=10.2$ , $\alpha=0.08$ .
3. (c)
  
  $E=\pm 0.04$ ; $s=100$ , $\alpha=0.1$ .
8.

Create an Excel worksheet that computes the $1-\alpha$ confidence interval for $\mu$ for the case we know $\sigma$ . It should be user friendly and only require the user to input $\bar{x}$ , $\sigma$ , $\alpha$ , and $n$ .
9.

Create an Excel worksheet that computes the $1-\alpha$ confidence interval for $\mu$ for the case we don’t know $\sigma$ . It should be user friendly and only require the user to input $\bar{x}$ , $s$ , $\alpha$ , and $n$ .
10.

Create a script in Python that computes the $1-\alpha$ confidence interval for $\mu$ for the case we know $\sigma$ . The user should only be required to input $\bar{x}$ , $\sigma$ , $\alpha$ , and $n$ .
11.

Create a script in Python that computes the $1-\alpha$ confidence interval for $\mu$ for the case we don’t know $\sigma$ . The user should only be required to input $\bar{x}$ , $s$ , $\alpha$ , and $n$ .

	$\displaystyle 1-\alpha$	$\displaystyle=P\left(-z^{}<Z<z^{}\right)$
		$\displaystyle=P\left(-z^{}<\dfrac{\bar{X}-\mu}{\sigma/\sqrt{n}}<z^{}\right)$
		$\displaystyle=P\left(-z^{}\dfrac{\sigma}{\sqrt{n}}<\bar{X}-\mu<z^{}\dfrac{% \sigma}{\sqrt{n}}\right)$
		$\displaystyle=P\left(-z^{}\dfrac{\sigma}{\sqrt{n}}-\bar{X}<-\mu<z^{}\dfrac{% \sigma}{\sqrt{n}}-\bar{x}\right)$
		$\displaystyle=P\left(\bar{X}+z^{}\dfrac{\sigma}{\sqrt{n}}>\mu>z^{}\bar{X}-% \dfrac{\sigma}{\sqrt{n}}\right)$

8.1 Confidence Intervals for the Mean

8.1.1 σ Is Known

Definition (Confidence Interval).

Theorem (Confidence Interval for μ, with σ Known).

Finding z*

Definition (Critical Value: z*).

Example 8.1.1.

Example 8.1.2.

Constructing the Confidence Interval

Example 8.1.3.

Example 8.1.4.

Derivation of Confidence Interval for μ, σ Known (Optional)

8.1.2 σ Is Not Known

Theorem (Confidence Interval for μ, with σ Not Known).

Finding t*

Definition (Critical Value: t*).

Example 8.1.5.

Example 8.1.6.

Constructing the Confidence Interval

Example 8.1.7.

Example 8.1.8.

8.1.3 Understanding the Confidence Interval

Interpreting a 1-α Confidence Interval

Changing α: How a 1-α Confidence Intervals Changes

Example 8.1.9.

Controlling the Width of Confidence Interval through n

Theorem (Controlling Margin of Error with n).

Example 8.1.10.

8.1.4 Exercises

8.1.1 $\sigma$ Is Known

Theorem (Confidence Interval for $\mu$ , with $\sigma$ Known).

Finding $z^{*}$

Definition (Critical Value: $z^{*}$ ).

Derivation of Confidence Interval for $\mu$ , $\sigma$ Known (Optional)

8.1.2 $\sigma$ Is Not Known

Theorem (Confidence Interval for $\mu$ , with $\sigma$ Not Known).

Finding $t^{*}$

Definition (Critical Value: $t^{*}$ ).

Interpreting a $1-\alpha$ Confidence Interval

Changing $\alpha$ : How a $1-\alpha$ Confidence Intervals Changes

Controlling the Width of Confidence Interval through $n$

Theorem (Controlling Margin of Error with $n$ ).