| Goals: |
|
In your daily life, you may have encountered the phrase, “what is the proportion of (fill-in the blank)?” Proportions allow us to summarize what how often an attribute occurs in relation to the whole.
Given a sample of data values and a subset of data values from the sample having a specified attribute, the sample proportion of the specified attribute, denoted as , is the ratio of to . That is,
If the collection of data values represents the entire population, then the proportion of the specified attribute is referred to as the population proportion of the specified attribute and is denoted as just .
Remark: It is common to express a proportion as a decimal, fraction, or a percentage. By definition, and written as a decimal, a proportion is a value ranging from 0 to 1.
Proportions are easy to calculate. By definition, we must first calculate the frequency of a specified attribute first. Frequency distributions, as a result, allow us to quickly calculated proportions. Let’s revisit an example.
Let’s find for the bin given in the frequency distribution in Figure 3.28. The frequencies of each bin are given. All we need to do is divide the frequency for bin by the total number of data values in the sample, . Thus, .
Let’s use , to generate a sample of 250 and calculate the sample proportion of the value showing up.
In cell , type RANDOM SAMPLE. Using , generate a random sample of 250 data points in cells through . A snapshot of our random sample is in Figure 3.44.
In cells through , copy the setup as it is in Figure 3.45.
Calculate the frequencies of each of the outputs occurring in cell through by typing . In our sample, it shows that we obtained a frequency of fours. Since the sample size is , the sample frequency is .
Quick question – would the sample proportion be different since a random sample was taken? Of course! In fact, our next example will highlight this.
What happens to the sample proportion of 4‘s if we take multiple random samples of size 250 or more using the command ?
Keeping the sample size the same, let’s denote the sample proportions calculated from each random sample generated as , where stands for which random sample generated. Thus, in the previous example we have . Repeating the previous exercise, or if you have it still open, pressing F9 on the keyboard will automatically generate a new random sample. All the work should be done for us, so we can then just record each sample proportion. The following table shows a sample of sample proportions calculated.
| 0.152 | |
|---|---|
| 0.172 | |
| 0.224 | |
| 0.192 | |
| 0.196 |
You should notice that the values vary as hypothesized. From our sample, though, we see that the values are as low as 0.152 and as high as 0.224. Did you obtain any values lower or higher?
It is interesting to notice that based on our large sample size of it seems more difficult to obtain very large or very small sample proportions. (Keep regenerating samples. Do you get 0.5 or more ever, or 0.02 or smaller ever?) It may still be possible to obtain these sample proportions, but it definitely seems to be more difficult to do so.
Increase the sample size to and calculate another 5 sample proportion of 4‘s. Below is our collection of sample proportions.
| 0.174 | |
|---|---|
| 0.177 | |
| 0.183 | |
| 0.168 | |
| 0.159 |
It seems more likely to obtain sample proportions between 0.15 and 0.18 and more difficult to obtain values smaller than 0.15 and larger than 0.18. We can then hypothesize as increases, the possible range of values will shrink.
The sample collected is meant to be a representation of the population. The larger the sample size, the more the sample represents the population.1010Assuming there is no biasness in the sample collected. Our intuition then allows us to hypothesize that the sample proportion should be estimating the population proportion with larger sample sizes.
Can we make a prediction of what the population proportion is of a 4 showing up using the command ?
The command assumes an equally likely chance of obtaining any of the values 1 through 6. That is, if we had a sample size of 6, we would expect to see 4 show up once. This is a proportion! Hence, we have the population proportion of 4‘s showing is . Can you see the relationship of the sample proportions in the previous examples and the population proportion?
Answer the following as True or False.
stands for sample proportion.
For each sample taken, should never change.
It is possible to obtain a proportion of 0.
It is possible to obtain a proportion of 200%.
Proportions can be written as a fraction, decimal, or percentage.
Generate a sample of 1000 data values using the command and answer the following.
What is the sample proportion of values that lie within ?
What is the sample proportion of values that lie within ?
What is the sample proportion of values that lie within ?
Do you suspect your answers will be different than your another student in the class? Why or why not?
Repeat the previous exercise but with 8000 data values. Compare your results with that in the previous exercise.
Use the command
to generate a sample of 900 different favorite colors taken over the years. Calculate the sample proportion of Blue showing up. Can you estimate the population proportion of Blue showing up?