In data-driven disciplines, population parameters, such as averages and proportions, are frequently of interest. Examples are endless: average human body temperature, average human blood pressure, average sea level, average effect of targeted advertising, average returns on an investment, average effect of a new teaching technique, proportion who favor candidate X, proportion of hydrogen in the sun. Each such average, if it is exists, has an exact value. And, generally, it is a value that cannot be known. For example, the average weight of human beings is a definite number, but to compute the number precisely, one would need to weigh all human beings on the planet. That can’t be done, and hence, the average weight of human beings is unknown. This is how it is in science. We must infer from a few to many, and work with estimates to unknown numbers (parameters).
In this chapter we begin to make connections between the unknown value of a parameter of interest, measured from the population, and the behavior of the corresponding statistic, a variable in its own right, obtained by way of sampling from the population.
Recall from Chapter 3 that by creating a histogram of a large sample from a population we can get a basic understanding of the behavior of the population, usually called the population distribution. Further recall that when a population distribution is symmetric, such as when it has a ‘bell shaped’ curve, the line of symmetry corresponds to the mean of the population. We were also able to estimate the mean when the distribution was not symmetric, by looking for the ‘balancing’ point. Thus, as we saw, if a distribution is skewed left, the mean is pulled to the left of the mode and if skewed right, the mean is to the right of the mode.
We focus, individually, on the behavior of two particular statistics; the sample mean and the sample proportion. In each of the next two sections we look at the behavior of these statistics from known populations. This allows us to make connections between the population and the behavior of the sample statistic.