Math 365, Elementary Statistics
Lesson 6 : Sampling Distribution
The sample mean x that we have computed in the previous chapters is, in fact, the observed value of a random variable X. Similarly, the sample variance s2 that we have computed before is the observed value of a random variable S2. Each time you collect a sample/data, the computed sample mean x is the value of the random variable X for this sample. This is explained in the following example.
x1, x2, …, xn.
They are, respectively, the observed values of n random variables
X1, X2, …, Xn.
We (re)define the sample mean X as the random variable
We also (re)define sample variance S2 as the random variable
So, the sample mean we computed before in Lesson 2 is a value of X.
We also say that X1, X2, … , Xn is a sample from the population X = height of an American. We assume that our sampling was done with replacement. Such a sample has the following properties.
The type of sampling where we do not place back the item selected before we select the next one is called sampling without replacement. Although many textbooks have a lengthy discussion of this concept, we will not emphasize it. All our samples are drawn with replacement and have the above properties.
Central Limit Theorem
Suppose X1,X2, …,Xn is a sample from a population X with mean μ and variance σ2.
Assume n is large.
Suppose you are conducting a poll to determine the proportion p (or percentage) of people in favor of a certain presidential candidate. You interview a randomly selected sample of n voters. Then you let X be the number of people among these n voters who are in favor of the candidate. Then X/n is the proportion in this sample that are in favor of the candidate. We use this sample proportion X/n as an estimate for the proportion of the entire voter population that are in favor of the candidate. This is the number X/n that the pollsters report on TV every evening before the election.
Here p is the proportion of voters that are in favor of the candidate. So, X is a B(n,p) random variable. We have already seen (section 5.3 in lesson 5) that, approximately, X follows a N(μ, σ) distribution, where μ = np, σ = √(np(1-p)). From this it follows that the sample proportion X/n, approximately, has
N (p, σ)
In fact, the same could be derived from the central limit theorem. Let
Y=1 if success
Here by "success" we mean that the voter is in favor of the candidate.
Then Y is a Bernoulli(p) random variable and the mean of Y is p and
the variance(Y) = p(1-p). The response of each voter in the sample could
thus be represented as a random variable as follows
Xi=1 if ith sample is a
Then X1,X2, … , Xn is a sample from the Y- population, and the sample proportion
X/n = X =(X1+X2+… +Xn)/n
is the sample mean. So, by CLT the sample proportion X=X/n, approximately, has
The final formulas regarding sample proportion X=X/n are as follows:
Remark. The same thing applies when you are trying to estimate the proportion of success p. Some examples might be the proportion of defective items, the proportion of people in favor of capital punishment, the proportion of immigrants.
Remark. The normal approximation of the sample proportion given above is not really different from the normal approximation of the binomial random variable (section 5.3). The only difference is the way we use them. In section 5.3, we used continuity correction. For large n, continuity correction is, in fact, negligible and will not have any effect.
Problems on 6.1: Central Limit Theorem and Sampling Distribution of the Proportion
Problems on Central Limit Theorem:
Exercise 6.1.1. It is known that the tuition
paid per semester by students in a university has a distribution with
mean $2,050 and standard deviation $310. If 64 students are interviewed,
what is the approximate probability that the sample mean tuition paid
will be above $2,060?
Exercise 6.1.3. According to some data, the
annual Kansas wheat export X has a mean 733 million dollars and standard
deviation 163 million dollars. What is the probability that over the
next 10 years Kansas wheat exports will exceed 8040 million dollars?
Problems on Population Proportion:
Exercise 6.1.4. According to a report entitled "Pediatric Nutrition Surveillance" published by Centers for Disease Control (CDC) 18 percent of the children younger than two had anemia in 1997. On a particular day in that year, a pediatrician examined 180 children.
Exercise 6.1.5. On one day during an impeachment hearing, it is claimed that 75 percent of eligible voters think the President should not be impeached. Suppose we interview 700 voters. Assuming the above, what is the probability that the sample proportion of voters who do not think the President should be impeached