MATH 105, Topics in Mathematics

Lesson 7: Distribution of the Sample Mean

Introduction

In this lesson, our final theorem would be that the sample mean

X = (X₁+X₂+ ... +X_n)/n

has normal distribution.

Given a set of data the mean or the average x value that we have computed in the previous chapters is, in fact, the observed value of a random variable X to be called the sample mean.

Similarly, the standard deviation s that we have computed before is the observed value of a random variable S to be called sample standard deviation.

Each time you collect a sample/data the computed sample mean x is the value of the random variable X for this sample.

Our point of view is explained in the following example.

Example 7.1.1. Suppose we want to study the height distribution of the U.S. population. We collect a data of size n = 713 as follows:

71	62	67	73	61	58	63	58	69	68
55	57	51	57	49	63	63	64	72	59
67	59	57	69	55	56	65	66	53	53
51	66	68	71	61	63	53	76	77	67

And so on

Our point of view is that the height

x₁, x₂, x₃, ... , x_n
(in our case 71, 62, 67, ... )
are, in fact, the observed values of random variables
X₁, X₂, X₃, ... , X_n,
respectively.

Here X₁ is the notation for height of the first member of the sample, which could be the height of anybody from the whole U.S. population, and in the case of our sample the value of X₁ is 71. Similarly, X₂ is the notation for height of the second member of the sample, which could be the height of anybody from the whole U.S. population, and in the case of our sample the value of X₂is 62.

Each time we collect a sample
x₁, x₂, x₃, ... , x_n
the values of X₁, X₂, X₃, ... , X_n
will be different.

But the sample members x₁, x₂, x₃, ... , x_n happen to be the values of the same set of random variables

X₁, X₂, X₃, ... , X_n.

Definition: We define the sample mean X as the random variable

X = (X₁+ X₂+ ....+ X_n)/n.

Each time we collect a sample of size n, we get a value of X, namely the average of the sample x₁, x₂, x₃, ... , x_n.

Remark: The main point here is that when we collect a sample and compute the mean x (or average), the value of x that we get is probabilistic or "chancy." We must talk about the probability distribution of x or X. If we know the distribution of X, then we will be able to answer the questions related to probability of various values of x that we may get.

We could make similar comments and definitions about the standard deviation. But we may not need them.

If we denote X to be the random variable of the height of an American, then we also say that

X₁, X₂, X₃, ... , X_n

is a sample from the population X-population. We used the example of height distribution of the U.S. population to explain our point of view. But given any random variable X (like weight, wages, binomial), we can talk about a sample

X₁, X₂, X₃, ... , X_n

from the X-population.

Properties: Suppose X is a random variable and let

X₁, X₂, X₃, ... , X_n

be a sample from the X-population. Then we have the following properties.

Suppose the mean of X is μ and standard deviation σ. Then X is called the parent or the population random variable. Also μ and σ are called the population mean and standard deviation.
Then each sample member X_i has the same distribution as X. Mean X_i is μ, and standard deviation of X_i is σ.
The sample members
X₁, X₂, X₃, ... , X_n

are all independent.
The distribution of X is called the sampling distribution of X.

Theorem: The mean of the sample mean X is equal to the population mean μ. So,

mean(X) = mean(X) = μ.

The standard deviation of the sample mean X is given by

σ _X = σ /(n ^1/2).

The Central Limit Theorem: Suppose

X₁, X₂, X₃, ... , X_n

is a sample from a population X with mean μ and standard deviation σ.

Then the sample mean is, approximately, normally distributed with mean μ and σ _X = σ /(n ^1/2).
Approximately
P(a < X < b) =P( (a- μ)/σ _X < Z < (b- μ)/σ _X).

Problems on Sampling

Exercise 7.1.1. It is known that the tuition X paid per semester by students in a university has a distribution with mean μ= $2,050 and standard deviation σ = $310. If 64 students are interviewed, what is the approximate probability that the sample mean tuition X paid will be above $2,060? Solution: Here we are asked to compute P(2,060 < X) ?

The mean of X = μ = 2,050 and
standard deviation of X = σ _X =
σ/(n^1/2)
= 310/(64^1/2) = 310/8 = 38.75.

So
P(2,060 < X) =
P((2060 - μ)/σ _X < (X- μ)/σ _X) =
P((2060 - 2050)/38.75 < Z) =
P(0.26 < Z) =
1 - P(Z ≤ 0.26) = 1 - 0.6026 = 0.3974.

Flash-animated Solution

Exercise 7.1.2. Monthly water consumption X per household, in a subdivision in Kansas City, has normal distribution with mean 15,000 gallons and standard deviation 3,000 gallons. What is the probability that the mean consumption of the 44 households in the subdivision will exceed 16,000 gallons?

Solution: Here we are asked to compute P(16000 < X). Sample size n = 44.

The mean of X = μ = 15000 and standard deviation of X = σ _X = σ /(n^1/2)
= 3000/(44^1/2) = 452.267.

So, P(16000 < X) =
P((16000 - μ)/σ _X < (X - μ)/σ _X) =
P((16000 - 15000)/452.267 < Z) =
P( 2.21 < Z) =
1 - P(Z ≤ 2.21) = 1 - 0.9864 = 0.0136.

Flash-animated Solution

Exercise 7.1.3. According to some data, the annual Kansas wheat export X has a mean of 733 million dollars and standard deviation of 163 million dollars. What is the probability that over next ten years Kansas wheat export will exceed 8040 million dollars? Solution

Exercise 7.1.4. The lead content in blood, in a county, has mean μ = 2 (microgram/deciliter) with the standard deviation σ = 0.02. If this claim is valid, then what is the approximate probability that n = 400 blood samples will have a sample mean lead content of more than 2.002? Solution

Exercise 7.1.5. The mean annual salary in a local industry has mean μ = $67,000.00 and the standard deviation σ = $16,000. You collect a sample of size 256 employees. What is the probability that the mean salary will exceed $67,500.00? Solution

Math 105, Topics in Mathematics

Lesson 7: Distribution of the Sample Mean

Introduction