In this chapter and the next we will consider inferential statistics.
Statistical inferences are very useful in real life. Due to the availability
of advanced calculators (TI-84) it has become accessible at this level.
The main purpose of the subject statistics
is to
develop methods to use sample statistics
T
to
estimate (or make other kinds of inferences regarding)
the
population parameters θ
(e. g. use sample mean X
to estimate the population mean μ,
use sample standard deviation S to estimate the population
standard deviation σ, use sample proportion of success
X to estimate the population
population proportion p).
[Studying Probablity or math was not the goal.
It was needed to do estimation and ineferential statistics, in a
meaningful way.]
In this lesson,
we will consider two methods to estimate the population mean
μ by the sample mean X.
We will also use the sample proportion of success X
to estimate the population proportion p of an attribute.
Two methods of estimation are considered.
A rather simplistic method is the
point estimation.
In point estimation, a number t is given
as an estimate for the parameter θ.
Most people are used to the concept of point estimation.
For example, to estimate the mean annual income
μ
of the US population, one is used to the idea of
taking a sample of a certain size,
compute the sample mean anual income x,
and call
it an estimate for μ.
The second one is called interval estimation.
In interval estimation an interval (L, U) is given as a range where the parameter θ is estimated to be within.
Most of the interval estimations we consider
would consist of
(1) a point estimate t fot θ and (2) an estimate e for the
error (precision)
of this estimation. Correspondingly, θ would be estimated to be within
the interval (t-e, t+e).
For example, when estimating the mean annual income
μ of the US
population, a statistician may take a sample,
compute the sample mean x
and say that the population mean μ
is estimated to be within the
(x-1000,
x+1000).
Obviously,
smaller error (i.e.
higher precision)
would always be more desirable.
Statistical estimation would always come with
information about how often the actual error ε=|θ-T| would
remain
within a specified error-limit e. This is given in terms of probability
P( ε
≤ e). This probability P( ε
≤ e) would be called the level of confidence, in the later sections.
Ideally, we would like to estimate θ by T, with high precision (i.e, low error) and with high level of confidence (i.e.
with higher probability that the estimate will be within a limit of error).
7.1 Point and Interval Estimation
In statistical point estimation, a
statistic T is used to estimate a parameter θ. Corresponding to a sample, the value t of T would be called
a point estimate
of θ. The statistic
T would be called an
estimator of θ.
For example, the sample mean X
would be an estimator
of the
population mean μ and the computed value
x, corresponding to a sample,
would be
a point estimate of μ.
Similarly, the sample
variance S2 is an estimator of the population variance σ2
and the computed value s2
of S2, corresponding to a sample,
is a point estimate of σ2.
It would be a common instinct to accept that the sample mean
X would be a natural estimator for the population mean μ and the sample variance S2
would be a
natural estimator for the population variance σ2. There are
some established mathematical criterions
to justify why or when
an estimator T would be
a good candidate as an estimator for a population parameter θ.
We will not get into in depth discussion on this.
Interval Estimation
Almost never, a
point estimate t of a parameter θ
would be exactly equal to the actual value of
θ.
This is why, it would be
more reasonable to provide an interval (L,U) where the parameter
θ is estimated to be
within.
Here L, U would be statistics. The values of L = l,U =
u would depend on the samples.
While θ would be estimated to be within
(l, u), for some samples,
the true value of θ would sometimes be left outside the interval (l,u).
We
would be happy as long as the true value of θ falls within the interval
(l,u) most often (or often enough), allowing the possibility of being
"wrong" a few times.
The next question would be,
how often is often enough?
The probability P(L
≤ θ
≤ U) is exactly the proportion of times when
θ would fall
within the interval (l,u). The interval estimation of θ,
formally defined below,
provides
an interval (L,U)
together with
the probabity P(L
≤ θ
≤ U).
Definition. Let θ
be a population parameter. An interval estimate
for θ provides the following:
It gives an interval (L,U) as an estimate for θ.
Here L,U are statistics or L=-∞ or U=∞.
It also gives the probability P(L
≤ θ
≤ U). This number
P(L
≤ θ
≤ U)
is called the level of confidence. It is often written as
P(L
≤ θ
≤ U) = 1- α.
The interval,
(L,U) is said to be a (1-α)100
percent confidence interval of θ.
In practice, α will be a small
number, like, 0.1, 0.01, 0.05. That means,
90, 95 or 99 percent confidence intervals of θ are commonly considered.
Three different types of confidence intervals are commonly considered.
When both L and U are statistics, the confidence interval (L,U) would
be called a two sided confidence interval
of θ.
When both L=-∞ and U is a statistic, the confidence interval (L,U)
=(-∞, U)
would be called a
one sided left confidence interval of θ.
When both L is a statistic and U=∞,
the confidence interval (L,U) =(L,-∞) would be called a
one sided
right confidence interval of θ.
In this couse, we only consider two sided confidence
intervals. Two sided confidence intervals are analogous to giving
a point estimate with an estimate of error.
Procedure to Construct Confidence Interval
In this subsection, we will construct a two sided conficence
interval of the population mean μ.
As you can see, in lesson 3-6, we (mostly) computed probability, while
the values of the parameters like mean μ were given.
The situation reverses now.
Now, the value of the parameters like mean μ would
have to be estimated and the probability will be given (or specified)
in terms of the level of confidence.
That is why,
some standard inverse probability (or cut-off)
numbers would be necessary. In this section, the following definition
of some cut-off numbers for standard normal random variables Z will be needed.
Definition:
As usual Z would denote the standard normal random variable.
Given a number
0 < α < 1,
the
number zα
is defined by the formula
P(zα ≤ Z) =
α.
Which is same as
P(Z ≤ zα) = 1 -
α.
In fact, the invNormal function of TI-84 computes zα as follows:
zα=invNormal(1-α).
As mentioned above, for us α will be a small number .1, .01, .05 and so
on.
The following animation demonstrates these numbers visually.
Animation 7.1.1
The procedure followed: A confidence interval for μ when σ is known
To explain the procedure to develop a confidence interval (of any parameter
θ),
we will construct
a (1-α)100
percent confidence interval
for the mean
μ
when the standard deviation σ is known.
Suppose X is a random variable with mean
μ
and known standard deviation σ.
Let X1,X2, …,
Xn be a sample from X. By CLT, approximately,
Z=(X-μ)√n/σ
has a standard normal distribution.
Therefore, approximately,
P(-zα/2
≤ Z ≤
zα/2 ) = 1 - α
where
Z=(X-μ)√n/σ.
Therefore, approximately,
P(-zα/2 ≤
(X-μ)
√n/σ
≤
zα/2 ) = 1 - α.
If we simplify, we get
P(X-E
≤
μ
≤
X+E)=1- α
where E=zα/2 σ/√n.
According to the definition above, this equation is precisely
what is needed to define a
(1-α)100 percent confidence interval
for μ.
We state the same as a theorem as follows.
Theorem.
We use the notations as above.
Assume that σ
is known. Then an (approximate)
(1-α)100
percent confidence interval for μ
is given by
X-E
≤
μ
≤
X+E
where
E=zα/2σ/√n.
If X is normal, then it is an exact
confidence interval for μ (not just an approximation).
The left side of the interval is called the Left End Point (LEP)
or the lower limit and
the right side of the interval is called the Right End Point (REP)
or the upper limit. Informally, this confidence interval is also called a
Z-interval.
As was defined above, this will be a two-sided confidence interval of μ.
A (1-α)100
percent confidence intervals is
interpreted as,
if one
computes (1-α)100
percent confidence intervals on a regular basis, the true value of
μ will be within the confidence
interval
(1-α)100 percent times and it outside the
interval
α100 percent times.
Definitions and Formulas:
The lengthl
of this (1-α)100 percent confidence
interval for μ is given by
l = 2zα/2σ/√n.
The margin of error (MOE)
E is defined as
E = zα/2σ/√n.
The Required Sample Size
(Discussion Only)
If we want to increase the level of confidence,
the Margin of error(MOE) E would increase and conversely.
Both can be controlled by increasing the sample size n.
The required sample size n can be determined by solving the formula
for the MOE above. Given a specified MOE E and a level of confidence
( 1-α), sample size n needed is given by
n = (zα/2σ/E)2.
Always round upward for n.
Note that the
above formula for required sample size is
independent
of the mean υ, but it
depends on the standard deviation σ.
The following animation demonstration that there is
such that guarantees both specified precission and level of confidence
( 1-α):
Animation 7.1.2
Problem Solving: In this section, we will compute
confidence interval for μ. TI-84 has a method that essentially
computes the Z-interval. We will compute Z-intervals using the formulas (by a
"Long Hand Method"),
with the help of invNormal function of TI-84.
The method would be clear from the solutions of the problems given below.
Problems on 7.1: On Z-intervals for the mean
Exercise 7.1.1.
Suppose X
is a normal
population with mean μ and standard
deviation σ = 15.
A sample of size 25 and the sample mean X
was found to be 81. Compute a 99 percent confidence interval for
μ and give the MOE.
Solution by Long Hand Method:
Here the population standard deviation σ = 15,
the sample size n = 25,
the sample mean X = 81,
Level of confidence = 99 percent. So
1 - α = .99, α = .01 and α/2 =.005.
Therefore, zα/2 = z.005 = invNormal(.995)= 2.5758
MOE = E=zα/2σ/√n
=2.5758*15/√25= 7.7274
LEP = X - E = 81 - 7.7274 = 73.2726
REP = X + E = 81 + 7.7274 = 88.7274
Exercise 7.1.2A. The weight distribution X
of a group of students is normal
with standard
deviation σ = 9.8.
To estimate the mean weight μ, a
sample of size 14 was collected and
the sample mean X
was found to be 151.1.
Find a 99 percent confidence interval for μ and give the margin of error (MOE).
Solution by Long Hand Method:
Here the population standard deviation σ = 9.8,
the sample size n = 14,
the sample mean X = 151.1,
Level of confidence = 99 percent. So
1 - α = .99, α = .01 and α/2 =.005.
Therefore, zα/2 = z.005 = invNormal(.995)= 2.5758
MOE = E=zα/2σ/√n
=2.5758*9.8/√14= 6.7464
LEP = X - E = 151.1 - 6.7464 = 144.3536
REP = X + E = 151.1 + 6.7464 = 157.8464
Exercise 7.1.2B.
(Change the Confidence level.)
As in Exercise 7.1.2A,
let X be the weight distribution of a group of students.
The standard
deviation σ = 9.8.
To estimate the mean weight μ, a
sample of size 14 was collected and
the sample mean X
was found to be 151.1.
Find a 90 percent confidence interval for μ and give the margin of error (MOE).
Solution by Long Hand Method:
Here the population standard deviation σ = 9.8,
the sample size n = 14,
the sample mean X = 151.1,
Level of confidence = 90 percent. So
1 - α = .90, α = .1 and α/2 =.05.
Therefore, zα/2 = z.05 = invNormal(.95)= 1.6449
MOE = E=zα/2σ/√n
=1.6449*9.8/√14= 4.3083
LEP = X - E = 151.1 - 4.3083 = 146.7917
REP = X + E = 151.1 + 4.3083 = 155.4083
Note that the length and/or MOE has decreased,
due to the decline in confidence level.
Exercise 7.1.2C.
(Change the
sample size.) As in Exercise 7.1.2A,
let X be the weight distribution of a group of students.
The standard
deviation σ = 9.8.
To estimate the mean weight μ, a
sample of size 78 was collected and
the sample mean X
was found to be 151.1.
Find a 90 percent confidence interval for μ
and give the margin of error (MOE).
Solution by Long Hand Method:
Here the population standard deviation σ = 9.8,
the sample size n = 78,
the sample mean X = 151.1,
Level of confidence = 90 percent. So
1 - α = .90, α = .1 and α/2 =.05.
Therefore, zα/2 = z.05 = invNormal(.95)= 1.6449
MOE = E=zα/2σ/√n
=1.6449*9.8/√78= 1.8252
LEP = X - E = 151.1 - 1.8252 = 149.2748
REP = X + E = 151.1 + 1.8252 = 152.9252
Note that the length and/or MOE has decreased,
due to the increase in sample size.
Exercise 7.1.3. The time taken by an athlete
to run an event is normally distributed with mean μ
and known standard deviation σ = 3.5
seconds. To estimate the mean μ, he
ran 16 times and the sample mean was found to be X
= 33 seconds.
Find a 90 percent confidence interval for true mean time μ
and give the margin of error (MOE).
Solution by Long Hand Method:
Here the population standard deviation σ = 3.5,
the sample size n = 16,
the sample mean X = 33,
Level of confidence = 95 percent. So
1 - α = .95, α = .05 and α/2 =.025.
Therefore, zα/2 = z.05 = invNormal(1-.025)= 1.9600
MOE = E=zα/2σ/√n
=1.9600*3.5/√16= 1.715
LEP = X - E = 33 - 1.715 = 31.285
REP = X + E = 33 + 1.715 = 34.715
Exercise 7.1.4.
Let X represent the monthly consumption of electricity (in KWH ,
in a winter month)
by the households in a county. From past experience, it is known that the standard
deviation is
σ = 150 KWH.
To estimate the mean monthly consumption μ, a
sample of size 115 was collected and
the sample mean X
of the monthly consumption
was found to be 874.
Find a 98 percent confidence interval for μ
and give the margin of error (MOE).
Solution by Long Hand Method: Here the population standard deviation σ = 150, the sample size n = 115,
the sample mean X = 874,
Level of confidence = 98 percent. So 1 - α = .98, α = .02 and α/2 =.01.
Therefore, zα/2 = z.01 = invNormal(1-.01)= 2.3263
MOE = E=zα/2σ/√n
=2.3263*150/√115= 32.5393
LEP = X - E = 874 - 32.5393 = 841.4607
REP = X + E = 874 + 32.5393 = 906.5393
Exercise 7.1.6.
The monthly gas consumption by an individual in a city is normally distributed with mean μ
and known standard deviation
σ = 7.5 gallons.
To estimate the mean μ,
a sample of
33 individual collated and their mean consumption
X was found to be 56 gallons.
Find a 96 percent confidence interval for true mean
consumption μ
and give the margin of error (MOE).
Solution by Long Hand Method:
Here the population standard deviation σ = 7.5, the sample size n = 33,
the sample mean X = 56,
Level of confidence = 98 percent.
So 1 - α = .96, α = .04 and
α/2 =.02.
Therefore, zα/2 = z.02 =
invNormal(1-.02)= 2.0537
MOE = E=zα/2σ/√n
=2.0537*7.5/√33= 2.6813
LEP = X - E = 56 - 2.6813 = 53.3187
REP = X + E = 56 + 2.6813 = 58.6813
Exercise 7.1.7.
The monthly cell phone minutes used by an individual
in a city has a mean
μ
and known standard deviation
σ = 350 minutes.
To estimate the mean μ,
a sample of
780 users was collected. The sample mean
X was found to be 2222.
Find a 97 percent confidence interval for true mean
cellphone minutes μ
and give the margin of error (MOE).
Solution by Long Hand Method:
Here the population standard deviation σ = 350, the sample size n =
780,
the sample mean X = 2222 minutes,
Level of confidence = 97 percent. So 1 - α = .97, α = .03
and α/2 =.015.
Therefore, zα/2 = z.015 =
invNormal(1-.015)= 2.1701
MOE = E=zα/2σ/√n
=2.1701*350/√780= 27.1957
LEP = X - E = 2222 - 27.1957 = 2194.8043
REP = X + E = 2222 + 27.1957 = 2249.1957
Exercise 7.1.8.
The diameter of pumpkins in a farm has a mean
μ
and known standard deviation
σ = 13 inches.
To estimate the mean μ,
a sample of
460 pumpkins was collected. The sample mean
X was found to be 29 inches.
Find a 80 percent confidence interval for true mean
diameter μ
and give the margin of error (MOE).
Solution by Long Hand Method:
Here the population standard deviation σ = 13, the sample size
n = 460,
the sample mean X = 29
Level of confidence = 80 percent. So 1 - α = .80, α = .20
and α/2 =.10
Therefore, zα/2 = z.10 =
invNormal(1-.1)= 1.2816
MOE = E=zα/2σ/√n
=1.2816*13/√460= .7768
LEP = X - E = 29 - .7768 = 28.2232
REP = X + E = 29 + .7768 = 29.7768
Exercise 7.1.9.
The amount of time a student spends doing homework for a
three credit math course has a mean
μ
and known standard deviation
σ = 44 minutes.
To estimate the mean μ,
a sample of
178 student homework time was collected. The sample mean
X was found to be 145 minutes.
Find a 92 percent confidence interval for true mean
time μ
and give the margin of error (MOE).
Solution by Long Hand Method:
Here the population standard deviation σ = 44,
the sample size n = 178,
the sample mean X = 145 minutes,
Level of confidence = 92 percent. So
1 - α = .92, α = .08
and α/2 =.04
Therefore, zα/2 = z.04 =
invNormal(1-.04)= 1.7507
MOE = E=zα/2σ/√n
=1.7507*44/√178= 5.7737
LEP = X - E = 145 - 5.7737 = 139.2263
REP = X + E = 145 + 5.7737 = 150.7737
7.2 Confidence interval for mean μ
when σ is Unknown
As above, let X be a random variable with mean μ and standard
deviation σ. As in the last section, we want to estimate the mean
μ by confidence intervals. In the last section,
we assumed that the standard deviation σ was known,
which would not be the case with populations without much familiarity or
experience. In this section, we will deal with such populations
where the standard deviation σ is not known.
In the last section, the main tool (or fact) that we used was that,
by CLT, the sampling distribution of the statistic
Z=(X-μ)
√n/σ was standard normal N(0,1).
In this section, we use the distribution of a similar statistic
T=(X-μ)
√n/S.
We would proceed to dtermine the sampling
distribution of T.
Student's t distribution
Given a positive integer m, there
is a random variable, denoted by tm,
that is said to have t distribution
with degrees of freedom m. The distribution is also known as
Student's t distribution, because it was discovered by a
chemist W. S. Gosset, under the pseudonym "Student".
The important properties of t distributions
are listed below. A t random variable with degrees of freedom m would be
denoted by T.
We also say that the random variable T has a
tm-distribution.
Animation 7.2.1
A tm
random variable T is a continuous random variable. The graph of the pdf of
a tm random variable T is shown as follows:
The graph looks very similar to the graph
of the pdf of the standard normal random variable Z.
Like normal, the graph also has a bell shape.
For an ordinary reader, it would not be easy to distinguish the
graph from that of the standard normal random variable.
The tm random variable T have
mean zero.
The standard deviation σ of
a tm random variable (i.e. with degrees of freedom m),
is given by σ =√m/(m-2),
if m is at least 3.
The distribution T is symmetric around the y-axis
(or around the mean zero).
As stated above, for each positive integer,
there is one
Student's t random variable tm, with degrees of freedom m.
When the degrees of freedom m is small, the graph is flatter.
For higher degrees of freedom m,
the graph is stiffer.
When m is large (say 30 or above), the pdf of a
tm random variable can be
approximated by the
standard normal random variable Z.
Inverse Probability for Student's t
Inverse probability for Student's t random variables is defined as was
done for normal random variables. In particular, like zα,
we define tm, α as follows.
Definition:
Suppose T is a t-random variable with degrees of freedom m.
Given a number
0 < α < 1,
the
number tm, α
is defined by the formula
P(tm, α ≤ T) =
α.
Which is same as
P(T ≤ tm, α) = 1 -
α.
The invT function of TI-84 (Silver Edition) computes tm, α as follows:
tm, α=invT(1-α, m).
The following animation demonstrates these numbers visually.
Animation 7.2.2
Sampling Distribution of T
The following theorem describes the sampling distribution of
the statistic T mentioned
at the beginning of this section. This can be viewed as
a counter part of the CLT for T.
Theorem. Let X be a
normal random variable with mean
μ and standard deviation σ.
Let X1,X2,…, Xn be a sample
of size n from the X population. Then
T=(X-μ)
√n/S.
has t-distribution with degrees of freedom n-1.
Therefore,
P(-tn-1, α/2 ≤
(X-μ)
√n/
S
≤
tn-1, α/2 ) = 1 - α.
So,
P(X-E
≤
μ
≤
X+E)=1-
α
where E=tn-1, α/2
S
/√n.
Confidence Interval for μ when σ is unknown
As Z-interval was constructed for the mean μ in section 7.1.1,
the above computations lead to a confidence interval for
the mean μ, which would be useful
when the standard deviation
σ is not known. In this case, we have to assume that the population
random variable X is normal.
Theorem.
We use the notations as above. In particular
X is assumed to be normal.
Then a
(1-α)100
percent confidence interval for μ
is given by
X-E
≤
μ
≤
X+E
where
E=tn-1, α/2 S/√
n.
Informally, this confidence interval is also called a
T-interval. As in the case of Z-intervals, we use the other terminologies like LEP, REP,
length of the interval and Margin of Error (MOE).
(We will not work with determination of the
required sample size. )
Remarks.:
A commom question would be, to estimate μ,
when do we use the Z-Interval and when do we use the T-Interval?
Use the T-Interval
only when σ is not known.
Not using σ, when it is known would amount to disregarding
information,
which would not be a good philosophy in estimation.
Recall, for large n,
tn-1 would be
approximately equal to the standard normal Z.
That is why,
for large sample size n,
using z α/2 instead of tn-1, α/2 in the formula for E would not make any signficant difference.
Optional Problems on 7.2: Probability of the T statistic
In this course, we avoid probability problems for T.
The tcdf function of TI-84 (Silver Edition) makes it easy to compute probability for T. No such problems would be assigned in this case.
But we would work one example.
Example:
The mean salary X of the engineers in a state is μ = $65,000 and standard deviation. You collect a sample of 12 engineers. What is the probability that the T statistics of the sample would be between
will be above 1.1.
Solution by TI-84 Silver Edition:
Here T has degrees of freedom df= m-1 = 12-1=11
P(T < 1.1) = tcdf(1.1, 10, 11) =0.1474
Problem Solving: In this section, we will compute
confidence interval (T-Intervals) for μ, when σ is unknown.
As for Z-intervals, the
TI-84 has a method that essentially computes the T-interval.
We will use the formulas (by a
"Long Hand Method"),
with the help of invT function of TI-84.
The method would be clear from the solutions of the probles given below.
There would be two types of problems (1) the statistics values
(x and s) will be given or
(2) raw data will be give,
instead. In later case,
we have compute x and s before we can use the formulas.
Problems on 7.2: T-intervals for mean:
σ
Unknown
Exercise 7.2.1.
The mean weight μ of a campus population
is to be estimated. It would be
reasonable to assume that weight X has a
normal distrubution.
A sample of size
n = 18 was collected.
The sample mean was found to be
x = 170.5 and
standard deviation was
s = 13.3. Compute a 99 percent
a confidence interval for μ.
State the degrees of freedom df, compute the
margin of error, LEP and REP.
Solution by Long Hand Method:
Since the population standard deviation σ is unknown,
we will compute a T-interval.
Here the sample size n = 18,
the sample mean X = 170.5,
Sample standard deviation s = 13.3
LEP = X - E = 170.5 - 9.0854 = 161.4146
REP = X + E = 170.5 + 9.0854 = 179.5854
Exercise 7.2.2. The time taken
to complete a problem in a Math 105 test is normally distributed with
mean μ and standard deviation σ.
A sample of size 23 was taken, and sample mean and standard deviation
were found to be x = 4.7 and s = .47. Estimate
the mean time μ taken to complete a
problem using a 98 percent confidence interval.
Solution by Long Hand Method: Since the population standard deviation σ is unknown,we will compute a T-interval.
Here the sample size n = 23,
the sample mean X = 4.7,
Sample standard deviation s = .47
LEP = X - E = 4.7 - .2458 = 4.4542
REP = X + E = 4.7 + .2458 = 4.9458
Exercise 7.2.3.
To estimate the mean length μ of babies
at birth a sample of size n = 16 was taken.
The sample mean was found to be
x
= 18.6 and standard deviation was
s = 9.486. Construct a 96 percent confidence
interval for mean μ.
State the degrees of freedom df, compute the margin of error, LEP and REP.
Solution by Long Hand Method:
It is reasonable to assume that the length X is normal.
Since the population standard deviation σ is unknown,
we will compute a T-interval.
Here the sample size n = 16,
the sample mean X = 18.6,
Sample standard deviation s = 9.486
LEP = X - E = 18.6 - 5.3323 = 13.2677
REP = X + E = 18.6 + 5.3323 = 23.9323
Exercise 7.2.4. The time taken by an athlete
to run an event is normally distributed with mean μ and unknown
standard deviation σ. To estimate
the mean μ he ran 9 times and the
sample mean was found to be X = 33 seconds
and the sample standard deviation s = 3.5 seconds.
Construct a 95 percent confidence interval for mean μ.
State the degrees of freedom df, compute the margin of error, LEP and REP.
Solution by Long Hand Method:
It is reasonable to assume that the time X is normal.
Since the population standard deviation σ is unknown,
we will compute a T-interval.
Here the sample size n = 9,
the sample mean X = 33,
Sample standard deviation s =3.5
LEP = X - E = 33 - 2.6903 = 30.3097
REP = X + E = 33 + 2.6903 = 35.6903
Exercise 7.2.5.
The mean length μ of telephone calls would have to be estimated.
In sample of 14 calls had a sample mean
X = 13 minutes
and the sample standard deviation was s = 5.5 minutes.
Construct a 90 percent confidence interval for mean μ.
State the degrees of freedom df compute the margin of error, LEP and REP.
Solution by Long Hand Method:
It is reasonable to assume that the time length X is normal.
Since the population standard deviation σ is unknown,
we will compute a T-interval.
Here the sample size n = 14,
the sample mean X = 13,
Sample standard deviation s = 5.5
LEP = X - E = 13 - 2.6031 = 10.3969
REP = X + E = 13 + 2.6031 = 15.6031
Exercise 7.2.6.
The mean weight μ of a variety of bananas in a grocery store has to be estimated.
A sample of 20 bananas was collected. The mean weight was found to be
x = 180 grams
and the sample standard deviation was s = 27 grams.
Construct a 85 percent confidence interval for mean μ.
State the degrees of freedom df compute the margin of error, LEP and REP.
Solution by Long Hand Method:
It is reasonable to assume that the weight X is normal.
Since the population standard deviation σ is unknown,
we will compute a T-interval.
Here the sample size n = 20,
the sample mean X = 180,
Sample standard deviation s = 27
LEP = X - E = 180 - 9.0573 = 170.9427
REP = X + E = 180 + 9.0573 = 189.0573
Exercise 7.2.7. It is assumed that the lifetime
(in hours) of lightbulbs produced in a factory is normally distributed
with mean μ and standard deviation
σ. To estimate μ
the following data was collected on the lifetime of bulbs.
5110
4671
6441
3331
5055
5270
5335
4973
1837
7783
4560
6074
4777
4707
5263
4978
5418
5123
Construct a 92 percent confidence interval for mean μ.
State the degrees of freedom df, compute the margin of error, LEP and REP.
Solution by Long Hand Method:
It is reasonable to assume that the time X is normal.
Since the population standard deviation σ is unknown,
we will compute a T-interval.
As in Lesson 2, enter data in a LIST of your TI
to compute mean X
and sample standard deviation S:
Here the sample size n = 18,
the sample mean X = 5039.2222,
Sample standard deviation s = 1203.8025
LEP = X - E = 5039.2222 - 528.2936 = 4510.9286
REP = X + E = 5039.2222 + 528.2936 = 5567.5158
Exercise 7.2.8. To estimate the mean weight
(in pounds) of salmon in a river the following sample was collected:
34.7
33.8
38.2
20.3
27.8
45.3
43.1
37.3
32.5
32.3
31.8
41.5
44.5
29.2
25.3
29.6
39.5
29.1
37.3
Construct a 97 percent confidence interval for mean μ.
State the degrees of freedom df, compute the margin of error, LEP and REP.
Solution by Long Hand Method:
It is reasonable to assume that the weight X is normal.
Since the population standard deviation σ is unknown,
we will compute a T-interval.
As in Lesson 2, enter data in a LIST of your TI
to compute mean X
and sample standard deviation S:
Here the sample size n = 19,
the sample mean X = 34.3737,
Sample standard deviation s = 6.7608
LEP = X - E = 34.3737 - 3.6545 = 30.7192
REP = X + E = 34.3737 + 3.6545 = 38.0282
Exercise 7.2.9.
The following data represents the time (in minutes) taken by students to drive to campus.
5.7
22.4
14.7
16.7
10.6
26.9
17.9
32.5
20.9
9.2
3.4
20.4
17.3
31.9
4.4
23.7
The mean weight μ of a variety of bananas in a grocery store has to be estimated.
Construct a 98 percent confidence interval for mean μ.
State the degrees of freedom df compute the margin of error, LEP and REP.
Solution by Long Hand Method:
It is reasonable to assume that the driving time X is normal.
Since the population standard deviation σ is unknown,
we will compute a T-interval.
As in Lesson 2, enter data in a LIST of your TI
to compute mean X
and sample standard deviation S:
Here the sample size n = 16,
the sample mean X = 17.4125,
Sample standard deviation s = 9.0843
LEP = X - E = 17.4125 - 5.9104 = 11.5021
REP = X + E = 17.4125 + 5.9104 = 23.3229
7.3 About the Population Proportion
In this section, we would estimate the population proportion p of an attribute
by a confidence interval. Examples would be
proportion p of defective
lamps produced in a factory,
proportion p of a variety of seeds that will
germinate in a particular condition,
proportion p of voters in favor of
a candidate and others.
To estimate such a proportion p, we would use the
sampling distribution of the sample proportion of success
X, to be defined below:
We let
Y=1 if success
Y=0if
failure
where "success" means that the sample has the attribute.
Y is a Bernoulli(p) random variable.
To estimate p, we take a sample X1,X2, …, Xn of size n from this Bernoulli( p) population and Xi represents the outcome of testing the ith lamp as follows:
Xi=1 if ith sample is a
success
Xi=0 if ith sample is a failure
.
Then,
T = X1+X2+… +Xn =
the total Number of Success in these n trials
and the sample mean
X =T/n =the Sample Proportion of Success in these n trials.
It follows from CLT (Lesson 6) that, approximately,
when the sample size is large and p is not very close to 0 or 1,
then the distribution of
X is approximately normal with
mean
μX = p
and the standard deviation
σX
=√p(1-p)/n.
Therefore (approximately),
P(-zα/2
≤
(X-p)/σX
≤
zα/2 )
= 1-α.
In an attempt to compute a confidence interval for p,
we simplify and get
P(X-zα/2σX
≤ p
≤ X+zα/2σX)
) = 1-α.
Since
p is unknown, we cannot compute
σX
and
this fails to produce a confidence interval for
p.
We use the sample proportion x of success
as a point estimate of p to get approximate equalities
σX
≈ √(x(1-x)/n)
and
P(X-zα/2σX ≤ p
≤ X+zα/2σX)
) ≈
1-α.
Confidence Interval for Population Proportion p
The following theorem follows from the above computations.
Theorem.
We use the notations as above.
Then an approximate
(1-α)100
percent confidence interval for
is p given by
x-e
≤
p
≤
x+e
where
e = zα/2√(x(1-x)/n)
Further, the following are some important concepts and comments:
The margin of error
(MOE) e
is defined as
e = zα/2√(x(1-x)/n)
A conservative margin of error
(Consevative MOE) E
is defined as
E = zα/2/√4n.
It can be checked
e
≤
E and also
zα/2σX
= zα/2√p(1-p)/n
≤
E
This means, E
is an upper bound of the esimated error e and also of the
actual error of this interval estimation of p. That is why, the Comservative
MOE E is taken more seriously than the ordinary MOE e.
Informally, this confidence interval is also called a
1-Proportion Z-interval.
As before, we will use terminilogies and abrevations like LEP, REP, MOE and Conservative MOE.
An Example to interpret Conservative MOE
President Clinton was impeached in 1998-99 by the House of
Representatives and acquitted by the Senate.
During the impeachment trial, a typical news item would read as follows:
President Clinton has
64 percent approval rating. The poll has a margin of error plus or minus
3.1 percentage points. The poll surveyed 972 people.
This meant the following, with p as the proportion of the population who "approved" President Clinton:
(A point estimate): that the sample proportion x of
people who "approve" President Clinton was 0.64.
So, a point estimate of p is .64.
Such polls usually use 95 percent confidence level.
Assuming that they are
using a 95 percent confidence interval, they mean that
E = zα/2/√4n
=
z.025/√4n
= 1.96/√(4x972)
= 0.031.
Interestingly, they give both the Conservative MOE and sample size, while
one can be computed from the other.
Problems on 7.3: On 1-Proportion Z-intervals and Coservative MOE
Exercise 7.3.1 In a sample of 197 apples
from a lot, 19 were found to be sour. Set a 99 percent confidence interval
for the proportion p of sour apples in the lot.
Solution:
Here the sample size n = 197,
The number of success T = 19
So, the sample proportion of success X =
T/n = 19/197 = .0964
Level of confidence = 99 percent. So
1 - α = .99, α = .01 and α/2 =.005.
Therefore, zα/2 = z.005 = invNormal(.995)= 2.5758
MOE = e
= zα/2√
(x(1-x)/n)
=
2.5758*
√
(.0964(1-.0964)/197)
= .054163
[In this section, for error terms, we retain at least 6 decimal points.]
LEP = X - e = .0964 - .054163 = .042237
REP = X + e = .0964 + .054163 = .150563
Conservative MOE =
E = zα/2/√4n
=
2.5758/√4*197
= .091759
[In this section, for error terms, we retain at least 6 decimal points.]
Exercise 7.3.2. A new vaccine was tried on
147 randomly selected individuals, and it was determined that 97 of
them developed immunity. Find a 95 percent confidence interval for the
proportion p of individuals in the population for whom the vaccine
would
help. Give the MOE, LEP, REP and the conservative
MOE.
Solution:
Here the sample size n = 147,
The number of success T = 97
So, the sample proportion of success X =
T/n = 97/147 = .6599
Level of confidence = 95 percent. So
1 - α = .95, α = .05 and α/2 =.025.
Therefore, zα/2 = z.025 = invNormal(.975) = 1.9600
MOE = e
= zα/2√
(x(1-x)/n)
=
1.9600*
√
(.6599(1-.6599)/147)
= .076584
[In this section, for error terms, we retain at least 6 decimal points.]
LEP = X - e = .6599 - .076584 = .583316
REP = X + e = .6599 + .076584 = .736484
Conservative MOE =
E = zα/2/√4n
=
1.9600/√4*147
= .080829
[In this section, for error terms, we retain at least 6 decimal points.]
Exercise 7.3.3. Before a congressional election,
a poll was conducted. Out of 887 randomly selected voters interviewed,
389 said that they would vote for Candidate A, and 359 said that they
would vote for Candidate B.
Construct a 98 percent confidence interval for the proportion p
of voters who would vote for A.
Construct a 98 percent confidence interval for the proportion p
of voters who would vote for B.
What is the conservative margin of error for both?
Solution:
Here the sample size n = 887,
Level of confidence = 98 percent. So
1 - α = .98, α = .02 and α/2 =.01.
Therefore, zα/2 = z.01 = invNormal(.99) = 2.3263
For Candidate A:
The number of success T = 389
So, the sample proportion of success X =
T/n = 389/887 = .4386
MOE = e
= zα/2√
(x(1-x)/n)
=
2.3263*
√
(.4386(1-.4386)/887)
= .038759
[In this section, for error terms, we retain at least 6 decimal points.]
LEP = X - e = .4386 - .038759 = .399841
REP = X + e = .4386 + .038759 = .477359
Conservative MOE =
E = zα/2/√4n
=
2.3263/√4*887
= .039055
[In this section, for error terms, we retain at least 6 decimal points.]
For Candidate B:
The number of success T = 359
So, the sample proportion of success X =
T/n = 359/887 = .4047
MOE = e
= zα/2√
(x(1-x)/n)
=
2.3263*
√
(.4047(1-.4047)/887)
= .038339
[In this section, for error terms, we retain at least 6 decimal points.]
LEP = X - e = .4047 - .038339 = .366361
REP = X + e = .4047 + .038339 = .443039
Conservative MOE = same, because the formula only depends on the sample size and level of confidence
Exercise 7.3.4.
A researcher wants to estimate the proportion p
of the children younger than two, in a community, who has anemia.
He/She examined 180
children and found 42 children with anemia?
Compute a 97 percent confidence interval for p.
Give the MOE, LEP, REP and the conservative MOE.
Solution:
Here the sample size n = 180,
The number of success T = 42
So, the sample proportion of success X =
T/n = 42/180 = .2333
Level of confidence = 97 percent. So
1 - α = .97, α = .03 and α/2 =.015.
Therefore, zα/2 = z.015 = invNormal(.985) = 2.1701
MOE = e
= zα/2√
(x(1-x)/n)
=
2.1701*
√
(.2333(1-.2333)/180)
= .068409
[In this section, for error terms, we retain at least 6 decimal points.]
LEP = X - e = .2333 - .068409 = .164891
REP = X + e = .2333 + .068409 = .301709
Conservative MOE =
E = zα/2/√4n
=
2.1701/√4*180
= .080875
[In this section, for error terms, we retain at least 6 decimal points.]
Exercise 7.3.5.
The proportion p of voters (in a county)
who vote by absentee ballot is to be estimated.
You sample 683 voters and found that 165 would have voted by absentee ballots.
Compute 96 percent confidence interval for p.
Give the MOE, LEP, REP and the conservative MOE.
Solution:
Here the sample size n = 683,
The number of success T = 165
So, the sample proportion of success X =
T/n = 165/683 = .2416
Level of confidence = 96 percent. So
1 - α = .96, α = .04 and α/2 =.02.
Therefore, zα/2 =
z.02 = invNormal(.98) = 2.0537
MOE = e
= zα/2√
(x(1-x)/n)
=
2.0537*
√
(.2416(1-.2416)/683)
= .033638
[In this section, for error terms, we retain at least 6 decimal points.]
LEP = X - e = .2416 - .033638 = .207962
REP = X + e = .2416 + .033638 = .275238
Conservative MOE =
E = zα/2/√4n
=
2.0537/√4*683
= .039291
[In this section, for error terms, we retain at least 6 decimal points.]
Exercise 7.3.6.
It is known that a vaccine may cause fever as side effect,
after one takes the shot.
The proportion p of those who suffer side effect is to be estimated.
A sample of 913 individuals were examined and
it was found the 153 suffered the side effect. Compute 94 percent confidence interval for p.
Give the MOE, LEP, REP and the conservative MOE.
Solution:
Here the sample size n = 913,
The number of success T = 153
So, the sample proportion of success X =
T/n = 153/913 = .1676
Level of confidence = 94 percent. So
1 - α = .94, α = .06 and α/2 =.03.
Therefore, zα/2 =
z.03 = invNormal(.97) = 1.8808
MOE = e
= zα/2√
(x(1-x)/n)
=
1.8808*
√
(.1676(1-.1676)/913)
= .023249
[In this section, for error terms, we retain at least 6 decimal points.]
LEP = X - e = .1676 - .023249 = .144351
REP = X + e = .1676 + .023249 = .190849
Conservative MOE =
E = zα/2/√4n
=
1.8808/√4*913
= .031123
[In this section, for error terms, we retain at least 6 decimal points.]
Exercise 7.3.7.
The proportion p of those on campus who took a flu shot is to be estimated.
A sample of 733 individuals were examined and
it was found that 220 among them
took the shot. Compute 93 percent confidence interval for p.
Give the MOE, LEP, REP and the conservative MOE.
Optional Problems on 7.3: Interpretation
Exercise 7.3.8. In a poll released on October
28,1998, it was revealed that 60 percent of the US population
wanted President
Clinton be rebuked but not impeached. The poll was conducted among 1,013
adults, and it had a margin of error of 3 percentage points.
Can you relate the last two numbers?
What is the level of confidence used here?
Solution:
News media polls use 95 percent confidence intervals. When they say
"margin of error," they mean "conservative margin of error." The
conservative margin of error E and level of confidence
1 - α are related by the formula
E = zα/2/√4n.
For this problem E = .03, 1 - α =.95, and n =1,013. We can check zα/2/√4n = 1.96/√(4x1013) = 0.03079.