MATH 105, Topics in Mathematics

Lesson 7: Estimation

Satya Mandal

Introduction

7.1 Point and Interval Estimation
(Homework 21, 22)

7.2 Confidence interval for mean μ when σ is Unknown
(Homework 23, 24)

7.3 About the Population Proportion
(Homework 25)

Homework 21 - 25

Due Date: See the Lecture Notes Site.

Introduction

In this chapter and the next we will consider inferential statistics. Statistical inferences are very useful in real life. Due to the availability of advanced calculators (TI-84) it has become accessible at this level.

The main purpose of the subject statistics is to develop methods to use sample statistics T to estimate (or make other kinds of inferences regarding) the population parameters θ (e. g. use sample mean X to estimate the population mean μ, use sample standard deviation S to estimate the population standard deviation σ, use sample proportion of success X to estimate the population population proportion p). [Studying Probablity or math was not the goal. It was needed to do estimation and ineferential statistics, in a meaningful way.]

In this lesson, we will consider two methods to estimate the population mean μ by the sample mean X. We will also use the sample proportion of success X to estimate the population proportion p of an attribute.

Two methods of estimation are considered.

A rather simplistic method is the point estimation. In point estimation, a number t is given as an estimate for the parameter θ. Most people are used to the concept of point estimation. For example, to estimate the mean annual income μ of the US population, one is used to the idea of taking a sample of a certain size, compute the sample mean anual income x, and call it an estimate for μ.
The second one is called interval estimation. In interval estimation an interval (L, U) is given as a range where the parameter θ is estimated to be within. Most of the interval estimations we consider would consist of (1) a point estimate t fot θ and (2) an estimate e for the error (precision) of this estimation. Correspondingly, θ would be estimated to be within the interval (t-e, t+e). For example, when estimating the mean annual income μ of the US population, a statistician may take a sample, compute the sample mean x and say that the population mean μ is estimated to be within the (x-1000, x+1000). Obviously, smaller error (i.e. higher precision) would always be more desirable. Statistical estimation would always come with information about how often the actual error ε=|θ-T| would remain within a specified error-limit e. This is given in terms of probability P( ε ≤ e). This probability P( ε ≤ e) would be called the level of confidence, in the later sections. Ideally, we would like to estimate θ by T, with high precision (i.e, low error) and with high level of confidence (i.e. with higher probability that the estimate will be within a limit of error).

7.1 Point and Interval Estimation

In statistical point estimation, a statistic T is used to estimate a parameter θ. Corresponding to a sample, the value t of T would be called a point estimate of θ. The statistic T would be called an estimator of θ.

For example, the sample mean X would be an estimator of the population mean μ and the computed value x, corresponding to a sample, would be a point estimate of μ. Similarly, the sample variance S² is an estimator of the population variance σ² and the computed value s² of S², corresponding to a sample, is a point estimate of σ².

It would be a common instinct to accept that the sample mean X would be a natural estimator for the population mean μ and the sample variance S² would be a natural estimator for the population variance σ². There are some established mathematical criterions to justify why or when an estimator T would be a good candidate as an estimator for a population parameter θ. We will not get into in depth discussion on this.

Interval Estimation

Almost never, a point estimate t of a parameter θ would be exactly equal to the actual value of θ. This is why, it would be more reasonable to provide an interval (L,U) where the parameter θ is estimated to be within. Here L, U would be statistics. The values of L = l,U = u would depend on the samples. While θ would be estimated to be within (l, u), for some samples, the true value of θ would sometimes be left outside the interval (l,u). We would be happy as long as the true value of θ falls within the interval (l,u) most often (or often enough), allowing the possibility of being "wrong" a few times.

The next question would be, how often is often enough? The probability P(L ≤ θ ≤ U) is exactly the proportion of times when θ would fall within the interval (l,u). The interval estimation of θ, formally defined below, provides an interval (L,U) together with the probabity P(L ≤ θ ≤ U).

Definition. Let θ be a population parameter. An interval estimate for θ provides the following:

It gives an interval (L,U) as an estimate for θ. Here L,U are statistics or L=-∞ or U=∞.
It also gives the probability P(L ≤ θ ≤ U). This number P(L ≤ θ ≤ U) is called the level of confidence. It is often written as

P(L ≤ θ ≤ U) = 1- α.
The interval, (L,U) is said to be a (1-α)100 percent confidence interval of θ.
In practice, α will be a small number, like, 0.1, 0.01, 0.05. That means, 90, 95 or 99 percent confidence intervals of θ are commonly considered.
Three different types of confidence intervals are commonly considered.
1. When both L and U are statistics, the confidence interval (L,U) would be called a two sided confidence interval of θ.
2. When both L=-∞ and U is a statistic, the confidence interval (L,U) =(-∞, U) would be called a one sided left confidence interval of θ.
3. When both L is a statistic and U=∞, the confidence interval (L,U) =(L,-∞) would be called a one sided right confidence interval of θ.
In this couse, we only consider two sided confidence intervals. Two sided confidence intervals are analogous to giving a point estimate with an estimate of error.

Procedure to Construct Confidence Interval

In this subsection, we will construct a two sided conficence interval of the population mean μ. As you can see, in lesson 3-6, we (mostly) computed probability, while the values of the parameters like mean μ were given. The situation reverses now. Now, the value of the parameters like mean μ would have to be estimated and the probability will be given (or specified) in terms of the level of confidence. That is why, some standard inverse probability (or cut-off) numbers would be necessary. In this section, the following definition of some cut-off numbers for standard normal random variables Z will be needed.

Definition: As usual Z would denote the standard normal random variable. Given a number 0 < α < 1, the number z_α is defined by the formula

P(z_α ≤ Z) = α.
Which is same as P(Z ≤ z_α) = 1 - α.

In fact, the invNormal function of TI-84 computes z_α as follows:
z_α=invNormal(1-α).

As mentioned above, for us α will be a small number .1, .01, .05 and so on. The following animation demonstrates these numbers visually.

Animation 7.1.1

The procedure followed: A confidence interval for μ when σ is known

To explain the procedure to develop a confidence interval (of any parameter θ), we will construct a (1-α)100 percent confidence interval for the mean μ when the standard deviation σ is known.

Suppose X is a random variable with mean μ and known standard deviation σ. Let X₁,X₂, …, X_n be a sample from X. By CLT, approximately,

Z=(X-μ)√n/σ has a standard normal distribution.

Therefore, approximately,

P(-z_α/2 ≤ Z ≤ z_α/2 ) = 1 - α where Z=(X-μ)√n/σ.

Therefore, approximately,

P(-z_α/2 ≤ (X-μ) √n/σ ≤ z_α/2 ) = 1 - α.

If we simplify, we get

P(X-E ≤ μ ≤ X+E)=1- α where E=z_α/2σ/√n.

According to the definition above, this equation is precisely what is needed to define a (1-α)100 percent confidence interval for μ. We state the same as a theorem as follows.

Theorem. We use the notations as above. Assume that σ is known. Then an (approximate) (1-α)100 percent confidence interval for μ is given by

X-E ≤ μ ≤ X+E where E=z_α/2 σ/√n.

If X is normal, then it is an exact confidence interval for μ (not just an approximation).

The left side of the interval is called the Left End Point (LEP) or the lower limit and the right side of the interval is called the Right End Point (REP) or the upper limit. Informally, this confidence interval is also called a Z-interval.

As was defined above, this will be a two-sided confidence interval of μ. A (1-α)100 percent confidence intervals is interpreted as, if one computes (1-α)100 percent confidence intervals on a regular basis, the true value of μ will be within the confidence interval (1-α)100 percent times and it outside the interval α100 percent times.

Definitions and Formulas:

The length l of this (1-α)100 percent confidence interval for μ is given by

l = 2z_α/2σ/√n.
The margin of error (MOE) E is defined as

E = z_α/2σ/√n.

The Required Sample Size

(Discussion Only)

If we want to increase the level of confidence, the Margin of error(MOE) E would increase and conversely. Both can be controlled by increasing the sample size n. The required sample size n can be determined by solving the formula for the MOE above. Given a specified MOE E and a level of confidence ( 1-α), sample size n needed is given by

n = (z_α/2σ/E)².

Always round upward for n. Note that the above formula for required sample size is independent of the mean υ, but it depends on the standard deviation σ.

The following animation demonstration that there is such that guarantees both specified precission and level of confidence ( 1-α):

Animation 7.1.2

Problem Solving: In this section, we will compute confidence interval for μ. TI-84 has a method that essentially computes the Z-interval. We will compute Z-intervals using the formulas (by a "Long Hand Method"), with the help of invNormal function of TI-84. The method would be clear from the solutions of the problems given below.

Problems on 7.1: On Z-intervals for the mean

Exercise 7.1.1. Suppose X is a normal population with mean μ and standard deviation σ = 15. A sample of size 25 and the sample mean X was found to be 81. Compute a 99 percent confidence interval for μ and give the MOE.

Solution by Long Hand Method:
Here the population standard deviation σ = 15,
the sample size n = 25,
the sample mean X = 81,
Level of confidence = 99 percent. So
1 - α = .99, α = .01 and α/2 =.005.
Therefore, z_α/2 = z_.005 = invNormal(.995)= 2.5758

MOE = E=z_α/2 σ/√n =2.5758*15/√25= 7.7274

LEP = X - E = 81 - 7.7274 = 73.2726
REP = X + E = 81 + 7.7274 = 88.7274

Exercise 7.1.2A. The weight distribution X of a group of students is normal with standard deviation σ = 9.8. To estimate the mean weight μ, a sample of size 14 was collected and the sample mean X was found to be 151.1.
Find a 99 percent confidence interval for μ and give the margin of error (MOE).

Solution by Long Hand Method:
Here the population standard deviation σ = 9.8,
the sample size n = 14,
the sample mean X = 151.1,
Level of confidence = 99 percent. So
1 - α = .99, α = .01 and α/2 =.005.
Therefore, z_α/2 = z_.005 = invNormal(.995)= 2.5758

MOE = E=z_α/2 σ /√ n =2.5758*9.8/√ 14= 6.7464

LEP = X - E = 151.1 - 6.7464 = 144.3536
REP = X + E = 151.1 + 6.7464 = 157.8464

Exercise 7.1.2B. (Change the Confidence level.) As in Exercise 7.1.2A, let X be the weight distribution of a group of students. The standard deviation σ = 9.8. To estimate the mean weight μ, a sample of size 14 was collected and the sample mean X was found to be 151.1.
Find a 90 percent confidence interval for μ and give the margin of error (MOE).

Solution by Long Hand Method:
Here the population standard deviation σ = 9.8,
the sample size n = 14,
the sample mean X = 151.1,
Level of confidence = 90 percent. So
1 - α = .90, α = .1 and α/2 =.05.
Therefore, z_α/2 = z_.05 = invNormal(.95)= 1.6449

MOE = E=z_α/2 σ /√ n =1.6449*9.8/√ 14= 4.3083

LEP = X - E = 151.1 - 4.3083 = 146.7917
REP = X + E = 151.1 + 4.3083 = 155.4083

Note that the length and/or MOE has decreased, due to the decline in confidence level.

Exercise 7.1.2C. (Change the sample size.) As in Exercise 7.1.2A, let X be the weight distribution of a group of students. The standard deviation σ = 9.8. To estimate the mean weight μ, a sample of size 78 was collected and the sample mean X was found to be 151.1.
Find a 90 percent confidence interval for μ and give the margin of error (MOE).

Solution by Long Hand Method:
Here the population standard deviation σ = 9.8,
the sample size n = 78,
the sample mean X = 151.1,
Level of confidence = 90 percent. So
1 - α = .90, α = .1 and α/2 =.05.
Therefore, z_α/2 = z_.05 = invNormal(.95)= 1.6449

MOE = E=z_α/2 σ /√ n =1.6449*9.8/√ 78= 1.8252

LEP = X - E = 151.1 - 1.8252 = 149.2748
REP = X + E = 151.1 + 1.8252 = 152.9252

Note that the length and/or MOE has decreased, due to the increase in sample size.

Exercise 7.1.3. The time taken by an athlete to run an event is normally distributed with mean μ and known standard deviation σ = 3.5 seconds. To estimate the mean μ, he ran 16 times and the sample mean was found to be X = 33 seconds. Find a 90 percent confidence interval for true mean time μ and give the margin of error (MOE).

Solution by Long Hand Method:
Here the population standard deviation σ = 3.5,
the sample size n = 16,
the sample mean X = 33,
Level of confidence = 95 percent. So
1 - α = .95, α = .05 and α/2 =.025.
Therefore, z_α/2 = z_.05 = invNormal(1-.025)= 1.9600

MOE = E=z_α/2 σ /√ n =1.9600*3.5/√ 16= 1.715

LEP = X - E = 33 - 1.715 = 31.285
REP = X + E = 33 + 1.715 = 34.715

Exercise 7.1.4. Let X represent the monthly consumption of electricity (in KWH , in a winter month) by the households in a county. From past experience, it is known that the standard deviation is σ = 150 KWH. To estimate the mean monthly consumption μ, a sample of size 115 was collected and the sample mean X of the monthly consumption was found to be 874. Find a 98 percent confidence interval for μ and give the margin of error (MOE).

Solution by Long Hand Method:
Here the population standard deviation σ = 150,
the sample size n = 115,
the sample mean X = 874,
Level of confidence = 98 percent. So
1 - α = .98, α = .02 and α/2 =.01.
Therefore, z_α/2 = z_.01 = invNormal(1-.01)= 2.3263

MOE = E=z_α/2 σ /√ n =2.3263*150/√ 115= 32.5393

LEP = X - E = 874 - 32.5393 = 841.4607
REP = X + E = 874 + 32.5393 = 906.5393

Exercise 7.1.6. The monthly gas consumption by an individual in a city is normally distributed with mean μ and known standard deviation σ = 7.5 gallons. To estimate the mean μ, a sample of 33 individual collated and their mean consumption X was found to be 56 gallons. Find a 96 percent confidence interval for true mean consumption μ and give the margin of error (MOE).

Solution by Long Hand Method:
Here the population standard deviation σ = 7.5,
the sample size n = 33,
the sample mean X = 56,
Level of confidence = 98 percent. So
1 - α = .96, α = .04 and α/2 =.02.
Therefore, z_α/2 = z_.02 = invNormal(1-.02)= 2.0537

MOE = E=z_α/2 σ /√n =2.0537*7.5/√ 33= 2.6813

LEP = X - E = 56 - 2.6813 = 53.3187
REP = X + E = 56 + 2.6813 = 58.6813

Exercise 7.1.7. The monthly cell phone minutes used by an individual in a city has a mean μ and known standard deviation σ = 350 minutes. To estimate the mean μ, a sample of 780 users was collected. The sample mean X was found to be 2222. Find a 97 percent confidence interval for true mean cellphone minutes μ and give the margin of error (MOE).

Solution by Long Hand Method:
Here the population standard deviation σ = 350,
the sample size n = 780,
the sample mean X = 2222 minutes,
Level of confidence = 97 percent. So
1 - α = .97, α = .03 and α/2 =.015.
Therefore, z_α/2 = z_.015 = invNormal(1-.015)= 2.1701

MOE = E=z_α/2 σ /√n =2.1701*350/√ 780= 27.1957

LEP = X - E = 2222 - 27.1957 = 2194.8043
REP = X + E = 2222 + 27.1957 = 2249.1957

Exercise 7.1.8. The diameter of pumpkins in a farm has a mean μ and known standard deviation σ = 13 inches. To estimate the mean μ, a sample of 460 pumpkins was collected. The sample mean X was found to be 29 inches. Find a 80 percent confidence interval for true mean diameter μ and give the margin of error (MOE).

Solution by Long Hand Method:
Here the population standard deviation σ = 13,
the sample size n = 460,
the sample mean X = 29
Level of confidence = 80 percent. So
1 - α = .80, α = .20 and α/2 =.10
Therefore, z_α/2 = z_.10 = invNormal(1-.1)= 1.2816

MOE = E=z_α/2 σ /√n =1.2816*13/√ 460= .7768

LEP = X - E = 29 - .7768 = 28.2232
REP = X + E = 29 + .7768 = 29.7768

Exercise 7.1.9. The amount of time a student spends doing homework for a three credit math course has a mean μ and known standard deviation σ = 44 minutes. To estimate the mean μ, a sample of 178 student homework time was collected. The sample mean X was found to be 145 minutes. Find a 92 percent confidence interval for true mean time μ and give the margin of error (MOE).

Solution by Long Hand Method:
Here the population standard deviation σ = 44,
the sample size n = 178,
the sample mean X = 145 minutes,
Level of confidence = 92 percent. So
1 - α = .92, α = .08 and α/2 =.04
Therefore, z_α/2 = z_.04 = invNormal(1-.04)= 1.7507

MOE = E=z_α/2 σ /√n =1.7507*44/√ 178= 5.7737

LEP = X - E = 145 - 5.7737 = 139.2263
REP = X + E = 145 + 5.7737 = 150.7737

7.2 Confidence interval for mean μ when σ is Unknown

As above, let X be a random variable with mean μ and standard deviation σ. As in the last section, we want to estimate the mean μ by confidence intervals. In the last section, we assumed that the standard deviation σ was known, which would not be the case with populations without much familiarity or experience. In this section, we will deal with such populations where the standard deviation σ is not known. In the last section, the main tool (or fact) that we used was that, by CLT, the sampling distribution of the statistic Z=(X-μ) √n/σ was standard normal N(0,1). In this section, we use the distribution of a similar statistic

T=(X-μ) √n/S.

We would proceed to dtermine the sampling distribution of T.

Student's t distribution

Given a positive integer m, there is a random variable, denoted by t_m, that is said to have t distribution with degrees of freedom m. The distribution is also known as Student's t distribution, because it was discovered by a chemist W. S. Gosset, under the pseudonym "Student". The important properties of t distributions are listed below. A t random variable with degrees of freedom m would be denoted by T. We also say that the random variable T has a t_m-distribution.

Animation 7.2.1

A t_m random variable T is a continuous random variable. The graph of the pdf of a t_m random variable T is shown as follows:

The graph looks very similar to the graph of the pdf of the standard normal random variable Z. Like normal, the graph also has a bell shape. For an ordinary reader, it would not be easy to distinguish the graph from that of the standard normal random variable.
The t_m random variable T have mean zero.
The standard deviation σ of a t_m random variable (i.e. with degrees of freedom m), is given by σ =√m/(m-2), if m is at least 3.
The distribution T is symmetric around the y-axis (or around the mean zero).
As stated above, for each positive integer, there is one Student's t random variable t_m, with degrees of freedom m. When the degrees of freedom m is small, the graph is flatter. For higher degrees of freedom m, the graph is stiffer.
When m is large (say 30 or above), the pdf of a t_m random variable can be approximated by the standard normal random variable Z.

Inverse Probability for Student's t

Inverse probability for Student's t random variables is defined as was done for normal random variables. In particular, like z_α, we define t_{m, α} as follows.

Definition: Suppose T is a t-random variable with degrees of freedom m. Given a number 0 < α < 1, the number t_{m, α} is defined by the formula

P(t_{m, α} ≤ T) = α.
Which is same as P(T ≤ t_{m, α}) = 1 - α.

The invT function of TI-84 (Silver Edition) computes t_{m, α} as follows:
t_{m, α}=invT(1-α, m).

The following animation demonstrates these numbers visually.

Animation 7.2.2

Sampling Distribution of T

The following theorem describes the sampling distribution of the statistic T mentioned at the beginning of this section. This can be viewed as a counter part of the CLT for T.

Theorem. Let X be a normal random variable with mean μ and standard deviation σ. Let X₁,X₂,…, X_n be a sample of size n from the X population. Then

T=(X-μ) √n/S.

has t-distribution with degrees of freedom n-1.

Therefore,

P(-t_{n-1, α/2} ≤ (X-μ) √n/ S ≤ t_{n-1, α/2} ) = 1 - α.

So,

P(X-E ≤ μ ≤ X+E)=1- α where E=t_{n-1, α/2} S /√n.

Confidence Interval for μ when σ is unknown

As Z-interval was constructed for the mean μ in section 7.1.1, the above computations lead to a confidence interval for the mean μ, which would be useful when the standard deviation σ is not known. In this case, we have to assume that the population random variable X is normal.

Theorem. We use the notations as above. In particular X is assumed to be normal. Then a (1-α)100 percent confidence interval for μ is given by

X-E ≤ μ ≤ X+E where E=t_{n-1, α/2} S/√ n.

Informally, this confidence interval is also called a T-interval. As in the case of Z-intervals, we use the other terminologies like LEP, REP, length of the interval and Margin of Error (MOE). (We will not work with determination of the required sample size. )

Remarks.:

A commom question would be, to estimate μ, when do we use the Z-Interval and when do we use the T-Interval? Use the T-Interval only when σ is not known. Not using σ, when it is known would amount to disregarding information, which would not be a good philosophy in estimation.
Recall, for large n, t_n-1 would be approximately equal to the standard normal Z. That is why, for large sample size n, using z_α/2 instead of t_{n-1, α/2} in the formula for E would not make any signficant difference.

Optional Problems on 7.2: Probability of the T statistic

In this course, we avoid probability problems for T. The tcdf function of TI-84 (Silver Edition) makes it easy to compute probability for T. No such problems would be assigned in this case. But we would work one example.

Example: The mean salary X of the engineers in a state is μ = $65,000 and standard deviation. You collect a sample of 12 engineers. What is the probability that the T statistics of the sample would be between will be above 1.1.

Solution by TI-84 Silver Edition:
Here T has degrees of freedom df= m-1 = 12-1=11

P(T < 1.1) = tcdf(1.1, 10, 11) =0.1474

Problem Solving: In this section, we will compute confidence interval (T-Intervals) for μ, when σ is unknown.
As for Z-intervals, the TI-84 has a method that essentially computes the T-interval. We will use the formulas (by a "Long Hand Method"), with the help of invT function of TI-84. The method would be clear from the solutions of the probles given below.

There would be two types of problems (1) the statistics values (x and s) will be given or (2) raw data will be give, instead. In later case, we have compute x and s before we can use the formulas.

Problems on 7.2: T-intervals for mean: σ Unknown

Exercise 7.2.1. The mean weight μ of a campus population is to be estimated. It would be reasonable to assume that weight X has a normal distrubution. A sample of size n = 18 was collected. The sample mean was found to be x = 170.5 and standard deviation was s = 13.3. Compute a 99 percent a confidence interval for μ. State the degrees of freedom df, compute the margin of error, LEP and REP.

Solution by Long Hand Method:
Since the population standard deviation σ is unknown, we will compute a T-interval.

Here the sample size n = 18,
the sample mean X = 170.5,
Sample standard deviation s = 13.3

The degrees of freedom df = n-1 = 18-1 = 17

Level of confidence = 99 percent.
1 - α = .99, α = .01 and α/2 =.005. Therefore,
t_{n-1, α/2} = t_{17, .005} = invT(1 - .005, 17)= 2.8982

MOE = E= t_{n-1, α/2} S /√n = 2.8982 * 13.3 /√18 = 9.0854

LEP = X - E = 170.5 - 9.0854 = 161.4146
REP = X + E = 170.5 + 9.0854 = 179.5854

Exercise 7.2.2. The time taken to complete a problem in a Math 105 test is normally distributed with mean μ and standard deviation σ. A sample of size 23 was taken, and sample mean and standard deviation were found to be x = 4.7 and s = .47. Estimate the mean time μ taken to complete a problem using a 98 percent confidence interval.

Solution by Long Hand Method:
Since the population standard deviation σ is unknown,we will compute a T-interval.

Here the sample size n = 23,
the sample mean X = 4.7,
Sample standard deviation s = .47

The degrees of freedom df = n-1 = 23-1 = 22

Level of confidence = 98 percent.
1 - α = .98, α = .02 and α/2 =.01. Therefore,
t_{n-1, α/2} = t_{22, .01} = invT(1 - .01, 22) = 2.5083

MOE = E= t_{n-1, α/2} S /√n= 2.5083* .47 /√23 = .2458

LEP = X - E = 4.7 - .2458 = 4.4542
REP = X + E = 4.7 + .2458 = 4.9458

Exercise 7.2.3. To estimate the mean length μ of babies at birth a sample of size n = 16 was taken. The sample mean was found to be x = 18.6 and standard deviation was s = 9.486. Construct a 96 percent confidence interval for mean μ. State the degrees of freedom df, compute the margin of error, LEP and REP.

Solution by Long Hand Method:
It is reasonable to assume that the length X is normal. Since the population standard deviation σ is unknown, we will compute a T-interval.

Here the sample size n = 16,
the sample mean X = 18.6,
Sample standard deviation s = 9.486

The degrees of freedom df = n-1 = 16 - 1 = 15

Level of confidence = 96 percent.
1 - α = .96, α = .04 and α/2 =.02. Therefore,
t_{n-1, α/2} = t_{15, .02} = invT(1 - .02, 15) = 2.2485

MOE = E= t_{n-1, α/2} S /√n= 2.2485*9.486 /√16 = 5.3323

LEP = X - E = 18.6 - 5.3323 = 13.2677
REP = X + E = 18.6 + 5.3323 = 23.9323

Exercise 7.2.4. The time taken by an athlete to run an event is normally distributed with mean μ and unknown standard deviation σ. To estimate the mean μ he ran 9 times and the sample mean was found to be X = 33 seconds and the sample standard deviation s = 3.5 seconds. Construct a 95 percent confidence interval for mean μ. State the degrees of freedom df, compute the margin of error, LEP and REP.

Solution by Long Hand Method:
It is reasonable to assume that the time X is normal. Since the population standard deviation σ is unknown, we will compute a T-interval.

Here the sample size n = 9,
the sample mean X = 33,
Sample standard deviation s =3.5

The degrees of freedom df = n-1 = 9 - 1 = 8

Level of confidence = 95 percent.
1 - α = .95, α = .05 and α/2 =.025. Therefore,
t_{n-1, α/2} = t_{8, .025} = invT(1 - .025, 8) = 2.3060

MOE = E= t_{n-1, α/2} S /√n= 2.3060 *3.5 /√9 = 2.6903

LEP = X - E = 33 - 2.6903 = 30.3097
REP = X + E = 33 + 2.6903 = 35.6903

Exercise 7.2.5. The mean length μ of telephone calls would have to be estimated. In sample of 14 calls had a sample mean X = 13 minutes and the sample standard deviation was s = 5.5 minutes. Construct a 90 percent confidence interval for mean μ. State the degrees of freedom df compute the margin of error, LEP and REP.

Solution by Long Hand Method:
It is reasonable to assume that the time length X is normal. Since the population standard deviation σ is unknown, we will compute a T-interval.

Here the sample size n = 14,
the sample mean X = 13,
Sample standard deviation s = 5.5

The degrees of freedom df = n-1 = 14 - 1 = 13

Level of confidence = 90 percent.
1 - α = .90, α = .10 and α/2 =.05. Therefore,
t_{n-1, α/2} = t_{13, .05} = invT(1 - .05, 13) = 1.7709

MOE = E= t_{n-1, α/2} S /√ n = 1.7709*5.5 /√14 = 2.6031

LEP = X - E = 13 - 2.6031 = 10.3969
REP = X + E = 13 + 2.6031 = 15.6031

Exercise 7.2.6. The mean weight μ of a variety of bananas in a grocery store has to be estimated. A sample of 20 bananas was collected. The mean weight was found to be x = 180 grams and the sample standard deviation was s = 27 grams. Construct a 85 percent confidence interval for mean μ. State the degrees of freedom df compute the margin of error, LEP and REP.

Solution by Long Hand Method:
It is reasonable to assume that the weight X is normal. Since the population standard deviation σ is unknown, we will compute a T-interval.

Here the sample size n = 20,
the sample mean X = 180,
Sample standard deviation s = 27

The degrees of freedom df = n-1 = 20 - 1 = 19

Level of confidence = 85 percent.
1 - α = .85, α = .15 and α/2 =.075. Therefore,
t_{n-1, α/2} = t_{19, .075} = invT(1 - .075, 19) = 1.5002

MOE = E= t_{n-1, α/2} S /√ n = 1.5002*27 /√ 20 = 9.0573

LEP = X - E = 180 - 9.0573 = 170.9427
REP = X + E = 180 + 9.0573 = 189.0573

Exercise 7.2.7. It is assumed that the lifetime (in hours) of lightbulbs produced in a factory is normally distributed with mean μ and standard deviation σ. To estimate μ the following data was collected on the lifetime of bulbs.

5110	4671	6441	3331	5055	5270	5335	4973	1837
7783	4560	6074	4777	4707	5263	4978	5418	5123

Construct a 92 percent confidence interval for mean μ. State the degrees of freedom df, compute the margin of error, LEP and REP.

Solution by Long Hand Method:
It is reasonable to assume that the time X is normal. Since the population standard deviation σ is unknown, we will compute a T-interval.

As in Lesson 2, enter data in a LIST of your TI to compute mean X and sample standard deviation S:
Here the sample size n = 18,
the sample mean X = 5039.2222,
Sample standard deviation s = 1203.8025

The degrees of freedom df = n-1 = 18 - 1 = 17

Level of confidence = 92 percent.
1 - α = .92, α = .08 and α/2 =.04. Therefore,
t_{n-1, α/2} = t_{17, .04} = invT(1 - .04, 17) = 1.8619

MOE = E= t_{n-1, α/2} S /√n= 1.8619 *1203.8025 /√18 = 528.2936

LEP = X - E = 5039.2222 - 528.2936 = 4510.9286
REP = X + E = 5039.2222 + 528.2936 = 5567.5158

Exercise 7.2.8. To estimate the mean weight (in pounds) of salmon in a river the following sample was collected:

34.7	33.8	38.2	20.3	27.8	45.3	43.1	37.3	32.5	32.3
31.8	41.5	44.5	29.2	25.3	29.6	39.5	29.1	37.3

Construct a 97 percent confidence interval for mean μ. State the degrees of freedom df, compute the margin of error, LEP and REP.

Solution by Long Hand Method:
It is reasonable to assume that the weight X is normal. Since the population standard deviation σ is unknown, we will compute a T-interval.

As in Lesson 2, enter data in a LIST of your TI to compute mean X and sample standard deviation S:
Here the sample size n = 19,
the sample mean X = 34.3737,
Sample standard deviation s = 6.7608

The degrees of freedom df = n-1 = 19 - 1 = 18

Level of confidence = 92 percent.
1 - α = .97, α = .03 and α/2 =.015. Therefore,
t_{n-1, α/2} = t_{18, .04} = invT(1 - .015, 18) = 2.3562

MOE = E= t_{n-1, α/2} S /√n= 2.3562*6.7608 /√19 = 3.6545

LEP = X - E = 34.3737 - 3.6545 = 30.7192
REP = X + E = 34.3737 + 3.6545 = 38.0282

Exercise 7.2.9. The following data represents the time (in minutes) taken by students to drive to campus.

5.7	22.4	14.7	16.7	10.6	26.9	17.9	32.5	20.9	9.2
3.4	20.4	17.3	31.9	4.4	23.7

The mean weight μ of a variety of bananas in a grocery store has to be estimated. Construct a 98 percent confidence interval for mean μ. State the degrees of freedom df compute the margin of error, LEP and REP.

Solution by Long Hand Method:
It is reasonable to assume that the driving time X is normal. Since the population standard deviation σ is unknown, we will compute a T-interval.

As in Lesson 2, enter data in a LIST of your TI to compute mean X and sample standard deviation S:
Here the sample size n = 16,
the sample mean X = 17.4125,
Sample standard deviation s = 9.0843

The degrees of freedom df = n-1 = 16 - 1 = 15

Level of confidence = 92 percent.
1 - α = .98, α = .02 and α/2 =.01. Therefore,
t_{n-1, α/2} = t_{15, .01} = invT(1 - .01, 15) = 2.6025

MOE = E= t_{n-1, α/2} S /√n= 2.6025*9.0843 /√16 = 5.9104

LEP = X - E = 17.4125 - 5.9104 = 11.5021
REP = X + E = 17.4125 + 5.9104 = 23.3229

7.3 About the Population Proportion

In this section, we would estimate the population proportion p of an attribute by a confidence interval. Examples would be

proportion p of defective lamps produced in a factory,
proportion p of a variety of seeds that will germinate in a particular condition,
proportion p of voters in favor of a candidate and others.

To estimate such a proportion p, we would use the sampling distribution of the sample proportion of success X, to be defined below: We let

Y=1 if success
Y=0 if failure

where "success" means that the sample has the attribute. Y is a Bernoulli(p) random variable.

To estimate p, we take a sample X₁,X₂, …, X_n of size n from this Bernoulli( p) population and X_i represents the outcome of testing the i^th lamp as follows:

X_i=1 if i^th sample is a success
X_i=0 if i^th sample is a failure .

Then, T = X₁+X₂+… +X_n = the total Number of Success in these n trials

and the sample mean

X =T/n = the Sample Proportion of Success in these n trials.

It follows from CLT (Lesson 6) that, approximately, when the sample size is large and p is not very close to 0 or 1, then the distribution of X is approximately normal with

mean μ _X = p and the standard deviation σ _X =√p(1-p)/n.

Therefore (approximately),

P(-z_α/2 ≤ (X-p)/σ_X ≤ z_α/2 ) = 1-α.

In an attempt to compute a confidence interval for p, we simplify and get

P(X-z_α/2 σ_X ≤ p ≤ X+z_α/2 σ_X) ) = 1-α.

Since p is unknown, we cannot compute σ_X and this fails to produce a confidence interval for p.

We use the sample proportion x of success as a point estimate of p to get approximate equalities

σ_X ≈ √(x(1-x)/n) and P(X-z_α/2 σ_X ≤ p ≤ X+z_α/2 σ_X) ) ≈ 1-α.

Confidence Interval for Population Proportion p

The following theorem follows from the above computations.

Theorem. We use the notations as above. Then an approximate (1-α)100 percent confidence interval for is p given by

x-e ≤ p ≤ x+e where e = z_α/2 √(x(1-x)/n)

Further, the following are some important concepts and comments:

The margin of error (MOE) e is defined as
e = z_α/2 √(x(1-x)/n)
A conservative margin of error (Consevative MOE) E is defined as
E = z_α/2/√4n.

It can be checked

e ≤ E and also z_α/2 σ_X = z_α/2√p(1-p)/n ≤ E
This means, E is an upper bound of the esimated error e and also of the actual error of this interval estimation of p. That is why, the Comservative MOE E is taken more seriously than the ordinary MOE e.
Informally, this confidence interval is also called a 1-Proportion Z-interval. As before, we will use terminilogies and abrevations like LEP, REP, MOE and Conservative MOE.

An Example to interpret Conservative MOE

President Clinton was impeached in 1998-99 by the House of Representatives and acquitted by the Senate. During the impeachment trial, a typical news item would read as follows:

President Clinton has 64 percent approval rating. The poll has a margin of error plus or minus 3.1 percentage points. The poll surveyed 972 people.

This meant the following, with p as the proportion of the population who "approved" President Clinton:

(A point estimate): that the sample proportion x of people who "approve" President Clinton was 0.64.
So, a point estimate of p is .64.
Such polls usually use 95 percent confidence level.
Assuming that they are using a 95 percent confidence interval, they mean that

E = z_α/2 /√4n = z_.025 /√4n = 1.96/√(4x972) = 0.031.
Interestingly, they give both the Conservative MOE and sample size, while one can be computed from the other.

Problems on 7.3: On 1-Proportion Z-intervals and Coservative MOE

Exercise 7.3.1 In a sample of 197 apples from a lot, 19 were found to be sour. Set a 99 percent confidence interval for the proportion p of sour apples in the lot.

Solution:
Here the sample size n = 197,
The number of success T = 19
So, the sample proportion of success X = T/n = 19/197 = .0964 Level of confidence = 99 percent. So
1 - α = .99, α = .01 and α/2 =.005.
Therefore, z_α/2 = z_.005 = invNormal(.995)= 2.5758

MOE = e = z_α/2 √ (x(1-x)/n) = 2.5758* √ (.0964(1-.0964)/197) = .054163 [In this section, for error terms, we retain at least 6 decimal points.]

LEP = X - e = .0964 - .054163 = .042237
REP = X + e = .0964 + .054163 = .150563
Conservative MOE = E = z_α/2/√4n = 2.5758/√4*197 = .091759 [In this section, for error terms, we retain at least 6 decimal points.]

Exercise 7.3.2. A new vaccine was tried on 147 randomly selected individuals, and it was determined that 97 of them developed immunity. Find a 95 percent confidence interval for the proportion p of individuals in the population for whom the vaccine would help. Give the MOE, LEP, REP and the conservative MOE.

Solution:
Here the sample size n = 147,
The number of success T = 97
So, the sample proportion of success X = T/n = 97/147 = .6599

Level of confidence = 95 percent. So
1 - α = .95, α = .05 and α/2 =.025.
Therefore, z_α/2 = z_.025 = invNormal(.975) = 1.9600

MOE = e = z_α/2 √ (x(1-x)/n) = 1.9600* √ (.6599(1-.6599)/147) = .076584 [In this section, for error terms, we retain at least 6 decimal points.]

LEP = X - e = .6599 - .076584 = .583316
REP = X + e = .6599 + .076584 = .736484
Conservative MOE = E = z_α/2/√4n = 1.9600/√4*147 = .080829 [In this section, for error terms, we retain at least 6 decimal points.]

Exercise 7.3.3. Before a congressional election, a poll was conducted. Out of 887 randomly selected voters interviewed, 389 said that they would vote for Candidate A, and 359 said that they would vote for Candidate B.

Construct a 98 percent confidence interval for the proportion p of voters who would vote for A.
Construct a 98 percent confidence interval for the proportion p of voters who would vote for B.
What is the conservative margin of error for both?

Solution:
Here the sample size n = 887,
Level of confidence = 98 percent. So
1 - α = .98, α = .02 and α/2 =.01.
Therefore, z_α/2 = z_.01 = invNormal(.99) = 2.3263

For Candidate A:
The number of success T = 389
So, the sample proportion of success X = T/n = 389/887 = .4386 MOE = e = z_α/2 √ (x(1-x)/n) = 2.3263* √ (.4386(1-.4386)/887) = .038759       [In this section, for error terms, we retain at least 6 decimal points.]

LEP = X - e = .4386 - .038759 = .399841
REP = X + e = .4386 + .038759 = .477359
Conservative MOE = E = z_α/2/√4n = 2.3263/√4*887 = .039055       [In this section, for error terms, we retain at least 6 decimal points.]

For Candidate B:
The number of success T = 359
So, the sample proportion of success X = T/n = 359/887 = .4047 MOE = e = z_α/2 √ (x(1-x)/n) = 2.3263* √ (.4047(1-.4047)/887) = .038339       [In this section, for error terms, we retain at least 6 decimal points.]

LEP = X - e = .4047 - .038339 = .366361
REP = X + e = .4047 + .038339 = .443039
Conservative MOE = same, because the formula only depends on the sample size and level of confidence

Exercise 7.3.4. A researcher wants to estimate the proportion p of the children younger than two, in a community, who has anemia. He/She examined 180 children and found 42 children with anemia? Compute a 97 percent confidence interval for p. Give the MOE, LEP, REP and the conservative MOE.

Solution:
Here the sample size n = 180,
The number of success T = 42
So, the sample proportion of success X = T/n = 42/180 = .2333

Level of confidence = 97 percent. So
1 - α = .97, α = .03 and α/2 =.015.
Therefore, z_α/2 = z_.015 = invNormal(.985) = 2.1701

MOE = e = z_α/2 √ (x(1-x)/n) = 2.1701* √ (.2333(1-.2333)/180) = .068409 [In this section, for error terms, we retain at least 6 decimal points.]

LEP = X - e = .2333 - .068409 = .164891
REP = X + e = .2333 + .068409 = .301709
Conservative MOE = E = z_α/2/√4n = 2.1701/√4*180 = .080875 [In this section, for error terms, we retain at least 6 decimal points.]

Exercise 7.3.5. The proportion p of voters (in a county) who vote by absentee ballot is to be estimated. You sample 683 voters and found that 165 would have voted by absentee ballots. Compute 96 percent confidence interval for p. Give the MOE, LEP, REP and the conservative MOE.

Solution:
Here the sample size n = 683,
The number of success T = 165
So, the sample proportion of success X = T/n = 165/683 = .2416

Level of confidence = 96 percent. So
1 - α = .96, α = .04 and α/2 =.02.
Therefore, z_α/2 = z_.02 = invNormal(.98) = 2.0537

MOE = e = z_α/2 √ (x(1-x)/n) = 2.0537* √ (.2416(1-.2416)/683) = .033638 [In this section, for error terms, we retain at least 6 decimal points.]

LEP = X - e = .2416 - .033638 = .207962
REP = X + e = .2416 + .033638 = .275238
Conservative MOE = E = z_α/2/√ 4n = 2.0537/√4*683 = .039291 [In this section, for error terms, we retain at least 6 decimal points.]

Exercise 7.3.6. It is known that a vaccine may cause fever as side effect, after one takes the shot. The proportion p of those who suffer side effect is to be estimated. A sample of 913 individuals were examined and it was found the 153 suffered the side effect. Compute 94 percent confidence interval for p. Give the MOE, LEP, REP and the conservative MOE.

Solution:
Here the sample size n = 913,
The number of success T = 153
So, the sample proportion of success X = T/n = 153/913 = .1676

Level of confidence = 94 percent. So
1 - α = .94, α = .06 and α/2 =.03.
Therefore, z_α/2 = z_.03 = invNormal(.97) = 1.8808

MOE = e = z_α/2 √ (x(1-x)/n) = 1.8808* √ (.1676(1-.1676)/913) = .023249 [In this section, for error terms, we retain at least 6 decimal points.]

LEP = X - e = .1676 - .023249 = .144351
REP = X + e = .1676 + .023249 = .190849
Conservative MOE = E = z_α/2/√ 4n = 1.8808/√4*913 = .031123 [In this section, for error terms, we retain at least 6 decimal points.]

Exercise 7.3.7. The proportion p of those on campus who took a flu shot is to be estimated. A sample of 733 individuals were examined and it was found that 220 among them took the shot. Compute 93 percent confidence interval for p. Give the MOE, LEP, REP and the conservative MOE.

Optional Problems on 7.3: Interpretation

Exercise 7.3.8. In a poll released on October 28,1998, it was revealed that 60 percent of the US population wanted President Clinton be rebuked but not impeached. The poll was conducted among 1,013 adults, and it had a margin of error of 3 percentage points.
Can you relate the last two numbers?
What is the level of confidence used here?

Solution: News media polls use 95 percent confidence intervals. When they say "margin of error," they mean "conservative margin of error." The conservative margin of error E and level of confidence 1 - α are related by the formula E = z_α/2 /√4n.
For this problem E = .03, 1 - α =.95, and n =1,013. We can check z_α/2 /√4n = 1.96/√(4x1013) = 0.03079.

Math 105, Topics in Mathematics

Lesson 7: Estimation

Introduction

7.1 Point and Interval Estimation

Interval Estimation

Procedure to Construct Confidence Interval

The procedure followed: A confidence interval for μ when σ is known

The Required Sample Size

7.2 Confidence interval for mean μ when σ is Unknown

Student's t distribution

Inverse Probability for Student's t

Sampling Distribution of T

Confidence Interval for μ when σ is unknown

7.3 About the Population Proportion

Confidence Interval for Population Proportion p

An Example to interpret Conservative MOE