Math 365, Elementary Statistics

 

Lesson 7: Estimation

Satya Mandal

Due Date: Visit the homework site.

Introductionback to top

The objective of this course has been to develop methods to use sample statistics T (e. g.sample mean X, sample standard deviation S, sample proportion of success X) to estimate population parameters θ (e. g. mean μ, standard deviation σ, population proportion p). We are ready to do the same in this lesson. We would use sampling distributions of sample mean and sample proportion of success that we discussed in lesson 6, (and the sampling distributions of other statistics) to develop methods to estimate the corresponding population parameters. As a particular example, we use the sample mean X as an estimate for the population mean μ.

We consider two methods of estimating parameters.

  1. The first one is called point estimation. In point estimation, a number t is given as an estimate for the parameter θ. Most people are used to the concept of point estimation. For example, to estimate the mean annual income μ of the US population, one is used to the idea of taking a sample of a certain size, compute the sample mean anual income x, and call it an estimate for μ.
  2. The second one is called interval estimation. In interval estimation an interval (L, U) is given as a range where the parameter θ is estimated to be within. Most of the interval estimations we consider would consist of (1) a point estimate t fot θ and (2) an estimate e for the error (precision) of this estimation. Correspondingly, θ would be estimated to be within the interval (t-e, t+e). For example, when estimating the mean annual income μ of the US population, a statistician may take a sample, compute the sample mean x and say that the population mean μ is estimated to be within the (x-1000, x+1000). Obviously, smaller error (i.e. higher precision) would be always be more desirable. Statistical estimation would always come with information about how often the actual error ε=|θ-T| would remain within a specified error-limit e. This is given in terms of probability P( ε   ≤   e). This probability P( ε   ≤   e) would be called the level of confidence, in the later sections. Ideally, we would like to estimate θ by T, with high precition (i.e, low error) with high level of confidence.


7.1 Point and Interval Estimationback to top

In statistical point estimation, a statistic T is used to estimate a parameter θ. Corresponding to a sample, the value t of T would be called a point estimate of θ. The statistic T would be called an estimator of θ.

For example, the sample mean X would be an estimator of the population mean μ and the computed value x, corresponding to a sample, would be a point estimate of μ. Similarly, the sample variance S2 is an estimator of the population variance σ2 and the computed value s2 of S2, corresponding to a sample, is a point estimate of σ2.

It would be a common instinct to accept that the sample mean X would be a natural estimator for the population mean μ and the sample variance S2 would be a natural estimator for the population variance σ2. There are some established mathematical criterions to justify why or when an estimator T would be a good candidate as an estimator for a population parameter θ. The most basic such criterion is called the unbiasedness as defined below.

Definition. A statistic T is said to be an unbiased estimator of a population parameter θ, if the mean of T or expected value E(T)=θ. In fact, it can be proved that

E(X) = μ,       E(S2) = σ2.

Therefore, the sample mean X is an unbiased estimator of the population mean μ and the sample variance S2 is an unbiased estimator of the population variance σ2.

As was stated in Lesson 6, the standard deviation of X, denoted by σ X, is given by

σX = σ/n.

So, the standard deviation σX of X decreases to zero, as the sample size n increases. Because of this, as was demonstrated in the Animation 6.1.1, the values of X would be close to the mean μ more frequently, if and when the sample size n is larger. This is another justification why the sample mean X would be a good estimator for mean μ. In the next subsection, this property will factor into the so called confidence level.

Interval Estimation

Almost never, a point estimate t of a parameter θ would be exactly equal to the actual value of θ. This is why, it would be more reasonable to provide an interval (L,U) where the parameter θ would estimated to be within. Here L, U would be statistics. The values of L = l,U = u would depend on the samples. While θ would be estimated to be within (l, u), for some samples, the true value of θ would sometimes be left outside the interval (l,u). We would be happy as long as the true value of θ falls within the interval (l,u) most often (or often enough), allowing the possibility of being "wrong" a few times.

The next question would be, how often is often enough? The probability P(L   ≤   θ   ≤   U) is exactly the proportion of times when θ would fall within the interval (l,u). The interval estimation of θ, formally defined below, provides an interval (L,U) together with the probabity P(L   ≤   θ   ≤   U).

Definition. Let θ be a population parameter. An interval estimate for θ provides the following:

  1. It gives an interval (L,U) as an estimate for θ. Here L,U are statistics or L=-∞ or U=∞.
  2. It also gives the probability P(L   ≤   θ   ≤   U). This number P(L   ≤   θ   ≤   U) is called the level of confidence. It is often written as

    P(L   ≤   θ   ≤   U) = 1- α.

  3. The interval, (L,U) is said to be a (1-α)100 percent confidence interval of θ.
  4. In practice, α will be a small number, like, 0.1, 0.01, 0.05. That means, 90, 95 or 99 percent confidence intervals of θ are commonly considered.
  5. Three different types of confidence intervals are commonly considered.
    1. When both L and U are statistics, the confidence interval (L,U) would be called a two sided confidence interval of θ.
    2. When L=-∞ and U is a statistic, the confidence interval (L,U) =(-∞, U) would be called a one sided left confidence interval of θ.
    3. When L is a statistic and U=∞, the confidence interval (L,U) =(L,∞) would be called a one sided right confidence interval of θ.
    In this couse, we only consider two sided confidence intervals.

Procedure to Construct Confidence Interval

In this subsection, we will construct a two sided conficence interval of the population mean μ. As you can see, till lesson 6, we (mostly) computed probability, while the values of the parameters like mean μ were given. The situation reverses now. As was the goal of this course, at this stage, the value of the parameters like mean μ would have to be estimated and the probability will be given (or specified) in terms of the level of confidence. That is why, some standard inverse probability (or cut-off) numbers would be necessary. In this section, the following definition of some cut-off numbers for standard normal random variables Z will be needed.

Definition: As usual Z would denote the standard normal random variable. Given a number 0 < α < 1, the number zα is defined by the formula

P(zα ≤ Z) = α.
Which is same as          P(Z ≤ zα) = 1 - α.

In fact, the invNormal function of TI-84 computes zα as follows:
zα=invNormal(1-α).

As mentioned above, for us α will be a small number .1, .01, .05 and so on. The following animation demonstrates these numbers visually.

Animation 7.1.1

The procedure followed: A confidence interval for μ when σ is known

To explain the procedure to develop a confidence interval (of any parameter θ), we will construct a (1-α)100 percent confidence interval for the mean μ when the standard deviation σ is known.

Suppose X is a random variable with mean μ and known standard deviation σ. Let X1,X2, …, Xn be a sample from X. By CLT, approximately,

Z=(X-μ)n/σ          has a standard normal distribution.

Therefore, approximately,

P(-zα/2 ≤ Z ≤ zα/2 ) = 1 - α               where           Z=(X-μ)n/σ.

Therefore, approximately,

P(-zα/2    ≤   (X-μ) n/σ    ≤    zα/2 ) = 1 - α.

If we simplify, we get

P(X-E    ≤    μ    ≤    X+E)=1- α      where      E=zα/2 σ/n.

According to the definition above, this equation is precisely what is needed to define a (1-α)100 percent confidence interval for μ. We state the same as a theorem as follows.

Theorem. We use the notations as above. Assume that σ is known. Then an (approximate) (1-α)100 percent confidence interval for μ is given by

X-E   ≤   μ   ≤   X+E      where      E=zα/2 σ/n.

If X is normal, then it is an exact confidence interval for μ (not just an approximation).

The left side of the interval is called the Left End Point (LEP) or the lower limit and the right side of the interval is called the Right End Point (REP) or the upper limit. Informally, this confidence interval is also called a Z-interval.

As was defined above, this will be a two-sided confidence interval of μ. A (1-α)100 percent confidence intervals is interpreted as, if one computes (1-α)100 percent confidence intervals on a regular basis, the true value of μ will be within the confidence interval (1-α)100 percent times and it outside the interval α100 percent times.

Definitions and Formulas:

  1. The length l of this (1-α)100 percent confidence interval for μ is given by

    l = 2zα/2σ/n.

  2. The margin of error (MOE) E is defined as

    E = zα/2σ/√n.

The Required Sample Size

If we want to increase the level of confidence, the Margin of error(MOE) E would increase and conversely. Both can be controlled by increasing the sample size n. The required sample size n can be determined by solving the formula for the MOE above. Given a specified MOE E and a level of confidence ( 1-α), sample size n needed is given by

n = (zα/2σ/E)2.

Always round upward for n. Note that the above formula for required sample size is independent of the mean υ, but it depends on the standard deviation σ.

The following animation demonstration that there is such that guarantees both specified precission and level of confidence ( 1-α):

Animation 7.1.2

Problem Solving: In this section, we will compute confidence interval for μ and also determine sample size as above. We will compute Z-intervals using the formulas with the help of invNormal function of TI-84. The method would be clear from the solutions of the problems given below.

Problems on 7.1: On Z-intervals for the mean

Exercise 7.1.1. Suppose X is a normal population with mean μ and standard deviation σ = 15. A sample of size 25 and the sample mean X was found to be 81. Compute a 99 percent confidence interval for μ and give the MOE.

Solution:
Here the population standard deviation σ = 15,
the sample size n = 25,
the sample mean X = 81,
Level of confidence = 99 percent. So
1 - α = .99,   α = .01   and   α/2 =.005.
Therefore, zα/2 = z.005 = invNormal(.995)= 2.5758

MOE = E=zα/2 σ/n =2.5758*15/25= 7.7274

LEP = X - E = 81 - 7.7274 = 73.2726
REP = X + E = 81 + 7.7274 = 88.7274

Exercise 7.1.2A. The weight distribution X of a group of students is normal with standard deviation σ = 9.8. To estimate the mean weight μ, a sample of size 14 was collected and the sample mean X was found to be 151.1.
Find a 99 percent confidence interval for μ and give the margin of error (MOE).

Solution:
Here the population standard deviation σ = 9.8,
the sample size n = 14,
the sample mean X = 151.1,
Level of confidence = 99 percent. So
1 - α = .99,   α = .01   and   α/2 =.005.
Therefore, zα/2 = z.005 = invNormal(.995)= 2.5758

MOE = E=zα/2 σ / n =2.5758*9.8/ 14= 6.7464

LEP = X - E = 151.1 - 6.7464 = 144.3536
REP = X + E = 151.1 + 6.7464 = 157.8464

Exercise 7.1.2B. (Change the Confidence level.) As in Exercise 7.1.2A, let X be the weight distribution of a group of students. The standard deviation σ = 9.8. To estimate the mean weight μ, a sample of size 14 was collected and the sample mean X was found to be 151.1.
Find a 90 percent confidence interval for μ and give the margin of error (MOE).

Solution:
Here the population standard deviation σ = 9.8,
the sample size n = 14,
the sample mean X = 151.1,
Level of confidence = 90 percent. So
1 - α = .90,   α = .1   and   α/2 =.05.
Therefore, zα/2 = z.05 = invNormal(.95)= 1.6449

MOE = E=zα/2 σ / n =1.6449*9.8/ 14= 4.3083

LEP = X - E = 151.1 - 4.3083 = 146.7917
REP = X + E = 151.1 + 4.3083 = 155.4083

Note that the length and/or MOE has decreased, due to the decline in confidence level.

Exercise 7.1.2C. (Change the sample size.) As in Exercise 7.1.2A, let X be the weight distribution of a group of students. The standard deviation σ = 9.8. To estimate the mean weight μ, a sample of size 78 was collected and the sample mean X was found to be 151.1.
Find a 90 percent confidence interval for μ and give the margin of error (MOE).

Solution:
Here the population standard deviation σ = 9.8,
the sample size n = 78,
the sample mean X = 151.1,
Level of confidence = 90 percent. So
1 - α = .90,   α = .1   and   α/2 =.05.
Therefore, zα/2 = z.05 = invNormal(.95)= 1.6449

MOE = E=zα/2 σ / n =1.6449*9.8/ 78= 1.8252

LEP = X - E = 151.1 - 1.8252 = 149.2748
REP = X + E = 151.1 + 1.8252 = 152.9252

Note that the length and/or MOE has decreased, due to the increase in sample size.

Exercise 7.1.3. The time taken by an athlete to run an event is normally distributed with mean μ and known standard deviation σ = 3.5 seconds. To estimate the mean μ, he ran 16 times and the sample mean was found to be X = 33 seconds. Find a 90 percent confidence interval for true mean time μ and give the margin of error (MOE).

Solution:
Here the population standard deviation σ = 3.5,
the sample size n = 16,
the sample mean X = 33,
Level of confidence = 95 percent. So
1 - α = .95,   α = .05   and   α/2 =.025.
Therefore, zα/2 = z.05 = invNormal(1-.025)= 1.9600

MOE = E=zα/2 σ / n =1.9600*3.5/ 16= 1.715

LEP = X - E = 33 - 1.715 = 31.285
REP = X + E = 33 + 1.715 = 34.715

Exercise 7.1.4. Let X represent the monthly consumption of electricity (in KWH , in a winter month) by the households in a county. From past experience, it is known that the standard deviation is σ = 150 KWH. To estimate the mean monthly consumption μ, a sample of size 115 was collected and the sample mean X of the monthly consumption was found to be 874. Find a 98 percent confidence interval for μ and give the margin of error (MOE).

Solution:
Here the population standard deviation σ = 150,
the sample size n = 115,
the sample mean X = 874,
Level of confidence = 98 percent. So
1 - α = .98,   α = .02   and   α/2 =.01.
Therefore, zα/2 = z.01 = invNormal(1-.01)= 2.3263

MOE = E=zα/2 σ / n =2.3263*150/ 115= 32.5393

LEP = X - E = 874 - 32.5393 = 841.4607
REP = X + E = 874 + 32.5393 = 906.5393

Exercise 7.1.6. The monthly gas consumption by an individual in a city is normally distributed with mean μ and known standard deviation σ = 7.5 gallons. To estimate the mean μ, a sample of 33 individual collated and their mean consumption X was found to be 56 gallons. Find a 96 percent confidence interval for true mean consumption μ and give the margin of error (MOE).

Solution:
Here the population standard deviation σ = 7.5,
the sample size n = 33,
the sample mean X = 56,
Level of confidence = 98 percent. So
1 - α = .96,   α = .04   and   α/2 =.02.
Therefore, zα/2 = z.02 = invNormal(1-.02)= 2.0537

MOE = E=zα/2 σ /n =2.0537*7.5/ 33= 2.6813

LEP = X - E = 56 - 2.6813 = 53.3187
REP = X + E = 56 + 2.6813 = 58.6813

Exercise 7.1.7. The monthly cell phone minutes used by an individual in a city has a mean μ and known standard deviation σ = 350 minutes. To estimate the mean μ, a sample of 780 users was collected. The sample mean X was found to be 2222. Find a 97 percent confidence interval for true mean cellphone minutes μ and give the margin of error (MOE).

Solution:
Here the population standard deviation σ = 350,
the sample size n = 780,
the sample mean X = 2222 minutes,
Level of confidence = 97 percent. So
1 - α = .97,   α = .03   and   α/2 =.015.
Therefore, zα/2 = z.015 = invNormal(1-.015)= 2.1701

MOE = E=zα/2 σ /n =2.1701*350/ 780= 27.1957

LEP = X - E = 2222 - 27.1957 = 2194.8043
REP = X + E = 2222 + 27.1957 = 2249.1957

Exercise 7.1.8. The diameter of pumpkins in a farm has a mean μ and known standard deviation σ = 13 inches. To estimate the mean μ, a sample of 460 pumpkins was collected. The sample mean X was found to be 29 inches. Find a 80 percent confidence interval for true mean diameter μ and give the margin of error (MOE).

Solution:
Here the population standard deviation σ = 13,
the sample size n = 460,
the sample mean X = 29
Level of confidence = 80 percent. So
1 - α = .80,   α = .20   and   α/2 =.10
Therefore, zα/2 = z.10 = invNormal(1-.1)= 1.2816

MOE = E=zα/2 σ /n =1.2816*13/ 460= .7768

LEP = X - E = 29 - .7768 = 28.2232
REP = X + E = 29 + .7768 = 29.7768

Exercise 7.1.9. The amount of time a student spends doing homework for a three credit math course has a mean μ and known standard deviation σ = 44 minutes. To estimate the mean μ, a sample of 178 student homework time was collected. The sample mean X was found to be 145 minutes. Find a 92 percent confidence interval for true mean time μ and give the margin of error (MOE).

Solution:
Here the population standard deviation σ = 44,
the sample size n = 178,
the sample mean X = 145 minutes,
Level of confidence = 92 percent. So
1 - α = .92,   α = .08   and   α/2 =.04
Therefore, zα/2 = z.04 = invNormal(1-.04)= 1.7507

MOE = E=zα/2 σ /n =1.7507*44/ 178= 5.7737

LEP = X - E = 145 - 5.7737 = 139.2263
REP = X + E = 145 + 5.7737 = 150.7737


Problems on 7.1: On Required sample size for Z-intervals

Exercise 7.1.10. The hourly wages in an industry has a standard deviation σ = $17. The mean hourly wages μ has to be estimated within $3 from the true value of μ, with 90 percent confidence. What would be the sample size needed?

Solution:
Here the population standard deviation σ = 17, and E = 3
Level of confidence = 90 percent. So
1 - α = .90,   α = .10   and   α/2 =.05.
Therefore, zα/2 = z.05 = invNormal(1-.05)= 1.6449

The sample size n = (zα/2σ/E)2  = (1.6449*17/3)2 = 86.8829.
We would round it upward. So,
Answer = 87

Exercise 7.1.11. The tuition X paid by a student per semester in a university has a distribution with mean μ and σ = $416. How large a sample should you draw so that you are 95 percent sure that the true value of μ will be within $10 of the sample mean x?

Solution:
Here the population standard deviation σ = 416, and E = 10
Level of confidence = 95 percent. So
1 - α = .95,   α = .05   and   α/2 =.025.
Therefore, zα/2 = z.05 = invNormal(1-.025)= 1.9600

The sample size n = (zα/2σ/E)2 = n = (1.9600*416/10)2 = 6648.1189.
We round it upward. So,
Answer = 6649

Note that the required sample size is really high, because the required precision within $10 was too low compared to the standard deviation$416.

Exercise 7.1.12. Let X represent the monthly gas consumption (in CCF, in a winter month), by the households in a county. From past experience, it is known that the standard deviation is σ = 55 CCF. The county wants to estimate the mean monthly gas consumption μ within 15 CCF, with 98 percent confidence level. What would be the sample size required?

Solution:
Here the population standard deviation σ = 55, and E = 15
Level of confidence = 98 percent. So
1 - α = .98,   α = .02   and   α/2 =.01.
Therefore, zα/2 = z.01 = invNormal(1-.01)= 2.3263

The sample size n = (zα/2σ/E)2  = (2.3263*55/15)2 = 72.7569.
We round it upward. So,
Answer = 73

Exercise 7.1.13. The telephone company's data shows that length X of their international calls has a standard deviation 14.5 minutes. The company wants to estimate the mean length μ of these calls within 2.5 minutes of the mean, with a 96 percent level of confidence. What would be the sample size required?

Solution:
Here the population standard deviation σ = 14.5, and the MOE E = 2.5
Level of confidence = 96 percent. So
1 - α = .96,   α = .04   and   α/2 =.02.
Therefore, zα/2 = z.02 = invNormal(1-.02) = 2.0537

The sample size n = (zα/2σ/E)2  = (2.0537*14.5/2.5)2 = 141.8829.
We round it upward. So,
Answer = 142

Exercise 7.1.14. The mean birth weight μ of babies have to be estimated within 5 oz from the actual mean μ, with a 99 percent confidence level. The standard deviation of the birth weight is known to be σ = 25 oz. What would be the sample size required?

Solution:
Here the population standard deviation σ = 25, and the MOE E = 5
Level of confidence = 99 percent. So
1 - α = .99,   α = .01   and   α/2 =.005.
Therefore, zα/2 = z.005 = invNormal(1-.005) = 2.5758

The sample size n = (zα/2σ/E)2  = (2.5758*25/5)2 = 165.8686.
We round it upward. So,
Answer = 166

Exercise 7.1.15. The mean monthly water consumption μ by the households in a subdivision has to be estimated within 500 gallons, with 96 percent confidence level. The standard deviation of the water consumption is known to be σ = 3000 gallons. What would be the sample size required?

Solution:
Here the population standard deviation σ = 3000, and the MOE E = 500
Level of confidence = 96 percent. So
1 - α = .96,   α = .04   and   α/2 =.02.
Therefore, zα/2 = z.005 = invNormal(1-.02) = 2.0537

The sample size n = (zα/2σ/E)2  = (2.0537*3000/500)2 = 151.8366.
We round it upward. So,
Answer = 152


7.2 Confidence interval for mean μ when σ is Unknownback to top

As above, let X be a random variable with mean μ and standard deviation σ. As in the last section, we want to estimate the mean μ by confidence intervals. In the last section, we assume that the standard deviation σ was known, which would not be the case with populations without much familiarity or experience. In this section, we will deal with such populations where the standard deviation σ is not known. In the last section, the main tool (or fact) that we used was that, by CLT, the sampling distribution of the statistic Z=(X-μ) n/σ was standard normal N(0,1). In this section, we use the distribution of a similar statistic

T=(X-μ) n/S.

We would proceed to dtermine the sampling distribution of T.


Student's t distribution

Given a positive integer m, there is a random variable, denoted by tm, that is said to have t distribution with degrees of freedom m. The distribution is also known as Student's t distribution, because it was discovered by a chemist W. S. Gosset, under the pseudonym "Student". The important properties of t distributions are listed below. A t random variable with degrees of freedom m would be denoted by T. We also say that the random variable T has a tm-distribution.

Animation 7.2.1
A tm random variable T is a continuous random variable. The graph of the pdf of a tm random variable T is shown as follows:
  1. The graph looks very similar to the graph of the pdf of the standard normal random variable Z. Like normal, the graph also has a bell shape. For an ordinary reader, it would not be easy to distinguish the graph from that of the standard normal random variable.
  2. The tm random variable T have mean zero.
  3. The standard deviation σ of a tm random variable (i.e. with degrees of freedom m), is given by σ =m/(m-2), if m is at least 3.
  4. The distribution T is symmetric around the y-axis (or around the mean zero).
  5. As stated above, for each positive integer, there is one Student's t random variable tm, with degrees of freedom m. When the degrees of freedom m is small, the graph is flatter. For higher degrees of freedom m, the graph is stiffer.
  6. When m is large (say 30 or above), the pdf of a tm random variable can be approximated by the standard normal random variable Z.

Inverse Probability for Student's t

Inverse probability for Student's t random variables is defined as was done for normal random variables. In particular, like zα, we define tm, α as follows.

Definition: Suppose T is a t-random variable with degrees of freedom m. Given a number 0 < α < 1, the number tm, α is defined by the formula

P(tm, α ≤ T) = α.
Which is same as          P(T ≤ tm, α) = 1 - α.

The invT function of TI-84 (Silver Edition) computes tm, α as follows:
tm, α=invT(1-α, m).

The following animation demonstrates these numbers visually.

Animation 7.2.2


Sampling Distribution of T

The following theorem describes the sampling distribution of the statistic T mentioned at the beginning of this section. This can be viewed as a counter part of the CLT for T.


Theorem.
Let X be a normal random variable with mean μ and standard deviation σ. Let X1,X2,…, Xn be a sample of size n from the X population. Then

T=(X-μ) n/S.

has t-distribution with degrees of freedom n-1.



Therefore,

P(-tn-1, α/2    ≤   (X-μ) n/ S    ≤    tn-1, α/2 ) = 1 - α.

So,

P(X-E    ≤    μ    ≤    X+E)=1- α      where      E=tn-1, α/2 S /n.

Confidence Interval for μ when σ is unknown

As Z-interval was constructed for the mean μ in section 7.1.1, the above computations lead to a confidence interval for the mean μ, which would be useful when the standard deviation σ is not known. In this case, we have to assume that the population random variable X is normal.

Theorem. We use the notations as above. In particular X is assumed to be normal. Then a (1-α)100 percent confidence interval for μ is given by

X-E   ≤   μ   ≤   X+E      where      E=tn-1, α/2 S/n.

Informally, this confidence interval is also called a T-interval. As in the case of Z-intervals, we use the other terminologies like LEP, REP, length of the interval and Margin of Error (MOE). (We will not work with determination of the required sample size. )


Remarks.:

  1. A commom question would be, to estimate μ, when do we use the Z-Interval and when do we use the T-Interval? Use the T-Interval only when σ is not known. Not using σ, when it is known would amount to disregarding information, which would not be a good philosophy in estimation.
  2. Recall, for large n, tn-1 would be approximately equal to the standard normal Z. That is why, for large sample size n, using z α/2 instead of tn-1, α/2 in the formula for E would not make any signficant difference.

Optional Problems on 7.2: Probability of the T statistic

In this course, we avoid probability problems for T. The tcdf function of TI-84 (Silver Edition) makes it easy to compute probability for T. No such problems would be assigned in this case. But we would work one example.

Example: The mean salary X of the engineers in a state is μ = $65,000 and standard deviation. You collect a sample of 12 engineers. What is the probability that the T statistics of the sample would be between will be above 1.1.

Solution by TI-84 Silver Edition:
Here T has degrees of freedom df= m-1 = 12-1=11

P(T < 1.1) = tcdf(1.1, 10, 11) =0.1474


Problem Solving: In this section, we will compute confidence interval (T-Intervals) for μ, when σ is unknown.
We will use invT function of TI-84. The method would be clear from the solutions of the probles given below.

There would be two types of problems (1) the statistics values (x and s) will be given and (2) raw data will be give, instead. In later case, we have compute x and s before we can use the formulas.


Problems on 7.2: T-intervals for mean: σ Unknown

Exercise 7.2.1. The mean weight μ of a campus population is to be estimated. It would be reasonable to assume that weight X has a normal distrubution. A sample of size n = 18 was collected. The sample mean was found to be x = 170.5 and standard deviation was s = 13.3. Compute a 99 percent a confidence interval for μ. State the degrees of freedom df, compute the margin of error, LEP and REP.

Solution:
Since the population standard deviation σ is unknown, we will compute a T-interval.

Here the sample size n = 18,
the sample mean X = 170.5,       
Sample standard deviation s = 13.3


The degrees of freedom df = n-1 = 18-1 = 17

Level of confidence = 99 percent.
1 - α = .99,   α = .01   and   α/2 =.005.             Therefore,
tn-1, α/2 = t17, .005 = invT(1 - .005, 17)= 2.8982

MOE = E= tn-1, α/2 S /n = 2.8982 * 13.3 /18 = 9.0854

LEP = X - E = 170.5 - 9.0854 = 161.4146
REP = X + E = 170.5 + 9.0854 = 179.5854

Exercise 7.2.2. The time taken to complete a problem in a Math 365 test is normally distributed with mean μ and standard deviation σ. A sample of size 23 was taken, and sample mean and standard deviation were found to be x = 4.7 and s = .47. Estimate the mean time μ taken to complete a problem using a 98 percent confidence interval.

Solution:
Since the population standard deviation σ is unknown,we will compute a T-interval.

Here the sample size n = 23,
the sample mean X = 4.7,       
Sample standard deviation s = .47


The degrees of freedom df = n-1 = 23-1 = 22

Level of confidence = 98 percent.
1 - α = .98,   α = .02   and   α/2 =.01.             Therefore,
tn-1, α/2 = t22, .01 = invT(1 - .01, 22) = 2.5083

MOE = E= tn-1, α/2 S /n= 2.5083* .47 /23 = .2458

LEP = X - E = 4.7 - .2458 = 4.4542
REP = X + E = 4.7 + .2458 = 4.9458

Exercise 7.2.3. To estimate the mean length μ of babies at birth a sample of size n = 16 was taken. The sample mean was found to be x = 18.6 and standard deviation was s = 9.486. Construct a 96 percent confidence interval for mean μ. State the degrees of freedom df, compute the margin of error, LEP and REP.

Solution:
It is reasonable to assume that the length X is normal. Since the population standard deviation σ is unknown, we will compute a T-interval.

Here the sample size n = 16,
the sample mean X = 18.6,       
Sample standard deviation s = 9.486


The degrees of freedom df = n-1 = 16 - 1 = 15

Level of confidence = 96 percent.
1 - α = .96,   α = .04   and   α/2 =.02.             Therefore,
tn-1, α/2 = t15, .02 = invT(1 - .02, 15) = 2.2485

MOE = E= tn-1, α/2 S /n= 2.2485*9.486 /16 = 5.3323

LEP = X - E = 18.6 - 5.3323 = 13.2677
REP = X + E = 18.6 + 5.3323 = 23.9323

Exercise 7.2.4. The time taken by an athlete to run an event is normally distributed with mean μ and unknown standard deviation σ. To estimate the mean μ he ran 9 times and the sample mean was found to be X = 33 seconds and the sample standard deviation s = 3.5 seconds. Construct a 95 percent confidence interval for mean μ. State the degrees of freedom df, compute the margin of error, LEP and REP.

Solution:
It is reasonable to assume that the time X is normal. Since the population standard deviation σ is unknown, we will compute a T-interval.

Here the sample size n = 9,
the sample mean X = 33,       
Sample standard deviation s =3.5


The degrees of freedom df = n-1 = 9 - 1 = 8

Level of confidence = 95 percent.
1 - α = .95,   α = .05   and   α/2 =.025.             Therefore,
tn-1, α/2 = t8, .025 = invT(1 - .025, 8) = 2.3060

MOE = E= tn-1, α/2 S /n= 2.3060 *3.5 /9 = 2.6903

LEP = X - E = 33 - 2.6903 = 30.3097
REP = X + E = 33 + 2.6903 = 35.6903

Exercise 7.2.5. The mean length μ of telephone calls would have to be estimated. In sample of 14 calls had a sample mean X = 13 minutes and the sample standard deviation was s = 5.5 minutes. Construct a 90 percent confidence interval for mean μ. State the degrees of freedom df compute the margin of error, LEP and REP.

Solution:
It is reasonable to assume that the time length X is normal. Since the population standard deviation σ is unknown, we will compute a T-interval.

Here the sample size n = 14,
the sample mean X = 13,       
Sample standard deviation s = 5.5

The degrees of freedom df = n-1 = 14 - 1 = 13

Level of confidence = 90 percent.
1 - α = .90,   α = .10   and   α/2 =.05.             Therefore,
tn-1, α/2 = t13, .05 = invT(1 - .05, 13) = 1.7709

MOE = E= tn-1, α/2 S / n = 1.7709*5.5 /14 = 2.6031

LEP = X - E = 13 - 2.6031 = 10.3969
REP = X + E = 13 + 2.6031 = 15.6031

Exercise 7.2.6. The mean weight μ of a variety of bananas in a grocery store has to be estimated. A sample of 20 bananas was collected. The mean weight was found to be x = 180 grams and the sample standard deviation was s = 27 grams. Construct a 85 percent confidence interval for mean μ. State the degrees of freedom df compute the margin of error, LEP and REP.

Solution:
It is reasonable to assume that the weight X is normal. Since the population standard deviation σ is unknown, we will compute a T-interval.

Here the sample size n = 20,
the sample mean X = 180,       
Sample standard deviation s = 27

The degrees of freedom df = n-1 = 20 - 1 = 19

Level of confidence = 85 percent.
1 - α = .85,   α = .15   and   α/2 =.075.             Therefore,
tn-1, α/2 = t19, .075 = invT(1 - .075, 19) = 1.5002

MOE = E= tn-1, α/2 S / n = 1.5002*27 / 20 = 9.0573

LEP = X - E = 180 - 9.0573 = 170.9427
REP = X + E = 180 + 9.0573 = 189.0573

Exercise 7.2.7. It is assumed that the lifetime (in hours) of lightbulbs produced in a factory is normally distributed with mean μ and standard deviation σ. To estimate μ the following data was collected on the lifetime of bulbs.

5110 4671 6441 3331 5055 5270 5335 4973 1837
7783 4560 6074 4777 4707 5263 4978 5418 5123

Construct a 92 percent confidence interval for mean μ. State the degrees of freedom df, compute the margin of error, LEP and REP.

Solution:
It is reasonable to assume that the time X is normal. Since the population standard deviation σ is unknown, we will compute a T-interval.

As in Lesson 2, enter data in a LIST of your TI to compute mean X and sample standard deviation S:
Here the sample size n = 18,
the sample mean X = 5039.2222,       
Sample standard deviation s = 1203.8025

The degrees of freedom df = n-1 = 18 - 1 = 17

Level of confidence = 92 percent.
1 - α = .92,   α = .08   and   α/2 =.04.             Therefore,
tn-1, α/2 = t17, .04 = invT(1 - .04, 17) = 1.8619

MOE = E= tn-1, α/2 S /n= 1.8619 *1203.8025 /18 = 528.2936

LEP = X - E = 5039.2222 - 528.2936 = 4510.9286
REP = X + E = 5039.2222 + 528.2936 = 5567.5158

Exercise 7.2.8. To estimate the mean weight (in pounds) of salmon in a river the following sample was collected:

34.7 33.8 38.2 20.3 27.8 45.3 43.1 37.3 32.5 32.3
31.8 41.5 44.5 29.2 25.3 29.6 39.5 29.1 37.3  

Construct a 97 percent confidence interval for mean μ. State the degrees of freedom df, compute the margin of error, LEP and REP.

Solution:
It is reasonable to assume that the weight X is normal. Since the population standard deviation σ is unknown, we will compute a T-interval.

As in Lesson 2, enter data in a LIST of your TI to compute mean X and sample standard deviation S:
Here the sample size n = 19,
the sample mean X = 34.3737,       
Sample standard deviation s = 6.7608

The degrees of freedom df = n-1 = 19 - 1 = 18

Level of confidence = 92 percent.
1 - α = .97,   α = .03   and   α/2 =.015.             Therefore,
tn-1, α/2 = t18, .04 = invT(1 - .015, 18) = 2.3562

MOE = E= tn-1, α/2 S /n= 2.3562*6.7608 /19 = 3.6545

LEP = X - E = 34.3737 - 3.6545 = 30.7192
REP = X + E = 34.3737 + 3.6545 = 38.0282

Exercise 7.2.9. The following data represents the time (in minutes) taken by students to drive to campus.

5.7 22.4 14.7 16.7 10.6 26.9 17.9 32.5 20.9 9.2
3.4 20.4 17.3 31.9 4.4 23.7        
Construct a 98 percent confidence interval for mean μ. State the degrees of freedom df compute the margin of error, LEP and REP.

Solution:
It is reasonable to assume that the driving time X is normal. Since the population standard deviation σ is unknown, we will compute a T-interval.

As in Lesson 2, enter data in a LIST of your TI to compute mean X and sample standard deviation S:
Here the sample size n = 16,
the sample mean X = 17.4125,       
Sample standard deviation s = 9.0843

The degrees of freedom df = n-1 = 16 - 1 = 15

Level of confidence = 92 percent.
1 - α = .98,   α = .02   and   α/2 =.01.             Therefore,
tn-1, α/2 = t15, .01 = invT(1 - .01, 15) = 2.6025

MOE = E= tn-1, α/2 S /n= 2.6025*9.0843 /16 = 5.9104

LEP = X - E = 17.4125 - 5.9104 = 11.5021
REP = X + E = 17.4125 + 5.9104 = 23.3229





7.3 About the Population Proportionback to top

In this section, we would estimate the population proportion p of an attribute by a confidence interval. Examples would be (1) proportion p of defective lamps produced in a factory, (2) proportion p of a variety of seeds that will germinate in a particular condition, (3) proportion p of voters in favor of a candidate and others. To estimate such a proportion p, we would use the sampling distribution of the sample proportion of success X. The distribution of the sample proportion of success X was discussed in section 6.3. As in section 6.3, let X represent the corresponding Bernoulli( p ) random variable. That means

X=1 if success
X=0 if failure

To estimate p, we take a sample X1,X2, …, Xn of size n from this Bernoulli( p) population and Xi represents the outcome of testing the ith lamp as follows:

Xi=1 if ith sample is a success
Xi=0 if ith sample is a failure .

Then,     T = X1+X2+… +Xn = the total Number of Success in these n trials

and the sample mean

X =T/n = the Sample Proportion of Success in these n trials.

Recall from section 6.3, when the sample size is large and p is not very close to 0 or 1, then the distribution of X is approximately normal with

mean     μ X = p         and the standard deviation     σ X =p(1-p)/n.


Therefore (approximately),

P(-zα/2  ≤  (X-p)/σX  ≤  zα/2 )   =  1-α.

In an attempt to compute a confidence interval for p, we simplify and get

P(X-zα/2 σX  ≤  p  ≤  X+zα/2 σX) )  =  1-α.

Since p is unknown, we cannot compute σX and this fails to produce a confidence interval for p.

We use the sample proportion x of success as a point estimate of p to get approximate equalities

σX(x(1-x)/n)    and    P(X-zα/2 σX  ≤  p  ≤  X+zα/2 σX) )  ≈   1-α.

Confidence Interval for Population Proportion p

The following theorem follows from the above computations.

Theorem. We use the notations as above. Then an approximate (1-α)100 percent confidence interval for is p given by

x-e  ≤  p  ≤  x+e         where     e =  zα/2 (x(1-x)/n)

Further, the following are some important concepts and comments:

  1. The margin of error (MOE) e is defined as

    e =  zα/2 (x(1-x)/n)

  2. A conservative margin of error (Consevative MOE) E is defined as

    E =  zα/2/4n.

    It can be checked

    e     ≤     E        and also        zα/2 σX = zα/2p(1-p)/n     ≤     E

    This means, E is an upper bound of the esimated error e and also of the actual error of this interval estimation of p. That is why, the Comservative MOE E is taken more seriously than the ordinary MOE e.
  3. Informally, this confidence interval is also called a 1-Proportion Z-interval. As before, we will use terminilogies and abrevations like LEP, REP, MOE and Conservative MOE.

The required samples size for p

The formula for E given above can be solved to determine the reqired sample size. Given a specified conservative MOE E, for a (1-α)100 percent confidence interval of p, the sample size n required is given by

n  =   (zα/2/2E)2      rounded to the higher integer.

An Example to interpret Conservative MOE

President Clinton was impeached in 1998-99 by the House of Representatives and acquitted by the Senate. During the impeachment trial, a typical news item would read as follows:

President Clinton has 64 percent approval rating. The poll has a margin of error plus or minus 3.1 percentage points. The poll surveyed 972 people.


This meant the following, with p as the proportion of the population who "approved" President Clinton:

  1. (A point estimate): that the sample proportion x of people who "approve" President Clinton was 0.64.
    So, a point estimate of p is .64.
  2. Such polls usually use 95 percent confidence level.
  3. Assuming that they are using a 95 percent confidence interval, they mean that

    E = zα/2 /√4n = z.025 /√4n = 1.96/(4x972) = 0.031.

  4. Interestingly, they give both the Conservative MOE and sample size, while one can be computed from the other.

Problems on 7.3: On 1-Proportion Z-intervals and Coservative MOE

Exercise 7.3.1 In a sample of 197 apples from a lot, 19 were found to be sour. Set a 99 percent confidence interval for the proportion p of sour apples in the lot.

Solution:
Here the sample size n = 197,
The number of success T = 19
So, the sample proportion of success X = T/n = 19/197 = .0964 Level of confidence = 99 percent. So
1 - α = .99,   α = .01   and   α/2 =.005.
Therefore, zα/2 = z.005 = invNormal(.995)= 2.5758

MOE = e  =  zα/2 (x(1-x)/n) = 2.5758* (.0964(1-.0964)/197) = .054163       [In this section, for error terms, we retain at least 6 decimal points.]

LEP = X - e = .0964 - .054163 = .042237
REP = X + e = .0964 + .054163 = .150563
Conservative MOE = E =  zα/2/4n = 2.5758/4*197 = .091759       [In this section, for error terms, we retain at least 6 decimal points.]

Exercise 7.3.2. A new vaccine was tried on 147 randomly selected individuals, and it was determined that 97 of them developed immunity. Find a 95 percent confidence interval for the proportion p of individuals in the population for whom the vaccine would help. Give the MOE, LEP, REP and the conservative MOE.

Solution:
Here the sample size n = 147,
The number of success T = 97
So, the sample proportion of success X = T/n = 97/147 = .6599

Level of confidence = 95 percent. So
1 - α = .95,   α = .05   and   α/2 =.025.
Therefore, zα/2 = z.025 = invNormal(.975) = 1.9600


MOE = e  =  zα/2 (x(1-x)/n) = 1.9600* (.6599(1-.6599)/147) = .076584       [In this section, for error terms, we retain at least 6 decimal points.]

LEP = X - e = .6599 - .076584 = .583316
REP = X + e = .6599 + .076584 = .736484
Conservative MOE = E =  zα/2/4n = 1.9600/4*147 = .080829       [In this section, for error terms, we retain at least 6 decimal points.]

Exercise 7.3.3. Before a congressional election, a poll was conducted. Out of 887 randomly selected voters interviewed, 389 said that they would vote for Candidate A, and 359 said that they would vote for Candidate B.

  1. Construct a 98 percent confidence interval for the proportion p of voters who would vote for A.
  2. Construct a 98 percent confidence interval for the proportion p of voters who would vote for B.
  3. What is the conservative margin of error for both?

Solution:
Here the sample size n = 887,
Level of confidence = 98 percent. So
1 - α = .98,   α = .02   and   α/2 =.01.
Therefore, zα/2 = z.01 = invNormal(.99) = 2.3263


For Candidate A:
The number of success T = 389
So, the sample proportion of success X = T/n = 389/887 = .4386 MOE = e  =  zα/2 (x(1-x)/n) = 2.3263* (.4386(1-.4386)/887) = .038759       [In this section, for error terms, we retain at least 6 decimal points.]

LEP = X - e = .4386 - .038759 = .399841
REP = X + e = .4386 + .038759 = .477359
Conservative MOE = E =  zα/2/4n = 2.3263/4*887 = .039055       [In this section, for error terms, we retain at least 6 decimal points.]

For Candidate B:
The number of success T = 359
So, the sample proportion of success X = T/n = 359/887 = .4047 MOE = e  =  zα/2 (x(1-x)/n) = 2.3263* (.4047(1-.4047)/887) = .038339       [In this section, for error terms, we retain at least 6 decimal points.]

LEP = X - e = .4047 - .038339 = .366361
REP = X + e = .4047 + .038339 = .443039
Conservative MOE = same, because the formula only depends on the sample size and level of confidence

Exercise 7.3.4. A researcher wants to estimate the proportion p of the children younger than two, in a community, who has anemia. He/She examined 180 children and found 42 children with anemia? Compute a 97 percent confidence interval for p. Give the MOE, LEP, REP and the conservative MOE.

Solution:
Here the sample size n = 180,
The number of success T = 42
So, the sample proportion of success X = T/n = 42/180 = .2333

Level of confidence = 97 percent. So
1 - α = .97,   α = .03   and   α/2 =.015.
Therefore, zα/2 = z.015 = invNormal(.985) = 2.1701


MOE = e  =  zα/2 (x(1-x)/n) = 2.1701* (.2333(1-.2333)/180) = .068409       [In this section, for error terms, we retain at least 6 decimal points.]

LEP = X - e = .2333 - .068409 = .164891
REP = X + e = .2333 + .068409 = .301709
Conservative MOE = E =  zα/2/4n = 2.1701/4*180 = .080875       [In this section, for error terms, we retain at least 6 decimal points.]

Exercise 7.3.5. The proportion p of voters (in a county) who vote by absentee ballot is to be estimated. You sample 683 voters and found that 165 would have voted by absentee ballots. Compute 96 percent confidence interval for p. Give the MOE, LEP, REP and the conservative MOE.


Solution:
Here the sample size n = 683,
The number of success T = 165
So, the sample proportion of success X = T/n = 165/683 = .2416

Level of confidence = 96 percent. So
1 - α = .96,   α = .04   and   α/2 =.02.
Therefore, zα/2 = z.02 = invNormal(.98) = 2.0537


MOE = e  =  zα/2 (x(1-x)/n) = 2.0537* (.2416(1-.2416)/683) = .033638       [In this section, for error terms, we retain at least 6 decimal points.]

LEP = X - e = .2416 - .033638 = .207962
REP = X + e = .2416 + .033638 = .275238
Conservative MOE = E =  zα/2/ 4n = 2.0537/4*683 = .039291       [In this section, for error terms, we retain at least 6 decimal points.]

Exercise 7.3.6. It is known that a vaccine may cause fever as side effect, after one takes the shot. The proportion p of those who suffer side effect is to be estimated. A sample of 913 individuals were examined and it was found the 153 suffered the side effect. Compute 94 percent confidence interval for p. Give the MOE, LEP, REP and the conservative MOE.


Solution:
Here the sample size n = 913,
The number of success T = 153
So, the sample proportion of success X = T/n = 153/913 = .1676

Level of confidence = 94 percent. So
1 - α = .94,   α = .06   and   α/2 =.03.
Therefore, zα/2 = z.03 = invNormal(.97) = 1.8808


MOE = e  =  zα/2 (x(1-x)/n) = 1.8808* (.1676(1-.1676)/913) = .023249       [In this section, for error terms, we retain at least 6 decimal points.]

LEP = X - e = .1676 - .023249 = .144351
REP = X + e = .1676 + .023249 = .190849
Conservative MOE = E =  zα/2/ 4n = 1.8808/4*913 = .031123       [In this section, for error terms, we retain at least 6 decimal points.]

Exercise 7.3.7. The proportion p of those on campus who took a flu shot is to be estimated. A sample of 733 individuals were examined and it was found that 220 among them took the shot. Compute 93 percent confidence interval for p. Give the MOE, LEP, REP and the conservative MOE.



Problems on 7.3: Required sample size for 1-Proportion Z-intervals

Exercise 7.3.8. A pollster wants to estimate the proportion p of the US population who would not support Government shutdown due to budget dispute. What would be the sample size required to estimate p within .02 of the actual value of p, with 99 percent confidence?

Solution:
Here given precision E = .02

Level of confidence = 99 percent. So
1 - α = .99,   α = .01   and   α/2 =.005.
Therefore, zα/2 = z.005 = invNormal(1-.005) = 2.5758

The sample size n = (zα/2/2E)2  = [2.5758/(2*.02)]2 = 4146.7160.
We would round it upward. So,
Answer = 4147

Exercise 7.3.9. The proportion p of defective lightbulbs produced by a machine needs to be estimated within .01 to determine whether the machine needs to be replaced. How large a sample should we take to do this with 90 percent confidence?

Solution:
Here given precision E = .01

Level of confidence = 90 percent. So
1 - α = .90,   α = .10   and   α/2 =.05.
Therefore, zα/2 = z.05 = invNormal(1-.05)= 1.6449

The sample size n = (zα/2/2E)2  = [1.6449/(2*.01)]2 = 6764.2400.
We would round it upward. So,
Answer = 6765

Exercise 7.3.10. The proportion p of those who earn more than $50 K, in an industry, is to be estimated within .07 from the actual value of p, with 90 percent confidence. What would be the sample size needed?

Solution:
Here given precision E = .07

Level of confidence = 90 percent. So
1 - α = .90,   α = .10   and   α/2 =.05.
Therefore, zα/2 = z.05 = invNormal(1-.05)= 1.6449

The sample size n = (zα/2/2E)2  = [1.6449/(2*.07)]2 = 138.0457.
We would round it upward. So,
Answer = 139

Exercise 7.3.11. The proportion p of those who paid more than $15 K tuition, in a university, is to be estimated within .03 from the actual value of p, with 95 percent confidence. What would be the sample size needed?

Solution:
Here given precision E = .03

Level of confidence = 95 percent. So
1 - α = .95,   α = .05   and   α/2 =.025.
Therefore, zα/2 = z.05 = invNormal(1-.025)= 1.9600

The sample size n = (zα/2/2E)2  = [1.9600/(2*.03)]2 = 1067.1111.
We would round it upward. So,
Answer = 1068

Exercise 7.3.12. The proportion p of those who consumed more than 200 CCF in Gas in January, in a county, would have to be estimated within .05 from the actual value of p, with 98 percent confidence. What would be the sample size needed?

Solution:
Here given precision E = .05

Level of confidence = 98 percent. So
1 - α = .98,   α = .02   and   α/2 =.01.
Therefore, zα/2 = z.01 = invNormal(1-.01)= 2.3263
The sample size n = (zα/2/2E)2  = [2.3263/(2*.05)]2 = 541.1672.
We would round it upward. So,
Answer = 542

Exercise 7.3.13. A telephone company wants to estimate the proportion p of call that are longer than 20 minutes. They want to estimated p within .03 from the actual value of p, with 94 percent confidence. What would be the sample size needed?

Exercise 7.3.14. The proportion p of babies born with weight more than 8 lbs, in a hospital, is to be estimated. It has to be estimated within .08 from the actual value of p, with 99 percent confidence. What would be the sample size required?

Exercise 7.3.14. The proportion p of households, in a county that use more than 5000 gallons of water in June, is to be estimated. It has to be estimated within .025 from the actual value of p, with 96 percent confidence. What would be the sample size required?



Optional Problems on 7.3: Interpretation

Exercise 7.3.15. In a poll released on October 28,1998, it was revealed that 60 percent of the US population wanted President Clinton be rebuked but not impeached. The poll was conducted among 1,013 adults, and it had a margin of error of 3 percentage points.
Can you relate the last two numbers?
What is the level of confidence used here?

Solution: News media polls use 95 percent confidence intervals. When they say "margin of error," they mean "conservative margin of error." The conservative margin of error E and level of confidence 1 - α are related by the formula E = zα/2 /√4n.
For this problem E = .03, 1 - α =.95, and n =1,013. We can check zα/2 /√4n = 1.96/(4x1013) = 0.03079.



7.4 Confidence Interval of the Variance σ2back to top

In this section, we estimate the variance σ2 of random varicables X, by confidence intervals. It will be assumed that the population random variable X is normal. Suppose X is such a normal random variable with mean μ and variance σ2.

To esitmate σ2, take a sample X1,X2, …, Xn of size n from the X population. As alwasy, S2 would denote the sample variance. To compute confidence intervals of σ2, the distribution of

U = (n-1)S2/σ2         will be used.

We would describe the distribution of U.

Chi-Square random variables

For each positive integer n, there is a random variable Y, that is said to have a Chi-Square distribution with degrees of freedom n. It is also written as " χ2-random variable" with degrees of freedom n or simply as " χ2n-random variable". The important properties of Chi-Square distribution follows.
Animation 7.4.1
A Chi-Square random variables Y are continuous.
The graph of the pdf of Chi-Square random variable Y is shown as follows:
  1. Unlike normal or t-distribution, the graph does not maintain any symmetry around a vertical line.
  2. A Chi-Square variable Y is always non-negative.
  3. As stated above, for each positive integer n, there is one Chi-Square random variable, with degrees of freedom df = n.
  4. As the degrees of freedom increases, the graph starts looking more like that of a normal random variable.
  5. The mean (or expected value) of a Chi-Square random variable Y, with degrees of freedom df= n, is given by E(Y) = n. The variance Var(Y) = 2n.

Probability Computation for Chi-Square:

In this course, we avoid probability problems on Chi-Square. However, we can use the χ2cdf function of TI-84 (Silver Edition) to compute probability.
For example, if Y is a Chi-Square random variable with degrees of freedom 22, then

P(0 ≤Y ≤47.5 ) = χ2cdf(0, 47.5, 22) = .9987

Such probability problems would not be assigned in this course.


Inverse Probability for Chi-Square values

We define the following inverse probability Chi-Square numbers, as we defined zα and tm, α.

Definition: Suppose Y is a Chi-Square random variable with degrees of freedom n. Given a number 0 < α < 1, the Chi-Square number χ2n, α is defined by the formula

P(χ2n, α ≤ Y) = α.
Which is same as          P(Y ≤ χ 2n, α) = 1 - α.

TI-84 does not have an inverse Chi-Square function. However, for this course following animation will suffice to compute these numbers.

Animation 7.4.1
  1. This inverse Chi-Square program will used for this course and homework.
  2. Input the degrees of freedom, α and click on the arrow.



Sampling Distribution of U

The following theorem describes the sampling distribution of the statistic U mentioned above.


Theorem.
Let X be a normal random variable with mean μ and standard deviation σ. Let X1,X2,…, Xn be a sample of size n from the X population. Then

U = (n-1)S2/σ2        

has a Chi-Square distribution with degrees of freedom n-1.



Therefore,

P(χ2n-1, 1 - α/2    ≤   U = (n-1)S22    ≤    χ2n-1, α/2 ) = 1 - α.

So,

P([(n-1)S22n-1, α/2]    ≤    σ2 ≤    [(n-1)S22n-1, 1 - α/2])=1- α


Confidence Interval for σ2

From above computation, the following theorem follows.

Theorem. We use the notations as above. In particular X is assumed to be normal. Then a (1-α)100 percent confidence interval for σ2 is given by

[(n-1)S22n-1, α/2]    ≤    σ2 ≤    [(n-1)S22n-1, 1 - α/2]

As before, we use the other terminologies LEP and REP. Here
LEP = (n-1)S22n-1, α/2
REP = (n-1)S22n-1, 1 - α/2.

Remark. The confidence interval of σ2 is not symmetric, as the Z-intervals and T-intervalis are. This is because, the Chi-Square distribution is not symmetric, while standard normal and t distributions are symmetric.


Problems 7.4: On Confidence Interval for σ2

Exercise 7.4.1. The birth weight of babies has a normal distribution, with variance σ2. Because of the economic and social diversity of the community, there are concerns about variability of the birth weight. A sample of 26 birth-weight was collected and the sample variance was found to be s2 = 26.7. Compute a 95 percent confidence interval for the variance σ2 of weight.

Solution:
Here sample size n = 25 and
the sample variance s2 = 26.7

Degrees of freedom = n - 1 =25

Level of confidence = 95 percent. So
1 - α = .95,   α = .05   and   α/2 =.025.
Using the Inverse Chi-Square animation 7.4.1 we have
χ2n-1, α/2 = χ225, .025 = 40.647
χ2n-1, 1 - α/2 = χ226, .975 = 13.121

Therefore, LEP = (n-1)S22n-1, α/2 = (26-1)*26.7/40.647 = 16.4219
REP = (n-1)S22n-1, 1 - α/2. = (26-1)*26.7/13.121 = 50.8726

Exercise 7.4.2. To estimate the variance length σ2 of babies at birth a sample of size n = 16 was taken. The sample variance was found to be s2 = 90 square-inches. Construct a 96 percent confidence interval for σ2 . State the degrees of freedom df, LEP and REP.

Solution:
It is reasonable to assume that the length X is normal.

Here the sample size n = 16,
Sample variance s2 = 90

The degrees of freedom df = n-1 = 16 - 1 = 15

Level of confidence = 96 percent.
1 - α = .96,   α = .04   and   α/2 =.02.
Using the Inverse Chi-Square animation 7.4.1 we have
χ2n-1, α/2 = χ215, .02 = 28.26
χ2n-1, 1 - α/2 = χ215, .98 = 5.986

Therefore,
LEP = (n-1)S22n-1, α/2 = (16-1)*90/28.26 = 47.7707
REP = (n-1)S22n-1, 1 - α/2. = (16-1)*90/5.986 = 225.5262

Exercise 7.4.3. The variability of the length of the telephone calls would have to be estimated. A sample of 14 calls had a sample variance s2 = 30 minutes-square. Construct a 90 percent confidence interval for σ2 . State the degrees of freedom df, LEP and REP.

Solution:
It is reasonable to assume that the length X is normal.

Here the sample size n = 14,
Sample variance s2 = 30

The degrees of freedom df = n-1 = 14 - 1 = 13

Level of confidence = 90 percent.
1 - α = .90,   α = .10   and   α/2 =.05.
Using the Inverse Chi-Square animation 7.4.1 we have
χ2n-1, α/2 = χ213, .05 = 22.363
χ2n-1, 1 - α/2 = χ213, .95 = 5.893

Therefore,
LEP = (n-1)S22n-1, α/2 = (14-1)*30/22.363 = 17.4395
REP = (n-1)S22n-1, 1 - α/2. = (14-1)*30/5.893 = 66.1802

Exercise 7.4.4. The variability of weight of a variety of bananas in a grocery store has to be estimated. A sample of 20 bananas was collected. The sample variance was s2 = 729 grams-square. Construct a 96 percent confidence interval for σ2 . State the degrees of freedom df, LEP and REP.

Solution:
It is reasonable to assume that the weight X is normal.

Here the sample size n = 20,
Sample variance s2 = 729

The degrees of freedom df = n-1 = 20 - 1 = 19.

Level of confidence = 96 percent.
1 - α = .96,   α = .04   and   α/2 =.02.
Using the Inverse Chi-Square animation 7.4.1 we have
χ2n-1, α/2 = χ219, .02 = 33.688
χ2n-1, 1 - α/2 = χ219, .98 = 8.568

Therefore,
LEP = (n-1)S22n-1, α/2 = (20 - 1)*729/33.688 = 411.1553
REP = (n-1)S22n-1, 1 - α/2. = (20 - 1)*729/8.568 = 1616.5966

Exercise 7.4.5. The variability of monthly consumption of electricity in a subdivision has to be estimated. A sample of 9 household had a sample variance was s2 = 22500 KWH-square. Construct a 94 percent confidence interval for σ2 . State the degrees of freedom df, LEP and REP.

Solution:
It is reasonable to assume that the monthly consumption X is normal.

Here the sample size n = 9,
Sample variance s2 = 22500

The degrees of freedom df = n-1 = 9 - 1 = 8

Level of confidence = 94 percent.
1 - α = .94,   α = .06   and   α/2 =.03.
Using the Inverse Chi-Square animation 7.4.1 we have
χ2n-1, α/2 = χ28, .03 = 17.011
χ2n-1, 1 - α/2 = χ28, .97 = 2.311

Therefore,
LEP = (n-1)S22n-1, α/2 = (9-1)*22500/17.011 = 10581.38851
REP = (n-1)S22n-1, 1 - α/2. = (9 - 1)*22500/2.311 = 77888.3600

Exercise 7.4.6. To estimate the variability of weight (in pounds) of salmon in a river the following sample was collected:

34.7 33.8 38.2 20.3 27.8 45.3 43.1 37.3 32.5 32.3
31.8 41.5 44.5 29.2 25.3 29.6 39.5 29.1 37.3  
Construct a 97 percent confidence interval for σ2 . State the degrees of freedom df, LEP and REP.

Solution:
It is reasonable to assume that the weight X is normal. As in Lesson 2, enter data in the TI-84 and compute the variance.

Here the sample size n = 19,
Sample variance s2 = (6.7608)2 = 45.6990

The degrees of freedom df = n-1 = 19 - 1 = 18

Level of confidence = 97 percent.
1 - α = .97,   α = .03   and   α/2 =.015.
Using the Inverse Chi-Square animation 7.4.1 we have
χ2n-1, α/2 = χ218, .015 = 34.743
χ2n-1, 1 - α/2 = χ218, .985 = 8.16

Therefore,
LEP = (n-1)S22n-1, α/2 = (19-1)*45.6990/34.743 = 23.6762
REP = (n-1)S22n-1, 1 - α/2. = (19 - 1)*45.6990/8.16 = 100.8066

Exercise 7.4.7. The lifetime (in hours) of lightbulbs produced in a factory is normally distributed. To estimate the variability, the following data was collected on the lifetime of bulbs.

5110 4671 6441 3331 5055 5270 5335 4973 1837
7783 4560 6074 4777 4707 5263 4978 5418 5123

Construct a 92 percent confidence interval for σ2 . State the degrees of freedom df, LEP and REP.

Solution:
It is reasonable to assume that the weight X is normal. As in Lesson 2, enter data in the TI-84 and compute the variance.

Here the sample size n = 18,
Sample variance s2 = (1203.8025)2 = 1449140.459

The degrees of freedom df = n-1 = 18 - 1 = 17

Level of confidence = 92 percent.
1 - α = .92,   α = .08   and   α/2 =.04.
Using the Inverse Chi-Square animation 7.4.1 we have
χ2n-1, α/2 = χ217, .04 = 28.446
χ2n-1, 1 - α/2 = χ217, .96 = 8.289

Therefore,
LEP = (n-1)S22n-1, α/2 = (18-1)*1449140.459/28.446 = 866040.4909
REP = (n-1)S22n-1, 1 - α/2. = (18 - 1)*1449140.459/8.289 = 2972057.884

Exercise 7.4.8. The following data represents the time (in minutes) taken by students to drive to campus. The variance of time σ2 is to be estimated.

5.7 22.4 14.7 16.7 10.6 26.9 17.9 32.5 20.9 9.2
3.4 20.4 17.3 31.9 4.4 23.7        
Construct a 99 percent confidence interval for σ2 . State the degrees of freedom df, LEP and REP.

Solution:
It is reasonable to assume that the time X is normal. As in Lesson 2, enter data in the TI-84 and compute the variance.

Here the sample size n = 16,
Sample variance s2 = (9.0843)2 = 82.5245

The degrees of freedom df = n-1 = 16 - 1 = 15

Level of confidence = 99 percent.
1 - α = .99,   α = .01   and   α/2 =.005.
Using the Inverse Chi-Square animation 7.4.1 we have
χ2n-1, α/2 = χ215, .005 = 32.802
χ2n-1, 1 - α/2 = χ215, .995 = 4.602

Therefore,
LEP = (n-1)S22n-1, α/2 = (16-1)*82.5245/32.802 = 37.7376
REP = (n-1)S22n-1, 1 - α/2. = (16 - 1)*82.5245/4.602 = 268.9847

Exercise 7.4.9. The following is sample data on the amount (in 1000 bushels) of wheat harvested by Kansas farmers in 2002.

206 300 200 385 280
600 225 933 320 260
Compute a 99 percent confidence interval for the variance σ2 of harvest. State the degrees of freedom df, LEP and REP.

Solution:
It is reasonable to assume that the wheat production X is normal. As in Lesson 2, enter data in the TI-84 and compute the variance.

Here the sample size n = 10,
Sample variance s2 = (229.6149)2 = 52723.0023

The degrees of freedom df = n-1 = 10 - 1 = 9

Level of confidence = 99 percent.
1 - α = .99,   α = .01   and   α/2 =.005.
Using the Inverse Chi-Square animation 7.4.1 we have
χ2n-1, α/2 = χ29, .005 = 23.59
χ2n-1, 1 - α/2 = χ29, .995 = 1.736

Therefore,
LEP = (n-1)S22n-1, α/2 = (10-1)*52723.0023/23.59 = 20114.75289
REP = (n-1)S22n-1, 1 - α/2. = (10 - 1)*52723.0023/1.736 = 273333.5373

Exercise 7.4.10. The following is data on monthly gas consumption (in ccf) during the winter months by a household.

154 222 264 257 127
228 240 393 278 140

Compute a 80 percent confidence interval for the variance σ2. State the degrees of freedom df, LEP and REP.

back to top