Satyagopal Mandal
Department of Mathematics
University of Kansas
Office: 624 Snow Hall  Phone: 785-864-5180
  • e-mail: mandal@math.ukans.edu
  • © Copy right Laws Apply. My Students have the permission to copy.

     

    Estimation

    As you know in statistics we try to understand a large population on the basis of information available in a small sample. Among what we mean by "understand" is to know the values of the population parameters. The game here is to use suitable sample statistics to estimate population parameters. For example, we may like to use the sample mean x as an estimate for the population mean m.

     

    There are two types of estimation of parameters we consider.

    1) The first one is called point estimation. In point estimation, we give a number as an estimate for the parameter.

    For example, if we are trying to estimate the mean height m of the American population, we take a sample, compute the sample mean height x and call it an estimate for m.

     

    2) The second type of estimation is called interval estimation. In interval estimation we give an interval (L, U) and say that the parameter will be within this interval (with certain degree of confidence). For example, while estimating the mean height m of the American population, we may take a sample, compute the sample mean x and say that the population mean m is within the interval (x -1, x +1). Obviously, in interval estimation, smaller the length U-L of the interval better it is (and also higher the degree of confidence better it is).

     

    Point and Interval Estimation

     

    As we have already mentioned, we use a statistic to estimate a parameter. For example, we say that the sample mean X is an estimator of the population mean m and the computed value x of X is an estimate of m. The estimator is a sampling random variable and the estimate is a number. Similarly, the sample standard deviation S is an estimator of the population standard deviation s and the computed value s of S is an estimate of s. Try to see the difference between an estimator and an estimate. An estimator is a random variable and an estimate is a number (that is the computed value of the estimator).

     

    We will not go into the theory that deals with the characteristics and criterion for a good estimator for a given parameter. We will also consider only the estimation of the population mean m and the population proportion p (of success).

     

    This may be intuitively clear to you that the sample mean X is a natural point estimator of m and sample proportion of success X is a natural point estimator of the population proportion p of success.

    So, if we are asked to give a point estimate of m we just compute x and say that x is a point estimate for m. Similarly we deal with point estimation of p. Similarly, the sample median would be a natural point estimator for the population median.

     

    Interval Estimation

    Almost never, we would expect a point estimate of a parameter t to be exactly equal to the actual value of the parameter t. For example, we would never expect that the sample mean x would be equal to the exact value of m.

     

    That is why, it is more reasonable that we give an interval (L, U) and say that the parameter t would be within this interval (L, U). Here L, U will be Statistics. Since the computed values

    L = l, U = u

    will depend on the sample, we do not expect that the value of t will always be within this computed interval (l, u). We are happy as long as the true value of t falls within (l, u) most often (or often enough), allowing the possibility that a few times we will be "wrong". The Probability

    P(L < t < U) and P(t not in (L,U))

    give us how often we would be correct or wrong, respectively. This is what we do in interval estimation, and (l, u) is also called a confidence interval of t. We have the following formal definition.

     

    Definition: Let t be a population parameter. An interval estimate of t provides the following:

    1. It gives an interval (L,U) as an estimate for t. Here L,U are statistics.
    2. It also gives the probability P(L < t < U). This number
    3. P(L < t < U)=1- a

      is called the level of confidence. And (L,U) is said to be an (1- a)100 percent confidence interval of t.

    4. In practice, a will be a small number, like, a = 0.1, 0.01 or 0.05. So, we will be talking about 90 percent confidence intervals, 99 percent confidence intervals and 95 percent confidence intervals.

     

    Definition: Suppose Z is the standard normal random variable. For a number a between 0 and 1 we define a number za by the following formula:

    P(Z < za) = 1- a.

    In practice, a = 0.1, 0.05, 0.025, 0.01, 0.005 and using the z-table we have

    z0.005

    2.58

    z0.01

    2.33

    z0.025

    1.96

    z0.05

    1.65

    z0.1

    1.28

     

     

    A (1- a)100 percent confidence interval for m:

     

    Suppose X is a random variable with mean m and variance s. We will compute a confidence interval for m.

     

    First, we assume that s is known. Let $X1, X2, … , Xn be a sample from the X-population. From Central Limit Theorem (CLT) we have, approximately,

    P (-za/2 < (X - m)/(s /n1/2) < za/2) = 1 - a.

     

    If we simplify, we get

    P (X - za/2 (s /n1/2) < m < X + za/2 (s /n1/2)) = 1- a.

    So we have the following theorem.

    Theorem: Assume that the population standard deviation s is known. Then an, approximate, (1- a)100 percent confidence interval for m is given by

    x - za/2 (s /n1/2) < m < x + za/2 (s /n1/2)

    Remark: This confidence interval is also called a two-sided confidence interval for m. There could be all kinds of confidence intervals for m. For example, if

    P(L < m) = 1- a, (L, infinity) is called a (1- a)100 percent one-sided (upper) confidence interval for m. Similarly, we can talk about one-sided) lower confidence intervals.

     

    Use Your Calculator: If you know x and s, then you can compute the confidence interval using the above formula. You can also use the calculator:

    1. Press stat and select TESTS (if you have a TI83)
    2. Select Zinterval and enter
    3. Input: you will have to select stats (not data)
    4. Feed in the values of s, x, n and c-level
    5. Select calculate and enter. It will give you the answer.

     

    Two More Formulas/Definition:

    1. The length of the confidence interval is
    2. l = 2za/2 (s /n1/2).

    3. The Margin of Error is defined as E = za/2 (s /n1/2).
    4. Some TV news reader may say that the value of m is x with a margin of error plus or minus E.
    5. Remark: If you go on computing 95 percent confidence interval for m, the actual value of m will fall within the confidence interval only 95 percent times and it will not be within the confidence interval other 5 percent times. Note that the confidence interval is a moving interval and varies from sample to sample. And the actual value of m is fixed, although unknown.

     

    Problems: Z-interval for m

     

    Ex.1: Assume that you have a population with mean m and standard deviation s = 15. Suppose you have collected a sample of size 25 and the sample mean x was found to be 81.

    1. Find a 99 percent, approximate, confidence interval for m.
    2. Find the margin of error at 99 percent level of confidence.

     

    Solution: 1) Here 1- a = 0.99. So, a = 1 - 0.99 = 0.01 and za/2 = z0.005 =2.58. Besides n = 25, x = 81 and s = 15. A 99 percent, approximate, confidence interval for m is given by

    x - za/2 (s /n1/2) < m < x + za/2 (s /n1/2)

    which is

    81 - 2.58(15/(25)1/2) < m < 81 + 2.58(15/(25) 1/2)

    which is

    73.26 < m < 88.74

    Alternately, you could get the answer from the Calculator.

    1. The Margin of Error

    E = za/2 (s /n1/2) = 2.58(15/(25)1/2) = 7.74.

     

    Ex.2: Assume that you have a normal population with mean m and standard deviation s = 9.8. Suppose you have collected a sample of size 14 and the sample mean x was found to be 151.1.

    1. Find a 99 percent, approximate, confidence interval for m.
    2. Find the margin of error at 99 percent level of confidence.

     

    Solution: 1) Here 1- a = 0.99. So, a = 1 - 0.99 = 0.01 and za/2 = z0.005 =2.58. Besides n = 14, x = 151.1 and s = 9.8. A 99 percent, approximate, confidence interval for m is given by

    x - za/2 (s /n1/2) < m < x + za/2 (s /n1/2)

    which is

    151.1 - 2.58(9.8/(14)1/2) < m < 151.1 + 2.58(9.8/(14) 1/2)

    which is

    144.34 < m < 157.86

    Alternately, you could get the answer from the Calculator.

    1. The Margin of Error

    E = za/2 (s /n1/2) = 2.58(9.8/(14)1/2) = 6.76.

     

    Ex.3: The time taken by an athlete to run an event is has a distribution with mean m and known standard deviation s = 3.5 second. To estimate the mean m he ran 16 times and sample mean was found to be x = 33 seconds.

    1. Find the margin of error in estimating the true mean m with 99 percent level of confidence.
    2. Find a 99 percent, approximate, confidence interval for m.

     

    Ex.4: A population has a distribution with standard deviation s = 17. A sample of size n = 65 was taken and the sample mean was x = 11,50. Give an approximate, 90 percent confidence interval for m.

     

    Ex. 5: It is suspected that an industrial plant is pollution the water stream. To determine the extent of damage, water sample of size n = 13 was collected and the dissolved oxygen concentration was measured. The mean concentration was found to be x = 2.3. It is known from past experience that s = 0.45. Compute a 95 percent confidence interval for mean m concentration of oxygen.