Math 365, Elementary Statistics

 

Chapter 5 : Continuous Random Variables

Satya Mandal

5.1 Probability Density Function (pdf)back to top

In Lesson 4, a continuous random variable was defined as a random variable X that can assume any value on an interval. The probability distribution of a continuous random variable defined and behaves in a different manner than a discrete random variable.

Probability distribution of a continuous random variable is described as follows.

Definition. Let X be a continuous random variable on a sample space S. The probability distribution of X is defined as follows:

  1. There is a function f(x), of real numbers x, associated to X. This function f(x) would be called the probability density function, (abbreviated as pdf) of X. The pdf f(x) of X must satisfy the following two properties:
    1. f(x) is always non-negative (i.e f(x) ≥ 0 for all x). Geometrically, the graph of the function y=f(x) always lies on or above the x-axis.
    2. The total area under the graph of y=f(x) and above the x-axis is one.
  2. For any two real numbers a ≤ b (also for a = - and b = ) the probability that X will be between a and b is given by the area under the graph of y = f(x), above the x-axis and between the vertical lines x = a and x = b. Notationally, we write

    P(a ≤ X ≤ b) = P(a ≤ X < b) = P(a < X ≤ b) = P(a < X < b) =

    the area of the region under the graph of y = f(x), above x-axis, between the vertical lines x = a and x = b.


The following are graphs and diagrams:


Animation 5.1.1
Four diagrams of the graph of the pdf of four different random variables and probability computations by area.
Important Comment: The total area under the graph of any pdf and above the x-axis must be equal to one.


Remark. Given a continuous random variable X, to get a model for the pdf f(x), we look at large samples and their relative frequency histogram of the X-values.



A Property of Continuous random variables


Suppose X is a continuous random variable. Then,
In contrast to discrete random variable, for any real number a, the probability P(X = a)=0 . We see this as follows:


P(X = a) = P(a ≤ X ≤a) = (area of segment of the vertical line x=a, from the x-axis upto the pdf) = 0.



The Mean, Variance and Standard Deviation

The mean μ, variance σ2 and standard deviation σ of continuous random variables X represent the same thing as in the case of discrete random variables. The mean μ of X represents the average value of X. The mean μ is also called the Expected value E(X) of X. The standard deviationσ of X is a measure of variability of X.

A formal definition involves some knowledge of Calculus (integration), which is not a prerequisite for this course. For this course, a formal definition will not be necessary.


Examples of Continuous random variables

This subsection is devoted to showcase the graph of the pdfs of some of the well known continuous random variables and probability computations using some animations.


Example
. The Normal random variable will be discussed most extensively. We will postpone any further discussion on it until the next section. For now, you can play with the following animation on the pdf and probability. You will notice, that two unknown parameters the mean μ and the standard deviation σ determine the pdf of a normal random variable. A goal of this course would be to estimate these parameters.

Animation 5.1.2


Example
. The T-random variable will be used lated in Lessons 7, 8, 9. We will postpone any further discussion on it until Lesson 7. For now, you can play with the following animation on the pdf and probability. You will notice that T-random variables are determined a degrees of freedom. The degrees of freedom is would be known.

Animation 5.1.3


Example
. The Chi Square (χ2 ) random variable will be used later in Lessons 7, 8, 9. Discussion is postpned until Lesson 7. For now, you can play with the following animation on the pdf and probability. You will notice that Chi Square random variables are determined a degrees of freedom. The degrees of freedom would be known.

Animation 5.1.4


Example
. The Uniform random variables will not be used in this course. But it is easy and interesting. One can say, it is a continuous counter part of the Equally Likely Probability.

A Uniform random variable X has a lower limit L and an upper limit U. Between L and U, X is uniformly distributed. Its pdf is given by


f(x) = left bracket 1/(U-L) if L ≤ x ≤ U
0 Otherwise

In this case, we say that X is a Uniform(L,U)-random variable.


In particular, X is a Uniform(0,1)-random variable, if the pdf of X is given by:

f(x) = left bracket 1 if 0 ≤ x ≤ 1
0 Otherwise

Similarly, Y is a Uniform(0,1)-random variable, if the pdf of X is given by

g(x) = left bracket 1/4 if -1 ≤ x ≤ 3
0 Otherwise

Animation 5.1.5


Example
. The Exponential random variable will not be used in this course. You can play with the following animation on the pdf and probability. You will notice, that one unknown parameters lambda determines the pdf. A goal of some other course would be to estimate this parameter.

Animation 5.1.6

Avoiding Calculus and Substitute Explanation

Knowledge of calculus, in particular of integration, is necessary to formally define various concepts for a continuous random variables. Since Calculus is not a prerequisite for this course, let me comment that integration takes the place of summation in case of continuous random variables. Summation sign was used to define the mean and variance of discrete random variables. If you replace the summation sign in the definitions for discrete random variables by the integration sign ∫, you would get the corresponding definitions for continuous random variables.


You can ignore the following, if you did not take calculus.

Suppose X is a continuous random variable and f(x) is the pdf of X. Then,


P(a ≤ X ≤ b) = P(a ≤ X < b) = P(a < X ≤ b) = P(a < X < b) = abf(x)dx


The mean is defined as

μ =E(X)=  ∞ xf(x)dx
-

and the variance of X is defines as

σ 2 =Variance(X)=  ∞ (x- μ )2 f(x)dx.
-

5.2 The Normal Random Variableback to top

The most commonly encountered random variable in nature and real life is the normal random variable.

Definition and Properties of Normal Random Vatiables

The the graphs y=f(x) of the probability density function (pdf) f(x) of a normal random variable X has a perfect bell shape, symmetric around the vertical line x = μ, through the mean μ.

The pdf f(x) is given by the formula:

            f(x)= 1/ (2 Π)] exp [-(x- μ)2/(2σ2)]   for  - < x <

where μ is the mean of X and σ is the standard deviation of X ( which can be proved using the integration formulas given in the last section.)


The following demonstrates the graph of the pdf y=f(x):

Animation 5.2.1
  1. As the standard deviation σ decreases, the whole probability-area concentrates near the mean. To see this phenomenon, minimize the standard deviation.
  2. The graph is symmetric around the mean. As the mean increases or decreases, the graph of the pdf only translates to the right or left. To see this, change the mean.
  3. The standard deviation σ is the distance from the line of symmetry to the point where the concavity of the graph changes from cupped down to cupped up.
  4. Revisit the animation on probability of normal random variable example above or click here for direct access.
  5. A normal random variable X with mean μ and standard deviation σ is also called a N(μ, σ) random variable.

Definition. A normal random variable Z with mean μ = 0 and standard deviation σ = 1 (i.e. a N(0,1)-random variable) is called a Standard Normal Random Variable. The notation Z is used fairly consistently to denote a standard normal random variable. Following are some comments about imortance, properties and usage of N(0, 1) random variables.

  1. Reinterpretingthe above properties of normal random variables, the graph of the pdf y = f(x) of the standard random variable Z is symmetric around the y-axis. Since the total area under the graph and above the x-axis is one, on each side of the y-axis the corresponding area is .5.
  2. Following is another version of the above animations, for the standard normal random variable, to compute probability.

    Animation 5.2.2

  3. Probability tables for the standard normal variable Z are commonly used to compute probability, in college level statistics courses. Such tables for Z would be available in any standard textbook or Internet. Because of the availability of simple and sophisticated tools (like TI-84, Excel, SAS, Minitab) the usage in tables has become outdated. In this course, TI-84 will be used compute probability.
  4. The distribution of the standard normal random variable Z is the most basic ingredient used to deal with problems on any normal random variable X. ( In turn, normal random variables are most basic in nature and real life.) Any problem on normal random variables X, is reduced to a problem on standard normal random variable Z. This process of reduction is called standardization. The following theorem makes this more precise.

Theorem (Standardization). Let X be a N(μ, σ)-random variable. Then Z = [(X-μ)/(σ)] is a standard random varable. So,

P(a < X < b) = P( a- μ
σ
< Z < b- μ
σ
)

OR

P(a < X < b) = P(A < Z < B)

where A= (a-
μ )/σ and B= (b-μ )/σ.

Now use TI-84 "Distr"-menu and the answer would be normalcdf(A,B).

Inverse probability will be discussed subsequently.
Some problems on probability computations will be discussed after the following comment.

Ubiquity of Normal Random Variables: Any random variable that we encounter in nature is, almost certainly, either normal or approximately normal. If there is one concept that you take from this course it is this: nature's random variables are normal or approximately normal. You will hear about normal random variables and the bell curve in your workplace or anywhere you may have to use statistics.


Problem Solving: There are two steps involved in solving problems in this Lesson:

  1. Standardizing the problem to a Z-problem.
  2. Use TI-84 to compute the Z-probability. ( You can use other tools else where, including the animations above.)
  3. Example: Suppose X is a N(2, .5) random variable. What is the probability that P(1 < X < 2.5)?
    Solution: Here μ = 2   σ =. 5
    P(1 < X < 2.5)
    = P([1  -   μ]/σ   <  [X  -   μ]/σ   <  [2.5  -   μ]/σ)
    = P([1  -   μ]/σ   <  Z   <  [2.5  -   μ]/σ)
    = P([1  -   μ]/σ   <  Z   <  [2.5  -   μ]/σ)
    = P([1  -   2]/.5  <  Z   <  [2.5  -   2]/.5)
    = P(-2 < Z < 1) = .8186         [by TI-84 function normalcdf(-2,1) ]

Use of Calculators (TI-84):
Computing Standard Normal (Z) Probability:
  1. To compute P(-2 ≤ Z ≤ 1.5) do the following.
    a) Press 2nd and then Distr (VARS) b) Scroll down to normalcdf and ENTER d) type in "-2, 1.5)" and ENTER. TI will give the answer.
  2. To compute P(Z ≤ 1.5) do the following.
    a) Press 2nd and then Distr (VARS) b) Scroll down to normalcdf and ENTER d) type in "-5, 1.5)" and ENTER. TI will give the answer.
    (We used -5 instead of minus infinity. There is no infinity key in TI and this will be precise enough. You can check tha P(-5 < Z < 5) = normalcdf(-5,5) ≈ 1.)
  3. To compute P(1.53 ≤ Z) do the following. a) Press 2nd and then Distr (VARS) b) Scroll down to normalcdf and ENTER d) type in "1.53, 5)" and ENTER. TI will give the answer.
    (We used 5 instead of infinity.This will be precise enough.)
  4. It will not make any difference, if "≤" is replaced by "<".



Problems on 5.2: the Normal Random Variable

Exercise 5.2.1. Let Z be the standard normal random variable.

  1. Find the probability P(-1.1 < Z < 2.5).
  2. Find the probability P(Z < -2.1).
  3. Find the probability P(-2.1 < Z < -1.5).
  4. Find the probability P(1.5 < Z).

Solution by TI-84:
This is already standard normal. Therefore, not standardization is needed. We use TI-84 :

  1. P(-1.1 < Z < 2.5)=normalcdf(-1.1, 2.5)=8581
  2. P(Z < -2.1)= normalcdf(-5, -2.1) = .01786
  3. P(-2.1 < Z < -1.5) = normalcdf(-2.1, -1.5) = .0489
  4. P(1.5 < Z) = normalcdf(1.5, 5) = .0668

Experiment with the normal animation.

Exercise 5.2.2. Let X be a normal random variable with mean μ = 3 and standard deviation σ = 1.5 .

  1. Find the probability P(-1.1 < X < 2.5).
  2. Find the probability P(X < -2.1).
  3. Find the probability P(-1.2 < X < -0.5).
  4. Find the probability P(1.5 < X).

Solution by TI-84:
Here the mean μ = 3 and standard deviation σ = 1.5
X is N(3, 1.5) -random variable. We use TI-84 :

  1. P(-1.1 < X < 2.5)
    = P([-1.1 - μ]/σ < [X - μ]/σ < [2.5 - μ]/σ)
    = P([-1.1 - μ]/σ < Z < [2.5 - μ]/σ)
    = P([-1.1 - 3]/1.5 < Z < [2.5 - 3;]/1.5)
    = P (- 2.7333 < Z < - .3333) = normalcdf(-2.7333, -.3333) = 0.3663.
  2. P(X < -2.1)
    = P([X - μ]/σ < [-2.1 - μ]/σ)
    = P( Z < [-2.1 - μ]/σ)
    = P( Z < [-2.1 - 3]/1.5)
    = P (Z < - 3.4) = normalcdf(-5, -3.4) = 3.3669*10- 4
  3. P(-1.2 < X < -0.5)
    = P([-1.2 - μ]/σ < [X - μ]/σ < [-0.5 - μ]/σ)
    = P([-1.2 - μ]/σ < Z < [-0.5 - μ]/σ)
    = P([-1.2 - 3]/1.5 < Z < [-0.5 - 3]/1.5)
    = P (- 2.8 < Z < -2.3333) = normalcdf(-2.8, -2.3333) = 0.0073
  4. P(1.5 < X)
    = P([1.5 - μ]/σ < [X - μ]/σ)
    = P([1.5 - μ]/σ < Z)
    = P([1.5 - 3]/1.5; < Z) = P(-1 < Z) =normalcdf(-1, 5) = .8413

Experiment with the normal animation.

Exercise 5.2.3. The length of life of some light bulbs produced in a factory is normally distributed with mean 8640 hours and standard deviation 1440 hours. Find the probability that a bulb will last

  1. less than 5040 hours;
  2. between 5040 hours and 8640 hours.

Solution by TI-84:
Here the mean μ = 8640 and standard deviation σ = 1440.
Let X = length of life of light bulbs produced in the factory. Then X is N(8640, 1440) -random variable. We use TI-84 :

  1. P(X < 5040)
    = P([X - μ]/σ < [5040 - μ]/σ)
    = P( Z < [5040 - μ]/σ)
    = P( Z < [5040 - 8640]/1440;)
    = P (Z < - 2.5) = normalcdf(-5, - 2.5) = .0062
  2. P(5040 < X < 8640)
    = P([5040 - μ]/σ < [X - μ]/σ < [8640 - μ]/σ)
    = P([5040 - μ]/σ < Z < [8640 - μ]/σ)
    = P([5040 - 8640]/1440 < Z < [8640 - 8640]/1440)
    = P (-2.5 < Z < 0) = normalcdf(-2.5, 0) = 0.4939

Exercise 5.2.4. The length X of a fish in a lake has normal distribution with mean 67 cm and standard deviation 21 cm. What proportion (i.e, probability) of fish are between 44 cm and 110 cm long?

Solution by TI-84:
Here the mean μ = 67 and standard deviation σ = 21.
Let X = length of fish in the lake. Then X is N(67, 21) -random variable. We use TI-84 :

P(44 < X < 110)
= P([44 - μ]/σ < [X - μ]/σ < [110 - μ]/σ)
= P([44 - μ]/σ < Z < [110 - μ]/σ)
= P([44 - 67]/21 < Z < [110 - 67]/21)
= P (-1.0952 < Z < 2.0476) = normalcdf(-1.0952, 2.0476) = 0.8430
If the answer is asked in percent, it would be 84.30 percent.

Exercise 5.2.5. The diameter of the pumpkins in my patch has normal distribution with mean 13 inches and standard deviation 4.5 inches. What proportion (i.e., probability) of pumpkins is above 22 inches?

Solution by TI-84:
Here the mean μ = 13 and standard deviation σ = 4.5
Let X = diameter of the pumpkins. Then X is N(13, 4.5) -random variable. We use TI-84 :

P(22 < X )
= P([22 - μ]/σ < [X - μ]/σ )
= P([22 - μ]/σ < Z )
= P([22 - 13]/4.5 < Z )
= P (2 < Z ) = normalcdf(2, 5)= .0227
Since the answer is asked in percent, it would be 2.27 percent.

Exercise 5.2.6. The annual expenditure X of a student is approximately normally distributed with mean μ = 11,000 dollars and standard deviation σ = 1500 dollars. What percent of students spend less than 10,000 dollars?

Solution by TI-84:
Here the mean μ = 11,000 and standard deviation σ = 1500
Let X = annual expenditure of students. Then X is N(11000, 1500) -random variable. We use TI-84 :

P(X < 10000)
= P([X - μ]/σ < [10000 - μ]/σ)
= P(Z < [10000 - μ]/σ)
= P(Z < [10000 - 11000]/1500)
= P ( Z < - .6667) = normalcdf(-5, -.6667) = .2525
Since the answer is asked in percent, it would be 25.25 percent.

Exercise 5.2.7. Suppose the annual production X of milk by cows in a farm is normally distributed with μ = 5500 liters and standard deviation σ = 150 liters. What percent of cows have annual yield less than 5155 liters?

Solution by TI-84:
Here the mean μ = 5500 and standard deviation σ = 150
Let X = annual production by cows in the farm. Then X is N(5500, 150) -random variable. We use TI-84 :

P(X < 5155)
= P([X - μ]/σ < [5155 - μ]/σ)
= P(Z < [5155 - μ]/σ)
= P(Z < [5155 - 5500]/150)
= P ( Z < - 2.3) = normalcdf(-5, -2.3) = .0107
Since the answer is asked in percent, it would be 1.07 percent.

Exercise 5.2.8. The amount of vegetable oil X produced by a machine in a day is normally distributed with μ = 130 liters and standard deviation σ = 25 liters. What is the probability that a machine will produce between 120 liters and 150 liters on a day?

Solution by TI-84:
Here the mean μ = 130 and standard deviation σ = 25
Let X = vegetable oil produced by the machines. Then X is N(130, 25) -random variable. We use TI-84 :

P(120 < X < 150)
= P([120 - μ]/σ < [X - μ]/σ < [150 - μ]/σ)
= P([120 - μ]/σ < Z < [150 - μ]/σ)
= P([120 - 130]/25 < Z < [150 - 130]/25)
= P (-.4 < Z < .8) = 0.4436
Since the answer is asked in percent, it would be 44.36 percent.

Exercise 5.2.9. The weight X at birth of babies is normally distributed with mean μ = 114 oz and standard deviation σ = 18 oz. What percent of babies will have birth weight below 141 oz?

Solution by TI-84:
Here the mean μ = 114 and standard deviation σ = 18
Let X = vegetable oil produced by the machines. Then X is N(114, 18) -random variable. We use TI-84 :

P(X < 141)
= P([X - μ]/σ < [114 - μ]/σ)
= P(Z < [141 - μ]/σ)
= P(Z < [141 - 114]/18)
= P ( Z < 1.5) = normalcdf(-5, 1.5) = .9332
Since the answer is asked in percent, it would be 93.32 percent.


5. 2. 1 Inverse Probability and Cut-Off Values

In certain situation, the probability is known and the variable values would have to be determined. This concept is called inverse probability. These are also called problems of Cut-Off Values. Following is an example of such a situation.

Example. Most common examples of inverse probability or cut-off values would be percentiles. Suppose X is the birth weight of babies in a county. When you want to determine the 75th percentile of the birth weight (or of weight at any age), it means that it is given that the probability P(X < u)=.75 and the seventy five percentile u is to be determined.

Pediatricians have charts of percentile weights. These charts have age on the horizontal axis and weight on the vertical axis. Several graphs are given in the chart, representing various percentiles (I guess, one for each 5 percent). So, there may be one graph for each of 5th, 10th and up to 95th or 100th percentile.

Other such example would be income percentiles of the US population.

Example: The annual income X in a county is normally distributed with mean μ= $37,000 and standard deviation $15,000. Your annual income is $75,000. Do you think that your annual income is above 90 percentile?

Solution by TI-84:
Here the mean μ = 37000 and standard deviation σ = 15000
Let X = annual income of the members of the county. Then X is N(37000, 15000) -random variable. We use TI-84 :

Let u = 90th percentile of the annual income. P(X < u)= .90
P([X - μ]/σ < [u - μ]/σ) = .90
P(Z < u - μ]/σ) = .90 Also
P(Z < 1.2816) =.90        [Because From TI-84, invNorm(.90) = 1.2816]

[u - μ]/σ = 1.2816 [By comparing the above two equations]
So,
u = μ + 1.2816*σ = 37000+1.2816*15000= 56224.
That means that the 95th percentile of the annual income is $56224.
We conclude your annual income $75,000 is above 90th percentile.


Use of Calculators (TI-84):
Inverse Probability for Standard Normal (Z):

TI-84 has an "invNorm" function.
invNorm(.90)= 1.2816 means P(Z < 1.2816) =.90.

To compute the variable value (or Z-value) u such that P(Z < L) = .75 do the following.
a) Press 2nd and then Distr (VARS) b) Scroll down to invNorm and ENTER d) type in ".75)" and ENTER. TI will give the answer.
Problems on unknown variable value c and given probability (for normal random variable X), do the following:
  1. Reduce the problem to P(X < c) = a known probability.
  2. Standardize and reduce to P(Z < U)= a known probability.
  3. Use the invNorm-function from TI-84.
  4. Compare two equations and U will be known. Now solve for c.

Problems on Cut-off values

Exercise 5.2.10. Let Z be the standard normal random variable.

  1. Given that P(-1.1 < Z < c)=.6881, find c.
  2. Given that P(Z < c)=0.0222, find c.
  3. Given that P(c < Z < 1.5) = 0.0919, find c.
  4. Given that P(c < Z) = 0.102, find c.

Experiment with the normal animation.

Solution by TI-84:
This is a problem in standard normal Z. Standardization would not be needed for this problem, because Z is already standard normal. We use TI-84 :

  1. P(-1.1 < Z < c)=.6881.
    P(Z < c) - P(Z < -1.1) = .6881.
    P(Z < c) - normalcdf(-5, -1.1) = .6881.
    P(Z < c) - .1357 = .6881.
    P(Z < c) = .8238.  Also,
    P(Z < .9299) = .8238.        [Because invNorm(.8238) = .9299].   By comparing these two equations
    c = .9299
  2. P(Z < c)=0.0222 Also,
    P(Z < -2.0103) = 0.0222        [Because invNorm(.0222) = -2.0103].   by comparing these two equations
    c = -2.0103
  3. P(c < Z < 1.5) = 0.0919
    P(Z < 1.5) - P(Z < c) = .0919.
    normalcdf(-5, 1.5) - P(Z < c) = .0919.
    .9332 - P(Z < c) = .0919.
    P(Z < c) = .8413.  Also,
    P(Z < .9998) = .8413.        [Because invNorm(.8413) = .9998].   By comparing these two equations
    c = .9998
  4. P(c < Z) = 0.102
    1- P(Z < c) = 0.102 P(Z < c) = .898.  Also,
    P(Z < 1.2702) = .898.        [Because invNorm(.898.) = 1.2702].   By comparing these two equations
    c = 1.2702

Exercise 5.2.11. The length X of a fish in a lake has normal distribution with mean 67 cm and standard deviation 21 cm. On a fishing trip to the lake, you are instructed to keep only those in the upper 33 percent in length. What is the cut-off length, above which you are permitted keep?

Solution by TI-84:
Here the mean μ = 67 and standard deviation σ = 21
Let X = the length of in the lake. Then X is N(67, 21) -random variable. We use TI-84 :

Let L = the cut-off length.
P(L < X)= .33
1 - P(X < L) =.33
P(X < L) = .67 P([X - μ]/σ < [L - μ]/σ) = .67
P(Z < L - μ]/σ) = .67 Also
P(Z < .4399) =.67        [Because From TI-84, invNorm(.67) = .4399]

[L - μ]/σ = .4399 [By comparing the above two equations]
So,
L = μ + .4399*σ = 67 + .4399*21= 76.2379 cm.

Exercise 5.2.12. The telephone company's data shows that length X of their international calls has normal distribution with mean 11.5 minutes and standard deviation 4.3 minutes. The company decided to give a special rate for the longest 20 percent calls. What is the cut-off time length?

Solution by TI-84:
Here the mean μ = 11.5 and standard deviation σ = 4.3
Let X = the length of the international calls in minutes. Then X is N(11.5, 4.3) -random variable. We use TI-84 :

Let L = the cut-off length above which the special rate applies.
P(L < X)= .20
1 - P(X < L) =.20
P(X < L) = .80 P([X - μ]/σ < [L - μ]/σ) = .80
P(Z < L - μ]/σ) = .80 Also
P(Z < .8416) =.80        [Because From TI-84, invNorm(.80) = .8416]

[L - μ]/σ = .8416 [By comparing the above two equations]
So,
L = μ + .8416*σ = 11.5 + .8416*4.3= 15.1189 minutes.

Exercise 5.2.13. The weight X of babies (of a fixed age) is normally distributed with with mean μ = 212 oz and standard deviation σ = 25 oz. Doctors would be concerned (not necessarily alarmed) if a baby is among the lower 5 percent in weight. Find the cut-off weight L below which the doctors will be concerned.

Solution by TI-84:
Here the mean μ = 212 and standard deviation σ = 25
Let X = the weight of the babies of this age group. Then X is N(212, 25) -random variable. We use TI-84 :

Let U = the cut-off length below which doctors will be concerned.
P(X < U)= .05
P([X - μ]/σ < [U - μ]/σ ) = .05
P( Z < [U- μ]/σ) = .05 Also
P(Z < - 1.6449) =.05        [Because From TI-84, invNorm(.05) = - 1.6449]

[U - μ]/σ = - 1.6449) [By comparing the above two equations]
So,
U = μ + (- 1.6449)*σ = 212 + (- 1.6449)*25= 170.8775 oz.

Exercise 5.2.14. Let X be a normal random variable with mean μ = 4 and standard deviation σ = 2.5. Find the values of c from the following equations.

  1. Suppose P(L   ≤  X  ≤  6.5)   =  .7787. What is L?
  2. Suppose P(2.5  ≤   X   ≤   U) = .6000. What is U?
  3. Suppose P(X   ≤   u) = .0548. What is u
  4. Suppose P(l  ≤   X) = .7775. What is l?

Solution by TI-84:
Here the mean μ = 4 and standard deviation σ = 2.5.
We use TI-84 :

  1. P(L ≤   X   ≤ 6.5) = .7787
    P([L - μ]/σ   ≤   [X - μ]/σ   ≤   [6.5 - μ]/σ) = .7787
    P([L - μ]/σ ≤ Z   ≤   [6.5 - μ]/σ) = .7787
    P(Z  ≤   [6.5 - μ]/σ) - P([Z  ≤   [L - μ]/σ) = .7787
    P(Z  ≤  [6.5 - 4]/2.5) - P([Z  ≤   [L - μ]/σ) = .7787
    P(Z ≤ 1) - P([Z ≤  [L - μ]/σ) = .7787
    normalcdf(-5, 1) - P([Z  ≤   [L - μ]/σ) = .7787
    .8413 - P([Z  ≤   [L - μ]/σ) = .7787
    P([Z  ≤   [L - μ]/σ) = .0626           Also
    P(Z  ≤   -1.5333) = .0626         because invNormal(.0626) = -1.5333
    Therefore, comparing
    [L - μ]/σ = -1.5333
    L = μ +σ*(-1.5333) = 4 + 2.5*(-1.5333) = 0.1668
  2. P(2.5  ≤   X   ≤   U) = .6000.
    P([2.5 - μ]/σ   ≤   [X - μ]/σ   ≤   [U - μ]/σ) = .6000
    P([2.5 - μ]/σ ≤ Z   ≤   [U - μ]/σ) = .6000
    P(Z  ≤   [U - μ]/σ) - P([Z  ≤   [2.5 - μ]/σ) = .6000
    P(Z  ≤  [U - μ]/σ) - P([Z  ≤   [2.5 - 4]/2.5) = .6000
    P(Z  ≤  [U - μ]/σ) - P([Z  ≤   -.6) = .6000
    P(Z  ≤  [U - μ]/σ) - normalcdf(-5, -.6) = .6000
    P(Z  ≤  [U - μ]/σ) - .2743 = .6000
    P(Z  ≤  [U - μ]/σ) =.8743           Also
    P(Z  ≤   1.1470) = .8743         because invNormal(.8743) = 1.1470

    Therefore, comparing
    [U - μ]/σ = 1.1470
    U = μ +σ*(1.1470) = 4 + 2.5*(1.1470) = 6.8675
  3. P(X   ≤   u) = .0548.
    P([X - μ]/σ   ≤   [u - μ]/σ) = .0548
    P(Z   ≤   [u - μ]/σ) = .0548
    P(Z  ≤   [u - μ]/σ) = .0548
    P(Z  ≤  [u - μ]/σ) = .0548
    P(Z  ≤  [u - μ]/σ) = .0548
    P(Z  ≤  [u - μ]/σ) = .0548
    P(Z  ≤  [u - μ]/σ) = .0548
    P(Z  ≤  [u - μ]/σ) = .0548           Also
    P(Z  ≤   -1.6000) = .0548         because invNormal(.0548) = -1.6000

    Therefore, comparing
    [u - μ]/σ = -1.6000
    u = μ +σ*(-1.6000) = 4 + 2.5*(-1.6000) = 0
  4. P(l  ≤   X) = .7775
    1 - P(X  ≤   l) = .7775
    P(X  ≤   l) =.2225
    P([X - μ]/σ   ≤   [l - μ]/σ) = .2225
    P(Z   ≤   [l - μ]/σ) = .2225
    P(Z  ≤   [l - μ]/σ) = .2225
    P(Z  ≤  [l - μ]/σ) = .2225
    P(Z  ≤  [l - μ]/σ) = .2225
    P(Z  ≤  [l - μ]/σ) = .2225
    P(Z  ≤  [l - μ]/σ) = .2225
    P(Z  ≤  [l - μ]/σ) = .2225           Also
    P(Z  ≤   -.7638) = .2225         because invNormal(.2225) = -.7638

    Therefore, comparing
    [l - μ]/σ = -.7638
    l = μ +σ*(-.7638) = 4 + 2.5*(-.7638) = 2.0905

Exercise 5.2.15. Monthly water consumption X per household, in a subdivision in Kansas City, has normal distribution with mean 15000 gallons and standard deviation 3000 gallons. It has been decided that a surcharge will be imposed for those in the top 25 percent. Find the cut-off consumption U in gallons.

Solution:
Here the mean μ = 15000 and standard deviation σ = 3000
U=Cut-off
P(U  ≤   X) = .25
1 - P(X  ≤   U) = .75
P(X  ≤   U) =.75
P([X - μ]/σ   ≤   [U - μ]/σ) = .75
P(Z   ≤   [U - μ]/σ) = .75
P(Z  ≤   [U - μ]/σ) = .75
P(Z  ≤  [U - μ]/σ) = .75
P(Z  ≤  [U - μ]/σ) = .75
P(Z  ≤  [U - μ]/σ) = .75
P(Z  ≤  [U - μ]/σ) = .75
P(Z  ≤  [U - μ]/σ) = .75           Also
P(Z  ≤   .6745) = .75         because invNormal(.75) = .6745

Therefore, comparing
[U - μ]/σ = .6745
l = μ +σ*(.6745) = 15000 + 3000*(.6745) = 17023.5


5.3 Normal Approximation to Binomialback to top

Like many random variables in nature, Binomial(n,p)-random variables also behave approximately like Normal random variables. A reexamination of the graph of the probability function of the Binomial(n,p)-random variables demonstrates that the graph has properties similar to that of Normal:

  1. The graph maintains a bell-shape.
  2. The graph also peaks at (or near) the mean μ = np.
Animation 5.3.1

Difficulty to deal with Binomial random variables with large number of trials n was demonstrated by Exercise 4.3.10, at the end of Lesson 4. To approximate Binomial probability, Normal probability distribution will be used as in the next theorem.


Theorem. Suppose X is a B(n,p) random variable. Assume n is large and p is not very close to 0 or 1. Then X behaves, approximately, like a N(μ, σ) random variable, with mean

μ = np and standard deviation σ = np(1-p).

For r=0,1,…,n   probability is approximated as follows:

P(X = r) = P(r-0.5 < X < r + .5) ≈ P(L < Z < R)

where L=(r-0.5-μ)/σ and R=(r+0.5-μ)/σ. ( As before, Z denotes the standard normal random variable. )

More generally, for r,s=0,1,…,n

P(r ≤ X ≤ s) = P(r-0.5 < X < s + .5) ≈ P(L < Z < R)

where L=(r-0.5-μ)/σ and R=(s+0.5-μ)/σ.


Remark. This adjustment by .5 on two sides is called continuity correction. Recall P(Y=r) =0 for any continuous random variable Y. Because of this, we could not have treated, naively, X as Normal, without continuity "correction". On the other hand, when n is very large, it would not make any significant difference.


Problems on 5.3: Normal Approximation to Binomial

Exercise 5.3.1. A Lawrence bank knows that 35 percent of its customers will visit the drive-through window. If 400 customers visit the bank, what is the approximate probability that more than 120 will visit the drive-through window?

Solution:
Here p=.35 and n=400.
First step is to compute the mean μ and the standard deviation σ :
μ = np= 400*.35 = 140 and σ = np(1-p) = 400*.35*(1-.35) = 9.5394
Let X = number of customers who will visit the drive-through window. Then X is B(400, .35) random variable and we will use N(140, 9.5394) to approximate.

Now "X is more than 120" means "X > 120", which is "X ≥121".
P(121 ≤ X) = P(120.5 ≤ X)         [This is continuity correction.]
= P([120.5 - μ]/σ < [X - μ]/σ )
≈ P([120.5 - μ]/σ < Z )
= P([120.5 - 140]/9.5394 < Z )
= P (-2.0442 < Z ) = normalcdf(-2.0442, 5)= .9795

Exercise 5.3.2. It is known that the probability that a household owns a food processor is 0.1. If 190 households are interviewed, find the approximate probability that

  1. more than 26 households own a food processor;
  2. less than 30 households own a food processor.

Solution:
Here p=.1 and n=190.
First step is to compute the mean μ and the standard deviation σ :
μ = np= 190*.1=19 and σ = np(1-p) = 190*.1*(1-.1) = 4.1352
Let X = number of households who own a food processor. Then X is B(190, .1) random variable and we will use N(19, 4.1352) to approximate.

  1. Now "X is more than 26" means "X > 26", which is "X ≥27".
    P(27 ≤ X) = P(120.5 ≤ X)         [This is continuity correction.]
    = P([26.5 - μ]/σ < [X - μ]/σ )
    ≈ P([26.5 - μ]/σ < Z )
    = P([26.5 - 19]/4.1352 < Z )
    = P (1.8137, < Z ) = normalcdf(1.8137, 5)= .0349
  2. Again, "X is less than 30" means "X < 30", which is "X ≤ 29".
    P(X ≤ 29)= P(X ≤ 29.5)        [This is continuity correction.]
    = P([X - μ]/σ < [29.5 - μ]/σ)
    ≈ P(Z < [29.5 - μ]/σ)
    = P(Z < [29.5 -19 ]/4.1352)
    = P ( Z < 2.5392) = normalcdf(-5, 2.5392) = .9944

Exercise 5.3.3. The campaign committee of a candidate claims that sixty percent of the voters are in favor of the candidate. You interview 150 voters. Assuming that the campaign committe's claim is accurate, what is the approximate probability that less than 77 will favor the candidate?
Solution

Solution:
Here p=.6 and n=150.
First step is to compute the mean μ and the standard deviation σ :
μ = np= 150*.6= 90 and σ = np(1-p) = 150*.6*(1-.6) = 6
Let X = number of voters in favor of the candidate. Then X is B(150, .6 ) random variable and we will use N(90, 6) to approximate.
Now, "X is less than 77" means "X < 77", which is "X ≤ 76".
P(X ≤ 76)= P(X ≤ 76.5)        [This is continuity correction.]
= P([X - μ]/σ < [76.5 - μ]/σ)
≈ P(Z < [76.5 - μ]/σ)
= P(Z < [76.5 -90 ]/6)
= P ( Z < -2.25) = normalcdf(-5, - 2.25) = .0122

Exercise 5.3.4. A technique is used to fertilize eggs in a fertility clinic laboratory. It is known that the probability that an egg will be fertilized by this technique is 0.1. If 500 eggs are treated, what is the probability that at least 60 eggs will be fertilized?

Solution:
Here p=.1 and n=500.
First step is to compute the mean μ and the standard deviation σ :
μ = np= 500*.1=50 and σ = np(1-p) = 500*.1*(1-.1) = 6.7082
Let X = number of egg will be fertilize. Then X is B(500, .1) random variable and we will use N(50, 6.7082) to approximate.

Now "X is at least 60" means "X ≥ 60".
P(60 ≤ X) = P(59.5 ≤ X)         [This is continuity correction.]
= P([59.5 - μ]/σ < [X - μ]/σ )
≈ P([59.5 - μ]/σ < Z )
= P([59.5 - 50]/6.7082 < Z )
= P (1.4162, < Z ) = normalcdf(1.4162, 5)= .0784

Exercise 5.3.5. The probability that a computer chip produced in a factory is defective is is .2. If you have a sample of 60 chips, what is the probability that the number of defective chips will be less than 20?

Solution:
Here p=.2 and n=60.
First step is to compute the mean μ and the standard deviation σ :
μ = np= 60*.2= 12 and σ = np(1-p) = 60*.2*(1-.2) = 3.0984
Let X = number of defective chips. Then X is B(60, .2 ) random variable and we will use N(60, 3.0984) to approximate.
Now, "X is less than 20" means "X < 20", which is "X ≤ 19".
P(X ≤ 19)= P(X ≤ 19.5)        [This is continuity correction.]
= P([X - μ]/σ < [19.5 - μ]/σ)
≈ P(Z < [19.5 - μ]/σ)
= P(Z < [19.5 -12 ]/3.0984)
= P ( Z < 2.4206) = normalcdf(-5, 2.4206) = .9923

Exercise 5.3.6. The probability that a light bulb produced by a machine is defective is p = 0.2. Suppose a quality control inspector takes a sample of 120 bulbs. What is the probability that more than 30 bulbs will be defective?

Solution:
Here p=.2 and n=120.
First step is to compute the mean μ and the standard deviation σ :
μ = np= 120*.2 = 24 and σ = np(1-p) = 120*.2*(1-.8) = 4.3818
Let X = number of defective bulbs. Then X is B(120, .2) random variable and we will use N(24, 4.3818) to approximate.

Now "X is more than 30" means "X > 30", which is "X ≥ 31".
P(31 ≤ X) = P(30.5 ≤ X)         [This is continuity correction.]
= P([30.5 - μ]/σ < [X - μ]/σ )
≈ P([30.5 - μ]/σ < Z )
= P([30.5 - 24]/4.3818 < Z )
= P (1.4834, < Z ) = normalcdf(1.4834, 5)= .0694

Exercise 5.3.7. Suppose the probability that a student has access to the Internet is p = 0.8. Suppose you interview 160 students. What is the probability that less than 120 students will have access to the Internet?

Exercise 5.3.8. Suppose that the probability that a person favors medical use of marijuana is p = 0.6. If 780 individuals are interviewed, what is the probability that less than 450 will be in favor?

Exercise 5.3.9. Suppose that the probability that a middle-income family invests in the stock market is p = 0.8. If we interview 880 middle-income families, what is the probability that more than 700 have invested in the stock market?

Exercise 5.3.10. Suppose that an insurance company knows from experience that the probability that a life-insurance policyholder will survive another 10 years is p = 0.9. The company has 2280 policyholders. What is the probability that more than 2025 will survive another 10 years.

back to top