Math 365, Elementary Statistics |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Chapter 5 : Continuous Random VariablesSatya Mandal 5.1 Probability Density Function (pdf)
|
Animation 5.1.1 | |
Four diagrams of the graph of the pdf of four different random variables and probability computations by area. | |
Important Comment: The total area under the graph of any pdf and above the x-axis must be equal to one. |
Remark. Given a continuous random variable X, to get a model for the pdf f(x), we look at large samples and their relative frequency histogram of the X-values.
Suppose X is a continuous random variable. Then,
In contrast to discrete random variable,
for any real number a, the probability P(X = a)=0 .
We see this as follows:
P(X = a) = P(a ≤ X ≤a) = (area of segment of the vertical line x=a, from the x-axis upto the pdf) = 0.
The mean μ, variance σ2 and standard deviation σ of continuous random variables X represent the same thing as in the case of discrete random variables. The mean μ of X represents the average value of X. The mean μ is also called the Expected value E(X) of X. The standard deviationσ of X is a measure of variability of X.
A formal definition involves some knowledge of Calculus (integration), which is not a prerequisite for this course. For this course, a formal definition will not be necessary.
This subsection is devoted to showcase the graph of the pdfs of some of the well known continuous random variables and probability computations using some animations.
Example.
The Normal random variable
will be discussed most extensively.
We will postpone any further discussion on it until the next section.
For now, you can play with the following animation on the pdf and probability.
You will notice, that
two unknown parameters
the mean μ and the standard deviation σ determine
the pdf of a normal random variable.
A goal of this course would be to estimate these parameters.
Animation 5.1.2 |
Example.
The T-random variable
will be used lated in Lessons 7, 8, 9.
We will postpone any further discussion on it until Lesson 7.
For now, you can play with the following animation on the pdf and probability.
You will notice that T-random variables are determined a
degrees of freedom.
The degrees of freedom is would be known.
Animation 5.1.3 |
Example.
The Chi Square (χ2 ) random variable
will be used later in Lessons 7, 8, 9.
Discussion is postpned until Lesson 7.
For now, you can play with the following animation on the pdf and probability.
You will notice that Chi Square random variables are determined a
degrees of freedom.
The degrees of freedom would be known.
Animation 5.1.4 |
Example.
The Uniform random variables
will not be used in this course.
But it is easy and interesting. One can say, it is a continuous counter part of the Equally Likely Probability.
f(x) | = | 1/(U-L) if L ≤ x ≤ U | |
0 Otherwise |
In this case, we say that X is a Uniform(L,U)-random variable.
In particular, X is a Uniform(0,1)-random variable, if the pdf of X is given by:
f(x) | = | 1 if 0 ≤ x ≤ 1 | |
0 Otherwise |
Similarly, Y is a Uniform(0,1)-random variable, if the pdf of X is given by
g(x) | = | 1/4 if -1 ≤ x ≤ 3 | |
0 Otherwise |
Animation 5.1.5 |
Example.
The Exponential random variable
will not be used in this course.
You can play with the following animation on the pdf and probability.
You will notice, that
one unknown parameters
lambda determines the pdf.
A goal of some other course would be to estimate this parameter.
Animation 5.1.6 |
Knowledge of calculus, in particular of integration, is necessary to formally define various concepts for a continuous random variables. Since Calculus is not a prerequisite for this course, let me comment that integration takes the place of summation in case of continuous random variables. Summation sign was used to define the mean and variance of discrete random variables. If you replace the summation sign ∑ in the definitions for discrete random variables by the integration sign ∫, you would get the corresponding definitions for continuous random variables.
You can ignore the following, if you did not take calculus.
Suppose X is a continuous random variable and f(x) is the pdf of X. Then,
P(a ≤ X ≤ b) = P(a ≤ X < b) = P(a < X ≤ b) = P(a < X < b) = ∫abf(x)dx
The mean is defined as
μ =E(X)=∫ | ∞ | xf(x)dx |
-∞ |
and the variance of X is defines as
σ 2 =Variance(X)=∫ | ∞ | (x- μ )2 f(x)dx. |
-∞ |
The most commonly encountered random variable in nature and real life is the normal random variable.
The the graphs y=f(x) of the probability density function (pdf) f(x) of a normal random variable X has a perfect bell shape, symmetric around the vertical line x = μ, through the mean μ.
The pdf f(x) is given by the formula:
f(x)=
1/[σ √(2
Π)] exp [-(x- μ)2/(2σ2)]
for - ∞ < x
< ∞
where μ is the mean of X and σ is the standard deviation of X ( which can be proved using the integration formulas given in the last section.)
The following demonstrates the graph of the pdf y=f(x):
Animation 5.2.1 |
|
Definition. A normal random variable Z with mean μ = 0 and standard deviation σ = 1 (i.e. a N(0,1)-random variable) is called a Standard Normal Random Variable. The notation Z is used fairly consistently to denote a standard normal random variable. Following are some comments about imortance, properties and usage of N(0, 1) random variables.
Animation 5.2.2 |
Theorem (Standardization). Let X be a N(μ, σ)-random variable. Then Z = [(X-μ)/(σ)] is a standard random varable. So,
P(a < X < b) = P( | a- μ
σ |
< Z < | b- μ
σ |
) |
P(a < X < b) = P(A < Z
< B)
where A= (a-μ
)/σ
and B= (b-μ
)/σ.
Now use TI-84 "Distr"-menu and the answer would be normalcdf(A,B).
Inverse probability will be discussed subsequently.Ubiquity of Normal Random Variables: Any random variable that we encounter in nature is, almost certainly, either normal or approximately normal. If there is one concept that you take from this course it is this: nature's random variables are normal or approximately normal. You will hear about normal random variables and the bell curve in your workplace or anywhere you may have to use statistics.
Problem Solving: There are two steps involved in solving problems in this Lesson:
Use of Calculators (TI-84): |
---|
Computing Standard Normal (Z) Probability:
|
Problems on 5.2: the Normal Random Variable
Exercise 5.2.1. Let Z be the standard normal random variable.
Solution by TI-84:
This is already standard normal. Therefore, not standardization is needed.
We use TI-84 :
Experiment with the normal animation.
Exercise 5.2.2. Let X be a normal random variable with mean μ = 3 and standard deviation σ = 1.5 .
Solution by TI-84:
Here the mean μ = 3 and standard deviation σ = 1.5
X is N(3, 1.5) -random variable. We use TI-84 :
Experiment with the normal animation.
Exercise 5.2.3. The length of life of some light bulbs produced in a factory is normally distributed with mean 8640 hours and standard deviation 1440 hours. Find the probability that a bulb will last
Solution by TI-84:
Here the mean μ = 8640 and standard deviation σ = 1440.
Let X = length of life of light bulbs produced in the factory.
Then X is N(8640, 1440) -random variable. We use TI-84 :
Exercise 5.2.4. The length X of a fish in
a lake has normal distribution with mean 67 cm and standard deviation
21 cm. What proportion (i.e, probability) of fish are between 44 cm
and 110 cm long?
Solution by TI-84:
Here the mean μ = 67 and standard deviation σ = 21.
Let X = length of fish in the lake.
Then X is N(67, 21) -random variable. We use TI-84 :
P(44 < X < 110)
= P([44 - μ]/σ < [X - μ]/σ < [110 - μ]/σ)
= P([44 - μ]/σ < Z < [110 - μ]/σ)
= P([44 - 67]/21 < Z < [110 - 67]/21)
= P (-1.0952 < Z < 2.0476) = normalcdf(-1.0952, 2.0476) = 0.8430
If the answer is asked in percent, it would be 84.30 percent.
Exercise 5.2.5. The diameter of the pumpkins
in my patch has normal distribution with mean 13 inches and standard
deviation 4.5 inches. What proportion (i.e., probability) of pumpkins
is above 22 inches?
Solution by TI-84:
Here the mean μ = 13 and standard deviation σ = 4.5
Let X = diameter of the pumpkins.
Then X is N(13, 4.5) -random variable. We use TI-84 :
P(22 < X )
= P([22 - μ]/σ < [X - μ]/σ )
= P([22 - μ]/σ < Z )
= P([22 - 13]/4.5 < Z )
= P (2 < Z ) = normalcdf(2, 5)= .0227
Since the answer is asked in percent, it would be 2.27 percent.
Exercise 5.2.6. The annual expenditure X of a student is approximately normally distributed with mean μ = 11,000 dollars and standard deviation σ = 1500 dollars. What percent of students spend less than 10,000 dollars?
Solution by TI-84:
Here the mean μ = 11,000 and standard deviation σ = 1500
Let X = annual expenditure of students.
Then X is N(11000, 1500) -random variable. We use TI-84 :
P(X < 10000)
= P([X - μ]/σ < [10000 - μ]/σ)
= P(Z < [10000 - μ]/σ)
= P(Z < [10000 - 11000]/1500)
= P ( Z < - .6667) = normalcdf(-5, -.6667) = .2525
Since the answer is asked in percent, it would be 25.25 percent.
Exercise 5.2.7. Suppose the annual production X of milk by cows in a farm is normally distributed with μ = 5500 liters and standard deviation σ = 150 liters. What percent of cows have annual yield less than 5155 liters?
Solution by TI-84:
Here the mean μ = 5500 and standard deviation σ = 150
Let X = annual production by cows in the farm.
Then X is N(5500, 150) -random variable. We use TI-84 :
P(X < 5155)
= P([X - μ]/σ < [5155 - μ]/σ)
= P(Z < [5155 - μ]/σ)
= P(Z < [5155 - 5500]/150)
= P ( Z < - 2.3) = normalcdf(-5, -2.3) = .0107
Since the answer is asked in percent, it would be 1.07 percent.
Exercise 5.2.8. The amount of vegetable oil
X produced by a machine in a day is normally distributed with μ
= 130 liters and standard deviation σ
= 25 liters. What is the probability that a machine will produce between
120 liters and 150 liters on a day?
Solution by TI-84:
Here the mean μ = 130 and standard deviation σ = 25
Let X = vegetable oil produced by the machines.
Then X is N(130, 25) -random variable. We use TI-84 :
P(120 < X < 150)
= P([120 - μ]/σ < [X - μ]/σ < [150 - μ]/σ)
= P([120 - μ]/σ < Z < [150 - μ]/σ)
= P([120 - 130]/25 < Z < [150 - 130]/25)
= P (-.4 < Z < .8) = 0.4436
Since the answer is asked in percent, it would be 44.36 percent.
Exercise 5.2.9. The weight X at birth of
babies is normally distributed with mean μ
= 114 oz and standard deviation σ =
18 oz. What percent of babies will have birth weight below 141 oz?
Solution by TI-84:
Here the mean μ = 114 and standard deviation σ = 18
Let X = vegetable oil produced by the machines.
Then X is N(114, 18) -random variable. We use TI-84 :
P(X < 141)
= P([X - μ]/σ < [114 - μ]/σ)
= P(Z < [141 - μ]/σ)
= P(Z < [141 - 114]/18)
= P ( Z < 1.5) = normalcdf(-5, 1.5) = .9332
Since the answer is asked in percent, it would be 93.32 percent.
In certain situation, the probability is known and the variable values would have to be determined. This concept is called inverse probability. These are also called problems of Cut-Off Values. Following is an example of such a situation.
Example. Most common examples of inverse probability or cut-off values would be percentiles. Suppose X is the birth weight of babies in a county. When you want to determine the 75th percentile of the birth weight (or of weight at any age), it means that it is given that the probability P(X < u)=.75 and the seventy five percentile u is to be determined.
Pediatricians have charts of percentile weights. These charts have age on the horizontal axis and weight on the vertical axis. Several graphs are given in the chart, representing various percentiles (I guess, one for each 5 percent). So, there may be one graph for each of 5th, 10th and up to 95th or 100th percentile.
Other such example would be income percentiles of the US population.
Example: The annual income X in a county is normally distributed with mean μ= $37,000 and standard deviation $15,000. Your annual income is $75,000. Do you think that your annual income is above 90 percentile?
Solution by TI-84:
Here the mean μ = 37000 and standard deviation σ = 15000
Let X = annual income of the members of the county.
Then X is N(37000, 15000) -random variable. We use TI-84 :
Let u = 90th percentile of the annual income.
P(X < u)= .90
P([X - μ]/σ < [u - μ]/σ) = .90
P(Z < u - μ]/σ) = .90 Also
P(Z < 1.2816) =.90
[Because From TI-84, invNorm(.90) = 1.2816]
[u - μ]/σ = 1.2816 [By comparing the above two equations]
So,
u = μ + 1.2816*σ = 37000+1.2816*15000= 56224.
That means that the 95th percentile of the annual income is $56224.
We conclude your annual income $75,000 is above 90th percentile.
Use of Calculators (TI-84): |
---|
Inverse Probability for Standard Normal (Z):
TI-84 has an "invNorm" function. invNorm(.90)= 1.2816 means P(Z < 1.2816) =.90. To compute the variable value (or Z-value) u such that P(Z < L) = .75 do the following. a) Press 2nd and then Distr (VARS) b) Scroll down to invNorm and ENTER d) type in ".75)" and ENTER. TI will give the answer. |
Problems on unknown variable
value c and given probability (for normal
random variable X), do the following:
|
Problems on Cut-off values
Exercise 5.2.10. Let Z be the standard normal random variable.
Experiment with the normal animation.
Solution by TI-84:
This is a problem in standard normal Z.
Standardization would not be needed for this problem,
because Z is already standard normal. We use TI-84 :
Exercise 5.2.11. The length X of a fish in
a lake has normal distribution with mean 67 cm and standard deviation
21 cm. On a fishing trip to the lake, you are instructed to keep
only
those in the upper 33 percent in length. What is the cut-off length,
above which you are permitted keep?
Solution by TI-84:
Here the mean μ = 67 and standard deviation σ = 21
Let X = the length of in the lake.
Then X is N(67, 21) -random variable. We use TI-84 :
Let L = the cut-off length.
P(L < X)= .33
1 - P(X < L) =.33
P(X < L) = .67
P([X - μ]/σ < [L - μ]/σ) = .67
P(Z < L - μ]/σ) = .67 Also
P(Z < .4399) =.67
[Because From TI-84, invNorm(.67) = .4399]
[L - μ]/σ = .4399 [By comparing the above two equations]
So,
L = μ + .4399*σ = 67 + .4399*21= 76.2379 cm.
Exercise 5.2.12. The telephone company's
data shows that length X of their international calls has normal distribution
with mean 11.5 minutes and standard deviation 4.3 minutes. The company
decided to give a special rate for the longest 20 percent calls. What
is the cut-off time length?
Solution by TI-84:
Here the mean μ = 11.5 and standard deviation σ = 4.3
Let X = the length of the international calls in minutes.
Then X is N(11.5, 4.3) -random variable. We use TI-84 :
Let L = the cut-off length above which the special rate applies.
P(L < X)= .20
1 - P(X < L) =.20
P(X < L) = .80
P([X - μ]/σ < [L - μ]/σ) = .80
P(Z < L - μ]/σ) = .80 Also
P(Z < .8416) =.80
[Because From TI-84, invNorm(.80) = .8416]
[L - μ]/σ = .8416 [By comparing the above two equations]
So,
L = μ + .8416*σ = 11.5 + .8416*4.3= 15.1189 minutes.
Exercise 5.2.13. The weight X of babies (of a fixed age) is normally distributed with with mean μ = 212 oz and standard deviation σ = 25 oz. Doctors would be concerned (not necessarily alarmed) if a baby is among the lower 5 percent in weight. Find the cut-off weight L below which the doctors will be concerned.
Solution by TI-84:
Here the mean μ = 212 and standard deviation σ = 25
Let X = the weight of the babies of this age group.
Then X is N(212, 25) -random variable. We use TI-84 :
Let U = the cut-off length below which doctors will be concerned.
P(X < U)= .05
P([X - μ]/σ < [U - μ]/σ ) = .05
P( Z < [U- μ]/σ) = .05 Also
P(Z < - 1.6449) =.05
[Because From TI-84, invNorm(.05) = - 1.6449]
[U - μ]/σ = - 1.6449) [By comparing the above two equations]
So,
U = μ + (- 1.6449)*σ = 212 + (- 1.6449)*25= 170.8775 oz.
Exercise 5.2.14. Let X be a normal random variable with mean μ = 4 and standard deviation σ = 2.5. Find the values of c from the following equations.
Solution by TI-84:
Here the mean μ = 4 and standard deviation σ = 2.5.
We use TI-84 :
Exercise 5.2.15. Monthly water consumption
X per household, in a subdivision in Kansas City, has normal distribution
with mean 15000 gallons and standard deviation 3000 gallons. It has
been decided that a surcharge will be imposed for those in the top 25
percent. Find the cut-off consumption U in gallons.
Solution:
Here the mean μ = 15000 and standard deviation σ = 3000
U=Cut-off
P(U ≤ X) = .25
1 - P(X ≤ U) = .75
P(X ≤ U) =.75
P([X - μ]/σ ≤ [U - μ]/σ) = .75
P(Z ≤ [U - μ]/σ) = .75
P(Z ≤ [U - μ]/σ) = .75
P(Z ≤ [U - μ]/σ) = .75
P(Z ≤ [U - μ]/σ) = .75
P(Z ≤ [U - μ]/σ) = .75
P(Z ≤ [U - μ]/σ) = .75
P(Z ≤ [U - μ]/σ) = .75
Also
P(Z ≤ .6745) = .75
because invNormal(.75) = .6745
Therefore, comparing
[U - μ]/σ = .6745
l = μ +σ*(.6745) = 15000 + 3000*(.6745) = 17023.5
Like many random variables in nature, Binomial(n,p)-random variables also behave approximately like Normal random variables. A reexamination of the graph of the probability function of the Binomial(n,p)-random variables demonstrates that the graph has properties similar to that of Normal:
Animation 5.3.1 |
Difficulty to deal with Binomial random variables with large number of trials n was demonstrated by Exercise 4.3.10, at the end of Lesson 4. To approximate Binomial probability, Normal probability distribution will be used as in the next theorem.
μ = np and standard deviation σ = √ np(1-p).
For r=0,1,…,n probability is approximated as follows:
P(X = r) = P(r-0.5 < X < r + .5)
≈ P(L < Z
< R)
where L=(r-0.5-μ)/σ
and R=(r+0.5-μ)/σ.
( As before, Z denotes the standard normal random variable. )
More generally, for r,s=0,1,…,n
P(r ≤ X ≤ s) = P(r-0.5 < X < s +
.5) ≈ P(L < Z < R)
where L=(r-0.5-μ)/σ
and R=(s+0.5-μ)/σ.
Remark. This adjustment by .5 on two sides is called continuity correction. Recall P(Y=r) =0 for any continuous random variable Y. Because of this, we could not have treated, naively, X as Normal, without continuity "correction". On the other hand, when n is very large, it would not make any significant difference.
Problems on 5.3: Normal Approximation to Binomial
Exercise 5.3.1. A Lawrence bank knows that
35 percent of its customers will visit the drive-through window. If
400 customers visit the bank, what is the approximate probability that
more than 120 will visit the drive-through window?
Solution:
Here p=.35 and n=400.
First step is to compute the mean μ and the standard deviation σ :
μ = np= 400*.35 = 140 and
σ = √np(1-p)
= √
400*.35*(1-.35)
= 9.5394
Let X = number of customers who will visit the drive-through window.
Then X is B(400, .35) random variable and we will use N(140, 9.5394) to approximate.
Now "X is more than 120" means "X > 120", which is "X ≥121".
P(121 ≤ X) = P(120.5 ≤ X)
[This is continuity correction.]
= P([120.5 - μ]/σ < [X - μ]/σ )
≈ P([120.5 - μ]/σ < Z )
= P([120.5 - 140]/9.5394 < Z )
= P (-2.0442 < Z ) = normalcdf(-2.0442, 5)= .9795
Exercise 5.3.2. It is known that the probability that a household owns a food processor is 0.1. If 190 households are interviewed, find the approximate probability that
Solution:
Here p=.1 and n=190.
First step is to compute the mean μ and the standard deviation σ :
μ = np= 190*.1=19 and
σ = √np(1-p)
= √
190*.1*(1-.1)
= 4.1352
Let X = number of households who own a food processor.
Then X is B(190, .1) random variable and
we will use N(19, 4.1352) to approximate.
Exercise 5.3.3. The campaign committee of
a candidate claims that sixty percent of the voters are in favor of
the candidate. You interview 150 voters. Assuming that the campaign
committe's claim is accurate, what is the approximate probability that
less than 77 will favor the candidate?
Solution
Solution:
Here p=.6 and n=150.
First step is to compute the mean μ and the standard deviation σ :
μ = np= 150*.6= 90 and
σ = √np(1-p)
= √
150*.6*(1-.6)
= 6
Let X = number of voters in favor of the candidate.
Then X is B(150, .6 ) random variable and
we will use N(90, 6) to approximate.
Now, "X is less than 77" means "X < 77", which is "X ≤ 76".
P(X ≤ 76)= P(X ≤ 76.5)
[This is continuity correction.]
= P([X - μ]/σ < [76.5 - μ]/σ)
≈ P(Z < [76.5 - μ]/σ)
= P(Z < [76.5 -90 ]/6)
= P ( Z < -2.25) = normalcdf(-5, - 2.25) = .0122
Exercise 5.3.4. A technique is used to fertilize eggs in a fertility clinic laboratory. It is known that the probability that an egg will be fertilized by this technique is 0.1. If 500 eggs are treated, what is the probability that at least 60 eggs will be fertilized?
Solution:
Here p=.1 and n=500.
First step is to compute the mean μ and the standard deviation σ :
μ = np= 500*.1=50 and
σ = √np(1-p)
= √
500*.1*(1-.1)
= 6.7082
Let X = number of egg will be fertilize.
Then X is B(500, .1) random variable and
we will use N(50, 6.7082) to approximate.
Now "X is at least 60" means "X ≥ 60".
P(60 ≤ X) = P(59.5 ≤ X)
[This is continuity correction.]
= P([59.5 - μ]/σ < [X - μ]/σ )
≈ P([59.5 - μ]/σ < Z )
= P([59.5 - 50]/6.7082 < Z )
= P (1.4162, < Z ) = normalcdf(1.4162, 5)= .0784
Exercise 5.3.5. The probability that a computer chip produced in a factory is defective is is .2. If you have a sample of 60 chips, what is the probability that the number of defective chips will be less than 20?
Solution:
Here p=.2 and n=60.
First step is to compute the mean μ and the standard deviation σ :
μ = np= 60*.2= 12 and
σ = √np(1-p)
= √
60*.2*(1-.2)
= 3.0984
Let X = number of defective chips.
Then X is B(60, .2 ) random variable and
we will use N(60, 3.0984) to approximate.
Now, "X is less than 20" means "X < 20", which is "X ≤ 19".
P(X ≤ 19)= P(X ≤ 19.5)
[This is continuity correction.]
= P([X - μ]/σ < [19.5 - μ]/σ)
≈ P(Z < [19.5 - μ]/σ)
= P(Z < [19.5 -12 ]/3.0984)
= P ( Z < 2.4206) = normalcdf(-5, 2.4206) = .9923
Exercise 5.3.6. The probability that a light bulb produced by a machine is defective is p = 0.2. Suppose a quality control inspector takes a sample of 120 bulbs. What is the probability that more than 30 bulbs will be defective?
Solution:
Here p=.2 and n=120.
First step is to compute the mean μ and the standard deviation σ :
μ = np= 120*.2 = 24 and
σ = √np(1-p)
= √
120*.2*(1-.8)
= 4.3818
Let X = number of defective bulbs.
Then X is B(120, .2) random variable and
we will use N(24, 4.3818) to approximate.
Now "X is more than 30" means "X > 30", which is "X ≥ 31".
P(31 ≤ X) = P(30.5 ≤ X)
[This is continuity correction.]
= P([30.5 - μ]/σ < [X - μ]/σ )
≈ P([30.5 - μ]/σ < Z )
= P([30.5 - 24]/4.3818 < Z )
= P (1.4834, < Z ) = normalcdf(1.4834, 5)= .0694
Exercise 5.3.7. Suppose the probability that a student has access to the Internet is p = 0.8. Suppose you interview 160 students. What is the probability that less than 120 students will have access to the Internet?
Exercise 5.3.8. Suppose that the probability that a person favors medical use of marijuana is p = 0.6. If 780 individuals are interviewed, what is the probability that less than 450 will be in favor?
Exercise 5.3.9. Suppose that the probability that a middle-income family invests in the stock market is p = 0.8. If we interview 880 middle-income families, what is the probability that more than 700 have invested in the stock market?
Exercise 5.3.10. Suppose that an insurance company knows from experience that the probability that a life-insurance policyholder will survive another 10 years is p = 0.9. The company has 2280 policyholders. What is the probability that more than 2025 will survive another 10 years.