In Lesson 1, a variable was defined as something that varies or changes value. Most often, we consider numerical variables.
Definitions. A randon variable X is associates a numerical value X(w) to each member w in the sample space (i.e. to each outcome w).
Normally, X(w) represents a characteristic of w.
For example, if the sample space S represents the fish population of the Clinton Lake, then the weight X of the fish population will
be a random variable.
In that case, X(w) will be the weight of w. Similarly, Y = length of the fish population will also be a random variable.
Animation 4.0.1
A random variable X represents population data
(not sample data).
We discussed bar graphs and histograms in lesson 2.
In nature, shape of the bar graphs and histograms of almost any kind of
numerical data would resemble a bell-shaped pattern.
A perfect such bell shape is shown in the following diagram:
Animation 4.0.2: A Bell Shape
Example.
Animation 4.0.3
You can imagine that the histogram roughly
fits under a bell shape curve.
It may not be a great fit, depending upon your expectation.
Example.
Animation 4.0.4
You can imagine that the histogram roughly
fits under a bell shape curve.
Example.
Animation 4.0.5
You can imagine that the histogram roughly
fits under a bell shape curve.
This one fits better.
Normally, it works better when data size is large.
Example.
Animation 4.0.6
You can imagine that the histogram roughly fits under a bell shape curve.
This one fits nice.
Normally, it works better when data size is large.
4.1 Normal Random Variable
As the above examples of histograms show, all of them fit under a bell shape curve. Clearly, some of them fit better than the others. A random variable X, whose histogram fits PERFECTLY under a bell shape curve is called a Normal Random Variable. Here we mean that the histogram of the population data (not sample data) must fit into a perfect bell shape curve. The word "Normal" is used because virtually all the random variables we normally encounter in nature fit under a perfect bell shape curve. If the histogram fits approximately, then the random variable can be called approximately a normal random variable.
We formally provide these definitions as follows:
Definition. A data set or a
random variable
X is said to be normally distributed if
the bar graph or the histogram fits into a perfect bell-shaped curve.
We also say that X has normal distribution or X is a normal random
variable. Also, the corresponding bell-shaped curve is called the
normal curve of X (or of the data set).
The corresponding bell curve is called the
probability density function (pdf) of X.
If the bar graph or the histogram fits approximately into a perfect
bell-shaped curve, then we say that the data set (or the variable)
is approximately normally distributed.
Definition. Such a perfect bell-shaped
curve as above is called a normal curve.
Such a curve is given in the Animation 4.0.2 above.
Remark. As was remarked above,
it is often the case in nature that most data sets
fit into such a normal curve,
at least approximately, even if not perfectly.
Properties of Normal Random Variables
A random variable X represents the population data (for the corresponding characteristic). Like any other data set, as in Lesson 2,
X has a mean μ and a standard deviation σ.
Now suppose X is a normal random variable with mean μ
and standard deviation σ. Then,
The bell curve of X is symmetric around the vertical line x= μ.
This line is also called the line of symmetry.
The two halves of the normal curve on two sides of
this line of symmetry are the mirror images of each other.
In fact, the standard deviation σ is the distance from the line of symmetry to the point where the concavity of the curve changes.
The bell curve has a formula that we will not give here. However, the formula is fully determined by the mean μ
and the standard deviation σ
Shape of the Bell curve: If
the standard deviation σ is large, then the normal curve is flat.
If σ is small, then the normal curve is tall and stiff.
All of the above properties are demonstrated by the following animation.
Animation 4.1.1
Move the pink sliders to change the standard deviation and the mean.
Observe, the line of symmetry coincides
with the vertical line x =μ.
Observe, the purple line segment connects the line of symmetry and the point on the curve, where the concavity changes. Length of this line is the standard deviation σ.
Observe, curve becomes flat, if standard deviation σ increases and conversely.
When the mean μ=0 and standard deviation σ = 1, then X
will be called a standard normal variable.
The standard normal random variable is usuallay denoted by Z.
Theorem. We also have Z = (X-μ)/σ
is a standard normal random variable.
The process of converting X to Z is called the standardization of
X.
If X = x is the value of X,
then
z = (x-μ)/σ
is called the standardized value of X.
We also call it the z-value of x.
Definition. If x and y are two values
of X (with x < y), then we say that x and y are
(y-x)/σ - standard deviation apart
from each other. We also say that there is
(y-x)/σ - standard deviation distance
between x an y is (y-x)/σ.
Remark: In statistics , the standard deviation is used as an unit.
Information like "the birth weight of two babies differ by a pound"
is of little value to a statistician. A statistician would like to know how many standard deviation apart are the birth weights of these two babies.
The 68-95-99.7 Rule
Later in this lesson, we will use TI-84 to compute probability for normal random variable. Before we do that we want to work with 68-95-99.7 Rule that computes probability (in percent) for more basic situations.
Suppose that a data set or a random variable X has normal distribution.
Then we have following facts:
The data whose standardized values are between
-1 and 1 represent 68 percent of the data.
In other words, P(-1 <
Z < 1) = 0.68.
The same is expressed by saying that 68 percent of the data
values are within one standard deviation σ
from the mean μ or between μ - σ and
μ + σ.
It also follows that the probability that z-value of a data is not
between -1 and 1 is 1-0.68= 0.32.
The data whose standardized values are between
-2 and 2 represent 95 percent of the data.
In other words,
P(-2 < Z < 2) = 0.95.
So the probability that z-value of a data is not
between -2 and 2 is 1 -0.95 = 0.05.
The data whose standardized values are between
-3 and 3 represent 99.7 percent of the data.
In other words,
P(-3 < Z < 3) = 0.997.
So the probability that z-value of a data is not
between -3 and 3 is equal to 1-0.997= 0.003.
Examine the flash animation on 68-95-99.7-Rule
Animation 4.1.2
In addition to the 68-95-99.7-Rule, you must use the symmetry
and other facts to compute
probability. For example
To compute probability, first we must standardize and reduce
it to a z-problem.
The total area under the normal curve above x-axis is
1.
Half the data are bigger than μ and
the other half is smaller than μ. So, we write
P( X > μ) = 0.5 or 50 percent P( X < μ) = 0.5.
or 50 percent.
Therefore, for the standard normal variable Z, we have
P( Z > 0) = 0.5
or 50 percent
P( Z < 0) = 0.5 or 50 percent.
We can use the symmetry in many other ways.
For example : P(0 < Z <1) = P(-1< Z < 1)/2
= 0.68/2 = 0.34.
Problems on 4.1: On standard deviation distance
Exercise 4.1.1.
The annual income X of a population has
mean μ = $42,000.00 and standard deviation σ = $12,000.00.
It is assumed that X is normally distributed.
The annual salary of a math GTA is $18,000, annual salary of a math professor
is $88,000, annual salary of a math distinguished professor (DP)is $150,000
and the annual salary of the dean is $250,000.
The distribution of annual income X is assumed to be normal.
How many standard deviations
apart are the annual salary of the professor and the GTA?
How many standard deviations apart are the annual salary
of the professor and the DP?
How many standard deviations apart are the annual salary of the GTA and the DP?
How many standard deviations apart are the annual salary of the dean and the DP?
How many standard deviations higher is the annual salary of the dean
from the mean?
Solution:
The standard deviations distance between the annual salary of the professor and the GTA is
= (y-x)/σ = (88,000-18000)/12,000 = 5.8333 standard deviations.
The standard deviations distance between the annual salary of the professor and the DP is = (y-x)/σ = (150,000-88,000)/12,000 = 5.1667 standard deviations.
The standard deviations distance between the annual salary of the DP
and the GTA is = (y-x)/σ = (150,000-18000)/12,000 = 11 standard deviations.
The standard deviations distance between the annual salary of the DP and
the dean is = (y-x)/σ = (250,000-150,000)/12,000 = 8.3333 standard deviations.
The standard deviations distance between the annual salary of dean from the mean μ is = (y-x)/σ = (250,000- 42,000)/12,000 = 17.3333 standard deviations.
Exercise 4.1.2.
The weight X of salmon in a river has a normal
distribution with mean μ = 23 pounds and
standard deviation σ = 8 pounds.
The weight of a salmon you caught recently is 18 pounds.
The weight of a salmon that your friend caught is 30 pounds.
How many standard deviation apart are the these two fish?
How many standard deviation higher is the fish your friend caught than the mean?
How many standard deviation lower is your fish than the mean?
Solution:
The standard deviations distance between two fish is
= (y-x)/σ = (30-18)/8 = 1.5 standard deviations.
The standard deviations distance of the fish of your friend from the mean is
= (y-x)/σ = (30-23)/8 = .875 standard deviations.
The standard deviations distance of your fish from the mean is
= (y-x)/σ = (23-18)/8 = .625 standard deviations.
Exercise 4.1.3.
The birth weight X babies in a county has a normal
distribution with mean μ = 114 oz pounds and
standard deviation σ = 18 oz pounds.
The birth weight of a pair of twins (Anita and April) are 130 oz and 99 oz.
Due to the racial and economic differences, the mean birth weight of babies in the north and south side of the county differ.
In the south side the mean birth weight is 106 oz and that in the north side of the county is 124 oz.
How many standard deviation apart are the these two twins?
How many standard deviation apart are the mean birth weight of babies in the two sides of the county?
How many standard deviation higher is Anita from the south side mean?
How many standard deviation lower is April from the north side mean?
Solution:
The standard deviations distance between the pair of twins is
= (y-x)/σ = (130-99)/18 = 1.72222 standard deviations.
The standard deviations distance of the mean birth weight in north ans south is
= (y-x)/σ = (124 - 106)/18 = 1 standard deviation.
The standard deviations distance between Anita and south side mean is
= (y-x)/σ = (130 - 106)/18 = 1.3333 standard deviations.
The standard deviations distance between April and north side mean is
= (y-x)/σ = (124-99)/18 = 1.3889 standard deviations.
Exercise 4.1.4.
The time X taken by a student in a Math 115 to
complete a homework set has a normal
distribution with mean μ = 63 minutes and
standard deviation σ = 17 minutes.
To finish the last homework Jeff took 55 minutes and his friend (Albert) took 76 minutes.
Also, the mean time taken by the sections of Jeff and Albert differ.
For Jeff's section, the mean time taken was 60 minutes and that of Albert's section was 73 minutes.
How many standard deviations apart are Jeff and Albert, for the last homework?
How many standard deviations apart are the mean time for the two sections?
How many standard deviations apart is Jeff from the mean of Albert's section?
How many standard deviations apart is Albert from the mean of Jeff's section?
Solution:
The standard deviation distance between Jeff and Albert is
= (y-x)/σ = (76 - 55)/17 = 1.2353 standard deviations.
The standard deviations distance of the mean two sections is
= (y-x)/σ = (73 - 60)/17 = .7647 standard deviation.
The standard deviations distance between Jeff and the mean of Albert's section is
= (y-x)/σ = (73 - 55)/17 = 1.0588 standard deviations.
The standard deviations distance between Albert and the mean of Jeff's section is
= (y-x)/σ = (76 - 60)/17 = .9411 standard deviations.
Problems on 4.1: On 68-95-99.7 rule
Exercise 4.1.5.
The annual income X of a population has
mean μ = $42,000.00 and standard deviation σ = $12,000.00.
It is assumed that X is normally distributed.
What proportion of the population earn between $30,000 and $42,000annually?
What proportion of the population earn between $30,000 and $54,000 annually?
What proportion of the population earn more than $30,000 annually?
What proportion of the population earn less than $18,000 annually?
Solution:
P(30,000 < X < 42,000)
= P((30,000 - μ)/σ < (X - μ)/σ < (42,000- μ)/σ)
=
P((30,000 - μ)/σ < Z < (42,000- μ)/σ)
=
P((30,000 - 42,000)/12,000 < Z < (42,000 - 42,000)/12,000)
=
P(-1 < Z < 0)
= P(-1 < Z < 1)/2 = .68/2 = .34
or 34 percent.
P(30,000 < X < 54,000)
= P((30,000 - μ)/σ < (X - μ)/σ < (54,000- μ)/σ)
=
P((30,000 - μ)/σ < Z < (54,000- μ)/σ)
=
P((30,000 - 42,000)/12,000 < Z < (54,000 - 42,000)/12,000)
=
P(-1 < Z < 2) = P(-1 < Z < 0) + P(0 < Z < 2)
=
P(-1 < Z < 1)/2 + P(2 < Z < 2)/2
= .68/2 + .95/2 = .815
or 81.5 percent.
Exercise 4.1.6.
The weight X of salmon in a river has a normal
distribution with mean μ = 23 pounds and
standard deviation σ = 8 pounds. You hooked a fish.
What is the probability that the fish will be between 23 lbs
and 31 lbs?
What is the probability that the fish will be between 7 lbs
and 31 lbs?
What is the probability that the fish will be less than 31 lbs?
What is the probability that the fish will be more than 39 lbs?
Solution:
P(23 < X <31)
=
P((23 - μ)/σ < (X - μ)/σ < (31 - μ)/σ)
=
P((23 - μ)/σ < Z < (31 - μ)/σ)
=
P((23 - μ)/σ < Z < (31 - μ)/σ)
=
P((23 - 23)/8 < Z < (31 - 23)/8)
=
P(0 < Z < 1)
=
P(-1 < Z < 1)/2
= .68/2 = .34
or 34 percent.
P(7 < X <31)
=
P((7 - μ)/σ < (X - μ)/σ < (31 - μ)/σ)
=
P((7 - μ)/σ < Z < (31 - μ)/σ)
=
P((7 - μ)/σ < Z < (31 - μ)/σ)
=
P((7 - 23)/8 < Z < (31 - 23)/8)
=
P(-2 < Z < 1)
=
P(-2 < Z < 0) + P(0 < Z < 1)
=
P(-2 < Z < 2)/2 + P(-1 < Z < 1)/2
= .95/2 + .68/2 = .815
or 81.5 percent.
Exercise 4.1.8.
The time X taken by a student in a Math 115 to
complete a homework set has a normal
distribution with mean μ = 63 minutes pounds and
standard deviation σ = 17 minutes.
What proportion of students will take between 80 minutes and 97 minutes?
What proportion of students will take between 46 minutes and 97 minutes?
What proportion of students will take less than 46 minutes?
What proportion of students will take less than 29 minutes?
Solution:
P(80 < X < 97)
= P((80 - μ)/σ < (X - μ)/σ < (97 - μ)/σ)
=
P((80 - μ)/σ < Z < (97 - μ)/σ)
=
P((80 - 63)/17 < Z < (97 - 63)/17)
=
P(1 < Z < 2)
= P(0 < Z < 2) - P(0 < Z < 1)
= .95/2 - .68/2 = .135
or 13.5 percent.
P(46 < X < 97)
= P((46 - μ)/σ < (X - μ)/σ < (97 - μ)/σ)
=
P((46 - μ)/σ < Z < (97 - μ)/σ)
=
P((46 - 63)/17 < Z < (97 - 63)/17)
=
P(-1 < Z < 2)
= P(-1 < Z < 0) + P(0 < Z < 2)
= .68/2 + .95/2= .815
or 81.5 percent.
The most commonly encountered random variable in nature
and in real life is the
normal random variable. Throughout this
section we will assume that X is a normal random variable with mean μ and standard deviation σ. As was stated above, the mean μ and
the standard deviation σ
determines the bell curve of X completely.
Probability P(a ≤ X ≤ b) = P(a < X < b) is give by the area under the bell curve, above the x-axis; between the vertical lines x= a and x=b.
The following animation demonstrates the same:
Animation 4.2.1
A variety of software (e.g. SAS, MINITAB) are available to compute probability for normal random variables. However, reducing a probability problem for X to a problem on standard normal probability is a very convenient way to approach and is almost universally used. This method of reduction is called standardization.
Recall that a normal random variable Z
with mean μ = 0 and standard
deviation σ = 1
(i.e. a N(0,1)-random variable)
is called a Standard Normal Random Variable.
The notation Z is used fairly consistently
to denote a
standard normal random variable. Following are some comments about
imortance, properties and usage of N(0, 1) random variables.
Reinterpreting the above properties of normal random variables,
the
graph of the pdf y = f(x) of the standard random
variable Z is symmetric
around the y-axis.
Since the total area under the graph and
above the x-axis is one,
on each side of the y-axis the corresponding
area is .5.
Following is another version of the above animations,
for the standard normal random variable, to compute probability.
Animation 4.2.2
Probability tables for the
standard normal variable Z are commonly used
to compute probability, in college level statistics courses.
Such tables for Z would be available in any standard textbook or Internet.
Because of the availability of simple and sophisticated tools
(like TI-84, Excel, SAS, Minitab)
the usage in tables has become outdated.
In this course, TI-84 will
be used
compute probability.
The distribution of the standard normal random variable Z is the most
basic ingredient used
to deal with problems on any normal random variable X.
( In turn,
normal random variables are most basic in nature and real life.)
Any problem on normal random variables X, is reduced to a problem on standard normal random variable Z, using the theorem in section 4.1.
This process of reduction is called
standardization.
The following theorem makes this more precise.
Method of Standardization:
As above, X is a normal random variable with mean μ and standard deviation σ.
Write Z = [(X-μ)/(σ)]. Then
P(a < X < b) = P(A < Z
< B) = normalcdf(A,B)
where A= (a-μ)/σ,
B= (b-μ)/σ
and
TI-84 "Distr"-menu provides the function
normalcdf(A,B).
Ubiquity of Normal Random Variables:
Any
random variable that we encounter in nature is, almost certainly,
either
normal or approximately normal.
Problem Solving:
There are two steps involved in solving problems in this Lesson:
Standardizing the problem to a Z-problem.
Use TI-84 to compute the Z-probability. (
You can use other tools else where, including the animations above.)
Example: Suppose X is a
N(2, .5) random variable.
What is the probability that P(1 < X < 2.5)?
Solution:
Here μ = 2 σ =. 5
P(1 < X < 2.5)
= P([1 - μ]/σ <
[X - μ]/σ <
[2.5 - μ]/σ)
= P([1 - μ]/σ <
Z <
[2.5 - μ]/σ)
= P([1 - μ]/σ <
Z <
[2.5 - μ]/σ)
= P([1 - 2]/.5 <
Z <
[2.5 - 2]/.5)
= P(-2 < Z < 1) = normalcdf(-2, 1) = .8186
[by TI-84
function normalcdf. ]
Use of Calculators (TI-84):
Computing Standard Normal (Z) Probability:
To compute P(-2 ≤ Z ≤ 1.5)
do the following.
a) Press 2nd and then Distr (VARS)
b) Scroll down to normalcdf and ENTER d) type in "-2, 1.5)" and ENTER.
TI will give the answer.
To compute P(Z ≤ 1.5)
do the following.
a) Press 2nd and then Distr (VARS)
b) Scroll down to normalcdf and ENTER d) type in "-5, 1.5)" and ENTER.
TI will give the answer.
(We used -5 instead of minus infinity. There is no infinity key in TI and this will be precise enough. You can check
tha P(-5 < Z < 5) = normalcdf(-5,5) ≈ 1.)
To compute P(1.53 ≤ Z)
do the following.
a) Press 2nd and then Distr (VARS)
b) Scroll down to normalcdf and ENTER d) type in "1.53, 5)" and ENTER.
TI will give the answer.
(We used 5 instead of infinity.This will be precise enough.)
It will not make any difference, if "≤" is replaced by "<".
Problems on 4.2: the Normal Random Variable
Exercise 4.2.1. Let Z be the standard normal
random variable.
Find the probability P(-1.1 < Z < 2.5).
Find the probability P(Z < -2.1).
Find the probability P(-2.1 < Z < -1.5).
Find the probability P(1.5 < Z).
Solution by TI-84:
This is already standard normal. Therefore, not standardization is needed.
We use TI-84 :
P(-1.1 < Z < 2.5)=normalcdf(-1.1, 2.5)=8581
P(Z < -2.1)= normalcdf(-5, -2.1) = .01786
P(-2.1 < Z < -1.5) = normalcdf(-2.1, -1.5) = .0489
P(1.5 < Z) = normalcdf(1.5, 5) = .0668
You can also experiment with the Animation 4.2.2.
Exercise 4.2.2. Let X be a normal random
variable with mean μ = 3 and standard
deviation σ = 1.5 .
Find the probability P(-1.1 < X < 2.5).
Find the probability P(X < -2.1).
Find the probability P(-1.2 < X < -0.5).
Find the probability P(1.5 < X).
Solution by TI-84:
Here the mean μ = 3 and standard deviation σ = 1.5
X is N(3, 1.5) -random variable. We use TI-84 :
Exercise 4.2.3. The length of life of some
light bulbs produced in a factory is normally distributed with mean
8640 hours and standard deviation 1440 hours. Find the probability that
a bulb will last
less than 5040 hours;
between 5040 hours and 8640 hours.
Solution by TI-84:
Here the mean μ = 8640 and standard deviation σ = 1440.
Let X = length of life of light bulbs produced in the factory.
Then X is N(8640, 1440) -random variable. We use TI-84 :
P(X < 5040)
= P([X - μ]/σ < [5040 - μ]/σ)
= P( Z < [5040 - μ]/σ)
= P( Z < [5040 - 8640]/1440;)
= P (Z < - 2.5) = normalcdf(-5, - 2.5) = .0062
P(5040 < X < 8640)
= P([5040 - μ]/σ < [X - μ]/σ < [8640 - μ]/σ)
= P([5040 - μ]/σ < Z < [8640 - μ]/σ)
= P([5040 - 8640]/1440 < Z < [8640 - 8640]/1440)
= P (-2.5 < Z < 0) = normalcdf(-2.5, 0) = 0.4939
Exercise 4.2.4. The length X of a fish in
a lake has normal distribution with mean 67 cm and standard deviation
21 cm. What proportion (i.e, probability) of fish are between 44 cm
and 110 cm long?
Solution by TI-84:
Here the mean μ = 67 and standard deviation σ = 21.
Let X = length of fish in the lake.
Then X is N(67, 21) -random variable. We use TI-84 :
P(44 < X < 110)
= P([44 - μ]/σ < [X - μ]/σ < [110 - μ]/σ)
= P([44 - μ]/σ < Z < [110 - μ]/σ)
= P([44 - 67]/21 < Z < [110 - 67]/21)
= P (-1.0952 < Z < 2.0476) = normalcdf(-1.0952, 2.0476) = 0.8430
If the answer is asked in percent, it would be 84.30 percent.
Exercise 4.2.5. The diameter of the pumpkins
in my patch has normal distribution with mean 13 inches and standard
deviation 4.5 inches. What proportion (i.e., probability) of pumpkins
is above 22 inches?
Solution by TI-84:
Here the mean μ = 13 and standard deviation σ = 4.5
Let X = diameter of the pumpkins.
Then X is N(13, 4.5) -random variable. We use TI-84 :
P(22 < X )
= P([22 - μ]/σ < [X - μ]/σ )
= P([22 - μ]/σ < Z )
= P([22 - 13]/4.5 < Z )
= P (2 < Z ) = normalcdf(2, 5)= .0227
Since the answer is asked in percent, it would be 2.27 percent.
Exercise 4.2.6. The annual expenditure X
of a student is approximately normally distributed with mean μ
= 11,000 dollars and standard deviation σ
= 1500 dollars. What percent of students spend less than 10,000 dollars?
Solution by TI-84:
Here the mean μ = 11,000 and standard deviation σ = 1500
Let X = annual expenditure of students.
Then X is N(11000, 1500) -random variable. We use TI-84 :
P(X < 10000)
= P([X - μ]/σ < [10000 - μ]/σ)
= P(Z < [10000 - μ]/σ)
= P(Z < [10000 - 11000]/1500)
= P ( Z < - .6667) = normalcdf(-5, -.6667) = .2525
Since the answer is asked in percent, it would be 25.25 percent.
Exercise 4.2.7. Suppose the annual production
X of milk by cows in a farm is normally distributed with
μ
= 5500 liters and standard deviation σ
= 150 liters. What percent of cows have annual yield less than 5155
liters?
Solution by TI-84:
Here the mean μ = 5500 and standard deviation σ = 150
Let X = annual production by cows in the farm.
Then X is N(5500, 150) -random variable. We use TI-84 :
P(X < 5155)
= P([X - μ]/σ < [5155 - μ]/σ)
= P(Z < [5155 - μ]/σ)
= P(Z < [5155 - 5500]/150)
= P ( Z < - 2.3) = normalcdf(-5, -2.3) = .0107
Since the answer is asked in percent, it would be 1.07 percent.
Exercise 4.2.8. The amount of vegetable oil
X produced by a machine in a day is normally distributed with μ
= 130 liters and standard deviation σ
= 25 liters. What is the probability that a machine will produce between
120 liters and 150 liters on a day?
Solution by TI-84:
Here the mean μ = 130 and standard deviation σ = 25
Let X = vegetable oil produced by the machines.
Then X is N(130, 25) -random variable. We use TI-84 :
P(120 < X < 150)
= P([120 - μ]/σ < [X - μ]/σ < [150 - μ]/σ)
= P([120 - μ]/σ < Z < [150 - μ]/σ)
= P([120 - 130]/25 < Z < [150 - 130]/25)
= P (-.4 < Z < .8) = normalcdf(-1.0952, 2.0476) = 0.4436
Since the answer is asked in percent, it would be 44.36 percent.
Exercise 4.2.9. The weight X at birth of
babies is normally distributed with mean μ
= 114 oz and standard deviation σ =
18 oz. What percent of babies will have birth weight below 141 oz?
Solution by TI-84:
Here the mean μ = 114 and standard deviation σ = 18
Let X = vegetable oil produced by the machines.
Then X is N(114, 18) -random variable. We use TI-84 :
P(X < 141)
= P([X - μ]/σ < [114 - μ]/σ)
= P(Z < [141 - μ]/σ)
= P(Z < [141 - 114]/18)
= P ( Z < 1.5) = normalcdf(-5, 1.5) = .9332
Since the answer is asked in percent, it would be 93.32 percent.
4.3 Percentiles and Quartiles
Percentiles and Quartiles of a data set or
of a random variable X provide a serious amount of information regarding
the distribution of the same. They also have a wide range of usage. In particular, results of various tests (e.g. SAT, ACT) are given in percentiles.
Definition.
Given a data set and a number p between 0 and 100, pth
percentile is a number xp
such that at least p-percent data is below xp and
at least 100-p percent data is
above xp.
In this section, we assume that the data set is represented by a
normal random variable X with mean μ and standard deviation σ.
In this case, pth percentile xp of X is given by the equation
P(X < xp) = p/100.
Twenty-fifth percentile x25 is
called the First Quartile,
denoted by Q1.
Fiftieth percentile x50 is
called the Second Quartile,
denoted by Q2.
Fiftieth percentile is the median of X. In our case of normal random variable,
the mean μ divides the data in to two equal halves. So,
Q2 = μ is the mean.
Seventy-fifth percentile x75 is
called the Third Quartile,
denoted by Q3.
We remark that the three quartiles divide the whole data set in to
four equal quarters.
Remark.
Pediatricians have charts of percentile weight for babies.
These charts have age on the horizontal axis and weight on the vertical axis.
Several graphs are given in the chart, representing various percentiles (I guess, one for each 5 percent). So, there may be one graph for each of 5th, 10th and up to 95th or 100th percentile.
Computations with TI-84
We can use the "invNorm" of TI-84 to compute percentiles. As before Z denotes the standard normal random variable. Then,
for 0 < q < 1, invNorm function of TI-84 is given by the equation
P(Z < invNorm(q)) =q.
Animation 4.3.1
This is a demo of invNormal function of TI-84
Suppose X is a normal random variable with mean μ and standatd deviation
σ. For 0 < p < 100, we have the following:
The p-th percentile
xp = μ + invNorm(p/100) *σ.
The first quartile
Q1 = μ - .6744*σ.
The second quartile
Q2 = μ.
The third quartile
Q3 = μ + .6744*σ.
Use of Calculators (TI-84):
Inverse Probability for Standard Normal (Z):
TI-84 has an "invNorm" function.
invNorm(.90)= 1.2816 means P(Z < 1.2816) =.90.
To compute the variable value (or Z-value) U such that
P(Z < U) = .75
do the following.
a) Press 2nd and then Distr (VARS)
b) Scroll down to invNorm and ENTER d) type in ".75)" and ENTER.
TI will give the answer.
Problems on 4.3: Percentiles and Quartiles
Exercise 4.3.1.
The annual income X in a county is normally distributed with mean μ= $37,000 and standard deviation $15,000.
Compute the 90th percentile income.
Compute the 5th percentile income.
Compute the first qualtile Q1.
Compute the second qualtile Q2.
Compute the third qualtile Q3.
Solution by TI-84:
We have
x90 = μ + invNorm(.90) *σ
=μ + 1.2816 *σ
=37,000 + 1.2816 *15,000
= $56224
We have
x5 = μ + invNorm(.05) *σ
=μ -1.6449 *σ
=37,000 -1.6449 *15,000
= $12327.50
The first quartile
Q1 = μ - .6744*σ
= 37000 -.6744*15000 = $26,884
The second quartile
Q2 = μ = $37,000.
The third quartile
Q3 = μ + .6744*σ
=
37000 + .6744*15000 = $47116
Exercise 4.3.2. The length X of a fish in a lake has normal distribution with mean 67 cm and standard deviation 21 cm.
Compute the 67th percentile lenght.
Compute the 10th percentile lenght.
Compute the first qualtile Q1.
Compute the second qualtile Q2.
Compute the third qualtile Q3.
Solution by TI-84:
We have
x67 = μ + invNorm(.67) *σ
=μ + .4399 *σ
=67 + .4399*21
= 76.2379 cm
We have
x10 = μ + invNorm(.10) *σ
=μ -1.2816 *σ
=67 -1.2816*21
= 40.0864 cm
The first quartile
Q1 = μ - .6744*σ
= 67 -.6744*21 = 52.8376 cm
The second quartile
Q2 = μ = 67 cm.
The third quartile
Q3 = μ + .6744*σ
=
67 + .6744*21 = 81.1624 cm
Exercise 4.3.3. The telephone company's
data shows that length X of their international calls has normal distribution
with mean 11.5 minutes and standard deviation 4.3 minutes.
Compute the 80th percentile lenght of calls.
Compute the 33rd percentile lenght of calls.
Compute the first qualtile Q1.
Compute the second qualtile Q2.
Compute the third qualtile Q3.
Solution by TI-84:
Here μ = 11.5 and σ = 4.3 minutes.
We have
x80 = μ + invNorm(.80) *σ
=μ + .8416 *σ
= 11.5 + .8416*4.3
= 15.1189 minutes.
We have
x33 = μ + invNorm(.33) *σ
=μ - .4399 *σ
= 11.5 - .4399*4.3
= 9.6084 minutes.
The first quartile
Q1 = μ - .6744*σ
= 11.5 -.6744*4.3 = 8.6001 minutes
The second quartile
Q2 = μ = 11.5 minutes.
The third quartile
Q3 = μ + .6744*σ
=
11.5 + .6744*4.3 = 14.3999 minutes.
Exercise 4.3.4. The weight X of babies (of
a fixed age) is normally distributed with with mean μ
= 212 oz and standard deviation σ =
25 oz.
Compute the 95th percentile weight.
Compute the 5th percentile weight.
Compute the first qualtile Q1.
Compute the second qualtile Q2.
Compute the third qualtile Q3.
Solution by TI-84:
Here the mean μ = 212 and standard deviation σ = 25
We have
x95 = μ + invNorm(.95) *σ
=μ + 1.6449 *σ
= 212 + 1.6449*25
= 253.1225 oz.
We have
x5 = μ + invNorm(.05) *σ
=μ + 1.6449 *σ
= 212 - 1.6449*25
= 170.8775 oz.
The first quartile
Q1 = μ - .6744*σ
= 212 -.6744*25 = 195.14 oz.
The second quartile
Q2 = μ = 212 oz.
The third quartile
Q3 = μ - .6744*σ
= 212 +.6744*25 = 288.86 oz.
Exercise 4.3.5. Total score X in Math 105 is assumed to be normal with mean 300 points and standard deviation 60 points.
Compute the 70th percentile score.
Compute the 15 percentile score.
Compute the first qualtile Q1.
Compute the second qualtile Q2.
Compute the third qualtile Q3.
Solution by TI-84:
Here the mean μ = 300 and standard deviation σ = 60
We have
x70 = μ + invNorm(.70) *σ
=μ + .5244 *σ
= 300 + .5244*60
=331.464.
We have
x15 = μ + invNorm(.15) *σ
=μ - 1.0364 *σ
= 300 - 1.0364*60= 237.816.
The first quartile
Q1 = μ - .6744*σ
= 300 -.6744*60 = 259.536
The second quartile
Q2 = μ =300.
The third quartile
Q3 = μ + .6744*σ
= 300 +.6744*60 = 340.464
Exercise 4.3.6. Monthly water consumption
X per household, in a subdivision in Kansas City, has normal distribution
with mean 15000 gallons and standard deviation 3000 gallons.
Compute the 85th percentile consumption.
Compute the 45th percentile consumption.
Compute the first qualtile Q1.
Compute the second qualtile Q2.
Compute the third qualtile Q3.
Solution by TI-84:
Here μ = 15000 and σ = 3000 gallons.
We have
x85 = μ + invNorm(.85) *σ
=μ + 1.0364 *σ
= 15000 + 1.0364 *3000 = 18109.2 gallons.
We have
x45 = μ + invNorm(.45) *σ
=μ - .1257 *σ
= 15000 - .1257*3000 = 14622.9 gallons.
The first quartile
Q1 = μ - .6744*σ
= 15000 -.6744*3000 = 12976.8 gallons.
The second quartile
Q2 = μ = 15000 gallons.
The third quartile
Q3 = μ + .6744*σ
= 15000 +.6744*3000 = 17023.2 gallons.