Math 105, Topics in Mathematics

Lesson 5: Normal Distribution

Introduction Go to top of page

The shape of the bar graphs and histograms of almost any kind of numerical data that we encounter in nature resembles a bell-shaped pattern. A perfect such bell shape is shown in the following diagram:

A bell-shaped curve.
To see a flash animation of the bell shape curve Click here.

Now let us compare some bar charts with this bell-shaped figure.

Example 5.0.1. The following is the frequency table of the scores obtained by a group of students:

Scores 6 7 8 9 10 11 12 13 14 15 16
Frequency 1 2 6 10 16 13 9 8 5 2 1

The following is the bar graph of this data:

A bar graph showing the same score and frequency data as above.

Example 5.0.2. Following is the class frequency table of weight (in pounds) of some babies:

Class Intervals 48-60 60-72 72-84 84-96 96-108 108-120 120-132 132-144 144-156 156-168
Frequency 15 24 41 67 119 184 142 26 5 2

The following is the bar graph of this data:

A bar graph showing the class interval and frequency data above.

Example 5.0.3. The following is the frequency table of hourly wages of a group of workers:

Wages 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Frequencies 3 5 7 4 12 10 13 11 15 13 11 4 3 0 0 0 1

The following is the bar graph of this data :

A bar graph showing the wages and frequencies data cited above.

Example 5.0.4. The following is the percentage frequency table of the weight of a large number of salmon:

weight
(in pounds)
20-
24
25-
29
30-
34
35-
39
40-
44
45-
49
50-
54
55-
59
60-
64
65-
69
70-
74
75-
80
Percentage
Frequencies
1.1 1.9 4.3 8.7 13.2 17.8 17.3 14.6 10.6 6.1 2.8 1.6

The following is the bar graph :

Bar graph showing the weight and percentage frequencies listed above.


You can see that all these bar graphs fit into a nice bell-shaped curve. Some fit into the bell shape better than the other graphs.

Perhaps the most amazing fact in statistics is that bar graphs of almost any kind of numerical data in nature fits nicely into a bell-shaped curve. How well a data set fits into a perfect bell-shaped curve largely depends on the size of the data. If the data set is large, then the bar graph or the histogram almost always fits into a perfect bell-shaped curve.

In this chapter we consider data sets whose bar graph or the histogram fits into a perfect bell-shaped curve. In fact, we talk about population data whose bar graph or histogram fits into a perfect bell-shaped curve. At this point it is helpful to recall the discussion on variables.

5.1 Random VariablesGo to top of page

Recall the following definition of variables.

Definition: A variable is a characteristic of the population members that varies with each member of the population.

We do not distinguish between "variables" and "random variables" (as many textbooks do). I only want to mention that a variable, in the context of random experiments and sample spaces is called a random variable. A random variable X assigns a value to each outcome (and a variable assigns a value to each population member). In other words, here the sample space S takes the place of the population.

Example 5.1.1. Suppose we want to study the KU student population. Our random experiment is to select a KU student randomly. So, the sample space S, in this case, is the KU student population, and each student is an outcome. Now all the variables that we considered in the example of KU student population (in lesson 2) are also random variables. GPA, weight, height, and annual income of KU students are random variables.

5.2. Normal Distribution and Normal Random VariablesGo to top of page

  1. Definition. A data set (or a variable X) is said to be normally distributed if the bar graph or the histogram fits into a perfect bell-shaped curve. We also say that X has normal distribution or X is a normal random variable. Also, the corresponding bell-shaped curve is called the normal curve of X (or of the data set).
  2. If the bar graph or the histogram fits approximately into a perfect bell-shaped curve, then we say that the data set (or the variable) is approximately normally distributed.
  3. Definition. Such a perfect bell-shaped curve as above is called a normal curve. Examine the animated normal curve.
  4. Remark. As we have seen in the introduction, it is often the case in nature that your data set will fit into such a normal curve, approximately, even if not perfectly.

Properties of Normal Curves:

Suppose X is a random variable that has normal distribution. Let us denote the mean (average) value of X by μ and the standard deviation by σ. Following are some properties of the normal curve of X.

  1. Symmetry: The normal curve of X is symmetric about a vertical line through the mean μ on the horizontal axis. This line is also called the line of symmetry. The two halves of the normal curve on two sides of this line of symmetry are the mirror images of each other.
  2. Center: The point where this line of symmetry of the normal curve cuts the horizontal axis is called the center of the normal curve. So, the x-value (or the data value) of the line of symmetry is the center. In fact,

    μ = mean of X = median of X = center.

    We say that the normal curve is symmetric around the mean.

  3. On the Shape of the Normal curve: If the standard deviation σ is large, then the normal curve is flat. If σ is small, then the normal curve is tall and stiff. But, once the value of μ and σ are known, the shape is completely determined. Examine the animation.
  4. Theorem. If X is a normal random variable with mean μ and standard deviation σ, then Z = (X-μ)/σ is also a normal random variable with mean ZERO and standard deviation ONE.
  5. This Z is called the standard normal variable. The process of converting X to Z is called the standardization of X.
  6. Definition. If X = x is the value of X, then

    z = (x-μ)/σ

    is called the standardized value of X. We also call it the z-value of x.

  7. Definition. If x and y are two values of X (with x < y), then we say that x and y are

    (y-x)/σ - standard deviation apart

    from each other.
    Remark: In statistics , the standard deviation is used as an unit. A statement like "the birth weight of two babies differ by a pound" is of little value to a statistician. A statistician would like to know how many standard deviation apart are the birth weights of these two babies.

Example 5.2.1. Suppose it is known that the annual income X of the U.S. population is normally distributed with mean μ = $37,000.00 and standard deviation σ = $13,000.00. (My data is not real data.)

  1. The annual income of a professor is $64,000, and his GTA's annual income is $12,000.00. How many standard deviations apart are the income of the professor and the GTA?
    The Solution
    (y-x)/σ = (64,000-12,000)/13,000 = 4 standard deviations.

  2. The chairman's annual income is $77,000.00. How many standard deviations apart are the income of the professor and the chair?
    The Solution
    (y-x)/σ = (77,000-64,000)/13,000 = 1 standard deviation.

  3. How many standard deviations apart are the income of the GTA and the chair?
    The Solution
    (y-x)/σ = (77,000 - 12,000)/13,000 = 5 standard deviations.


Example 5.2.1-B. The weight X salmon in a river has normal distrubution with mean μ = 37 pounds and standard deviation σ = 8 pounds. The weight of a salmon you caught recently is 18 pounds. The weight of a salmon that your friend caught is 30 pounds. How many standard deviation apart are the these two fish?
Answer:

The 68-95-99.7 Rule

Suppose that a data set (or a random variable) has normal distribution. Then we have following facts:

  1. The data whose standardized values are between -1 and 1 represent 68 percent of the data.
    1. In other words, the probability that the z-value of a data value x is between -1 and 1 is 0.68. So P(-1 < Z < 1) = 0.68.
    2. The same is expressed by saying that 68 percent of the data values are within one standard deviation σ from the mean μ.
    3. It follows that the probability that z-value of a data is not between -1 and 1 is 1-0.68= 0.32.
  2. The data whose standardized values are between -2 and 2 represent 95 percent of the data. In other words, the probability that the z-value of a data value x is between -2 and 2, is 0.95. So P(-2 < Z < 2) = 0.95. So the probability that z-value of a data is not between -2 and 2 is 1 -0.95 = 0.05.
  3. The data whose standardized values are between -3 and 3 represent 99.7 percent of the data. In other words, the probability that the z-value of a data value x is between -3 and 3 is 0.997. So P(-3 < Z < 3) = 0.997. So the probability that z-value of a data is not between -3 and 3 is equal to 1-0.997= 0.003.
  4. Examine the flash animation on 68-95-99.7-Rule.
  5. There are many tables and software programs that give you the probability. I give you a table that gives you probability that the z-value is within two given numbers.
  6. In addition to the z-tables, you must use the symmetry and other facts to compute probability. For example
    1. To compute probability, first we must standardize and reduce it to a z-problem.
    2. The total area under the normal curve above x-axis is 1.
    3. The area under the normal curve on either side of the y-axis is one half. That means

      P( Z > 0) = 0.5.
      P( Z < 0) = 0.5.

    4. For example : P(0 < Z <1) = P(-1< Z < 1)/2 = 0.68/2 = 0.34.
    5. Half the data are bigger than μ. In other words

      P( X > μ) = 0.5.
      Similarly, P( X < μ) = 0.5.


Quartiles:

  1. From the tables one can compute the z-value of the third quartile Q3 is

    (Q3-μ)/σ = 0.675.

  2. So the third Quartile is given by :

    Q3= μ + 0.675 σ.

  3. Similarly the first quartile is:

    Q1= μ - 0.675 σ.

Example 5.2.2. Suppose scores X of students on certain test (like SAT) are approximately normally distributed with mean μ = 70 and standard deviation σ = 10 points. Suppose 1500 students take the test.

  1. Estimate the average score on the test.
  2. Estimate what percent of students scored 70 or more?
  3. Estimate what percent of the students scored between 60 and 80?
  4. Estimate what percent of students scored more than 80?

The Solutions
  1. An estimate of the average is population average μ = 70. So the answer is 70.

  2. Here they are asking us to find P(X > 70).
    1. We have the mean μ = 70.
    2. So P(X > 70) =P(70 < X) = P((70 - μ)/ μ < (X - μ)/ μ)
      = P((70 - 70)/10 < Z ) = P(Z < 0) = 0.5
    3. So the answer is 50 percent.

  3. Here we are asked to find P(60 < X <80). We have
    P(60 < X < 80) =
    P((60-μ)/ σ < (X-μ)/ σ < (80 - μ)/σ) =
    P((60-70)/10 < Z < (80-70)/10) =
    P(-1 < Z < 1) = 0.68.

  4. Here they are asking us to find P(X > 80). We must use the symmetry. We have
    P(X > 80) = P(X not between 60 and 80)/2 =
    (1 - P(X not within 60 and 80))/2 =
    (1 - 0.68)/2 = 0.32/2= 0.16.

Example 5.2.3. Let us have the same set up as in the above problem. Suppose that 1500 students took the test.

  1. Estimate how many students scored between 60 and 80?
  2. Estimate how many students scored more than 80?

The Solutions
  1. The answer = P(60 < X < 80) x 1500 = 0.68 x 1500 = 1020. This is an estimate only.

  2. The answer = P(X > 80) x 1500 = 0.16 x 1500 = 240.


Example 5.2.4. Let us have the same set up as above. Estimate the first and the third quartile.

The Solutions
  1. We can use the formulas given above.
  2. Q3= μ + 0.675 σ = 70 + 0.675 x 10 = 76.75
  3. Q1= μ- 0.675 σ= 70 - 0.675 x 10 = 63.25.

Percentiles

For a number p between 0 and 100, your score in a test is (above) p-percentile, if p percent of the students who took the test did not do as well as you did. More formally:

Definition. For a variable X, the pth percentile is a number xp so that P(X < xp) = p/100. In other words, p percent of the data values are less than (or also equal to) xp. So

  1. The first quartile Q1 is the 25 percentile.


  2. The median (the mean for normal distribution) is the 50 percentile.

  3. The third quartile Q3 is 75 percentile.
Remark. For the following problems you are expected to use the formulas for quartiles.

Example 5.2.5. The distribution of weight of six-month-old baby boys is approximately normal with mean μ = 17.25 pounds and standard deviation σ = 2 pounds.

  1. Suppose that a six-month-old boy weighs 15.9 pounds. Approximately what weight percentile is he in?

  2. Suppose that a six-month-old boy weighs 18.6 pounds. Approximately what weight percentile is he in?

  3. Suppose that a six-month-old boy is in the 75th percentile weight. Estimate his weight.

The Solutions
We will first directly compute the quartiles.
  1. Q3= μ + 0.675σ = 17.25 + 0.675 x 2 = 18.6
  2. Q1= μ- 0.675 σ = 17.25 - 0.675 x 2 = 15.9.
  1. The boy is in the 25th percentile.
  2. The boy is in the 75th percentile.
  3. An estimate of the boy's weight is 18.6 pounds.

 

Problems on 5.2: Probability (using The 68-95-99.7 Rule)

Exercise 5.2.1. Suppose that the weight X at birth of babies have normal distribution with mean μ = 115 ounce and standard deviation σ = 17 ounces. Use the 68-95-99.7 rule and do the following:

  1. What proportion of babies have weight, at birth, between 115 and 132 ounces? Solution
  2. What proportion of babies have weight, at birth, between 115 and 149 ounces? Solution
  3. What proportion of babies have weight, at birth, between 98 and 149 ounces? Solution
  4. What proportion of babies have weight, at birth, between 81 and 132 ounces? Solution
  5. What proportion of babies have weight, at birth, between 132 and 149 ounces? Solution

 

Exercise 5.2.2. Suppose that the length X of salmon in a river have normal distribution with mean μ = 17 inches and standard deviation σ = 4 inches. Use the 68-95-99.7 rule and do the following:

  1. What proportion of salmon in the river are between 17 inches and 21 inches? Solution
  2. What proportion of salmon in the river are between 13 inches and 21 inches? Solution
  3. What proportion of salmon in the river are between 9 inches and 17 inches? Solution
  4. What proportion of salmon in the river are between 9 inches and 21 inches? Solution

Problems on 5.2: Percentiles

Exercise 5.2.3. The distribution of weight for twelve-month-old baby girls is approximately normal with mean μ = 21 pounds and standard deviation σ = 2.2 pounds.

  1. Suppose that a twelve-month-old girl weighs 22.49 pounds. Approximately what weight percentile is she in?
  2. Suppose that a twelve-month-old girl weighs 19.52 pounds. Approximately what weight percentile is she in?
  3. Suppose that a twelve-month-old girl is in the 75th percentile weight. Estimate her weight.

Solution

 

Exercise 5.2.4. The distribution of weights for one-month-old baby girls is approximately normal with mean μ = 8.75 pounds and standard deviation σ = 1.1 pounds.

  1. Suppose that a one-month-old girl weighs 8.01 pounds. Approximately what weight percentile is she in?
  2. Suppose that a one-month-old girl weighs 9.49 pounds. Approximately what weight percentile is she in?
  3. Suppose that a one-month-old girl is in the 75th percentile weight. Estimate her weight.
Solution

 

Exercise 5.2.5. The distribution of weights for one-month-old baby boys is approximately normal with mean μ = 22.5 pounds and standard deviation σ = 2.2 pounds.

  1. Suppose that a twelve-month-old boy weighs 21.99 pounds. Approximately what weight percentile is he in?
  2. Suppose that a twelve-month-old boy weighs 21.02 pounds. Approximately what weight percentile is he in?
  3. Suppose that a twelve-month-old boy is in the 75th percentile weight. Estimate his weight.
Solution

5.3 Computing Probability and Use TablesGo to top of page

We have discussed both discrete and continuous random variables. We note here that a normal random variable X is a continuous random variable.

For a normal random variable X, we only used the 68-95-99 rule to compute the probability that X is within 1, 2, or 3 standard deviations σ from the mean μ. We will learn to use probability tables to compute the probability that X is within any two given numbers.

First, we can rephrase the definition of the normal distribution as follows:

Suppose that a random variable X (or a data set) has normal distribution with mean μ and standard deviation σ. Then the relative frequency histogram of the data fits into a a perfect bell-shaped curve as follows.

A bell shape curve.

This bell-shaped curve is called a normal curve.

Now we have the following:

  1. A normal curve is symmetric around the vertical line through the mean μ.
  2. The total area under the normal curve and above the horizontal axis (x-axis) is one. So on two sides of the line of symmetry the area under the graph is one-half.
  3. The probability rule: The probability that X is between two given numbers a and b (with a ≤ b) is equal to the area under the normal curve, above the x-axis and between the vertical lines x = a and x = b. So

    P(a < X < b) = P(a ≤ X < b ) =
    P(a < X ≤ b) = P(a ≤ X ≤ b) =
    the area under the normal curve, above the x-axis (of X)
    and between the vertical lines x = a and x = b.
    Bell curve showing the area described above.
    Also look at the animation for Normal Probability.

  4. The formula also works if a = - ∞ is negative infinity and b ∞ is infinity. That means,

    P( X ≤ b) = P( X <b) =
    the area under the normal curve (of X), above the x-axis
    and on the left side of the vertical line x = b.
    Bell or normal curve showing the area described above.

    and

    P(a < X) = P(a ≤ X) =
    the area under the normal curve (of X),
    above the x-axis and on the right side of x = a.
    Bell or normal curve showing the area described above.

  5. It follows that for any number a, we have P (X=a) = 0. This is very much in contrast with the case of discrete variables.

The Z-Table

Recall that the normal random variable with mean μ = 0 and standard deviation σ = 1 is called standard normal random variable (denoted by Z). Tables are available that you can use to compute the probability for a standard normal random variable. I have compiled a probability table for Z random variable. This table was compiled to specifically meet the needs of this class.

Bell curve showing the shaded area P(Z>z).


For various values of z, the table gives P(Z < z), which is the above shaded area as in this diagram.

Computing Probability for a Normal Random Variable

Suppose X is a normal random variable with mean μ and standard deviation σ.

  1. Then

    Z = (X - μ) / σ

    is a standard normal random variable.

  2. The Final Formula:

    P(a < X < b)
    = P((a-μ)/σ <(X-μ)/σ < (b - μ)/σ)
    = P((a-μ)/σ < Z < (b - μ)/σ)
    = The difference of the two numbers to be read from the z-table.

Example 5.3.1. The length X of life of some light bulbs produced in a factory is normally distributed with mean μ = 8640 hours and standard deviation σ\ = 1440 hours.

  1. Find the probability of the event E that a bulb will last less than 5040 hours.
  2. Find the probability of the event F that a bulb will last between 5040 hours and 8640 hours.
The Solutions
  1. We have P(E) = P( X < 5040) = P((X-μ)/σ < (5040-μ)/ σ) = P(Z < (5040-8640)/1440) =
    P(Z < -2.5) = 0.0062 (This is from the table).
  2. We have P(F) = P( 5040 < X < 8640) = P((5040 -μ)/σ <(X-μ)/ σ < (8640-μ)/σ ) = P((5040 -8640)/ 1440 < Z < (8640-8640)/1440 ) = P(-2.5 < Z < 0) = P(Z < 0) - P(Z < -2.5) = 0.5 - 0.0062 = 0.4938.
  3. Use the animation Z-Probability for better understanding.

Example 5.3.2. The length X of a fish in a lake has normal distribution with mean 67 cm and standard deviation 21 cm. What proportion (i.e., probability) of fish have length between 44 cm and 110 cm?


The Solutions
Here we are asked to find P(44 < X < 110). We have
  1. P( 44 < X < 110) =
    P((44 - μ)/ σ< (X-μ)/σ< (110-μ)/ σ) =
    P((44 - 67)/ 21 < Z < (110 - 67)/ 21) =
    P(-1.10 < Z < 2.05) =
    P(Z < 2.05) - P(Z < -1.10) = 0.9798 - 0.1357 = 0.8441.
    The answer is 84.46 percent.
  2. Proportion is .8441, which is 84.41 percent.
  3. Have a look at the flash animated Solution

Example 5.3.3. The diameters of the pumpkins in my patch have normal distribution with mean 13 inches and standard deviation 4.5 inches. What proportion (i.e., probability) of pumpkins is above 22 inches?

The Solutions
Here we are asked to find P(22 < X).
  1. We have
    P(22 < X) =
    P((22 -μ)/s < (X-μ)/ σ ) =
    P((22 -13)/4.5 < Z ) =
    P(2 < Z) = 1 - P(Z < 2) =
    1- 0.9772 =
    0.0228.
  2. So probability is 0.0228, which is 2.28 percent.
  3. Have a look at the flash animated
    Solution

Example 5.3.4. The amount of vegatable oil X produced by a machine in a day is normally distributed with μ = 130 liters and standard deviation σ = 25 liters. What is the probability that a machine produces between 120 liters and 150 liters each day?
Solution


The Solutions
Here we are asked to find P(120 < X < 150).
  1. We have
    P(120 < X < 150) =
    P((120 -μ)/ s < (X-μ)/s < (150 -μ)/σ ) =
    P((120 - 130)/25 < Z < (150 - 130)/25) =
    P(-0.4 < Z < 0.8) =
    P(Z < 0.8) - P(Z < -0.4) =
    0.7881 - 0.3446 = 0.4435.
  2. The probability is 0.4435, which is 44.35 percent.
  3. Also have a look at the animated Solution



Problems on 5.3: Probability (using the Probability Table )

Exercise 5.3.1. Let Z be the standard normal random variable.

  1. Find the probability P(-1.1 < Z < 2.5).
  2. Find the probability P(Z < -2.1).
  3. Find the probability P(-2.1 < Z < -1.5).
  4. Find the probability P(1.5 < Z).

Experiment with the normal-animation.
Solution

Exercise 5.3.2. Let X be a normal random variable with mean μ = 3 and standard deviation σ = 1.5 .

  1. Find the probability P(-1.1 < X < 2.5).
  2. Find the probability P(X < -2.1).
  3. Find the probability P(-1.2 < X < -0.5).
  4. Find the probability P(1.5 < X).

Experiment with the normal-animation.
Experiment with the Solution


Exercise 5.3.3.
The telephone company's data shows that length X of their international calls has normal distribution with mean 11.5 minutes and standard deviation 4.3 minutes. The company decided to give a special rate for the longest 20 percent calls. What percentage of international calls last between 5 minutes and 15 minutes?


Exercise 5.3.4.
The annual expenditure X of a student is approximately normally distributed with mean μ = 11,000 dollars and standard deviation σ = 1500 dollars. What percentage of students spend less than 10,000 dollars? Solution


Exercise 5.3.5.
Suppose the annual production X of milk per cow is normally distributed with μ = 5500 liters and standard deviation σ = 150 liters. What percentage of cows have annual yield less than 5155 liters? Solution


Exercise 5.3.6.
The weight X at birth of babies is normally distributed with mean μ = 114 oz. and standard deviation σ = 18 oz. What percentage of babies have birth weight below 141 oz.?
Solution


Problems added from 365

Exercise 5.2.4. The length X of a fish in a lake has normal distribution with mean 67 cm and standard deviation 21 cm. What proportion (i.e, probability) of fish are between 44 cm and 110 cm long?
Solution

Exercise 5.2.5. The diameter of the pumpkins in my patch has normal distribution with mean 13 inches and standard deviation 4.5 inches. What proportion (i.e., probability) of pumpkins is above 22 inches?
Solution

Exercise 5.2.7. Suppose the annual production X of milk per cow is normally distributed with μ = 5500 liters and standard deviation σ = 150 liters. What percent of cows have annual yield less than 5155 liters?
Solution

Exercise 5.2.8. The amount of vegetable oil X produced by a machine in a day is normally distributed with μ = 130 liters and standard deviation σ = 25 liters. What is the probability that a machine will produce between 120 liters and 150 liters on a day?
Solution

back to top