Satyagopal Mandal
Department of Mathematics
University of Kansas
Office: 624 Snow Hall  Phone: 785-864-5180
  • e-mail: mandal@math.ukans.edu
  • © Copy right Laws Apply. My Students have the permission to copy.

    Normal Distribution

     

    The shape of the bar graphs and histograms of almost any kind of numerical data resembles a Bell-shaped pattern. A perfect such bell-shape is shown in the following diagram:


  • Picture 14

     

    Now let us compare some bar charts with this bell-shaped figure.

     

    Example A: (see Page 455.) The following is the Frequency table of the scores obtained by a group of students:

    Scores

    6

    7

    8

    9

    10

    11

    12

    13

    14

    15

    16

    Frequency

    1

    2

    6

    10

    16

    13

    9

    8

    5

    2

    1

     

    The following is the bar graph of this data:


    Picture 15

     

     

     

    Example B: (see Page 478) Following is the class frequency table of weight (in pounds) of some babies:

    Class

    48-60

    60-72

    72-84

    84-96

    96-108

    108-120

    120-132

    132-144

    144-156

    156-168

    Frequency

    15

    24

    41

    67

    119

    184

    142

    26

    5

    2

                         

     

    The following is the bar graph of this data :


    Picture 16

     

     

    Example C: (see Ex 29, pp. 479) The following is the frequency table of some data:

     

    Value

    9

    10

    11

    12

    13

    14

    15

    16

    17

    18

    19

    20

    21

    22

    23

    24

    25

    Freque.

    3

    5

    7

    4

    12

    10

    13

    11

    15

    13

    11

    4

    3

    0

    0

    0

    1

     

    The following is the bar graph of this data :


    Picture 17

     

     

    Example D: (See page 513) The following is the percentage frequency table of the SAT scores of a large number of students:

     

     

    Scores

    200-240

    250-290

    300-340

    350-390

    400-440

    450-490

    500-540

    550-590

    600-640

    650-690

    700-740

    750-800

                             

    Perct.Freq

    1.1

    1.9

    4.3

    8.7

    13.2

    17.8

    17.3

    14.6

    10.6

    6.1

    2.8

    1.6

     

    The following is the bar graph :


    Picture 18

     

     

    You can see that all these bar graphs will fit into a nice bell-shaped curve. Some will fit in better than the other graphs.

     

    Perhaps, the most amazing fact in statistics is that bar graphs of almost all kinds of numerical data in nature will fit into a nice bell shaped curve. How well a data set will fit into a perfect bell-shaped curve largely depends on the size of the data. If the data set is large, the bar graph or the histogram will fit into a perfect bell-shaped curve, almost always.

     

    In this chapter we are considering data sets whose bar graph or the histogram fits into a perfect bell shaped curve. We are, in fact, talking about population data whose bar graph or histogram fits into a perfect bell shaped curve. At this point it will be helpful to recall the discussion on variables (Chapter 14).

     

     

    Random Variables:

     

    Recall the following definition of variables.

    Definition: A variable is a characteristic of the population members that varies with each member of the population.

    We will not distinguish between "variables" and "random variables" (as your textbook does.). I only want to mention that a variable, in the context of random experiments and sample spaces is called a random variable. So, a random variable X assigns a value to each outcome (and a variable assigns a value to each population member). In other words, here the sample space S takes the place of the population.

    Example: Suppose we want to study the KU student population. So, our random experiment is to select a KU student randomly. So, the sample space S, in this case, is the KU student population and each student is an outcome. Now, all the variables that we considered in the example of "KU student population" will also be random variables. So, GPA, weight, height, annual income of KU students will be random variables.

     

     

    Normal Distribution and Normal Random Variables:

     

    Definition 1: A data set (or a variable X) is said to be normally distributed if the bar graph or the histogram fit into a perfect bell shaped curve. We also say that X has normal distribution or X is a normal random variable and also the corresponding bell shaped curve is called the normal curve of X (or of the data set).

    If the bar graph or the histogram fits approximately into a perfect bell shape then we say that the data set (or the variable) is approximately normally distributed.

     

    Definition 2: Such a perfect bell shaped curve as above is called a normal curve.

     

    Remark: In statistics it is "normal" for the bar graphs to fit into a perfect bell shaped curve.

     

     

    Properties of Normal Curves:

     

    Suppose X is a variable that has normal distribution. Let us denote the mean (average) value of X by m and the standard deviation by s. Following are some of the properties of the normal curve of X.

     

    1) Symmetry: The normal curve of X is symmetric about a vertical line (through the mean m on the horizontal axis). This line is also called the line of symmetry. The two halves of the normal curve on two sides of this line of symmetry are the mirror images of each other.

     

    2) Center: The point where this line of symmetry of the normal curve cuts the horizontal axis is called the center of the normal curve. So, the x-value (or the data value) of the line of symmetry is the center. In fact,

    m = mean of X = median of X = center.

     

    So, we say that the normal curve is symmetric around the mean.

     

    3) More on the Shape of the Normal curve: If the standard deviation s is large then the normal curve will be flat. If s is small then the normal curve will be tall and stiff. But once the value of m and s are known the shape is completely determined.

     

     

    Theorem: If X is a normal random variable with mean m and standard deviation s then

    Z = (X- m)/ s is

    also a normal random variable with mean ZERO and standard deviation ONE.

     

    This Z is called the standard normal variable. The process of converting X to Z is called the standardization of X.

     

    Definition 3: If X = x is the value of X, then

    z = (x- m)/ s

    will be called the standardized value of X. We also call it the z-value of x.

     

    Definition 4: If x and y are two values of X (with x < y) then we say that x and y are

    (y-x)/s -standard deviation apart

    from each other.

     

    Suggested Problems: Ex. 1-4 from Page 527.

     

    Example 1: Suppose it is known that the annual income X of the US-population is normally distributed with mean m = $37,000.00 and standard deviation s = $13,000.00. (My data is not real data). A) The annual income of a professor is $64,000 and his GTA's annual income is $12,000.00. How many standard deviation apart are the income of the professor and the GTA?

     

    ANSWER: (64,000-12,000)/13,000 = 4 standard deviation.

     

    B) The Chairman's annual income is $77,000.00. How many standard deviation apart are the income of the professor and the Chair?

     

    ANSWER: (77,000-64,000)/13,000 = 1 standard deviation.

    C) How many standard deviation apart are the income of the GTA and the Chair?

     

    ANSWER: (77,000 - 12,000)/13,000 = 5 standard deviation.

     

     

    The 68-95-99.7 Rule:

     

    Suppose that a data set (or a random variable) has normal distribution. Then we have following facts:

    1. The data whose standardized values are between -1 and 1 represent 68 percent of the data.

    1. In other words, the probability that the z-value of a data value x will be between -1 and 1 is .68. So, P(-1 < Z < 1) = 0.68.
    2. The same is expressed by saying that 68 percent of the data values are within one standard deviation s from the mean m.
    3. It follows that the probability that z-value of a data is not between -1 and 1, is 1-0.68= 0.32.

    1. The data whose standardized values are between -2 and 2 represent 95 percent of the data. In other words, the probability that the z-value of a data value x will be between -2 and 2, is 0.95. So, P(-2 < Z < 2) = 0.95. So, the probability that z-value of a data is not between -2 and 2 is 1 -0.95 = 0.05.
    2. The data whose standardized values are between -3 and 3 represent 99.7 percent of the data. In other words, the probability that the z-value of a data value x will be between -3 and 3 is 0.997. So, P(-3 < Z < 3) = 0.997. So, the probability that z-value of a data is not between -3 and 3, is equal to 1-0.997= 0.003.
    3. There are many tables and software programs that will give you the probability. I plan to give you a table that will give you probability that the z-value is within two given numbers.
    4. Besides, you can use the symmetry to compute probability. For example

    1. P(0 < Z <1) = P(-1< Z < 1)/2 = 0.68/2 = 0.34.
    2. P(0 < Z) = 0.5 = P( Z < 0).
    3. Half the data are bigger than m. In other words P( X > m) = 0.5. Similarly P( X < m) = 0.5.

     

    Quartiles: From the tables one can compute the z-value of the third quartile Q3 is (Q3-m)/s = 0.675. So,

    Q3= m + 0.675 s.

    Similarly, the first quartile is

    Q1= m - 0.675 s.

     

    Suggested Problems: Ex.5-7, 9-10, 13-16, 21-24 (Page 527-).

     

     

    Example 2.: (similar to Ex.5. pp. 527) Suppose scores X of students on certain test (like SAT) are approximately normally distributed with mean m = 70 and standard deviation s = 10 points. Suppose 1500 students take the test.

    1. Estimate the average score on the test.
    2. Estimate what percent of students scored 70 or more?
    3. Estimate what percent of the students scored between 60 and 80?
    4. Estimate what percent of students scored more than 80?

     

    Solutions:

    1. An estimate of the average is population average m = 70. So, the answer is 70.
    2. Here they are asking us to find P(X > 70). We have P(X > 70) = 0.5 because m = 70. So the answer is 50 percent.
    3. Here we are asked to find P(60 < X <80). We have
    4. P(60 < X < 80) =

      P((60-m)/s < (X- m)/s < (80 - m)/s) =

      P((60-70)/10 < Z < P((80-70)/10) =

      P(-1 < Z < 1) = 0.68.

    5. Here they are asking us to find P(X > 80). We will have to use the symmetry. We have

    P(X > 80) =

    P(X not within 60 and 80)/2 =

    (1 - P(X not within 60 and 80))/2 = (1 - 0.68)/2 = 0.32/2= 0.16.

    Example 2.2: (similar to Ex. 6) Let us have the same set up as in the above problem. We also have that 1500 students wrote the test.

    1. Estimate how many students got between 60 and 80?
    2. Estimate how many students got more than 80?

     

    Solutions:

    1. The answer = P(60 < X < 80) x 1500 = 0.68 x 1500 = 1020. This is an estimate only.
    2. The answer = P(X > 80) x 1500 = 0.16 x 1500 = 240.

     

    Example 2.3: (Similar to Ex. 7.) Let us have the same set up as in example 2.1. Estimate the first and the third quartile.

     

    Solutions: We can use the formulas given above.

    Q3= m + 0.675 s = 70 + 0.675 x 10 = 76.75

     

    Q1= m - 0.675 s = 70 - 0.675 x 10 = 63.25.

     

     

    Remark: As far as I could see, your textbook did not explain what is percentile? But they are giving problems on percentiles. I briefly mentioned in the class that, for a number p between 0 and 100, your score in a test is p-percentile if p percent of the students who wrote the test did not do as well as you did. More formally, for a variable X, the pth percentile is a number xp so that

    P(X < xp) = p/100.

    In other words, p percent of the data values are less than (or also equal to) xp. So,

    1. The first quartile Q1 is 25 percentile.
    2. The median (the mean for normal distribution) is the 50 percentile.
    3. The third quartile Q3 is 75 percentile.

    For the problems on percentiles in the textbook, you are expected you use the 68-95-99.7 rule. But I would like to keep it simple and it will be enough if you can use a-c above. So, I will have to reformulate some of the problems in the textbook.

    Example 2.3: (see Ex. 21.) The distribution of weights for 6-month-old baby boys is approximately normal with mean m = 17.25 pounds and standard deviation s = 2 pounds.

    1. Suppose that a 6-month-old boy weighs 15.9 pounds. Approximately what weight percentile is he in.
    2. Suppose that a 6-month-old boy weighs 18.6 pounds. Approximately what weight percentile is he in?
    3. Suppose that a 6-month-old boy is in the 75th percentile weight. Estimate his weight.

     

    Solutions: We will first directly compute the quartiles.

    Q3= m + 0.675 s = 17.25 + 0.675 x 2 = 18.6

     

    Q1= m - 0.675 s = 17.25 - 0.675 x 2 = 15.9.

    1. The boy is in the 25th percentile.
    2. The boy is in the 75th percentile.
    3. An estimate of the boy's weight is 18.6 pounds.

     

     

    Example 2.4: (see Ex. 22.) The distribution of weights for 12-month-old baby girls is approximately normal with mean m = 21 pounds and standard deviation s = 2.2 pounds.

     

    1. Suppose that a 12-month-old girl weighs 22.49 pounds. Approximately what weight percentile is she in.
    2. Suppose that a 12-month-old girl weighs 19.52 pounds. Approximately what weight percentile is she in?
    3. Suppose that a 12-month-old boy is in the 75th percentile weight. Estimate her weight.

     

    Example 2.5: (see Ex. 23.) The distribution of weights for 1-month-old baby girls is approximately normal with mean m = 8.75 pounds and standard deviation s = 1.1 pounds.

    1. Suppose that a 1-month-old girl weighs 8.01 pounds. Approximately what weight percentile is she in.
    2. Suppose that a 1-month-old girl weighs 9.49 pounds. Approximately what weight percentile is she in?
    3. Suppose that a 1-month-old boy is in the 75th percentile weight. Estimate her weight.

     

    Example 2.6: (see Ex. 24.) The distribution of weights for 12-month-old baby boys is approximately normal with mean m = 22.5 pounds and standard deviation s = 2.2 pounds.

    1. Suppose that a 12-month-old boy weighs 21.99 pounds. Approximately what weight percentile is he in.
    2. Suppose that a 12-month-old boy weighs 21.02 pounds. Approximately what weight percentile is he in?
    3. Suppose that a 12-month-old boy is in the 75th percentile weight. Estimate his weight.