Satyagopal Mandal |
Department of Mathematics |
Office: 624 Snow Hall Phone: 785-864-5180 |
Normal Distribution
The shape of the bar graphs and histograms of almost any kind of numerical data resembles a Bell-shaped pattern. A perfect such bell-shape is shown in the following diagram:
Picture 14
Now let us compare some bar charts with this bell-shaped figure.
Example A: (see Page 455.) The following is the Frequency table of the scores obtained by a group of students:
Scores |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
16 |
Frequency |
1 |
2 |
6 |
10 |
16 |
13 |
9 |
8 |
5 |
2 |
1 |
The following is the bar graph of this data:
Picture 15
Example B: (see Page 478) Following is the class frequency table of weight (in pounds) of some babies:
Class |
48-60 |
60-72 |
72-84 |
84-96 |
96-108 |
108-120 |
120-132 |
132-144 |
144-156 |
156-168 |
Frequency |
15 |
24 |
41 |
67 |
119 |
184 |
142 |
26 |
5 |
2 |
The following is the bar graph of this data :
Picture 16
Example C: (see Ex 29, pp. 479) The following is the frequency table of some data:
Value |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 |
21 |
22 |
23 |
24 |
25 |
Freque. |
3 |
5 |
7 |
4 |
12 |
10 |
13 |
11 |
15 |
13 |
11 |
4 |
3 |
0 |
0 |
0 |
1 |
The following is the bar graph of this data :
Picture 17
Example D: (See page 513) The following is the percentage frequency table of the SAT scores of a large number of students:
Scores |
200-240 |
250-290 |
300-340 |
350-390 |
400-440 |
450-490 |
500-540 |
550-590 |
600-640 |
650-690 |
700-740 |
750-800 |
Perct.Freq |
1.1 |
1.9 |
4.3 |
8.7 |
13.2 |
17.8 |
17.3 |
14.6 |
10.6 |
6.1 |
2.8 |
1.6 |
The following is the bar graph :
Picture 18
You can see that all these bar graphs will fit into a nice bell-shaped curve. Some will fit in better than the other graphs.
Perhaps, the
most amazing fact in statistics is that bar graphs of almost all kinds of numerical data in nature will fit into a nice bell shaped curve. How well a data set will fit into a perfect bell-shaped curve largely depends on the size of the data. If the data set is large, the bar graph or the histogram will fit into a perfect bell-shaped curve, almost always.
In this chapter we are considering data sets whose bar graph or the histogram fits into a perfect bell shaped curve. We are, in fact, talking about population data whose bar graph or histogram fits into a perfect bell shaped curve. At this point it will be helpful to recall the discussion on variables (Chapter 14).
Random Variables:
Recall the following definition of variables.
Definition: A
variable is a characteristic of the population members that varies with each member of the population.We will
not distinguish between "variables" and "random variables" (as your textbook does.). I only want to mention that a variable, in the context of random experiments and sample spaces is called a random variable. So, a random variable X assigns a value to each outcome (and a variable assigns a value to each population member). In other words, here the sample space S takes the place of the population.Example: Suppose we want to study the KU student population. So, our random experiment is to select a KU student randomly. So, the sample space S, in this case, is the KU student population and each student is an outcome. Now, all the variables that we considered in the example of "KU student population" will also be random variables. So, GPA, weight, height, annual income of KU students will be random variables.
Normal Distribution and Normal Random Variables:
Definition 1: A data set (or a variable X) is said to be
normally distributed if the bar graph or the histogram fit into a perfect bell shaped curve. We also say that X has normal distribution or X is a normal random variable and also the corresponding bell shaped curve is called the normal curve of X (or of the data set).If the bar graph or the histogram fits approximately into a perfect bell shape then we say that the data set (or the variable) is approximately normally distributed.
Definition 2: Such a perfect bell shaped curve as above is called a
normal curve.
Remark: In statistics it is "normal" for the bar graphs to fit into a perfect bell shaped curve.
Properties of Normal Curves:
Suppose X is a variable that has normal distribution. Let us denote the mean (average) value of X by m and the standard deviation by s. Following are some of the properties of the normal curve of X.
1) Symmetry: The normal curve of X is symmetric about a vertical line (through the mean m on the horizontal axis). This line is also called the line of symmetry. The two halves of the normal curve on two sides of this line of symmetry are the mirror images of each other.
2) Center: The point where this line of symmetry of the normal curve cuts the horizontal axis is called the center of the normal curve. So, the x-value (or the data value) of the line of symmetry is the center. In fact,
m
= mean of X = median of X = center.
So, we say that the
normal curve is symmetric around the mean.
3) More
on the Shape of the Normal curve: If the standard deviation s is large then the normal curve will be flat. If s is small then the normal curve will be tall and stiff. But once the value of m and s are known the shape is completely determined.
Theorem: If X is a normal random variable with mean
m and standard deviation s thenZ = (X-
m)/ s isalso a normal random variable with mean
ZERO and standard deviation ONE.
This Z is called the
standard normal variable. The process of converting X to Z is called the standardization of X.
Definition 3: If X = x is the value of X, then
z = (x-
m)/ swill be called the
standardized value of X. We also call it the z-value of x.
Definition 4: If x and y are two values of X (with x < y) then we say that x and y are
(y-x)/
s -standard deviation apartfrom each other.
Suggested Problems
: Ex. 1-4 from Page 527.
Example 1: Suppose it is known that the annual income X of the US-population is normally distributed with mean
m = $37,000.00 and standard deviation s = $13,000.00. (My data is not real data). A) The annual income of a professor is $64,000 and his GTA's annual income is $12,000.00. How many standard deviation apart are the income of the professor and the GTA?
ANSWER: (64,000-12,000)/13,000 = 4 standard deviation.
B) The Chairman's annual income is $77,000.00. How many standard deviation apart are the income of the professor and the Chair?
ANSWER: (77,000-64,000)/13,000 = 1 standard deviation.
C) How many standard deviation apart are the income of the GTA and the Chair?
ANSWER: (77,000 - 12,000)/13,000 = 5 standard deviation.
The 68-95-99.7 Rule:
Suppose that a data set (or a random variable) has normal distribution. Then we have following facts:
Quartiles
: From the tables one can compute the z-value of the third quartile Q3 is (Q3-m)/s = 0.675. So,Q3=
m + 0.675 s.Similarly, the first quartile is
Q1=
m - 0.675 s.
Suggested Problems
: Ex.5-7, 9-10, 13-16, 21-24 (Page 527-).
Example 2.: (similar to Ex.5. pp. 527) Suppose scores X of students on certain test (like SAT) are approximately normally distributed with mean
m = 70 and standard deviation s = 10 points. Suppose 1500 students take the test.
Solutions:
P(60 < X < 80) =
P((60-
m)/s < (X- m)/s < (80 - m)/s) =P((60-70)/10 < Z < P((80-70)/10) =
P(-1 < Z < 1) = 0.68
.P(X > 80) =
P(X not within 60 and 80)/2 =
(1 - P(X not within 60 and 80))/2
= (1 - 0.68)/2 = 0.32/2= 0.16.Example 2.2: (similar to Ex. 6) Let us have the same set up as in the above problem. We also have that 1500 students wrote the test.
Solutions:
Example 2.3: (Similar to Ex. 7.) Let us have the same set up as in example 2.1. Estimate the first and the third quartile.
Solutions: We can use the formulas given above.
Q3=
m + 0.675 s = 70 + 0.675 x 10 = 76.75
Q1=
m - 0.675 s = 70 - 0.675 x 10 = 63.25.
Remark: As far as I could see, your textbook did not explain what is percentile? But they are giving problems on percentiles. I briefly mentioned in the class that, for a number p between 0 and 100, your score in a test is
p-percentile if p percent of the students who wrote the test did not do as well as you did. More formally, for a variable X, the pth percentile is a number xp so thatP(X < xp) = p/100.
In other words, p percent of the data values are less than (or also equal to) xp. So,
For the problems on percentiles in the textbook, you are expected you use the 68-95-99.7 rule. But I would like to keep it simple and it will be enough if you can use a-c above. So, I will have to reformulate some of the problems in the textbook.
Example 2.3: (see Ex. 21.) The distribution of weights for 6-month-old baby boys is approximately normal with mean
m = 17.25 pounds and standard deviation s = 2 pounds.
Solutions: We will first directly compute the quartiles.
Q3=
m + 0.675 s = 17.25 + 0.675 x 2 = 18.6
Q1=
m - 0.675 s = 17.25 - 0.675 x 2 = 15.9.
Example 2.4: (see Ex. 22.) The distribution of weights for 12-month-old baby girls is approximately normal with mean
m = 21 pounds and standard deviation s = 2.2 pounds.
Example 2.5: (see Ex. 23.) The distribution of weights for 1-month-old baby girls is approximately normal with mean
m = 8.75 pounds and standard deviation s = 1.1 pounds.
Example 2.6: (see Ex. 24.) The distribution of weights for 12-month-old baby boys is approximately normal with mean
m = 22.5 pounds and standard deviation s = 2.2 pounds.