Math 365, Elementary Statistics |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Lesson 2 : Measures of Central Tendency and Measures of DispersionIntroductionIn this lesson we talk about two types of constants that we compute from data:
A measure of central tendency represents an "average value." Mean, median, mode (if you already know these) are measures of central tendency. A measure of dispersion is a measure of how widely the data is scattered around. 2.1 Measure of Central Tendency: MeanThe most common measure of central tendencies is the mean or arithmetic mean. Definition. The mean
or the arithmetic mean of a set of data is
given by
If we denote a data value (i.e., the variable) by x and if n is the
size of the data, then the above formula is written as
OR mean = x = ∑ x/n where ∑ denotes summation. If the data is a sample, then the mean is called the sample
mean. Again, if x denotes the variable, the data is sometimes
denoted by x1,x2, ... ,xn and then
OR
If you have not seen the notation ∑ before,
it simply means summation. For example,
Sometimes, different values in data carry different weight. Let us consider the following data and the corresponding frequency distribution that we computed earlier: Example 2.1.1 To estimate the mean time taken to complete a three-mile drive by a race car, the race car did several time trials. The following are sample times taken (in seconds) to complete the laps:
Following is the frequency distribution of this data:
Now we want to compute the mean time. So, we add all the data values
and divide by the data size 35. We already have computed the frequency
distribution which tells us that, in the data, 46 was present 1 time,
47 was present 1 time, 48 was present 3, times and so on. So, using
the frequency distribution, we compute the mean as follows :
The mean of the original data is the weighted
mean of the data values 46, 47, 48, 49, 50, 51, 52, 53, 54, 55
and 56 with the corresponding frequency as the weight.
So, a new formula for the mean would be
OR
where fi is the frequency of xi. The weighted mean is defined in more general context as follows: Definition. If x1, x2,
... , xn in a data set have different weights and the values
xi has weight wi, then the weighted
mean is defined as
weighted mean = x = ∑wixi
/ ∑wi
Properties of MeanRemark (effect of translation): Your teacher tells you that the mean score for the midterm in your class is 73. After you complained and requested a change, he agreed that all can add 7 points to their score. The new mean score is (old mean + 7) = 73 + 7 = 80. This is what we meant by "effect of translation." Example (effect of multiplication by c): Suppose you have some data x1, x2, ..., xn on salaries in an industry in the United States and the mean is $37000. On a certain day, 1 U.S. dollar = 1.4729 Canadian dollars (say c = 1.4729). So, in Canadian dollars the mean is 37000*c = 37000 x 1.4729. Similarly, the change of units (inches to feet or cm) are "multiplication by a constant c." Example 2.1.2. A student took PHSX 115 (College Physics), PSYC 120 (Personality), FREN 110 (Elementary French), BUS 241 (Managerial Accounting), and MATH 365 (Elementary Statistics). The number of credit hours and the student's grade is given in the following table:
What is the student's GPA? Solution. The GPA is the weighted average of the points (corresponding to the grades), weight being the course-credit hours. So, the GPA = (3x4+4x3+3x5+2x3+3x3)/(4+3+5+3+3) = 54/18 = 3. 2.2 Measure of Central Tendency: Median, and Mode
The Median
|
Use of Calculators (TI-83): |
---|
Entering your data
|
Sorting data and
computing the median
|
Computing the mean
if only raw data is given
|
Computing the mean
if the frequency table is given
|
Problems on 2.2: Mean and Median
Exercise 2.2.1. The following is the price
(in dollars) of a stock (say, CISCO SYSTEMS) checked by a trader several
times on a particular day.
138 | 142 | 127 | 137 | 148 | 130 | 142 | 133 |
Find the median price and mean price observed by the trader.
Solution
Exercise 2.2.2. The following figures refer to the GPA of six
students.
3.0 | 3.3 | 3.1 | 3.0 | 3.1 | 3.1 |
Find the median and mean GPA.
Exercise 2.2.3. The following data give the lifetime (in days) of light bulbs.
138 | 952 | 980 | 967 | 992 | 197 | 215 | 157 |
Find the mean and median lifetime of these bulbs.
Solution
Exercise 2.2.4. An athlete ran an event 32
times. The following frequency table gives the time taken (in seconds)
by the athlete to complete the events.
Time (in seconds) | Frequency |
---|---|
26 | 3 |
27 | 6 |
28 | 5 |
29 | 6 |
30 | 9 |
31 | 3 |
Total | 32 |
Compute the mean and median time taken by the athlete.
Solution
Exercise 2.2.5. Following is data on the weight (in ounces), at birth, of 96 babies born in Lawrence Memorial Hospital in May 2000.
94 | 105 | 124 | 110 | 119 | 137 | 96 | 110 | 120 | 115 | 119 |
104 | 135 | 123 | 129 | 72 | 121 | 117 | 96 | 107 | 80 | 80 |
96 | 123 | 124 | 124 | 134 | 78 | 138 | 106 | 130 | 97 | 134 |
111 | 133 | 128 | 96 | 126 | 124 | 125 | 127 | 62 | 127 | 96 |
116 | 118 | 126 | 94 | 127 | 121 | 117 | 124 | 93 | 135 | 112 |
120 | 125 | 120 | 147 | 138 | 72 | 119 | 89 | 81 | 113 | 100 |
109 | 127 | 138 | 122 | 110 | 113 | 100 | 115 | 110 | 135 | 120 |
97 | 127 | 120 | 110 | 107 | 111 | 126 | 132 | 120 | 108 | 148 |
133 | 103 | 92 | 124 | 150 | 86 | 121 | 98 |
Compute the mean and median weight, at birth, of the babies.
Solution
Exercise 2.2.6. Following is data on the hourly wages (paid only in whole dollars) of 99 employees in an industry.
7 | 11 | 7 | 11 | 10 | 9 | 10 | 10 | 12 | 13 |
7 | 8 | 11 | 11 | 14 | 9 | 7 | 9 | 11 | 7 |
9 | 13 | 12 | 14 | 7 | 8 | 7 | 14 | 15 | 9 |
9 | 7 | 11 | 9 | 12 | 9 | 12 | 11 | 14 | 9 |
12 | 13 | 7 | 9 | 10 | 14 | 11 | 12 | 13 | 7 |
15 | 15 | 16 | 16 | 15 | 16 | 11 | 7 | 18 | 19 |
15 | 16 | 15 | 15 | 16 | 16 | 17 | 16 | 16 | 13 |
15 | 15 | 16 | 15 | 16 | 15 | 15 | 17 | 16 | 12 |
16 | 15 | 15 | 16 | 15 | 15 | 19 | 8 | 16 | 17 |
16 | 16 | 15 | 16 | 16 | 16 | 13 | 12 | 8 |
Compute the mean and median hourly wage.
Solution
Exercise 2.2.7. Following is the frequency table on the number of typos in a sample of 30 books published by a publisher.
No. of Typos | 156 | 158 | 159 | 160 | 162 |
---|---|---|---|---|---|
Frequency | 6 | 4 | 5 | 6 | 9 |
Find the mean and median number of typos in a book.
Solution
Exercise 2.2.8. Following is data on the length (in inches), at birth, of 96 babies born in Lawrence Memorial Hospital in May 2000.
18 | 18.5 | 19 | 18.5 | 19 | 21 | 18 | 19 | 20 | 20.5 |
19 | 19 | 21.5 | 19.5 | 20 | 17 | 20 | 20 | 19 | 20.5 |
18 | 18.5 | 20 | 19.5 | 20.75 | 20 | 21 | 18 | 20.5 | 20 |
21 | 19 | 20.5 | 19 | 20 | 19.5 | 17.75 | 20 | 19.5 | 20 |
20.5 | 17 | 21 | 18.5 | 20 | 20 | 20 | 18.5 | 19.5 | 19 |
18 | 20.5 | 18 | 20 | 19 | 19 | 19.5 | 20 | 20.75 | 21 |
17.75 | 19 | 18 | 19 | 20 | 18.5 | 20 | 19 | 21 | 19 |
19.5 | 20 | 20 | 19 | 19.5 | 20 | 19.5 | 18.5 | 20.5 | 19.5 |
20.25 | 20 | 19.5 | 19.5 | 20 | 20 | 20 | 21 | 20 | 19 |
18.5 | 20.5 | 21.5 | 18 | 19.5 | 18 |
Compute the mean and median length, at birth, of these babies.
Solution
Range
Clearly, the measures of central tendency—mean, median, mode—cannot
tell us the "whole story" about the data.
Example 2.3.1. Suppose two sections of the statistics class have the following percentage score distribution at the end of the semester:
Section A | 81 | 84 | 83 | 80 | 82 |
---|---|---|---|---|---|
Section B | 72 | 93 | 92 | 82 | 71 |
Both these sections have the same mean—82. But in Section A, everybody will get a B grade. In section B, we will have two C's, one B and two A's.
The measure of dispersion is a measure of how widely the data is scattered around. In section A, the data has a very small dispersion or variability, whereas section B has a large dispersion.
A very simple measure of dispersion is the range of the data as we have defined before:
range = largest value - smallest value.
Mean Deviation, Sample Variance, and Standard Deviation
We will discuss three more measures of dispersion.
Suppose we have a data set x1, x2, ... , xn of size n. We will denote the mean of the data by x. Three definitions follow:
Definition. The mean deviation of the data is defined as follows.
mean deviation = ( |x1- x
| + ... + |xn- x
|) / n
So, the mean deviation is the mean of the absolute deviations | xi -x | from the mean.
Definition. The sample
variance s2 of the data is
defined as follows:
s2 = ( (x1- x)2
+ ... + (xn- x)2
) / (n -1)
Remark.
Definition. The sample
standard deviation s is defined as the square root of the sample
variance s2. So, to compute the sample standard deviation,
we have to compute the sample variance first.
If we simplify the definition of sample variance we get the following
formula:
s2 =( (x12 + x22
+ ... + xn2) - nx2)/(n
- 1)
Let us quickly do some computation with the above example 2.3.1.
The mean deviation for section A = (1+2+1+2+0)/5= 6/5 and the mean deviation for section B = (10+11+10+0+11)/5= 42/5. Since the variability of section B was much higher, the mean deviation was very high.
Let us compute the the sample variances :
For section A the sample variance is
( (81-82)2+(84-82)2+(83-82)2+(80-82)2+(82-82)2
)/(5-1) =
(1+4+1+4+0) /4= 10/4 = 2.5 .
For section B the sample variance is
( (72-82)2+(93-82)2+(92-82)2+(82-82)2+(71-82)2
)/(5-1) =
(100+121+100+0+121) /4= 442/4.
Application of Standard deviation
The mean and the standard deviation tell us a lot about how the data is distributed.
Chebyshev's Rule. This rule applies for all
kinds of data. Suppose x is the mean and s is
the standard deviation of the data. Then we have the following:
Chebyshev's Rule makes no assumption about the data or the variable. If we make some assumptions about the data, then we can improve the above rule as follows.
The Empirical Rule: Suppose the histogram
of the data is symmetric around the vertical line x = x
as follows:
In other words, the histogram should fit into a bell-shaped curve.
Bell-shaped Curve |
Click to see the Flash
animation.
Then we have the following:
Question: What does it mean when the variance or mean deviation of some data is zero? The answer is that all the data members are EQUAL!
Practice Problem. Consider the exercises 2.2.1 through 2.2.8. For each problem, compute the mean and standard deviation of the data and find what percentage of the data are within one, two, or three standard deviations from the mean.
Use of the Frequency Table
When a frequency table is given, we can use new formulas to compute the mean and variance of the data.
Formulas. Suppose the data consisting of n
observations are given in a frequency table (ungrouped). Let xi
denote the values and fi be the frequency of xi.
Then
x |
= |
|
= |
n |
, |
s2 = |
n- 1 |
, |
s2 = | 1
n- 1 |
[∑ |
(fixi2) -
n x2 |
]. |
Example 2.3.2. The following table extends the frequency table of the time taken to complete a lap by a race car (example 2.1.1) to compute mean and variance using the above formulas.
Time x |
Frequency f |
fx | fx2 |
---|---|---|---|
46 | 1 | 46 | 2116 |
47 | 1 | 47 | 2209 |
48 | 3 | 144 | 6912 |
49 | 3 | 147 | 7203 |
50 | 4 | 200 | 10000 |
51 | 6 | 306 | 15606 |
52 | 4 | 208 | 10816 |
53 | 5 | 265 | 14045 |
54 | 5 | 270 | 14580 |
55 | 2 | 110 | 6050 |
56 | 1 | 56 | 3136 |
Total | 35 | 1799 | 92673 |
So, the mean x = 1799/35 =51.4 and variance
s2 = (92673 - 35x 51.42)/(35-1) = 6.0118.
Example 2.3.3. Following is the class frequency distribution of the data on birth weight of some babies (exercise 1.2, Lesson 1):
Classes | Frequency f |
Class Mark x |
fx | fx2 |
---|---|---|---|---|
60.5-80.5 | 9 | 70.5 | 634.5 | 44732.25 |
80.5-100.5 | 20 | 90.5 | 1810 | 163805 |
100.5-120.5 | 25 | 110.5 | 2762.5 | 305256.25 |
120.5-140.5 | 37 | 130.5 | 4828.5 | 630119.25 |
140.5-160.5 | 8 | 150.5 | 1204 | 181202 |
Total | 99 | 11239.5 | 1325114.75 |
We can use the above formula to compute (approximate) variance and the standard deviation of the birth weight.
So, the mean x = 11239.5/99 = 113.53 and variance
s2 = (1325114.75 - 99 x 113.532)/(99-1) = 500.997.
Remarks.
Comment: We have had detailed discussions of various formulas for defining the mean, variance, and other constants. It is important to understand these concepts and formulas.
It is equally important to appreciate the value and necessity of using
calculators or other available software (like Excel). It is almost impossible
(and unnecessary) to compute these constants manually and correctly, unless
one is specially gifted with numerical computations.
Use of Calculators (TI-83): |
---|
Computing the variance
and standard deviation
|
Problems on 2.3: Variance, Standard Deviation, and Use of the Frequency Table
Exercise 2.3.1. The following is the price (in dollars) of a stock (say, CISCO SYSTEMS) checked by a trader several times on a particular day.
138 | 142 | 127 | 137 | 148 | 130 | 142 | 133 |
Find the variance and standard deviation of the price.
Solution
Exercise 2.3.2. The following figures refer to the GPA of six students.
3.0 | 3.3 | 3.1 | 3.0 | 3.1 | 3.1 |
Find the variance and standard deviation of GPA.
Exercise 2.3.3. The following data give the lifetime (in days) of certain light bulbs.
138 | 952 | 980 | 967 | 992 | 197 | 215 | 157 |
Find the variance and standard deviation of the lifetime of these
bulbs.
Solution
Exercise 2.3.4. An athlete ran an event 32 times. The following frequency table gives the time taken (in seconds) by the athlete to complete the events.
Time (in seconds) | Frequency |
---|---|
15.6 | 3 |
15.7 | 6 |
15.8 | 5 |
15.9 | 6 |
16.0 | 9 |
16.1 | 3 |
Total | 32 |
Compute the variance and standard deviation of time taken by the athlete.
Solution
Exercise 2.3.5. Following is data on the weight (in ounces), at birth, of 96 babies born in Lawrence Memorial Hospital in May 2000.
94 | 105 | 124 | 110 | 119 | 137 | 96 | 110 | 120 | 115 | 119 |
104 | 135 | 123 | 129 | 72 | 121 | 117 | 96 | 107 | 80 | 80 |
96 | 123 | 124 | 124 | 134 | 78 | 138 | 106 | 130 | 97 | 134 |
111 | 133 | 128 | 96 | 126 | 124 | 125 | 127 | 62 | 127 | 96 |
116 | 118 | 126 | 94 | 127 | 121 | 117 | 124 | 93 | 135 | 112 |
120 | 125 | 120 | 147 | 138 | 72 | 119 | 89 | 81 | 113 | 100 |
109 | 127 | 138 | 122 | 110 | 113 | 100 | 115 | 110 | 135 | 120 |
97 | 127 | 120 | 110 | 107 | 111 | 126 | 132 | 120 | 108 | 148 |
133 | 103 | 92 | 124 | 150 | 86 | 121 | 98 |
Compute the variance and standard deviation of the weight, at birth,
of these babies.
Solution
Exercise 2.3.6. Following is data on the hourly wages (paid only in whole dollars) of 99 employees in an industry.
7 | 11 | 7 | 11 | 10 | 9 | 10 | 10 | 12 | 13 |
7 | 8 | 11 | 11 | 14 | 9 | 7 | 9 | 11 | 7 |
9 | 13 | 12 | 14 | 7 | 8 | 7 | 14 | 15 | 9 |
9 | 7 | 11 | 9 | 12 | 9 | 12 | 11 | 14 | 9 |
12 | 13 | 7 | 9 | 10 | 14 | 11 | 12 | 13 | 7 |
15 | 15 | 16 | 16 | 15 | 16 | 11 | 7 | 18 | 19 |
15 | 16 | 15 | 15 | 16 | 16 | 17 | 16 | 16 | 13 |
15 | 15 | 16 | 15 | 16 | 15 | 15 | 17 | 16 | 12 |
16 | 15 | 15 | 16 | 15 | 15 | 19 | 8 | 16 | 17 |
16 | 16 | 15 | 16 | 16 | 16 | 13 | 12 | 8 |
Compute the variance and standard deviation of the hourly wages.
Solution
Exercise 2.3.7. Following is the frequency table on the number of typos in a sample of 30 books published by a publisher.
No. of Typos | 156 | 158 | 159 | 160 | 162 |
---|---|---|---|---|---|
Frequency | 6 | 4 | 5 | 6 | 9 |
Find the mean number, variance, and standard deviation of typos in
a book.
Solution
Exercise 2.3.8. Following is data on the length (in inches), at birth, of 96 babies born in Lawrence Memorial Hospital in May 2000.
18 | 18.5 | 19 | 18.5 | 19 | 21 | 18 | 19 | 20 | 20.5 |
19 | 19 | 21.5 | 19.5 | 20 | 17 | 20 | 20 | 19 | 20.5 |
18 | 18.5 | 20 | 19.5 | 20.75 | 20 | 21 | 18 | 20.5 | 20 |
21 | 19 | 20.5 | 19 | 20 | 19.5 | 17.75 | 20 | 19.5 | 20 |
20.5 | 17 | 21 | 18.5 | 20 | 20 | 20 | 18.5 | 19.5 | 19 |
18 | 20.5 | 18 | 20 | 19 | 19 | 19.5 | 20 | 20.75 | 21 |
17.75 | 19 | 18 | 19 | 20 | 18.5 | 20 | 19 | 21 | 19 |
19.5 | 20 | 20 | 19 | 19.5 | 20 | 19.5 | 18.5 | 20.5 | 19.5 |
20.25 | 20 | 19.5 | 19.5 | 20 | 20 | 20 | 21 | 20 | 19 |
18.5 | 20.5 | 21.5 | 18 | 19.5 | 18 |
Compute the variance and standard deviation of the length, at birth,
of these babies.
Solution
Exercise 2.3.9. The following is the frequency table of weight (in pounds) of some salmon in a river. Find the variance and standard deviation.
Weight x | 31 | 32 | 33 | 34 | 35 | 36 | 37 |
---|---|---|---|---|---|---|---|
Frequency f | 3 | 2 | 4 | 5 | 6 | 5 | 9 |
Find the variance and the standard deviation.
Solution
Exercise 2.3.10. The following data represents the time (in minutes) taken by students to drive to campus.
23 | 17 | 19 | 24 | 42 | 33 | 20 | 22 | 15 | 9 |
26 | 37 | 29 | 19 | 35 | 18 | 30 | 21 | 11 | 23 |
13 | 27 | 32 | 32 | 23 | 35 | 25 | 33 | 24 | 23 |
Find the mean, variance, and the standard deviation of the data.
Solution