Math 365, Elementary Statistics

Lesson 2 : Measures of Central Tendency and Measures of Dispersion

Introduction	2.1 Measures of Central Tendency: Mean		2.2 Measures of Central Tendency: Mean, Median, Mode
2.3 Measures of Dispersion		Homework 4 - 7

Introduction

In this lesson we talk about two types of constants that we compute from data:

measures of central tendency and
measures of dispersion.

A measure of central tendency represents an "average value." Mean, median, mode (if you already know these) are measures of central tendency. A measure of dispersion is a measure of how widely the data is scattered around.

2.1 Measure of Central Tendency: Mean

The most common measure of central tendencies is the mean or arithmetic mean.

Definition. The mean or the arithmetic mean of a set of data is given by

mean =

sum of all the data values

size of the data

If we denote a data value (i.e., the variable) by x and if n is the size of the data, then the above formula is written as

mean =

∑ x

mean = x = ∑ x/n where ∑ denotes summation.

If the data is a sample, then the mean is called the sample mean. Again, if x denotes the variable, the data is sometimes denoted by x₁,x₂, ... ,x_n and then

mean =

n
∑ x_i
i=1
n

mean = x =

n
∑ x_i/n
i=1

If you have not seen the notation ∑ before, it simply means summation. For example,

n
∑
i = 1

x_i = x₁+x₂+ ... +x_n

Weighted Mean

Sometimes, different values in data carry different weight. Let us consider the following data and the corresponding frequency distribution that we computed earlier:

Example 2.1.1 To estimate the mean time taken to complete a three-mile drive by a race car, the race car did several time trials. The following are sample times taken (in seconds) to complete the laps:

50	48	49	46	54	53	52	51	47	56	52	51
51	53	50	49	48	54	53	51	52	54	54	53
55	48	51	50	52	49	51	53	55	54	50

Following is the frequency distribution of this data:

Time (in seconds)	46	47	48	49	50	51	52	53	54	55	56
Frequency	1	1	3	3	4	6	4	5	5	2	1

Now we want to compute the mean time. So, we add all the data values and divide by the data size 35. We already have computed the frequency distribution which tells us that, in the data, 46 was present 1 time, 47 was present 1 time, 48 was present 3, times and so on. So, using the frequency distribution, we compute the mean as follows :

mean=x=

(46x1+47x1+48x3+49x3+50x4+51x6+52x4+53x5+54x5+55x2+56x1)
(1+1+3+3+4+6+4+5+5+2+1)

=1799/35=51.4

The mean of the original data is the weighted mean of the data values 46, 47, 48, 49, 50, 51, 52, 53, 54, 55 and 56 with the corresponding frequency as the weight. So, a new formula for the mean would be

= mean =

ⁿ
∑ x_i f_i
i=1
ⁿ
∑f_i
_i=1

mean

= x =

ⁿ ⁿ
∑ f_i x_i/ ∑f_i
_i=1 _i=1

where f_i is the frequency of x_i. The weighted mean is defined in more general context as follows:

Definition. If x₁, x₂, ... , x_n in a data set have different weights and the values x_i has weight w_i, then the weighted mean is defined as

weighted mean =

ⁿ ∑ _i=1	w_ix_i

ⁿ ∑ _i=1	w_i

weighted mean = x = ∑w_ix_i / ∑w_i

Properties of the Mean

Combining two means. Suppose we have two sets of data. The mean of the first set is x, and the size of the first set is m; the mean of the second set is y, and size of the second set is n. The mean of the combined data is

Combined mean = (m x +ny)/(m+n)

This is the weighted mean of x, y with weight m,n respectively.
Effect of translation. Let x be the mean of x₁, x₂, ... , x_n. Then the mean of y₁ = x₁+d, y₂ = x₂+d, ... ; y_n = x_n+d is given by

y = x+d
Effect of multiplication by a constant. Let x be the mean of x₁, ... , x_n. Then the mean of
z₁ = cx₁, z₂ = cx₂, ... , z_n = cx_n

is given by

z = cx

Properties of Mean

Remark (effect of translation): Your teacher tells you that the mean score for the midterm in your class is 73. After you complained and requested a change, he agreed that all can add 7 points to their score. The new mean score is (old mean + 7) = 73 + 7 = 80. This is what we meant by "effect of translation."

Example (effect of multiplication by c): Suppose you have some data x₁, x₂, ..., x_n on salaries in an industry in the United States and the mean is $37000. On a certain day, 1 U.S. dollar = 1.4729 Canadian dollars (say c = 1.4729). So, in Canadian dollars the mean is 37000*c = 37000 x 1.4729. Similarly, the change of units (inches to feet or cm) are "multiplication by a constant c."

Example 2.1.2. A student took PHSX 115 (College Physics), PSYC 120 (Personality), FREN 110 (Elementary French), BUS 241 (Managerial Accounting), and MATH 365 (Elementary Statistics). The number of credit hours and the student's grade is given in the following table:

Course	PHSX 115	PSYC 120	FREN 110	BUS 241	MATH 365
Grade (Points)	B (3 points)	A (4 points)	B (3 points)	C (2 points)	B (3 points)
Credit Hours	4	3	5	3	3

What is the student's GPA?

Solution. The GPA is the weighted average of the points (corresponding to the grades), weight being the course-credit hours. So, the GPA = (3x4+4x3+3x5+2x3+3x3)/(4+3+5+3+3) = 54/18 = 3.

2.2 Measure of Central Tendency: Median, and Mode

The Median

The median represents the middle value of the data. Half the data will be less than or equal to the median, and half the data will be greater than or equal to the median. You are above the median American income if half the American population is making less than you make.

Definition. Suppose the data is arranged in an increasing order (i.e., in an array). If the size of data is ODD then the median is the middle value. If it is EVEN, then the median is the mean of the middle two values.

The Percentiles

Definition. For a number p between 0 to 100, the pth percentile x_p of the data is a number such that at least p percent of the data members are below x_p and at least (100 - p) percent of the data members are above x_p.

The 25th percentile is called the first quartile Q₁.
The median is the 50th percentile, also called the second quartile Q₂.
The 75th percentile is called the third quartile Q₃.

The Mode

There is one other measure of central tendencies that should be mentioned.

Definition. The MODE of the data is the value or values that have the highest frequency. For example, the mode of the set {1, 3, 5, 5, 7} is {5} because it has the highest frequency. The mode of {1, 1, 3, 5, 5, 7} is {1, 5} because 1 and 5 both have the highest frequency. Such a set is said to be bimodal.

Use of Calculators (TI-83):
Entering your data Press the button stat. Select "Edit" in the Edit menu and enter. You will find six lists named L1, L2, L3, L4, L5, L6. Let's say you want to enter your data in L1. If L1 has some data, clear it by pressing the stat button and selecting ClrList in the Edit menu. Once L1 is cleared, select Edit in the Edit menu and enter. Now type in your data and enter one by one.
Sorting data and computing the median Enter your data in a list, say L1. Select SortA in the Edit menu and enter. The calculator will ask for the list. Type in the list (L1), close the parentheses, and enter. The calculator will say Done. Press stat, select edit in the Edit menu, and enter. You will see that your data in L1 has been sorted in an increasing order. If the data size is odd, the median is the middle value. If the data size is even, the median is the average of the middle two values.
Computing the mean if only raw data is given Enter your data in a list, say L1. Select "1-Var Stats" in the CALC menu and enter. The calculator will ask for the list. Type in the list L1 and enter. The calculator will give a list of numbers; x-bar is the mean x.
Computing the mean if the frequency table is given Enter the frequency table in the calculator, say, x-values in L1 and frequencies in L2. Select "1-Var Stats" in the CALC menu and enter. The calculator will ask for the lists. Type in the list L1, L2 and enter. The calculator will give a list of numbers; x-bar is the mean x.

Problems on 2.2: Mean and Median

Exercise 2.2.1. The following is the price (in dollars) of a stock (say, CISCO SYSTEMS) checked by a trader several times on a particular day.

138

142

127

137

148

130

142

133

Find the median price and mean price observed by the trader.
Solution

Exercise 2.2.2. The following figures refer to the GPA of six students.

3.0

3.3

3.1

3.0

3.1

Find the median and mean GPA.

Exercise 2.2.3. The following data give the lifetime (in days) of light bulbs.

138

952

980

967

992

197

215

157

Find the mean and median lifetime of these bulbs.
Solution

Exercise 2.2.4. An athlete ran an event 32 times. The following frequency table gives the time taken (in seconds) by the athlete to complete the events.

Time (in seconds)	Frequency
26	3
27	6
28	5
29	6
30	9
31	3
Total	32

Compute the mean and median time taken by the athlete.
Solution

Exercise 2.2.5. Following is data on the weight (in ounces), at birth, of 96 babies born in Lawrence Memorial Hospital in May 2000.

94	105	124	110	119	137	96	110	120	115	119
104	135	123	129	72	121	117	96	107	80	80
96	123	124	124	134	78	138	106	130	97	134
111	133	128	96	126	124	125	127	62	127	96
116	118	126	94	127	121	117	124	93	135	112
120	125	120	147	138	72	119	89	81	113	100
109	127	138	122	110	113	100	115	110	135	120
97	127	120	110	107	111	126	132	120	108	148
133	103	92	124	150	86	121	98

Compute the mean and median weight, at birth, of the babies.
Solution

Exercise 2.2.6. Following is data on the hourly wages (paid only in whole dollars) of 99 employees in an industry.

7	11	7	11	10	9	10	10	12	13
7	8	11	11	14	9	7	9	11	7
9	13	12	14	7	8	7	14	15	9
9	7	11	9	12	9	12	11	14	9
12	13	7	9	10	14	11	12	13	7
15	15	16	16	15	16	11	7	18	19
15	16	15	15	16	16	17	16	16	13
15	15	16	15	16	15	15	17	16	12
16	15	15	16	15	15	19	8	16	17
16	16	15	16	16	16	13	12	8

Compute the mean and median hourly wage.
Solution

Exercise 2.2.7. Following is the frequency table on the number of typos in a sample of 30 books published by a publisher.

No. of Typos	156	158	159	160	162
Frequency	6	4	5	6	9

Find the mean and median number of typos in a book.
Solution

Exercise 2.2.8. Following is data on the length (in inches), at birth, of 96 babies born in Lawrence Memorial Hospital in May 2000.

18	18.5	19	18.5	19	21	18	19	20	20.5
19	19	21.5	19.5	20	17	20	20	19	20.5
18	18.5	20	19.5	20.75	20	21	18	20.5	20
21	19	20.5	19	20	19.5	17.75	20	19.5	20
20.5	17	21	18.5	20	20	20	18.5	19.5	19
18	20.5	18	20	19	19	19.5	20	20.75	21
17.75	19	18	19	20	18.5	20	19	21	19
19.5	20	20	19	19.5	20	19.5	18.5	20.5	19.5
20.25	20	19.5	19.5	20	20	20	21	20	19
18.5	20.5	21.5	18	19.5	18

Compute the mean and median length, at birth, of these babies.
Solution

2.3 Measures of Dispersion

Range
Clearly, the measures of central tendency—mean, median, mode—cannot tell us the "whole story" about the data.

Example 2.3.1. Suppose two sections of the statistics class have the following percentage score distribution at the end of the semester:

Section A	81	84	83	80	82
Section B	72	93	92	82	71

Both these sections have the same mean—82. But in Section A, everybody will get a B grade. In section B, we will have two C's, one B and two A's.

The measure of dispersion is a measure of how widely the data is scattered around. In section A, the data has a very small dispersion or variability, whereas section B has a large dispersion.

A very simple measure of dispersion is the range of the data as we have defined before:

range = largest value - smallest value.

Mean Deviation, Sample Variance, and Standard Deviation

We will discuss three more measures of dispersion.

Suppose we have a data set x₁, x₂, ... , x_n of size n. We will denote the mean of the data by x. Three definitions follow:

Definition. The mean deviation of the data is defined as follows.

mean deviation = ( |x₁- x | + ... + |x_n- x |) / n

So, the mean deviation is the mean of the absolute deviations | x_i -x | from the mean.

Definition. The sample variance s² of the data is defined as follows:

s² = ( (x₁- x)² + ... + (x_n- x)² ) / (n -1)

Remark.

Note that we denote the sample variance as the square of a number s.
Also note that we divide by n-1, not by n. For some reason, dividing by n-1 works better.
We would like our measure of dispersion to have the same units as our data, but our formula involves squares (x_i-x)^2, which means the unit of dispersion, s², is the unit of the data squared. If the data is in feet, the variance is in square feet. To solve this problem we define another measure of dispersion, standard deviation denoted s.

Definition. The sample standard deviation s is defined as the square root of the sample variance s². So, to compute the sample standard deviation, we have to compute the sample variance first.

If we simplify the definition of sample variance we get the following formula:

s² =( (x₁² + x₂² + ... + x_n²) - nx²)/(n - 1)

Let us quickly do some computation with the above example 2.3.1.

The mean deviation for section A = (1+2+1+2+0)/5= 6/5 and the mean deviation for section B = (10+11+10+0+11)/5= 42/5. Since the variability of section B was much higher, the mean deviation was very high.

Let us compute the the sample variances :

For section A the sample variance is

( (81-82)²+(84-82)²+(83-82)²+(80-82)²+(82-82)² )/(5-1) =
(1+4+1+4+0) /4= 10/4 = 2.5 .

For section B the sample variance is

( (72-82)²+(93-82)²+(92-82)²+(82-82)²+(71-82)² )/(5-1) =
(100+121+100+0+121) /4= 442/4.

Application of Standard deviation

The mean and the standard deviation tell us a lot about how the data is distributed.

Chebyshev's Rule. This rule applies for all kinds of data. Suppose x is the mean and s is the standard deviation of the data. Then we have the following:

At least 0 percent of the observations will fall within 1 standard deviation of the mean, i.e, within (x-s, x+s). This is clearly obvious.
At least 75 percent of the observations will fall within 2 standard deviations of the mean, i.e., within (x-2s, x+2s).
At least 89 percent of the observations will fall within 3 standard deviations of the mean, i.e., within (x-3s, x+3s).
More generally, at least 100(1 - 1/k²) percent of the data will be within k- standard deviations from the mean, i.e. within (x-ks, x+ks).

Chebyshev's Rule makes no assumption about the data or the variable. If we make some assumptions about the data, then we can improve the above rule as follows.

The Empirical Rule: Suppose the histogram of the data is symmetric around the vertical line x = x as follows:
histogram of the data showing its symmetric distribution

In other words, the histogram should fit into a bell-shaped curve.

Bell-shaped Curve

Click to see the Flash animation.
Then we have the following:

Approximately 68.3 percent of the observations will fall in the interval (x-s, x+s).
Approximately 95.4 percent of the observations will fall in the interval (x-2s, x+2s).
Approximately 99.7 percent of the observations will fall within the interval (x-3s, x+3s).

Question: What does it mean when the variance or mean deviation of some data is zero? The answer is that all the data members are EQUAL!

Practice Problem. Consider the exercises 2.2.1 through 2.2.8. For each problem, compute the mean and standard deviation of the data and find what percentage of the data are within one, two, or three standard deviations from the mean.

Use of the Frequency Table

When a frequency table is given, we can use new formulas to compute the mean and variance of the data.

Formulas. Suppose the data consisting of n observations are given in a frequency table (ungrouped). Let x_i denote the values and f_i be the frequency of x_i. Then

the mean =

∑

f_ix_i

∑

f_i

∑

f_ix_i

the variance =

s² =

∑	f_i(x_i- x)²

n- 1

A simplified formula for variance is

s² =

n- 1

[∑

(f_ix_i²) - n x²

If the data is given in a frequency table of the grouped data, we use the same formula, with x_i as the class mark, which is the average of the class limits.

Example 2.3.2. The following table extends the frequency table of the time taken to complete a lap by a race car (example 2.1.1) to compute mean and variance using the above formulas.

Time x	Frequency f	fx	fx²
46	1	46	2116
47	1	47	2209
48	3	144	6912
49	3	147	7203
50	4	200	10000
51	6	306	15606
52	4	208	10816
53	5	265	14045
54	5	270	14580
55	2	110	6050
56	1	56	3136
Total	35	1799	92673

So, the mean x = 1799/35 =51.4 and variance
s² = (92673 - 35x 51.4²)/(35-1) = 6.0118.

Example 2.3.3. Following is the class frequency distribution of the data on birth weight of some babies (exercise 1.2, Lesson 1):

Classes	Frequency f	Class Mark x	fx	fx²
60.5-80.5	9	70.5	634.5	44732.25
80.5-100.5	20	90.5	1810	163805
100.5-120.5	25	110.5	2762.5	305256.25
120.5-140.5	37	130.5	4828.5	630119.25
140.5-160.5	8	150.5	1204	181202
Total	99		11239.5	1325114.75

We can use the above formula to compute (approximate) variance and the standard deviation of the birth weight.

So, the mean x = 11239.5/99 = 113.53 and variance

s² = (1325114.75 - 99 x 113.53²)/(99-1) = 500.997.

Remarks.

Note that we can only get an approximate mean and variance if we use the class mark and with the above formula. If you also use the original data you may notice a difference.
Because of the availability of computers, the importance of such approximations has declined.

Comment: We have had detailed discussions of various formulas for defining the mean, variance, and other constants. It is important to understand these concepts and formulas.

It is equally important to appreciate the value and necessity of using calculators or other available software (like Excel). It is almost impossible (and unnecessary) to compute these constants manually and correctly, unless one is specially gifted with numerical computations.

Use of Calculators (TI-83):
Computing the variance and standard deviation Follow the same steps used for computing the mean (using either raw data or the frequency table). The calculator will give a list of numbers; S_X is the standard deviation. The variance is the square of the standard deviation.

Problems on 2.3: Variance, Standard Deviation, and Use of the Frequency Table

Exercise 2.3.1. The following is the price (in dollars) of a stock (say, CISCO SYSTEMS) checked by a trader several times on a particular day.

138

142

127

137

148

130

142

133

Find the variance and standard deviation of the price.
Solution

Exercise 2.3.2. The following figures refer to the GPA of six students.

3.0

3.3

3.1

3.0

3.1

Find the variance and standard deviation of GPA.

Exercise 2.3.3. The following data give the lifetime (in days) of certain light bulbs.

138

952

980

967

992

197

215

157

Find the variance and standard deviation of the lifetime of these bulbs.
Solution

Exercise 2.3.4. An athlete ran an event 32 times. The following frequency table gives the time taken (in seconds) by the athlete to complete the events.

Time (in seconds)	Frequency
15.6	3
15.7	6
15.8	5
15.9	6
16.0	9
16.1	3
Total	32

Compute the variance and standard deviation of time taken by the athlete.
Solution

Exercise 2.3.5. Following is data on the weight (in ounces), at birth, of 96 babies born in Lawrence Memorial Hospital in May 2000.

94	105	124	110	119	137	96	110	120	115	119
104	135	123	129	72	121	117	96	107	80	80
96	123	124	124	134	78	138	106	130	97	134
111	133	128	96	126	124	125	127	62	127	96
116	118	126	94	127	121	117	124	93	135	112
120	125	120	147	138	72	119	89	81	113	100
109	127	138	122	110	113	100	115	110	135	120
97	127	120	110	107	111	126	132	120	108	148
133	103	92	124	150	86	121	98

Compute the variance and standard deviation of the weight, at birth, of these babies.
Solution

Exercise 2.3.6. Following is data on the hourly wages (paid only in whole dollars) of 99 employees in an industry.

7	11	7	11	10	9	10	10	12	13
7	8	11	11	14	9	7	9	11	7
9	13	12	14	7	8	7	14	15	9
9	7	11	9	12	9	12	11	14	9
12	13	7	9	10	14	11	12	13	7
15	15	16	16	15	16	11	7	18	19
15	16	15	15	16	16	17	16	16	13
15	15	16	15	16	15	15	17	16	12
16	15	15	16	15	15	19	8	16	17
16	16	15	16	16	16	13	12	8

Compute the variance and standard deviation of the hourly wages.
Solution

Exercise 2.3.7. Following is the frequency table on the number of typos in a sample of 30 books published by a publisher.

No. of Typos	156	158	159	160	162
Frequency	6	4	5	6	9

Find the mean number, variance, and standard deviation of typos in a book.
Solution

Exercise 2.3.8. Following is data on the length (in inches), at birth, of 96 babies born in Lawrence Memorial Hospital in May 2000.

18	18.5	19	18.5	19	21	18	19	20	20.5
19	19	21.5	19.5	20	17	20	20	19	20.5
18	18.5	20	19.5	20.75	20	21	18	20.5	20
21	19	20.5	19	20	19.5	17.75	20	19.5	20
20.5	17	21	18.5	20	20	20	18.5	19.5	19
18	20.5	18	20	19	19	19.5	20	20.75	21
17.75	19	18	19	20	18.5	20	19	21	19
19.5	20	20	19	19.5	20	19.5	18.5	20.5	19.5
20.25	20	19.5	19.5	20	20	20	21	20	19
18.5	20.5	21.5	18	19.5	18

Compute the variance and standard deviation of the length, at birth, of these babies.
Solution

Exercise 2.3.9. The following is the frequency table of weight (in pounds) of some salmon in a river. Find the variance and standard deviation.

Weight x	31	32	33	34	35	36	37
Frequency f	3	2	4	5	6	5	9

Find the variance and the standard deviation.
Solution

Exercise 2.3.10. The following data represents the time (in minutes) taken by students to drive to campus.

23	17	19	24	42	33	20	22	15	9
26	37	29	19	35	18	30	21	11	23
13	27	32	32	23	35	25	33	24	23

Find the mean, variance, and the standard deviation of the data.
Solution