Math 105: Topics in Mathematics

Lesson 2 : Measures of Central Tendency and Dispersion

Due Date: See the Lecture Notes Site.

Introductionback to top

In this lesson we define various numerical measures (constants) for data sets. These numerical measures summarize and describe the data. The average value of the data would be a common example. There are two broad classification of such numerical measures that are computed from the data:

  1. measures of central tendency and
  2. measures of dispersion.

A measure of central tendency represents an "average value." Mean, median, mode (if you already know these) are measures of central tendency. A measure of dispersion is a measure of how widely the data is scattered around.

2.1 Measure of Central Tendency: Meanback to top

The most common measure of central tendencies is the mean or arithmetic mean.

Definition. The mean or the arithmetic mean of a set of data is given by

mean = sum of all the data values
size of the data
.

If we denote a data value (i.e., the variable) by x and if n is the size of the data, then the above formula is written as


mean =

x 

=

x
n


where denotes summation. If the data represents a sample, then the mean is called the sample mean. Again, if x denotes the variable, the data is sometimes denoted by x1,x2, ... ,xn and with such notations the formula for the mean is written as

mean =
x
 
= n   
 xi
i=1   
n
.

If you have not seen the notation before, it simply means summation. For example,

 n

i = 1 
xi = x1+x2+ ... +xn


The Frequency Table and the Mean

When the frequency table of a data set is given, then we can use the frequency table to compute the mean of the original data. Let us consider the the following example:

Example 2.1.1 To estimate the mean time taken to complete a three-mile drive by a race car, the race car did several time trials. The following are sample times taken (in seconds) to complete the laps:

50 48 49 46 54 53 52 51 47 56 52 51
51 53 50 49 48 54 53 51 52 54 54 53
55 48 51 50 52 49 51 53 55 54 50  

Following is the frequency distribution of this data:

Time (in seconds) 46 47 48 49 50 51 52 53 54 55 56
Frequency 1 1 3 3 4 6 4 5 5 2 1

To compute the mean time of the original data, we obviously, add all the data values and divide by the data size 35. The frequency distribution tells us that, in the data, 46 was present 1 time, 47 was present 1 time, 48 was present 3, times and so on. So, using the frequency distribution, we compute the mean as follows :

mean=x= (46x1+47x1+48x3+49x3+50x4+51x6+52x4+53x5+54x5+55x2+56x1)
(1+1+3+3+4+6+4+5+5+2+1)
=1799/35=51.4

More generally, when we compute the mean using the frequency table, the formula for the mean would be

mean   =   x   =   (x1f1 + x2f2 + ... ) / (f1 + f2 + ... )   =   (Sum (x-value times the frequency)) / (Sum of frequencies)

where fi is the frequency of xi.


Properties of the Mean

  1. Effect of translation. Let x be the mean of x1, x2, ... , xn. Then the mean of y1 = x1+d, y2 = x2+d, ... ; yn = xn+d is given by

    y =   x+d


  2. Effect of multiplication by a constant. Let x be the mean of x1, ... , xn. Then the mean of

    z1 = cx1, z2 = cx2, ... , zn = cxn

    is given by

    z =  cx

    Examples for this are changes of units: (1) Conversion of data from inches to centimeter. The conversion multiplier is c=2.54. (2) Foot to inches c=12 (3) US dollars to EURO, where c is the exchange rate.

Example (effect of translation): Your teacher tells you that the mean score for the midterm in your class is 73. After you complained and requested a change, he agreed that all can add 7 points to their score. The new mean score is (old mean + 7) = 73 + 7 = 80. This is what we meant by "effect of translation."

Example (effect of multiplication by c): Suppose you have some data x1, x2, ..., xn on salaries in an industry in the United States and the mean is $37000. On a certain day (March 21, 2011), 1 U.S. dollar = 0.976469 Canadian dollars (say c = 0.976469). So, in Canadian dollars the mean is 37000*c = 37000 x 0.976469=36129.35 Canadian Dollars.

Similarly, any change of units (inches to feet or cm, minutes to seconds) are "multiplication by a constant c."

2.2 Other Measures of Central Tendency: Median, and Modeback to top

The Median

The median represents the middle value of the data. Half the data will be less than or equal to the median, and half the data will be greater than or equal to the median. You are above the median American income if at least half the American population is making less than you make.

Definition. Suppose the data is arranged in an increasing order (i.e., in an array). If the size of data is ODD then the median is the middle value. If it is EVEN, then the median is the mean of the middle two values.


The Percentiles

Definition. For a number p between 0 to 100, the pth percentile xp of the data is a number such that at least p percent of the data members are below xp and at least (100 - p) percent of the data members are above xp.

  1. The 25th percentile is called the first quartile Q1.
  2. The median is the 50th percentile, also called the second quartile Q2.
  3. The 75th percentile is called the third quartile Q3.

The Mode

There is one other measure of central tendencies that should be mentioned.

Definition. The MODE of the data is the value or values that have the highest frequency. For example, the mode of the set {1, 3, 5, 5, 7} is {5} because it has the highest frequency. The mode of {1, 1, 3, 5, 5, 7} is {1, 5} because 1 and 5 both have the highest frequency. Such a set is said to be bimodal.


Use of Calculators (TI-84):
Entering your data
  1. Press the button stat.
  2. Select "Edit" in the Edit menu and enter.
  3. You will find six lists named L1, L2, L3, L4, L5, L6.
  4. Let's say you want to enter your data in L1.
  5. If L1 has some data, clear it by pressing the stat button and selecting ClrList in the Edit menu.
  6. Once L1 is cleared, select Edit in the Edit menu and enter.
  7. Now type in your data and enter one by one.
Sorting data and computing the median
  1. Enter your data in a list, say L1.
  2. Select SortA in the Edit menu and enter.
  3. The calculator will ask for the list. Type in the list (L1), close the parentheses, and enter.
  4. The calculator will say Done.
  5. Press stat, select edit in the Edit menu, and enter.
  6. You will see that your data in L1 has been sorted in an increasing order.
  7. If the data size is odd, the median is the middle value.
    If the data size is even, the median is the average of the middle two values.
Computing the mean if only raw data is given
  1. Enter your data in a list, say L1.
  2. Select "1-Var Stats" in the CALC menu and enter.
  3. The calculator will ask for the list. Type in the list L1 and enter.
  4. The calculator will give a list of numbers; x-bar is the mean x.
Computing the mean if the frequency table is given
  1. Enter the frequency table in the calculator, say, x-values in L1 and frequencies in L2.
  2. Select "1-Var Stats" in the CALC menu and enter.
  3. The calculator will ask for the lists. Type in the list L1, L2 and enter.
  4. The calculator will give a list of numbers; x-bar is the mean x.
Computing the meadian
Do the same as above and scroll down.

Problems on 2.2: Mean and Median

Exercise 2.2.1. The following is the price (in dollars) of a stock (say, CISCO SYSTEMS) checked by a trader several times on a particular day.

138 142 127 137 148 130 142 133

Find the median price and mean price observed by the trader.
Solution: Use TI-84.



Exercise 2.2.2.
The following figures refer to the GPA of six students.

3.0 3.3 3.1 3.0 3.1 3.1

Find the median and mean GPA.


Exercise 2.2.3. The following data give the lifetime (in days) of light bulbs.

138 952 980 967 992 197 215 157

Find the mean and median lifetime of these bulbs.
Solution: Use TI-84.


Exercise 2.2.4. An athlete ran an event 32 times. The following frequency table gives the time taken (in seconds) by the athlete to complete the events.

Time (in seconds) Frequency
26 3
27 6
28 5
29 6
30 9
31 3
Total 32

Compute the mean and median time taken by the athlete.
Solution: Use TI-84.


Exercise 2.2.5. Following is data on the weight (in ounces), at birth, of 96 babies born in Lawrence Memorial Hospital in May 2000.

94 105 124 110 119 137 96 110 120 115 119
104 135 123 129 72 121 117 96 107 80 80
96 123 124 124 134 78 138 106 130 97 134
111 133 128 96 126 124 125 127 62 127 96
116 118 126 94 127 121 117 124 93 135 112
120 125 120 147 138 72 119 89 81 113 100
109 127 138 122 110 113 100 115 110 135 120
97 127 120 110 107 111 126 132 120 108 148
133 103 92 124 150 86 121 98      

Compute the mean and median weight, at birth, of the babies.
Solution: Use TI-84.


Exercise 2.2.6. Following is data on the hourly wages (paid only in whole dollars) of 99 employees in an industry.

7 11 7 11 10 9 10 10 12 13
7 8 11 11 14 9 7 9 11 7
9 13 12 14 7 8 7 14 15 9
9 7 11 9 12 9 12 11 14 9
12 13 7 9 10 14 11 12 13 7
15 15 16 16 15 16 11 7 18 19
15 16 15 15 16 16 17 16 16 13
15 15 16 15 16 15 15 17 16 12
16 15 15 16 15 15 19 8 16 17
16 16 15 16 16 16 13 12 8  

Compute the mean and median hourly wage.
Solution: Use TI-84.


Exercise 2.2.7. Following is the frequency table on the number of typos in a sample of 30 books published by a publisher.

No. of Typos 156 158 159 160 162
Frequency 6 4 5 6 9

Find the mean and median number of typos in a book.
Solution: Use TI-84.


Exercise 2.2.8. Following is data on the length (in inches), at birth, of 96 babies born in Lawrence Memorial Hospital in May 2000.

18 18.5 19 18.5 19 21 18 19 20 20.5
19 19 21.5 19.5 20 17 20 20 19 20.5
18 18.5 20 19.5 20.75 20 21 18 20.5 20
21 19 20.5 19 20 19.5 17.75 20 19.5 20
20.5 17 21 18.5 20 20 20 18.5 19.5 19
18 20.5 18 20 19 19 19.5 20 20.75 21
17.75 19 18 19 20 18.5 20 19 21 19
19.5 20 20 19 19.5 20 19.5 18.5 20.5 19.5
20.25 20 19.5 19.5 20 20 20 21 20 19
18.5 20.5 21.5 18 19.5 18        

Compute the mean and median length, at birth, of these babies.
Solution: Use TI-84.


2.3 Measures of Dispersionback to top

The measures of central tendencies—mean, median, mode— represent the middle values of the data set. The variability of data would also be of our interest, for a better understanding of the distribution of data. Two data sets may have same mean and median, but they may be spread out differently. Following is an example.

Example 2.3.1. Suppose two sections of the statistics class have the following percentage score distribution at the end of the semester:

Section A 81 84 83 80 82
Section B 72 93 92 82 71

Both these sections have the same mean—82. Medians of both the sets are same —82.
But the data sets are differently dispersed. In Section A, everybody will get a B grade. In section B, we will have two C's, one B and two A's.

The measure of dispersion is a measure of how widely the data is scattered around. In section A, the data has a very small dispersion or variability, whereas section B has a large dispersion.

A very simple measure of dispersion is the range of the data, defined as:

range = largest value - smallest value.


Mean Deviation, Sample Variance, and Standard Deviation

We will discuss three more measures of dispersion.

Suppose we have a data set x1, x2, ... , xn of size n. We will denote the mean of the data by x. Three definitions follow:

Definition. The mean deviation of the data is defined as follows.

mean deviation   = ( |x1- x | + ... + |xn- x |) / n

So, the mean deviation is the mean of the absolute deviations | xi -x | from the mean.


Definition. The sample variance s2 of the data is defined as follows:

s2   = [ (x1- x)2 + ... + (xn- x)2 ] / (n -1)

Remark.

  1. Note that we denote the sample variance as the square of a number s.
  2. Also note that we divide by n-1, not by n. For some statistical reason, dividing by n-1 works better.
  3. We would like our measure of dispersion to have the same units as our data, but our formula involves squares (xi-x)2. Therefore, the unit of the variance, s2, is the unit of the data squared. If the data is in feet, the variance is in square feet. To solve this problem we define another measure of dispersion, standard deviation denoted s.


Definition. The sample standard deviation is defined as the square root of the sample variance s2. So, to compute the sample standard deviation, we have to compute the sample variance first.


If we simplify the definition of sample variance we get the following formula:

s2 =[ (x12 + x22 + ... + xn2) - nx2]/(n - 1)

We do some computations with the above example 2.3.1.

The mean deviation for section A = (1+2+1+2+0)/5= 6/5 and the mean deviation for section B = (10+11+10+0+11)/5= 42/5. Since the variability of section B was clearly higher, the mean deviation is also so.

Let us compute the the sample variances :

For section A the sample variance is

[ (81-82)2+(84-82)2+(83-82)2+(80-82)2+(82-82)2 ]/(5-1) =
(1+4+1+4+0) /4= 10/4 = 2.5 .

For section B the sample variance is

[ (72-82)2+(93-82)2+(92-82)2+(82-82)2+(71-82)2 ]/(5-1) =
(100+121+100+0+121) /4= 442/4.


Question: What does it mean when the variance or mean deviation of some data is zero? The answer is that all the data members are EQUAL!


Use of the Frequency Table

When a frequency table is given, we can use new formulas to compute the mean and variance of the data.


Formulas. Suppose the data consisting of n observations is given in a frequency table (ungrouped). Let xi denote the values and fi be the frequency of xi. Then

  1. the mean =

    x
     
    =
    fixi

    fi
    =
    fixi

    n
    ,

  2. the variance =

    s2 =
    fi(xi - x)2

    n- 1

    ,

  3. A simplified formula for variance is
    s2 = 1
    n- 1
    [
    (fixi2) - n x2
    ].

  4. If the data is given in a class frequency table of the grouped data, we use the same formula, with xi as the class mark, which is the average of the class limits. In this case, we only get an estimate of the variance of the original data.

Example 2.3.2. The following table extends the frequency table of the time taken to complete a lap by a race car (example 2.1.1) to compute mean and variance using the above formulas.

Time
x
Frequency
f
fx fx2
46 1 46 2116
47 1 47 2209
48 3 144 6912
49 3 147 7203
50 4 200 10000
51 6 306 15606
52 4 208 10816
53 5 265 14045
54 5 270 14580
55 2 110 6050
56 1 56 3136
Total 35 1799 92673


So, the mean x = 1799/35 =51.4 and variance
s2 = (92673 - 35x 51.42)/(35-1) = 6.0118.

Remark: When computing power was not in abundance (only 20 or 30 years ago), as it is now, we used to compute mean and variance using such tables to do the computations. Such methods are out of date by now. We use TI-84 or other tools now.

Example 2.3.3. Following is the class frequency distribution of the data on birth weight of some babies (exercise 1.2, Lesson 1):

Classes Frequency
f
Class Mark
x
fx fx2
60.5-80.5 9 70.5 634.5 44732.25
80.5-100.5 20 90.5 1810 163805
100.5-120.5 25 110.5 2762.5 305256.25
120.5-140.5 37 130.5 4828.5 630119.25
140.5-160.5 8 150.5 1204 181202
Total 99   11239.5 1325114.75

We can use the above formula to compute (approximate) variance and the standard deviation of the birth weight.

So, the mean x = 11239.5/99 = 113.53 and variance

s2 = (1325114.75 - 99 x 113.532)/(99-1) = 500.997.

Remarks.

  1. Note that we can only get an approximate mean and variance if we use the class mark and with the above formula. If you use the original data, you will notice a difference.
  2. As mentioned above, because of the availability of computers, the importance of such approximations has declined.

Comment: We have had detailed discussions of various formulas for defining the mean, variance, and other constants. It is important to understand these concepts and formulas.

It is equally important to appreciate the value and necessity of using calculators or other available software (like Excel). It is almost impossible (and unnecessary) to compute these constants manually and correctly, unless one is specially gifted with numerical computations.

Use of Calculators (TI-84):
Computing the variance and standard deviation
  1. Follow the same steps used for computing the mean (using either raw data or the frequency table).
  2. The calculator will give a list of numbers; SX is the standard deviation.
  3. The variance is the square of the standard deviation.


Problems on 2.3: Variance, Standard Deviation, and Use of the Frequency Table

Exercise 2.3.1. The following is the price (in dollars) of a stock (say, CISCO SYSTEMS) checked by a trader several times on a particular day.

138 142 127 137 148 130 142 133

Find the variance and standard deviation of the price.
Solution: Use TI-84.


Exercise 2.3.2. The following figures refer to the GPA of six students.

3.0 3.3 3.1 3.0 3.1 3.1

Find the variance and standard deviation of GPA.


Exercise 2.3.3. The following data give the lifetime (in days) of certain light bulbs.

138 952 980 967 992 197 215 157

Find the variance and standard deviation of the lifetime of these bulbs.
Solution: Use TI-84.


Exercise 2.3.4. An athlete ran an event 32 times. The following frequency table gives the time taken (in seconds) by the athlete to complete the events.

Time (in seconds) Frequency
26 3
27 6
28 5
29 6
30 9
31 3
Total 32

Compute the variance and standard deviation of time taken by the athlete.
Solution: Use TI-84.
Also see Solution to understand the use of formula.


Exercise 2.3.5. Following is data on the weight (in ounces), at birth, of 96 babies born in Lawrence Memorial Hospital in May 2000.

94 105 124 110 119 137 96 110 120 115 119
104 135 123 129 72 121 117 96 107 80 80
96 123 124 124 134 78 138 106 130 97 134
111 133 128 96 126 124 125 127 62 127 96
116 118 126 94 127 121 117 124 93 135 112
120 125 120 147 138 72 119 89 81 113 100
109 127 138 122 110 113 100 115 110 135 120
97 127 120 110 107 111 126 132 120 108 148
133 103 92 124 150 86 121 98      

Compute the variance and standard deviation of the weight, at birth, of these babies.
Solution: Use TI-84.
Also see Solution to understand the use of formula.


Exercise 2.3.6. Following is data on the hourly wages (paid only in whole dollars) of 99 employees in an industry.

7 11 7 11 10 9 10 10 12 13
7 8 11 11 14 9 7 9 11 7
9 13 12 14 7 8 7 14 15 9
9 7 11 9 12 9 12 11 14 9
12 13 7 9 10 14 11 12 13 7
15 15 16 16 15 16 11 7 18 19
15 16 15 15 16 16 17 16 16 13
15 15 16 15 16 15 15 17 16 12
16 15 15 16 15 15 19 8 16 17
16 16 15 16 16 16 13 12 8  

Compute the variance and standard deviation of the hourly wages.
Solution: Use TI-84.


Exercise 2.3.7. Following is the frequency table on the number of typos in a sample of 30 books published by a publisher.

No. of Typos 156 158 159 160 162
Frequency 6 4 5 6 9

Find the mean number, variance, and standard deviation of typos in a book.
Solution: Use TI-84.


Exercise 2.3.8. Following is data on the length (in inches), at birth, of 96 babies born in Lawrence Memorial Hospital in May 2000.

18 18.5 19 18.5 19 21 18 19 20 20.5
19 19 21.5 19.5 20 17 20 20 19 20.5
18 18.5 20 19.5 20.75 20 21 18 20.5 20
21 19 20.5 19 20 19.5 17.75 20 19.5 20
20.5 17 21 18.5 20 20 20 18.5 19.5 19
18 20.5 18 20 19 19 19.5 20 20.75 21
17.75 19 18 19 20 18.5 20 19 21 19
19.5 20 20 19 19.5 20 19.5 18.5 20.5 19.5
20.25 20 19.5 19.5 20 20 20 21 20 19
18.5 20.5 21.5 18 19.5 18        

Compute the variance and standard deviation of the length, at birth, of these babies.
Solution: Use TI-84.


Exercise 2.3.9. The following is the frequency table of weight (in pounds) of some salmon in a river. Find the variance and standard deviation.

Weight x 31 32 33 34 35 36 37
Frequency f 3 2 4 5 6 5 9

Find the variance and the standard deviation.
Solution: Use TI-84.
Also see Solution


Exercise 2.3.10. The following data represents the time (in minutes) taken by students to drive to campus.

23 17 19 24 42 33 20 22 15 9
26 37 29 19 35 18 30 21 11 23
13 27 32 32 23 35 25 33 24 23

Find the mean, variance, and the standard deviation of the data.
Solution: Use TI-84.