Math 105, Topics in Mathematics

Lesson 3: Measure of Locations of Data

Introduction Go to top of page

Last lesson was mostly about graphical representations of data.
In this lesson is about numerical representations of data.
We will compute some constants from the data.
We talk about two types of constants that we compute from the data:

  1. measure of central tendency and
  2. measure of dispersion.

A measure of central tendency of a data represents an "average value." Mean, median, mode (if you already know them) are measures of central tendency. A measure of dispersion is a measure of how widely the data is scattered.

3.1 Measure of Central tendency or Location Go to top of page

The most important measure of central tendencies is the mean or arithmetic mean.

Definition. The mean or the arithmetic mean of a set data is given by

mean = Sum of all the data values
size of the data
.
If the data values are x1, x2, ... , xn where n is the size of the data, then the above formula is written as
mean   =x   =   (x1 + x2 + ... + xn) /n

If the data is a sample, then the mean is also called the sample mean.
Similarly, if data is the population data, then the mean is also called the population mean.

The Frequency Table and the Mean

When the frequency table of a data set is given, then we can use the frequency table to compute the mean of the original data. Let us consider the the following example:

Example 3.1.1 To estimate the mean time taken to complete a three-mile drive by a racecar, the racecar did several time trials, and the following sample of times taken (in seconds) to complete the laps was collected:

50 48 49 46 54 53 52 51 47 56 52 51
51 53 50 49 48 54 53 51 52 54 54 53
55 48 51 50 52 49 51 53 55 54 50  

Now we want to compute the mean time. To do so, we add all the data values and divide by the data size 35.
We computed the frequency distribution of this data, as follows:

Time
(in seconds)
46 47 48 49 50 51 52 53 54 55 56
Frequency 1 1 3 3 4 6 4 5 5 2 1

The frequency distribution tells us that, in the data, 46 was present one time, 47 was present one time, 48 was present three times, and so on. So, if we use the frequency distribution, we compute the mean as follows:

mean = x =
(46x1+47x1+48x3+49x3+50x4+51x6+52x4+53x5+54x5+55x2+56x1)

(1+1+3+3+4+6+4+5+5+2+1)

= 1799/35 = 51.4

This example shows that, to compute the mean of the original data, we can use the frequency table. A new formula for the mean is

mean   =   x   =   (x1f1 + x2f2 + ... ) / (f1 + f2 + ... )   =   (Sum (x-value times the frequency)) / (Sum of frequencies)

where fi is the frequency of xi.

Properties of the Mean:

  1. Effect of Translation: Let x be the mean of x1, x2, ... , xn. Then the mean of y1 = x1+ d, y2 = x2+ d, ... , yn = xn+ d is given by
    y   =   x + d

  2. Effect of multiplication by a constant: Let x be the mean of x1, ... , xn. Then the mean of

    z1 = cx1, z2 = cx2, ... , zn = cxn

    is given by

    z   =   cx


Example 3.1.2. Suppose the mean score in the class on a test is 78 points. Then the teacher announces that you can add 7 points to your score. Now what is the new mean score?
Answer: By the first property of mean the new score is 78 + 7 = 85.

Example 3.1.3. Suppose it is known that the average income in Canada in 35,000 Canadian dollars. What is the average Canadian income in U.S. dollars?
Answer: My online search reveals, as of today (Frb 25, 2010) one Canadian dollar = c = $.95 (U.S. dollars). So the average Canadian income in U.S. dollars, by property 2, is = (35,000 times .95 ) = 33,250 U.S. dollars.

The Median

The median represents the middle value of the data. Half the data is less than or equal to the median, and half the data is greater than or equal to the median. You are above the median-American-income if half the American population are making less than you make.

Definition. Suppose the data is arranged in an increasing order (i.e., in an array). If the data size is odd, then the median is the middle value. If the data size is even, then the median is the mean of the middle two values.

The Percentiles:

Definition. For a number p between 0 to 100, the pth percentile xp of the data is a number such that at least p percent of the data members are below xp, and at least 100-p percent of the data members are above xp.

  1. The 25 percentile is called the first Quartile Q1. So one-quarter of the data values are less than or equal to Q1, and three-quarters of the data values are above or equal to Q1.
  2. The median is the 50 percentile, also called the second Quartile Q2. So, two-quarters (i.e., half) of the data values are less than or equal to Q2, and two-quarters (i.e., half) of the data values are greater than or equal to Q2.
  3. The 75 percentile is called the third Quartile Q3. So three-quarters of the data values are less than or equal to Q3, and one-quarter of the data values are greater than or equal to Q3.

There is one other measure of central tendencies.

Definition. The MODE of the data is the value that appears maximum number of times in the data.

Use of Calculators (TI-83)
Enter Your Data:
  1. Press the button stat.
  2. Select "Edit" in the Edit menu and enter.
  3. You will find 6 lists named L1, L2, L3, L4, L5, L6.
  4. Say, you want to enter your data in L1.
  5. If L1 has some data, you clear it by pressing the stat button and selecting ClrList in the Edit menu.
  6. Once L1 is cleared, you select Edit in Edit menu and enter.
  7. Now you type in your data and enter one by one.
Use of Calculators, (TI-83):
Sort Data and Compute the Median:
  1. Enter your data in a a list, say L1.
  2. Select SortA in the Edit menu and enter.
  3. The calculator will be asking for the list. You type in the list (L1) and close the parenthesis and enter.
  4. The calculator will say Done.
  5. You press stat, select edit in Edit menu and enter.
  6. You will see that your data in L1 have been sorted in an increasing order.
  7. Now,
    • If data size is odd, the median is the middle value,
    • If data size is even, the median is the average of the middle two values.
Use of Calculators (TI-83)
Computing the Mean: If only raw Data is given
  1. Enter your data in a a list, say L1.
  2. Select "1-Var Stats" in the CALC menu and enter.
  3. The calculator will be asking for the list. You type in the list L1 and enter.
  4. The calculator will give a list of numbers, x-bar is the mean x.
Computing the Mean: If the Frequency Table is given
  1. Enter the frequency table in the calculator, say, x-values in L1 and frequencies in L2.
  2. Select "1-Var Stats" in the CALC menu and enter.
  3. The calculator will be asking for the lists. You type in the list L1, L2 and enter.
  4. The Calculator will give a list of numbers, x-bar is the mean x.

Problems on 3.1: Mean and Median

Exercise 3.1.1. The following is the price (in dollars) of a stock (say CISCO SYSTEMS, if you like) checked by a trader several times on a particular day.

138 142 127 137 148 130 142 133

Find the median price and mean price observed by the trader.
Solution

Exercise 3.1.2. The following figures refer to GPA of six students:

3.0 3.3 3.1 3.0 3.1 3.1

Find the median and mean GPA.

Exercise 3.1.3. The following data give the lifetime (in days) of certain light bulbs.

138 952 980 967 992 197 215 157

Find the mean and median life time of these bulbs.
Solution

Exercise 3.1.4. An athlete ran an event 32 times. The following frequency table gives the time taken (in seconds) by the athlete to complete the events.

Time (in seconds) Frequency
26 3
27 6
28 5
29 6
30 9
31 3
Total 32

Compute the mean median time taken by the athlete.
Solution

Exercise 3.1.5. The following is the weight (in ounces), at birth, of 96 babies born in Lawrence Memorial Hospital in May 2000.

94 105 124 110 119 137 96 110 120 115 119
104 135 123 129 72 121 117 96 107 80 80
96 123 124 124 134 78 138 106 130 97 134
111 133 128 96 126 124 125 127 62 127 96
116 118 126 94 127 121 117 124 93 135 112
120 125 120 147 138 72 119 89 81 113 100
109 127 138 122 110 113 100 115 110 135 120
97 127 120 110 107 111 126 132 120 108 148
133 103 92 124 150 86 121 98      

Compute the mean and median weight, at birth, of the babies.
Solution

Exercise 3.1.6. Following is some data on the hourly wages (paid only in whole dollars) of 99 employees in some industry.

7 11 7 11 10 9 10 10 12 13
7 8 11 11 14 9 7 9 11 7
9 13 12 14 7 8 7 14 15 9
9 7 11 9 12 9 12 11 14 9
12 13 7 9 10 14 11 12 13 7
15 15 16 16 15 16 11 7 18 19
15 16 15 15 16 16 17 16 16 13
15 15 16 15 16 15 15 17 16 12
16 15 15 16 15 15 19 8 16 17
16 16 15 16 16 16 13 12 8  

Compute the mean hourly wage and median.
Solution

Exercise 3.1.7. The following is the frequency table on the number of typos in a sample of 30 books published by a publisher.

No. of Typos 156 158 159 160 162
Frequency 6 4 5 6 9

Find the mean and median number of typos in a book.
Solution

Exercise 3.1.8. The following are the length (in inches), at birth, of 96 babies born in Lawrence Memorial Hospital in May 2000.

18 18.5 19 18.5 19 21 18 19 20 20.5
19 19 21.5 19.5 20 17 20 20 19 20.5
18 18.5 20 19.5 20.75 20 21 18 20.5 20
21 19 20.5 19 20 19.5 17.75 20 19.5 20
20.5 17 21 18.5 20 20 20 18.5 19.5 19
18 20.5 18 20 19 19 19.5 20 20.75 21
17.75 19 18 19 20 18.5 20 19 21 19
19.5 20 20 19 19.5 20 19.5 18.5 20.5 19.5
20.25 20 19.5 19.5 20 20 20 21 20 19
18.5 20.5 21.5 18 19.5 18

Compute the mean and median length, at birth, of these babies.
Solution

3.2 Measure of Dispersion or Variability Go to top of page

Clearly, the measures of central tendency--mean, median, mode--cannot tell us the "whole story" about the data. They do give us an idea where the "central values" of the data belong.

Example 3.2.1. Suppose two sections of the statistics class have the following percentage score distribution at the end of the semester:

Section A 81 84 83 80 82
Section B 72 93 92 82 71

Both these sections have the same mean 82. But in Section A, everybody will get a B grade. In section B, we will have two Cs, one B, and two As. In which section would you like to be? The measure of dispersion will give you some idea.

The measure of dispersion (or variability) is a measure of how widely the data are scattered around. In section A, the data have a very small dispersion or variability, while in section B they have a large dispersion.

A simple (or the simplest) measure of dispersion is the range of the data as we have defined before:

range = (largest   value) - (smallest   value).

We discuss three more of measures of dispersions.
Suppose we have a data set x1, x2, ... , xn of size n. We denote the mean of the data by x. Three definitions follow:

  1. Definition. The mean deviation of the data is defined as follows:

    mean deviation   =   (|x1 - x| +|x2 - x| + ... |xn - x|) /n


    The main deficiency of the mean deviation is that, it uses the absolute value function y=f(x)=|x|, which is a split function and the graph is not smooth at x=0.
  2. Definition. The variance s2 of the data is defined as follows:

    s2   =   ((x1 - x) 2 +(x2 - x) 2 + ... +(xn - x) 2 ) /(n-1)

    Note that we denote the sample variance as the square of s and also that we divide by n - 1, not n. For some reason dividing by n - 1 works better. A deficiency of the variance is that, it uses the square of the variable.

  3. Definition. The standard deviation s is defined as the squareroot of the sample variance s2. To compute sample standard deviation, we must compute the sample variance first.

    We will be working with the variance s2 and the standard deviation s.

Let us quickly do some computation with the above example 3.2.1:

The mean deviation for section A = (1+2+1+2+0)/5= 6/5 and the mean deviation for section B = (10+11+10+0+11)/5= 42/5. Because the variability of the section B was much higher the mean deviation was very high.

Let us compute the the sample variances:

For section A the sample variance =

[(81-82)2+(84-82)2+(83-82)2+ (80-82)2+(82-82)2] /[5-1]
=   [1+4+1+4+0] / 4 = 10/4 = 2.5 .

For section B the sample variance is =

[(72-82)2+(93-82)2+(92-82)2+ (82-82)2+(71-82)2] /[5-1]
=  [100+121+0+121+100]/ 4 = 442/ 4 = 110.5

Use of Calculators, (TI-83):
Computing the Variance and Standard Deviation:
  1. Do as in the case computing the mean (in either case of raw data or frequency table).
  2. The calculator gives a list of numbers. SX is the standard deviation.
  3. The variance is the square of the standard deviation.

Problems on 3.2: Variance and Standard Deviation

Exercise 3.2.1. The following is the price (in dollars) of a stock (say CISCO SYSTEMS, if you like) checked by a trader several times on a particular day.

138 142 127 137 148 130 142 133

Find the variance and standard deviation of the price.
Solution

Exercise 3.2.2. The following figures refer to GPA of six students:

3.0 3.3 3.1 3.0 3.1 3.1

Find the variance and standard deviation of GPA.

Exercise 3.2.3. The following data give the lifetime (in days) of certain light bulbs.

138 952 980 967 992 197 215 157

Find the variance and standard deviation of lifetime of these bulbs.
Solution

Exercise 3.2.4. An athlete ran an event 32 times. The following frequency table gives the time taken (in seconds) by the athlete to complete the events.

Time (in seconds) Frequency
26 3
27 6
28 5
29 6
30 9
31 3
Total 32

Compute the variance and standard deviation of time taken by the athlete.
Solution

Exercise 3.2.5. The following is the weight (in ounces), at birth, of 96 babies born in Lawrence Memorial Hospital in May 2000.

94 105 124 110 119 137 96 110 120 115 119
104 135 123 129 72 121 117 96 107 80 80
96 123 124 124 134 78 138 106 130 97 134
111 133 128 96 126 124 125 127 62 127 96
116 118 126 94 127 121 117 124 93 135 112
120 125 120 147 138 72 119 89 81 113 100
109 127 138 122 110 113 100 115 110 135 120
97 127 120 110 107 111 126 132 120 108 148
133 103 92 124 150 86 121 98      

Compute the variance and standard deviation of the weight, at birth, of these babies.
Solution

Exercise 3.2.6. Following is some data on the hourly wages (paid only in whole dollars) of 99 employees in some industry.

7 11 7 11 10 9 10 10 12 13
7 8 11 11 14 9 7 9 11 7
9 13 12 14 7 8 7 14 15 9
9 7 11 9 12 9 12 11 14 9
12 13 7 9 10 14 11 12 13 7
15 15 16 16 15 16 11 7 18 19
15 16 15 15 16 16 17 16 16 13
15 15 16 15 16 15 15 17 16 12
16 15 15 16 15 15 19 8 16 17
16 16 15 16 16 16 13 12 8  

Compute the variance and standard deviation of the hourly wages.
Solution

Exercise 3.2.7. The following is the frequency table on the number of typos in a sample of 30 books published by a publisher.

No. of Typos 156 158 159 160 162
Frequency 6 4 5 6 9

Find the mean number, variance and standard deviation of typos in a book.
Solution

Exercise 3.2.8. The following are the length (in inches), at birth, of 96 babies born in Lawrence Memorial Hospital in May 2000.

18 18.5 19 18.5 19 21 18 19 20 20.5
19 19 21.5 19.5 20 17 20 20 19 20.5
18 18.5 20 19.5 20.75 20 21 18 20.5 20
21 19 20.5 19 20 19.5 17.75 20 19.5 20
20.5 17 21 18.5 20 20 20 18.5 19.5 19
18 20.5 18 20 19 19 19.5 20 20.75 21
17.75 19 18 19 20 18.5 20 19 21 19
19.5 20 20 19 19.5 20 19.5 18.5 20.5 19.5
20.25 20 19.5 19.5 20 20 20 21 20 19
18.5 20.5 21.5 18 19.5 18        

Compute the variance and standard deviation of the length, at birth, of these babies.
Solution

Exercise 3.2.9. What does it mean when the variance or mean deviation of some data is zero? The answer is that all the data members are EQUAL!

Exercise 3.2.10. The following represents the frequency table of weight (in pounds) of some salmon in a river.

Variable
x
Frequency
f
31 3
32 2
33 4
34 5
35 6
36 5
37 9

Find the mean, variance, and the standard deviation.
Solution

Exercise 3.2.11. The following data represent time (in minutes) taken by students to drive to the campus.

23 17 19 24 42 33 20 22 15 9
26 37 29 19 35 18 30 21 11 23
13 27 32 32 23 35 25 33 24 23

Find the mean, variance, and the standard deviation of the data.
Solution

3.3 Use of the Frequency Table Go to top of page

When a frequency table is given to compute the mean and variance of the data we can give new formulas.

Formulas. Suppose the data consisting of n observations are given in a frequency table (ungrouped). Let xi denote the values and fi be the frequency of xi. Then

the variance =

s2 = (f1(x1 - x)2 + f2(x2 - x)2 + ... ) )/ (n-1)

We have already told you how to use the calculator to compute the variance and standard deviation when we have a frequency table.

back to topGo to top of page