MATH 105, Topics in Math- Lesson 2

Lesson 2 : Measures of Central Tendency and Dispersion

2.2 Measures of Central Tendency: Mean, Median, Mode

Due Date: See the Lecture Notes Site.

Introduction

In this lesson we define various numerical measures (constants) for data sets. These numerical measures summarize and describe the data. The average value of the data would be a common example. There are two broad classification of such numerical measures that are computed from the data:

measures of central tendency and
measures of dispersion.

A measure of central tendency represents an "average value." Mean, median, mode (if you already know these) are measures of central tendency. A measure of dispersion is a measure of how widely the data is scattered around.

2.1 Measure of Central Tendency: Mean

The most common measure of central tendencies is the mean or arithmetic mean.

Definition. The mean or the arithmetic mean of a set of data is given by

mean =

sum of all the data values

size of the data

If we denote a data value (i.e., the variable) by x and if n is the size of the data, then the above formula is written as

mean =

∑ x

where ∑ denotes summation. If the data represents a sample, then the mean is called the sample mean. Again, if x denotes the variable, the data is sometimes denoted by x₁,x₂, ... ,x_n and with such notations the formula for the mean is written as

If you have not seen the notation ∑ before, it simply means summation. For example,

When the frequency table of a data set is given, then we can use the frequency table to compute the mean of the original data. Let us consider the the following example:

Example 2.1.1 To estimate the mean time taken to complete a three-mile drive by a race car, the race car did several time trials. The following are sample times taken (in seconds) to complete the laps:

To compute the mean time of the original data, we obviously, add all the data values and divide by the data size 35. The frequency distribution tells us that, in the data, 46 was present 1 time, 47 was present 1 time, 48 was present 3, times and so on. So, using the frequency distribution, we compute the mean as follows :

More generally, when we compute the mean using the frequency table, the formula for the mean would be

Properties of the Mean

Example (effect of translation): Your teacher tells you that the mean score for the midterm in your class is 73. After you complained and requested a change, he agreed that all can add 7 points to their score. The new mean score is (old mean + 7) = 73 + 7 = 80. This is what we meant by "effect of translation."

Example (effect of multiplication by c): Suppose you have some data x₁, x₂, ..., x_n on salaries in an industry in the United States and the mean is $37000. On a certain day (March 21, 2011), 1 U.S. dollar = 0.976469 Canadian dollars (say c = 0.976469). So, in Canadian dollars the mean is 37000*c = 37000 x 0.976469=36129.35 Canadian Dollars.

Similarly, any change of units (inches to feet or cm, minutes to seconds) are "multiplication by a constant c."

2.2 Other Measures of Central Tendency: Median, and Mode

The median represents the middle value of the data. Half the data will be less than or equal to the median, and half the data will be greater than or equal to the median. You are above the median American income if at least half the American population is making less than you make.

Definition. Suppose the data is arranged in an increasing order (i.e., in an array). If the size of data is ODD then the median is the middle value. If it is EVEN, then the median is the mean of the middle two values.

Definition. For a number p between 0 to 100, the pth percentile x_p of the data is a number such that at least p percent of the data members are below x_p and at least (100 - p) percent of the data members are above x_p.

Definition. The MODE of the data is the value or values that have the highest frequency. For example, the mode of the set {1, 3, 5, 5, 7} is {5} because it has the highest frequency. The mode of {1, 1, 3, 5, 5, 7} is {1, 5} because 1 and 5 both have the highest frequency. Such a set is said to be bimodal.

Exercise 2.2.1. The following is the price (in dollars) of a stock (say, CISCO SYSTEMS) checked by a trader several times on a particular day.

Find the median price and mean price observed by the trader.
Solution: Use TI-84.

Exercise 2.2.4. An athlete ran an event 32 times. The following frequency table gives the time taken (in seconds) by the athlete to complete the events.

Exercise 2.2.5. Following is data on the weight (in ounces), at birth, of 96 babies born in Lawrence Memorial Hospital in May 2000.

Compute the mean and median weight, at birth, of the babies.
Solution: Use TI-84.

Exercise 2.2.6. Following is data on the hourly wages (paid only in whole dollars) of 99 employees in an industry.

Exercise 2.2.7. Following is the frequency table on the number of typos in a sample of 30 books published by a publisher.

Exercise 2.2.8. Following is data on the length (in inches), at birth, of 96 babies born in Lawrence Memorial Hospital in May 2000.

Compute the mean and median length, at birth, of these babies.
Solution: Use TI-84.

2.3 Measures of Dispersion

Example 2.3.1. Suppose two sections of the statistics class have the following percentage score distribution at the end of the semester:

Both these sections have the same mean—82. Medians of both the sets are same —82.
But the data sets are differently dispersed. In Section A, everybody will get a B grade. In section B, we will have two C's, one B and two A's.

The measure of dispersion is a measure of how widely the data is scattered around. In section A, the data has a very small dispersion or variability, whereas section B has a large dispersion.

Suppose we have a data set x₁, x₂, ... , x_n of size n. We will denote the mean of the data by x. Three definitions follow:

So, the mean deviation is the mean of the absolute deviations | x_i -x | from the mean.

Definition. The sample standard deviation is defined as the square root of the sample variance s². So, to compute the sample standard deviation, we have to compute the sample variance first.

If we simplify the definition of sample variance we get the following formula:

s² =[ (x₁² + x₂² + ... + x_n²) - nx²]/(n - 1)

The mean deviation for section A = (1+2+1+2+0)/5= 6/5 and the mean deviation for section B = (10+11+10+0+11)/5= 42/5. Since the variability of section B was clearly higher, the mean deviation is also so.

Question: What does it mean when the variance or mean deviation of some data is zero? The answer is that all the data members are EQUAL!

When a frequency table is given, we can use new formulas to compute the mean and variance of the data.

Formulas. Suppose the data consisting of n observations is given in a frequency table (ungrouped). Let x_i denote the values and f_i be the frequency of x_i. Then

Example 2.3.2. The following table extends the frequency table of the time taken to complete a lap by a race car (example 2.1.1) to compute mean and variance using the above formulas.

So, the mean x = 1799/35 =51.4 and variance
s² = (92673 - 35x 51.4²)/(35-1) = 6.0118.

Remark: When computing power was not in abundance (only 20 or 30 years ago), as it is now, we used to compute mean and variance using such tables to do the computations. Such methods are out of date by now. We use TI-84 or other tools now.

Example 2.3.3. Following is the class frequency distribution of the data on birth weight of some babies (exercise 1.2, Lesson 1):

We can use the above formula to compute (approximate) variance and the standard deviation of the birth weight.

Comment: We have had detailed discussions of various formulas for defining the mean, variance, and other constants. It is important to understand these concepts and formulas.

It is equally important to appreciate the value and necessity of using calculators or other available software (like Excel). It is almost impossible (and unnecessary) to compute these constants manually and correctly, unless one is specially gifted with numerical computations.

Exercise 2.3.1. The following is the price (in dollars) of a stock (say, CISCO SYSTEMS) checked by a trader several times on a particular day.

Exercise 2.3.3. The following data give the lifetime (in days) of certain light bulbs.

Find the variance and standard deviation of the lifetime of these bulbs.
Solution: Use TI-84.

Exercise 2.3.4. An athlete ran an event 32 times. The following frequency table gives the time taken (in seconds) by the athlete to complete the events.

Compute the variance and standard deviation of time taken by the athlete.
Solution: Use TI-84.
Also see Solution to understand the use of formula.

Exercise 2.3.5. Following is data on the weight (in ounces), at birth, of 96 babies born in Lawrence Memorial Hospital in May 2000.

Compute the variance and standard deviation of the weight, at birth, of these babies.
Solution: Use TI-84.
Also see Solution to understand the use of formula.

Exercise 2.3.6. Following is data on the hourly wages (paid only in whole dollars) of 99 employees in an industry.

Compute the variance and standard deviation of the hourly wages.
Solution: Use TI-84.

Exercise 2.3.7. Following is the frequency table on the number of typos in a sample of 30 books published by a publisher.

Find the mean number, variance, and standard deviation of typos in a book.
Solution: Use TI-84.

Exercise 2.3.8. Following is data on the length (in inches), at birth, of 96 babies born in Lawrence Memorial Hospital in May 2000.

Compute the variance and standard deviation of the length, at birth, of these babies.
Solution: Use TI-84.

Exercise 2.3.9. The following is the frequency table of weight (in pounds) of some salmon in a river. Find the variance and standard deviation.

Find the variance and the standard deviation.
Solution: Use TI-84.
Also see Solution

Exercise 2.3.10. The following data represents the time (in minutes) taken by students to drive to campus.