Satyagopal Mandal |
Department of Mathematics |
Office: 624 Snow Hall Phone: 785-864-5180 |
Chapter 14: Numerical Summaries of Data
After having done some pictorial representations of data, now we will talk about numerical representations of data. In numerical representation of data we compute some numerical constants (or summaries, as in your textbook) from the data that gives us some idea about the data. Examples of such constants that you might have heard of already are average (also called mean), median, percentiles, quartiles, range.
There are two types of numerical summaries of data. A constant (computed from the data) that tells us where the data values fall (or about the location of the data) are called the measure of location. The mean (average) and median are measures of location. The constants (computed from the data) that give us an idea about how widely the data is spread are called measures of spread.
Measure of Location
The average is the most familiar measure of location. The average is also known as mean.
Definition 1: The average (or mean) of a set of n numbers is sum of all these numbers divided by N.
Average =(Data Sum)/(Data size) = (Data Sum)/N
Example: Suppose that the hourly wages of 8 employees in a small restaurant are 10.5, 7.5, 8, 9, 11, 12, 13.5, 8.5 dollars. So the average hourly wages of these 8 employees is
Average =(Data Sum)/(Data size)
= (10.5+7.5+8+9+11+12+13.5+8.5)/8 =10.
Use your Calculator: You already know how to enter data. Now you select calc from the stat menu. Now select 1-var stat from the menu and enter. The cursor will be blinking and you write the list number (say L1) and enter. It will give you a list of numbers and
"x-bar" is the average of the data.
Computing Average from frequency tables:
When data has already been summarized in a frequency table then it is easier to use the table to compute the average of the original data. Let us consider the following example.
Example: Following is the frequency table of some data on the hourly wages (paid only in whole dollars).
Hrly. Wages |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
Frequency |
4 |
4 |
7 |
11 |
10 |
6 |
4 |
3 |
1 |
From this frequency table we know that 4 employees have hourly wage $7, 4 have hourly wage $8 and so on. So, to find the average hourly wage of this group we have add 7 four times, 8 four times, 9 seven times and add all of them. The we have to divide the total of all these numbers by the total number of workers which is
N=4+4+7+11+10+6+4+3+1=50.
So, the
Average = Total/N =
(7 x4 +8x4 + 9 x 7 + 10 x 11 + 11 x 10 + 12 x 6 + 13 x 4 + 14 x 3 +15x1)/50
= 10.32.
More generally, suppose that the following is a frequency table of some data:
Variable |
s1 |
s2 |
s3 |
…….. |
sk |
Frequency |
f1 |
f2 |
f3 |
…….. |
fk |
Then the average of the original data is computed as follows:
Total = (s
1 x f1)+(s2 x f2)+ …… +(sk x fk)
2) Step 2. Compute the data size N:
N=f
1+f2+……+fk
Average= Total/N.
Use Your Calculator: While computing the average for a data in frequency table, you enter the variable values in one list (say L1) and frequency in a different list (say L2). Now select 1-var stat and enter. The cursor will be blinking and you enter L1, L2, and enter.
Suggested Problems
: Examples 8-11 (pp. 463-); Ex. 25a, 26a, 27a. Also compute the average of the data I gave you in the last class (see Chapter 14B in my web-site).
The Median
Definition: The median of a set of data represents the middle value of the data. So median is a number "me" so that half the data values are less than or equal to "me" and half the data values are bigger than or equal to "me".
To compute the median, first sort your data. Now, there will be two cases:
Median is a very meaningful measure of location. For example, if you knew last years median US annual income, then you would know whether you did better than half the Americans or not?
Example: The scores of 8 students in a test at 17, 13, 12, 18, 15, 16, 22, 22. Find the median.
The sorted data here is 12, 13, 15, 16, 17,18, 22, 22.
Here we have even number N=8 scores. The middle two values are 16,17. So the median score is (16+17)/2=16.5.
Example: The hourly wages of 7 employees are 10, 7, 12, 11, 12, 13, 8 (in dollars). Find the median wage.
The sorted data here is 7, 8, 10,11, 12, 12, 13.
Here we have even number N=8 scores. The middle two values are 16,17. So the median score is (16+17)/2=16.5. The middle value is the 4th value 11. So median wage is $11.00.
Suggested Problems
: Examples 12, 13, 14 (pp.465-); Ex. 25b, 26b, 27b. Also compute the median of the data I gave you on Friday (see Chapter 14B in my web-site).
The Quartiles
As median divides the whole data sets into two parts, the three quartiles divide the whole data set into 4 parts.
Definitions:
1) The
first quartile (Q1) is a number so that one-quarter of the data values are smaller or equal to it and three-quarter of the data values are bigger or equal to it.2) The
second quartile (Q2) is, in fact, the median. One-half of the data values are less than or equal to it and one-half of the data values are bigger or equal to it.
Use your Calculator: To find the three quartiles, first sort the data. Now find the median, which is the second quartile (Q2). Now divide the data into the lower and upper half, (include the median with both upper and lower half, if the data size is odd). The median of the lower half is the first quartile (Q1) and the median of the upper half is the third quartile (Q3).
The five-number summary:
Definition: The lowest value (min), the highest value (max) and the three quartiles form the five-number summary of the data.
Suggested Problems: Example 15-17 (pp. 469-); Ex. 25c-d, 26c-d, 28a-b. Also find the 5-number summary of the data I gave you on Friday (see Chapter14B in my web-site).
Measure of Spread
A measure of spread gives an idea how widely the data is spread out. We will talk about three such measure of spread - range, interquartile range and the standard deviation.
Definition 1: The range of the data is the difference between the highest value and the lowest value of the data.
Range = max - min.
Definition 2: The interquartile rang (IQR) is the difference between third and first quartile.
Interquartile range = IQR= Q3-Q1.
The standard diviation is a little more involves. It is the square root of the variance.
Definition 3: The
variance is the average of the square of the differences between the data values and the average of the data. We compute it in the following steps:Step 1: Compute the average, call it A.
Step 2: For each data value x, compute the difference x-A.
Step 3: Find the squares (x-A)2.
Step 4: Find the average of the squares in Step 4. This is the variance.
Step 5: The
Standard deviation is the square root of the variance.
We can do it in the following tabular form:
Data Value |
frequency |
product |
Step 2 |
Step 3 |
PRODUCT |
X |
f |
xf |
x-A |
(x-A)2 |
f(x-A)2 |
Don’t add |
Sum=N |
Data sum |
dont |
dont |
Sum of squares |
Average = A= (Data sum)/N
Variance =(Sum of Squares)/N
Standard deviation = square root of variance.
Use Your Calculator: Look at how you computed the average value, using calculator. The s x is the standard deviation.
Suggested Problems
: Example 20-21 (pp.472-), Ex.29 (only standard deviation). Also compute the standard deviation of the data I gave on Friday (see Chapter 14B on web-site).
Example 1: (Without using Calculator) Sixty-eight samples of milk were examined for the percentage of milk fat. The first two columns gives the frequency table.
Percentage of milk fat |
frequency |
product |
Step 2 |
Step 3 |
PRODUCT |
X |
f |
xf |
x-A |
(x-A)2 |
f(x-A)2 |
3.5 |
2 |
7.0 |
-.154 |
.0237 |
.0474 |
3.55 |
7 |
24.85 |
-.104 |
.0108 |
.0756 |
3.60 |
14 |
50.4 |
-.054 |
.0029 |
.0406 |
3.65 |
16 |
58.4 |
-.004 |
.0000 |
.0000 |
3.70 |
19 |
70.3 |
.046 |
.0021 |
.0399 |
3.75 |
10 |
37.5 |
.096 |
.0092 |
.092 |
Don’t add |
Sum=N=68 |
Data sum=248.45 |
dont |
dont |
Sum of squares=.2955 |
Average = A= (Data sum)/N =248.45/68=3.654
Variance =(Sum of Squares)/N =.2955/68 = .00434
Standard deviation = square root of variance =.0659
Exercise: (Do not use Calculator) A quality control engineer is interested in determining whether a machine is properly adjusted to dispense 1 pound of sugar. The engineer collected a sample of 30 packets and the following is the frequency table (in the first two columns) of the weight. Complete the table and compute the average, variance and the standard deviation.
Weight |
frequency |
product |
Step 2 |
Step 3 |
PRODUCT |
X |
f |
xf |
x-A |
(x-A)2 |
f(x-A)2 |
15.6 |
3 |
||||
15.7 |
5 |
||||
15.9 |
7 |
||||
16.0 |
6 |
||||
16.2 |
5 |
||||
16.3 |
4 |
||||
Don’t add |
Sum=N= |
Data sum= |
Don’t |
dont |
Sum of squares= |
Average = A= (Data sum)/N =
Variance =(Sum of Squares)/N
Standard deviation = square root of variance =
Exercise: (Do not use Calculator) The following is the data on hourly wages of 30 workers in an industry. Complete the table and compute the average hourly wage and the disparity (i. e. standard deviation) in hourly wages.
Wages (dollar) |
frequency |
product |
Step 2 |
Step 3 |
PRODUCT |
X |
f |
xf |
x-A |
(x-A)2 |
f(x-A)2 |
7 |
2 |
||||
8 |
4 |
||||
9 |
7 |
||||
10 |
8 |
||||
11 |
5 |
||||
12 |
3 |
||||
13 |
1 |
||||
Don’t add |
Sum=N= |
Data sum= |
Don’t |
dont |
Sum of squares= |
Average = A= (Data sum)/N =
Variance =(Sum of Squares)/N
Standard deviation = square root of variance =