Satyagopal Mandal |
Department of Mathematics |
Office: 624 Snow Hall Phone: 785-864-5180 |
We will have to deal with two types of data, namely, the sample data and the population data. Most often, for us data will be a long list of numbers. Our job here is to make inferences or predictions about the population data on the basis of the sample data.
The branch of statistics that deals with organization, representation and summarizing (huge set of) data in understandable ways is called descriptive statistics. The other part that deals with making inferences about the population is called the inferential statistics. In this chapter we study descriptive statistics.
Descriptive Statistics
There are two ways of summarizing or representing data, namely, the Graphical (or pictorial) representation and the numerical representation. Before we proceed, let us fix some definitions. Each individual piece of data is called a data value or data point or data member. A list of data values is called a data set (which could be the whole sample data or the whole population data).
Definition: A variable is a characteristic of the population members that varies with each member of the population.
Example: Suppose we are studying the KU student population.
X(Donald Smith) = 3.25 | X(Sam Donaldson)=3.11 |
X(Karen Currie)=3.89 | X(King Who)=2.13 |
Y(Donald Smith) = Male | Y(Sam Donaldson) =Male |
Y(Karen Currie) =Female | Y(King Who)=Male |
Z(Donald Smith) = 190 | Z(Sam Donaldson) =166 |
Z(Karen Currie) =130 | Z(King Who)=187 |
T(Donald Smith) = 33 | T(Sam Donaldson) = 102 |
T(Karen Currie) = 72 | T(King Who)=89 |
Similarly, height, annual income, annual expenditure of KU students are variable.
Types of Variables:
Graphical Representation
The Pictorial representation of data is found everyday in newspapers and TV news. The most common among them are pie charts, bar graphs and histograms. We are going to discuss these three ways of pictorial representation of data.
Frequency Table and Bar Graph of discrete data
We will explain this with the help of the following example. (You may like to have a look at example 1, page 454, in the reference textbook. I will pick a different problem.)
Example (see page 475 of the reference textbook): Following are
scores in a test that consisted of 10 questions worth 10 points each.
Data Set 1:Test Scores of 24 students:
50 | 80 | 70 | 50 | 70 | 60 | 70 | 70 |
80 | 70 | 60 | 100 | 60 | 80 | 60 | 80 |
70 | 80 | 100 | 60 | 10 | 60 | 50 | 60 |
For example, the data member 10 was present only once in the data, so the frequency of 10 is one. The data member 60 is present 7 times, so the frequency of 60 is seven. On the other hand, 90 was not present in the data, so the frequency of 90 is zero.
Definition 2: The frequency table of a data set is a table that gives values of the variable and the corresponding frequencies. (If a value has zero frequency then we may or may not mention it in the table.) The following is the frequency table of the above data set.
The Frequency Table of Data Set 1:
Scores | 10 | 50 | 60 | 70 | 80 | 90 | 100 |
Frequency | 1 | 3 | 7 | 6 | 5 | 0 | 2 |
The following is the bar graph of the above data:
Definition 3: The bar graph is a pictorial representation of the data set constructed as follows:
Remarks:
Use of Calculators:
It is not easy to construct a frequency table of a data set,
unless you are systematic enough. Traditionally, we used "tally mark"
to count the frequency. Now you can use some software programs
(e.g. Excel). Let me give you a method, using calculator (TI-83).
|
Suggested Problems: Example 1 (pp. 456), Ex.1 (pp. 475), Ex 11 (pp. 477)
Class Intervals and Bar Graphs
Some time, it is impossible and unreasonable to try to erect one bar above each data value. This is particularly so if the number of data values present in the data is too large (say more than 20). In this case, we divide the whole range of data into class intervals. The class frequency of a class interval is the number of data values that fall in the class interval. The table that gives the class intervals and corresponding class frequencies is called class frequency table. We can construct a bar graph by erecting bars above the class intervals, of height equal to the corresponding class frequencies. Normally we take 5 to 20 class intervals.
Example: Following is the class frequency table of
hourly wages (in nearest dollar) of 30 workers. Construct
a bar graph.
Class | 8-9 | 10-11 | 12-13 | 14-15 | 16-17 | 18-19 |
Frequency | 5 | 5 | 8 | 5 | 2 | 5 |
The following is the bar graph of the above data:
Suggested Problems: Example 5 (pp. 480).
Qualitative Data and Bar Graphs
If we have a set of qualitative data (say blood groups of a group of people) then for each variable value the class frequency is the number of times the value is present. (So, the frequency of blood group "A" is the number of people in the group with blood group "A"). The frequency table of such a data set is given by listing the variable values and the corresponding frequencies. The following is an example.
Example: The following is a the data on the blood group of
36 patients in a hospital:
O | A | B | O | A | A | A | O | O |
O | A | O | A | B | O | O | O | AB |
B | A | A | O | O | A | A | O | AB |
O | A | A | B | A | O | A | O | O |
The variable blood group can assume 4 values O, A, B, AB.
The frequency table of this data is given as follows:
Blood Group | O | A | B | AB |
Frequency | 16 | 14 | 4 | 2 |
Following is the bar gragh of this qualitative data:
I used Excel to sketch these bar graphs. You could experiment with any kind of software programs or calculators.
If the data type is continuous then the "bar graph like" representation of data is called histogram. Since the variable is continuous and there is no "gap" between them, there will not be any gap between the bars. Given a set of data, histograms are constructed as follows:
Class Interval above-below (in ounces) | Frequency |
48-60 | 15 |
60 - 72 | 24 |
72 - 84 | 41 |
84 - 96 | 67 |
96 - 108 | 119 |
108 - 120 | 184 |
120 -132 | 142 |
132 - 144 | 26 |
144 - 156 | 5 |
156 - 168 | 2 |
Suggested Problems: Example 7 (pp. 461), Ex. 19-22 (pp. 478)
Pie Charts
We see pie charts almost everyday. Latest I saw them in last
March-April, in the instruction booklet for IRS Form1040.
Pie Charts are self explanatory, we will not go further into them.
As a matter of fact there are many ways to pictorially represent
data and they are so natural that we need not go through
all of them. Because of the availability of many software
programs, people are using all kinds of innovative and
colorful pictorial representations of data.
You could also experiment with them (e. g. use Excel).
Two Pie Charts reproduced from IRS Booklets | ||
![]() Picture 10 |
![]() Picture 11 |