Math 105, Topics in Mathematics

Lesson 2: Representation of Data

In this lesson I deal with the representation of data in both graphic and tabular form.

We discuss two types of data, namely, the sample data and the population data. Most often, for us data are a long list of numbers. Our job here is to make inferences or predictions about the population data on the basis of the sample data.

There are two branches of statistics. The branch of statistics that deals with the organization, representation, and summarizing of a large set of data in understandable ways is called descriptive statistics. The part of statistics that deals with making inferences about the population is called inferential statistics. In this chapter we study descriptive statistics.

2.1 Descriptive Statistics: DefinitionsGo to top of page

There are two ways of summarizing or representing data, namely, the graphic (or pictorial) representation and the numerical representation. Before we proceed, let us introduce some definitions. Each individual piece of data is called a data value or data point or data member. A list of data values is called a data set, which could be either just the sample data or the whole population data. Population data is always unknown and discussion is mostly theoratical or mathematical. On the other hand, sample data is actual known data, mostly a collection of numbers.

Definition. A variable is a rule or a formula or a mechanism that associates a value to each member of the population. Given a member ω, a variable X assigns a value X(ω) to ω. For us X(ω) is a characteristic (like height, weight, time, salary) of the population.

Give a Diagram here?

Example 2.1. Suppose we are studying the KU student population.

  1. If GPA is the "characteristic" that we are studying, then X= the GPA of students is a variable. So, given a student, X has a value. For example,
    X(Donald Smith) = 3.25 X(Sam Donaldson)=3.11
    X(Karen Currie)=3.89 X(King Who)=2.13

  2. However, if gender is the "characteristic" that we are studying, then Y= gender of students is a variable. For example,
    Y(Donald Smith) = Male Y(Sam Donaldson) =Male
    Y(Karen Currie) =Female Y(King Who)=Male

  3. If weight (in pounds) is the "characteristic" that we are studying, then Z= weight of students is a variable. For example,
    Z(Donald Smith) = 190 Z(Sam Donaldson) =166
    Z(Karen Currie) =130 Z(King Who)=187

  4. If the number of credit hours course completed is the "characteristic" that we are studying, then T= the number of credit hours course completed by students is a variable. For example,
    T(Donald Smith) = 33 T(Sam Donaldson) = 102
    T(Karen Currie) = 72 T(King Who)=89

    Similarly, height, annual income, and annual expenditure of KU students are variables.

Types of Variables:

  1. A variable that assumes numerical values is called a quantitative (or numerical) variable. There are two types of quantitative variables:
    1. A quantitative variable is called a discrete variable if there is a "gap" between the values that the variable can assume. In particular, if a variable can assume only a finite number of values, then it is a discrete variable. T= the number of credit hours completed is a discrete variable.
    2. If the difference between the values that a quantitative variable can assume can be arbitrarily small, then the variable is called a continuous variable. For example, the weight and height characteristics above are continuous variables.
    3. As a rule, weight, length, volume, and time are continuous variables. A discrete variable is often a number of something.
  2. A variable that assumes nonnumerical values is called a qualitative variable. For example, gender, blood group, and hair color are examples of quantitative variables.

Remark. We also define discrete, continuous, and qualitative data in a similar way. That means, the data are called discrete, continuous, or qualitative if the corresponding variable is so.

2.2 Graphical Representation Go to top of page

The pictorial representation of data is found every day in newspapers and TV news. The most common are pie charts, bar graphs, and histograms. We are going to discuss these three methods of pictorial representation of data.

Frequency Table and Bar Graph of Discrete Data

We explain the frequency table and bar graph with the help of the following example.

Example 2.2. Following are scores in a test that consisted of 10 questions worth 10 points each.

Data Set 1: Test Scores of 24 students:

50 80 70 50 70 60 70 70
80 70 60 100 60 80 60 80
70 80 100 60 10 60 50 60

Definition 1: The frequency of a data member is the number of times the data member is present in the data.

For example, the data member 10 was present only once in the data, so the frequency of 10 is one. The data member 60 is present seven times, so the frequency of 60 is seven. However, 90 was not present in the data, so the frequency of 90 is zero.

Definition 2: The frequency table of a data set is a table that gives values of the variable and the corresponding frequencies. (If a value has zero frequency, then we may or may not mention it in the table.) The following is the frequency table of the above data set.

The Frequency Table of Data Set 1:

Scores 10 50 60 70 80 90 100
Frequency 1 3 7 6 5 0 2

The following is the bar graph of the above data:

Bar chart showing the scores and frequency listed above.

Definition 3: The bar graph is a pictorial representation of the data set constructed as follows:

  1. The variable values are listed in an increasing order on the horizontal axis.
  2. The frequency of each value is displayed by erecting a column, above the value, of height equal to the frequency of the value.

Definition 4: The relative frequency bar graph is similar to standard bar graphs. It is constructed by erecting columns, above the data values, of height equal to the relative frequency of the value. Here relative frequency is given by

Relative frequency = (frequency)/(Data Size).

Also

Percentage frequency = 100(frequency)/(Data Size).

Remarks:

  1. The bar graphs are also known as bar charts. Many software programs (e.g., Excel) can be used to construct bar graphs.
  2. There should be a gap between the bars in the graph.

 

Remark : Problems are at the end of the lesson.

2.3 Class Intervals and Bar Graphs Go to top of page

Sometimes, it is impossible and unreasonable to try to erect one bar above each data value. This is particularly so if the number of data values present in the data is too large (say more than 20). In this case, we divide the whole range of data into class intervals. The class frequency of a class interval is the number of data values that falls into the class interval. The table that gives the class intervals and corresponding class frequencies is called class frequency table. We can construct a bar graph by erecting bars above the class intervals, of height equal to the corresponding class frequencies. Normally we take 5 to 20 class intervals.

Example 2.3. Following is the class frequency table of hourly wages (in nearest dollar) of 30 workers. Construct a bar graph.

Class 8-9 10-11 12-13 14-15 16-17 18-19
Frequency 5 5 8 5 2 5

The following is the bar graph of the above data:

Bar graph showing the class and frequency data listed above.

2.4 Qualitative Data and Bar GraphsGo to top of page

If we have a set of qualitative data (say blood groups of a group of people), then for each variable value the class frequency is the number of times the value is present. (The frequency of blood group "A" is the number of people in the group with blood group "A.") The frequency table of such a data set is given by listing the variable values and the corresponding frequencies. See the following example.

Example 2.4. (Khazanie, p. 18) The following is the data on the blood group of 36 patients in a hospital:

O A B O A A A O O
O A O A B O O O AB
B A A O O A A O AB
O A A B A O A O O

The variable blood group can assume 4 values--O, A, B, AB. The frequency table of this data is:

Blood Group O A B AB
Frequency 16 14 4 2

This is the bar graph of this qualitative data:

Bar graph showing blood group and frequency data listed above.

I used Excel to sketch these bar graphs. You could experiment with any kind of software programs or calculators.

2.5 Continuous Variables and Histograms

If the data type is continuous, then the "bar graph-like" representation of data is called a histogram. Because the variable is continuous and there is no "gap" between variables, there will not be any gap between the bars. Given a set of data, histograms are constructed as follows:

  1. Decide the number n of class intervals (or bars) you want to have. Again this will be between 5 and 20.
  2. Pick a number L slightly smaller than the lowest data value, and pick a number U slightly bigger than the largest data value.
  3. Now divide the range from L to U into the number of class intervals, n that you decided in 1. (In this class, we will only have class intervals of equal length.)
  4. Make a reasonable decision about which class (right or left) you want to contain the data values that fall on the boundary between two class intervals.
  5. Construct the frequency table, as before. (You can use your calculator or any software.) Now you complete the histogram by erecting a bar of appropriate height above the class intervals.

Example 2.5. The following table gives the frequency of the weight (in ounces) of certain number of babies in some city.

Class Interval above-below
(in ounces)
Frequency
48-60 15
60 - 72 24
72 - 84 41
84 - 96 67
96 - 108 119
108 - 120 184
120 -132 142
132 - 144 26
144 - 156 5
156 - 168 2

The following is the histogram of this data:

A histogram of the class interval and frequency data listed above.

Following is the bar graph of the same data:

Bar chart showing the class interval and frequency data listed above.

Pie Charts

We see pie charts almost every day. Last spring I saw them in the instruction booklet for IRS Form1040. Pie charts are self-explanatory; we will not discuss them further. In fact, there are many ways to pictorially represent data, and they are so natural that we need not consider all of them. Because of the availability of many software programs, people are using all kinds of innovative and colorful pictorial representations of data. You could experiment with any number of types of representations (e.g., use Excel).

Two Pie Charts reproduced from IRS Booklets
An example of a pie chart showing outlays.   An example of a pie chart showng income.

Use of Calculators:

We use calculators (TI-83) extensively in this course. First, I will explain how you enter data in TI-83.

Use of Calculators (TI-83):
Enter Your Data:
  1. Press the button stat.
  2. Select "Edit" in the Edit menu and enter.
  3. You will find six lists named L1, L2, L3, L4, L5, L6.
  4. Say you want to enter your data in L1.
  5. If L1 has some data, then you clear it by pressing the stat button and selecting ClrList in the Edit menu.
  6. Once L1 is cleared, you select Edit in Edit menu and enter.
  7. Now you type in your data and enter one by one.

It is not easy to construct a frequency table of a data set; you must be very systematic. Traditionally, we used "tally marks" to count the frequency. Now you can use some software programs (e.g., Excel). Let me give you a method that uses the calculator (TI-83).

  1. Press "stat."
  2. To input data, enter "edit."
  3. Enter your data (say in L1).
  4. Press "stat."
  5. Enter "sortA" L1.
  6. Press "stat" and then enter "edit." Now on L1 you see the data is sorted in an increasing order.
  7. Now you can count the frequencies.

Problems on Lesson 2

Exercise 2.1. To estimate the mean time taken to complete a three-mile drive by a racecar, the racecar did several time trials and the following sample of times taken (in seconds) to complete the laps was collected.

50 48 49 46 54 53 52 51 47 56 52 51
51 53 50 49 48 54 53 51 52 54 54 53
55 48 51 50 52 49 51 53 55 54 50  

The following is the frequency distribution of this ungrouped data:

Time
(in seconds)
Frequency Relative
Frequency
Percentage
Frequency
46 1 1/35 2.86
47 1 1/35 2.86
48 3 3/35 8.57
49 3 3/35 8.57
50 4 4/35 11.43
51 6 6/35 17.14
52 4 4/35 11.43
53 5 5/35 14.29
54 5 5/35 14.29
55 2 2/35 5.71
56 1 1/35 2.86
Total 35 1 100.00

Construct a bar chart.

Exercise 2.2. The following is the weight (in ounces), at birth, of 96 babies born in Lawrence Memorial Hospital in May 2000.

94 105 124 110 119 137 96 110 120 115 119
104 135 123 129 72 121 117 96 107 80 80
96 123 124 124 134 78 138 106 130 97 134
111 133 128 96 126 124 125 127 62 127 96
116 118 126 94 127 121 117 124 93 135 112
120 125 120 147 138 72 119 89 81 113 100
109 127 138 122 110 113 100 115 110 135 120
97 127 120 110 107 111 126 132 120 108 148
133 103 92 124 150 86 121 98      

Construct a class frequency table of this data by dividing the the whole range of data in to class intervals:

[60.5-70.5], [70.5-80.5], [80.5-90.5], [90.5-100.5], [100.5-110.5],
[110.5 - 120.5], [120.5-130.5], [130.5-140.5], [140.5-150.5], [150.5-160.5]

Solution

Exercise 2.3. The following are the length (in inches), at birth, of 96 babies born in Lawrence Memorial Hospital in May 2000.

18 18.5 19 18.5 19 21 18 19 20 20.5
19 19 21.5 19.5 20 17 20 20 19 20.5
18 18.5 20 19.5 20.75 20 21 18 20.5 20
21 19 20.5 19 20 19.5 17.75 20 19.5 20
20.5 17 21 18.5 20 20 20 18.5 19.5 19
18 20.5 18 20 19 19 19.5 20 20.75 21
17.75 19 18 19 20 18.5 20 19 21 19
19.5 20 20 19 19.5 20 19.5 18.5 20.5 19.5
20.25 20 19.5 19.5 20 20 20 21 20 19
18.5 20.5 21.5 18 19.5 18        

Construct a frequency table for this data by dividing the whole range in to class intervals :

[16-17], [17-18], [18-19], [19-20], [20-21], [21-22].

Important decision: If a data member falls on the boundary, we count it in the right/upper class-interval. Solution

Exercise 2.4. The following data represents the number of typos in a sample of 30 books published by some publisher.

156 159 162 160 156 162
159 160 156 156 160 162
156 159 162 156 162 158
160 158 159 162 158 158
162 160 159 162 162 160

Construct a frequency table (by sorting in your calculator). Also construct a bar chart.
Solution

Exercise 2.5. Following is data on the hourly wages (paid only in whole dollars) in some industry.

9 11 8 9 10 11 7 10 12 13
7 11 8 11 14 9 10 9 11 7
13 13 14 12 9 8 12 14 15 9
9 7 12 7 12 7 7 11 13 9
11 9 9 9 10 14 11 12 14 7

Construct a frequency table (by sorting in your calculator). Also construct a bar chart.
Solution

Exercise 2.6. Following is data on the hourly wages (paid only in whole dollars) of 99 employees in some industry.

7 11 7 11 10 9 10 10 12 13
7 8 11 11 14 9 7 9 11 7
9 13 12 14 7 8 7 14 15 9
9 7 11 9 12 9 12 11 14 9
12 13 7 9 10 14 11 12 13 7
15 15 16 16 15 16 11 7 18 19
15 16 15 15 16 16 17 16 16 13
15 15 16 15 16 15 15 17 16 12
16 15 15 16 15 15 19 8 16 17
16 16 15 16 16 16 13 12 8  

Construct a frequency table (by sorting in your calculator).

back to top