Math 105, Topics in Mathematics |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Lesson 2: Representation of DataIn this lesson I deal with the representation of data in both graphic and tabular form. We discuss two types of data, namely, the sample data and the population data. Most often, for us data are a long list of numbers. Our job here is to make inferences or predictions about the population data on the basis of the sample data. There are two branches of statistics. The branch of statistics that deals with the organization, representation, and summarizing of a large set of data in understandable ways is called descriptive statistics. The part of statistics that deals with making inferences about the population is called inferential statistics. In this chapter we study descriptive statistics. 2.1 Descriptive Statistics: DefinitionsThere are two ways of summarizing or representing data, namely, the graphic (or pictorial) representation and the numerical representation. Before we proceed, let us introduce some definitions. Each individual piece of data is called a data value or data point or data member. A list of data values is called a data set, which could be either just the sample data or the whole population data. Population data is always unknown and discussion is mostly theoratical or mathematical. On the other hand, sample data is actual known data, mostly a collection of numbers. Definition. A variable
is a rule or a formula or a mechanism that associates a value to each
member of the population. Given a member ω, a variable X
assigns a value X(ω)
to ω. For us X(ω)
is a characteristic (like height, weight, time, salary) of the population. Example 2.1. Suppose we are studying the KU student population.
Types of Variables:
Remark. We also define discrete, continuous, and qualitative data in a similar way. That means, the data are called discrete, continuous, or qualitative if the corresponding variable is so. 2.2 Graphical RepresentationThe pictorial representation of data is found every day in newspapers and TV news. The most common are pie charts, bar graphs, and histograms. We are going to discuss these three methods of pictorial representation of data. Frequency Table and Bar Graph of Discrete DataWe explain the frequency table and bar graph with the help of the following example. Example 2.2. Following are scores in a test that consisted of 10 questions worth 10 points each. Data Set 1: Test Scores of 24 students:
Definition 1: The frequency of a data member is the number of times the data member is present in the data. For example, the data member 10 was present only once in the data, so the frequency of 10 is one. The data member 60 is present seven times, so the frequency of 60 is seven. However, 90 was not present in the data, so the frequency of 90 is zero. Definition 2: The frequency table of a data set is a table that gives values of the variable and the corresponding frequencies. (If a value has zero frequency, then we may or may not mention it in the table.) The following is the frequency table of the above data set. The Frequency Table of Data Set 1:
The following is the bar graph of the above data:
Definition 3: The bar graph is a pictorial representation of the data set constructed as follows:
Definition 4: The relative frequency bar graph is similar to standard bar graphs. It is constructed by erecting columns, above the data values, of height equal to the relative frequency of the value. Here relative frequency is given by Relative frequency = (frequency)/(Data Size). Also Percentage frequency = 100(frequency)/(Data Size). Remarks:
Remark : Problems are at the end of the lesson. 2.3 Class Intervals and Bar GraphsSometimes, it is impossible and unreasonable to try to erect one bar above each data value. This is particularly so if the number of data values present in the data is too large (say more than 20). In this case, we divide the whole range of data into class intervals. The class frequency of a class interval is the number of data values that falls into the class interval. The table that gives the class intervals and corresponding class frequencies is called class frequency table. We can construct a bar graph by erecting bars above the class intervals, of height equal to the corresponding class frequencies. Normally we take 5 to 20 class intervals. Example 2.3. Following is the class frequency table of hourly wages (in nearest dollar) of 30 workers. Construct a bar graph.
The following is the bar graph of the above data:
2.4 Qualitative Data and Bar GraphsIf we have a set of qualitative data (say blood groups of a group of people), then for each variable value the class frequency is the number of times the value is present. (The frequency of blood group "A" is the number of people in the group with blood group "A.") The frequency table of such a data set is given by listing the variable values and the corresponding frequencies. See the following example. Example 2.4. (Khazanie, p. 18) The following is the data on the blood group of 36 patients in a hospital:
The variable blood group can assume 4 values--O, A, B, AB. The frequency table of this data is:
This is the bar graph of this qualitative data:
I used Excel to sketch these bar graphs. You could experiment with any kind of software programs or calculators. 2.5 Continuous Variables and HistogramsIf the data type is continuous, then the "bar graph-like" representation of data is called a histogram. Because the variable is continuous and there is no "gap" between variables, there will not be any gap between the bars. Given a set of data, histograms are constructed as follows:
Example 2.5. The following table gives the frequency of the weight (in ounces) of certain number of babies in some city.
The following is the histogram of this data: Following is the bar graph of the same data: Pie ChartsWe see pie charts almost every day. Last spring I saw them in the instruction booklet for IRS Form1040. Pie charts are self-explanatory; we will not discuss them further. In fact, there are many ways to pictorially represent data, and they are so natural that we need not consider all of them. Because of the availability of many software programs, people are using all kinds of innovative and colorful pictorial representations of data. You could experiment with any number of types of representations (e.g., use Excel).
Use of Calculators: We use calculators (TI-83) extensively in this course. First, I will explain how you enter data in TI-83.
It is not easy to construct a frequency table of a data set; you must be very systematic. Traditionally, we used "tally marks" to count the frequency. Now you can use some software programs (e.g., Excel). Let me give you a method that uses the calculator (TI-83).
Problems on Lesson 2 Exercise 2.1. To estimate the mean time taken to complete a three-mile drive by a racecar, the racecar did several time trials and the following sample of times taken (in seconds) to complete the laps was collected.
The following is the frequency distribution of this ungrouped data:
Construct a bar chart. Exercise 2.2. The following is the weight (in ounces), at birth, of 96 babies born in Lawrence Memorial Hospital in May 2000.
Construct a class frequency table of this data by dividing the the whole range of data in to class intervals: [60.5-70.5], [70.5-80.5], [80.5-90.5], [90.5-100.5],
[100.5-110.5], Exercise 2.3. The following are the length (in inches), at birth, of 96 babies born in Lawrence Memorial Hospital in May 2000.
Construct a frequency table for this data by dividing the whole range in to class intervals : [16-17], [17-18], [18-19], [19-20], [20-21], [21-22]. Important decision: If a data member falls on the boundary, we count it in the right/upper class-interval. Solution Exercise 2.4. The following data represents the number of typos in a sample of 30 books published by some publisher.
Construct a frequency table (by sorting in your calculator). Also
construct a bar chart. Exercise 2.5. Following is data on the hourly
wages (paid only in whole dollars) in some industry.
Construct a frequency table (by sorting in your calculator). Also
construct a bar chart. Exercise 2.6. Following is data on the hourly wages (paid only in whole dollars) of 99 employees in some industry.
Construct a frequency table (by sorting in your calculator). |