## Introduction

### Introduction

One simple graph, the stem-and-leaf graph or stemplot, comes from the field of exploratory data analysis. It is a good choice when the data sets are small. To create the plot, divide each observation of data into a stem and a leaf. The stem consists of the leading digit(s), while the leaf consists of a final significant digit. For example, 23 has stem two and leaf three. The number 432 has stem 43 and leaf two. Likewise, the number 5,432 has stem 543 and leaf two. The decimal 9.3 has stem nine and leaf three. Write the stems in a vertical line from smallest to largest. Draw a vertical line to the right of the stems. Then write the leaves in increasing order next to their corresponding stem. Make sure the leaves show a space between values, so that the exact data values may be easily determined. The frequency of data values for each stem provides information about the shape of the distribution.

### Example 2.1

For Susan Dean’s spring precalculus class, scores for the first exam were as follows (smallest to largest):

33, 42, 49, 49, 53, 55, 55, 61, 63, 67, 68, 68, 69, 69, 72, 73, 74, 78, 80, 83, 88, 88, 88, 90, 92, 94, 94, 94, 94, 96, 100

Stem Leaf
3 3
4 2 9 9
5 3 5 5
6 1 3 7 8 8 9 9
7 2 3 4 8
8 0 3 8 8 8
9 0 2 4 4 4 4 6
10 0
Table 2.1 Stem-and-Leaf Graph

The stemplot shows that most scores fell in the 60s, 70s, 80s, and 90s. Eight out of the 31 scores or approximately 26 percent $(831)(831)$ were in the 90s or 100, a fairly high number of As.

Try It 2.1

For the Park City basketball team, scores for the last 30 games were as follows (smallest to largest):

32, 32, 33, 34, 38, 40, 42, 42, 43, 44, 46, 47, 47, 48, 48, 48, 49, 50, 50, 51, 52, 52, 52, 53, 54, 56, 57, 57, 60, 61

Construct a stemplot for the data.

The stemplot is a quick way to graph data and gives an exact picture of the data. You want to look for an overall pattern and any outliers. An outlier is an observation of data that does not fit the rest of the data. It is sometimes called an extreme value. When you graph an outlier, it will appear not to fit the pattern of the graph. Some outliers are due to mistakes, for example, writing 50 instead of 500, while others may indicate that something unusual is happening. It takes some background information to explain outliers, so we will cover them in more detail later.

### Example 2.2

The data are the distances (in kilometers) from a home to local supermarkets. Create a stemplot using the data.

1.1, 1.5, 2.3, 2.5, 2.7, 3.2, 3.3, 3.3, 3.5, 3.8, 4.0, 4.2, 4.5, 4.5, 4.7, 4.8, 5.5, 5.6, 6.5, 6.7, 12.3

Do the data seem to have any concentration of values?

The leaves are to the right of the decimal.

Solution 2.2

The value 12.3 may be an outlier. Values appear to concentrate at 3 and 4 kilometers.

Stem Leaf
1 1 5
2 3 5 7
3 2 3 3 5 8
4 0 2 5 5 7 8
5 5 6
6 5 7
7
8
9
10
11
12 3
Table 2.2

Try It 2.2

The following data show the distances (in miles) from the homes of high school students to the school. Create a stemplot using the data and identify any outliers

0.5, 0.7, 1.1, 1.2, 1.2, 1.3, 1.3, 1.5, 1.5, 1.7, 1.7, 1.8, 1.9, 2.0, 2.2, 2.5, 2.6, 2.8, 2.8, 2.8, 3.5, 3.8, 4.4, 4.8, 4.9, 5.2, 5.5, 5.7, 5.8, 8.0

### Example 2.3

A side-by-side stem-and-leaf plot allows a comparison of the two data sets in two columns. In a side-by-side stem-and-leaf plot, two sets of leaves share the same stem. The leaves are to the left and the right of the stems. Table 2.4 and Table 2.5 show the ages of presidents at their inauguration and at their death. Construct a side-by-side stem-and-leaf plot using these data.

Table 2.3

Notice that the leaf values increase in order, from right to left, for leaves shown to the left of the stem, while the leaf values increase in order from left to right, for leaves shown to the right of the stem.

President Age President Age President Age
Washington 57 Lincoln 52 Hoover 54
J. Adams 61 A. Johnson 56 F. Roosevelt 51
Jefferson 57 Grant 46 Truman 60
Madison 57 Hayes 54 Eisenhower 62
Monroe 58 Garfield 49 Kennedy 43
J. Q. Adams 57 Arthur 51 L. Johnson 55
Jackson 61 Cleveland 47 Nixon 56
Van Buren 54 B. Harrison 55 Ford 61
W. H. Harrison 68 Cleveland 55 Carter 52
Tyler 51 McKinley 54 Reagan 69
Polk 49 T. Roosevelt 42 G.H.W. Bush 64
Taylor 64 Taft 51 Clinton 47
Fillmore 50 Wilson 56 G. W. Bush 54
Pierce 48 Harding 55 Obama 47
Buchanan 65 Coolidge 51
Table 2.4 Presidential Ages at Inauguration
President Age President Age President Age
Washington 67 Lincoln 56 Hoover 90
J. Adams 90 A. Johnson 66 F. Roosevelt 63
Jefferson 83 Grant 63 Truman 88
Madison 85 Hayes 70 Eisenhower 78
Monroe 73 Garfield 49 Kennedy 46
J. Q. Adams 80 Arthur 56 L. Johnson 64
Jackson 78 Cleveland 71 Nixon 81
Van Buren 79 B. Harrison 67 Ford 93
W. H. Harrison 68 Cleveland 71 Reagan 93
Tyler 71 McKinley 58
Polk 53 T. Roosevelt 60
Taylor 65 Taft 72
Fillmore 74 Wilson 67
Pierce 64 Harding 57
Buchanan 77 Coolidge 60
Solution 2.3
Ages at Inauguration   Ages at Death
9 9 8 7 7 7 6 3 2 4 6 9
8 7 7 7 7 6 6 6 5 5 5 5 4 4 4 4 4 2 1 1 1 1 1 0 5 3 6 6 7 7 8
9 5 4 4 2 1 1 1 0 6 0 0 3 3 4 4 5 6 7 7 7 8
7 0 0 1 1 1 4 7 8 8 9
8 0 1 3 5 8
9 0 0 3 3
Table 2.5 Presidential Age at Death

Try It 2.3

The table shows the number of wins and losses a sports team has had in 42 seasons. Create a side-by-side stem-and-leaf plot of these wins and losses.

Losses Wins Year Losses Wins Year
34 48 1968–1969 41 41 1989–1990
34 48 1969–1970 39 43 1990–1991
46 36 1970–1971 44 38 1991–1992
46 36 1971–1972 39 43 1992–1993
36 46 1972–1973 25 57 1993–1994
47 35 1973–1974 40 42 1994–1995
51 31 1974–1975 36 46 1995–1996
53 29 1975–1976 26 56 1996–1997
51 31 1976–1977 32 50 1997–1998
41 41 1977–1978 19 31 1998–1999
36 46 1978–1979 54 28 1999–2000
32 50 1979–1980 57 25 2000–2001
51 31 1980–1981 49 33 2001–2002
40 42 1981–1982 47 35 2002–2003
39 43 1982–1983 54 28 2003–2004
42 40 1983–1984 69 13 2004–2005
48 34 1984–1985 56 26 2005–2006
32 50 1985–1986 52 30 2006–2007
25 57 1986–1987 45 37 2007–2008
32 50 1987–1988 35 47 2008–2009
30 52 1988–1989 29 53 2009–2010
Table 2.6

Another type of graph that is useful for specific data values is a line graph. In the particular line graph shown in Example 2.4, the x-axis (horizontal axis) consists of data values and the y-axis (vertical axis) consists of frequency points. The frequency points are connected using line segments.

### Example 2.4

In a survey, 40 mothers were asked how many times per week a teenager must be reminded to do his or her chores. The results are shown in Table 2.7 and in Figure 2.2.

Number of Times Teenager Is Reminded Frequency
0 2
1 5
2 8
3 14
4 7
5 4
Table 2.7
Figure 2.2

Try It 2.4

In a survey, 40 people were asked how many times per year they had their car in the shop for repairs. The results are shown in Table 2.8. Construct a line graph.

Number of Times in Shop Frequency
0 7
1 10
2 14
3 9
Table 2.8

Bar graphs consist of bars that are separated from each other. The bars can be rectangles, or they can be rectangular boxes, used in three-dimensional plots, and they can be vertical or horizontal. The bar graph shown in Example 2.5 has age-groups represented on the x-axis and proportions on the y-axis.

### Example 2.5

By the end of 2011, a social media site had more than 146 million users in the United States. Table 2.9 shows three age-groups, the number of users in each age-group, and the proportion (percentage) of users in each age-group. Construct a bar graph using this data.

Age-Groups Number of Site Users Proportion (%) of Site Users
13–25 65,082,280 45%
26–44 53,300,200 36%
45–64 27,885,100 19%
Table 2.9
Solution 2.5
Figure 2.3

Try It 2.5

The population in Park City is made up of children, working-age adults, and retirees. Table 2.10 shows the three age-groups, the number of people in the town from each age-group, and the proportion (%) of people in each age-group. Construct a bar graph showing the proportions.

Age-Groups Number of People Proportion of Population
Children 67,059 19%
Retirees 131,662 38%
Table 2.10

### Example 2.6

The columns in Table 2.11 contain the race or ethnicity of students in U.S. public schools for the class of 2011, percentages for the Advanced Placement (AP) examinee population for that class, and percentages for the overall student population. Create a bar graph with the student race or ethnicity (qualitative data) on the x-axis and the AP examinee population percentages on the y-axis.

Race/Ethnicity AP Examinee Population Overall Student Population
1 = Asian, Asian American, or Pacific Islander 10.3% 5.7%
2 = Black or African American 9.0% 14.7%
3 = Hispanic or Latino 17.0% 17.6%
4 = American Indian or Alaska Native 0.6% 1.1%
5 = White 57.1% 59.2%
6 = Not reported/other 6.0% 1.7%
Table 2.11

Solution 2.6
Figure 2.4

Try It 2.6

Park City is broken down into six voting districts. The table shows the percentage of the total registered voter population that lives in each district as well as the percentage of the entire population that lives in each district. Construct a bar graph that shows the registered voter population by district.

District Registered Voter Population Overall City Population
1 15.5% 19.4%
2 12.2% 15.6%
3 9.8% 9.0%
4 17.4% 18.5%
5 22.8% 20.7%
6 22.3% 16.8%
Table 2.12

### Example 2.7

Below is a two-way table showing the types of pets owned by men and women.

Dogs Cats Fish Total
Men 4 2 2 8
Women 4 6 2 12
Total 8 8 4 20
Table 2.13

Given these data, calculate the marginal distributions of pets for the people surveyed.

Solution 2.7

Note—The sum of all the marginal distributions must equal one. In this case, therefore, the solution checks.

### Example 2.8

Table 2.14 is a two-way table showing the types of pets owned by men and women.

Dogs Cats Fish Total
Men 4 2 2 8
Women 4 6 2 12
Total 8 8 4 20
Table 2.14

Given these data, calculate the conditional distributions for the subpopulation of men who own each pet type.

Solution 2.8

Note—The sum of all the conditional distributions must equal 1. In this case, therefore, the solution checks.