# Introduction

### Introduction

One simple graph, the **stem-and-leaf graph** or **stemplot**, comes from the field of exploratory data analysis. It is a good choice when the data sets are small. To create the plot, divide each observation of data into a stem and a leaf. The stem consists of the leading digit(s), while the leaf consists of a **final significant digit**. For example, 23 has stem two and leaf three. The number 432 has stem 43 and leaf two. Likewise, the number 5,432 has stem 543 and leaf two. The decimal 9.3 has stem nine and leaf three. Write the stems in a vertical line from smallest to largest. Draw a vertical line to the right of the stems. Then write the leaves in increasing order next to their corresponding stem. Make sure the leaves show a space between values, so that the exact data values may be easily determined. The frequency of data values for each stem provides information about the shape of the distribution.

### Example 2.1

For Susan Dean’s spring precalculus class, scores for the first exam were as follows (smallest to largest):

33, 42, 49, 49, 53, 55, 55, 61, 63, 67, 68, 68, 69, 69, 72, 73, 74, 78, 80, 83, 88, 88, 88, 90, 92, 94, 94, 94, 94, 96, 100Stem | Leaf |
---|---|

3 | 3 |

4 | 2 9 9 |

5 | 3 5 5 |

6 | 1 3 7 8 8 9 9 |

7 | 2 3 4 8 |

8 | 0 3 8 8 8 |

9 | 0 2 4 4 4 4 6 |

10 | 0 |

The stemplot shows that most scores fell in the 60s, 70s, 80s, and 90s. Eight out of the 31 scores or approximately 26 percent $\left(\frac{8}{31}\right)$ were in the 90s or 100, a fairly high number of As.

For the Park City basketball team, scores for the last 30 games were as follows (smallest to largest):

32, 32, 33, 34, 38, 40, 42, 42, 43, 44, 46, 47, 47, 48, 48, 48, 49, 50, 50, 51, 52, 52, 52, 53, 54, 56, 57, 57, 60, 61Construct a stemplot for the data.

The stemplot is a quick way to graph data and gives an exact picture of the data. You want to look for an overall pattern and any outliers. An outlier is an observation of data that does not fit the rest of the data. It is sometimes called an **extreme value.** When you graph an outlier, it will appear not to fit the pattern of the graph. Some outliers are due to mistakes, for example, writing 50 instead of 500, while others may indicate that something unusual is happening. It takes some background information to explain outliers, so we will cover them in more detail later.

### Example 2.2

The data are the distances (in kilometers) from a home to local supermarkets. Create a stemplot using the data.

1.1, 1.5, 2.3, 2.5, 2.7, 3.2, 3.3, 3.3, 3.5, 3.8, 4.0, 4.2, 4.5, 4.5, 4.7, 4.8, 5.5, 5.6, 6.5, 6.7, 12.3

Do the data seem to have any concentration of values?

The leaves are to the right of the decimal.

The value 12.3 may be an outlier. Values appear to concentrate at 3 and 4 kilometers.

Stem | Leaf |
---|---|

1 | 1 5 |

2 | 3 5 7 |

3 | 2 3 3 5 8 |

4 | 0 2 5 5 7 8 |

5 | 5 6 |

6 | 5 7 |

7 | |

8 | |

9 | |

10 | |

11 | |

12 | 3 |

The following data show the distances (in miles) from the homes of high school students to the school. Create a stemplot using the data and identify any outliers

0.5, 0.7, 1.1, 1.2, 1.2, 1.3, 1.3, 1.5, 1.5, 1.7, 1.7, 1.8, 1.9, 2.0, 2.2, 2.5, 2.6, 2.8, 2.8, 2.8, 3.5, 3.8, 4.4, 4.8, 4.9, 5.2, 5.5, 5.7, 5.8, 8.0

### Example 2.3

A **side-by-side stem-and-leaf plot** allows a comparison of the two data sets in two columns. In a side-by-side stem-and-leaf plot, two sets of leaves share the same stem. The leaves are to the left and the right of the stems. Table 2.4 and Table 2.5 show the ages of presidents at their inauguration and at their death. Construct a side-by-side stem-and-leaf plot using these data.

Notice that the leaf values increase in order, from right to left, for leaves shown to the left of the stem, while the leaf values increase in order from left to right, for leaves shown to the right of the stem.

President | Age | President | Age | President | Age |
---|---|---|---|---|---|

Washington | 57 | Lincoln | 52 | Hoover | 54 |

J. Adams | 61 | A. Johnson | 56 | F. Roosevelt | 51 |

Jefferson | 57 | Grant | 46 | Truman | 60 |

Madison | 57 | Hayes | 54 | Eisenhower | 62 |

Monroe | 58 | Garfield | 49 | Kennedy | 43 |

J. Q. Adams | 57 | Arthur | 51 | L. Johnson | 55 |

Jackson | 61 | Cleveland | 47 | Nixon | 56 |

Van Buren | 54 | B. Harrison | 55 | Ford | 61 |

W. H. Harrison | 68 | Cleveland | 55 | Carter | 52 |

Tyler | 51 | McKinley | 54 | Reagan | 69 |

Polk | 49 | T. Roosevelt | 42 | G.H.W. Bush | 64 |

Taylor | 64 | Taft | 51 | Clinton | 47 |

Fillmore | 50 | Wilson | 56 | G. W. Bush | 54 |

Pierce | 48 | Harding | 55 | Obama | 47 |

Buchanan | 65 | Coolidge | 51 |

President | Age | President | Age | President | Age |
---|---|---|---|---|---|

Washington | 67 | Lincoln | 56 | Hoover | 90 |

J. Adams | 90 | A. Johnson | 66 | F. Roosevelt | 63 |

Jefferson | 83 | Grant | 63 | Truman | 88 |

Madison | 85 | Hayes | 70 | Eisenhower | 78 |

Monroe | 73 | Garfield | 49 | Kennedy | 46 |

J. Q. Adams | 80 | Arthur | 56 | L. Johnson | 64 |

Jackson | 78 | Cleveland | 71 | Nixon | 81 |

Van Buren | 79 | B. Harrison | 67 | Ford | 93 |

W. H. Harrison | 68 | Cleveland | 71 | Reagan | 93 |

Tyler | 71 | McKinley | 58 | ||

Polk | 53 | T. Roosevelt | 60 | ||

Taylor | 65 | Taft | 72 | ||

Fillmore | 74 | Wilson | 67 | ||

Pierce | 64 | Harding | 57 | ||

Buchanan | 77 | Coolidge | 60 |

Ages at Inauguration | Ages at Death | |
---|---|---|

9 9 8 7 7 7 6 3 2 | 4 | 6 9 |

8 7 7 7 7 6 6 6 5 5 5 5 4 4 4 4 4 2 1 1 1 1 1 0 | 5 | 3 6 6 7 7 8 |

9 5 4 4 2 1 1 1 0 | 6 | 0 0 3 3 4 4 5 6 7 7 7 8 |

7 | 0 0 1 1 1 4 7 8 8 9 | |

8 | 0 1 3 5 8 | |

9 | 0 0 3 3 |

The table shows the number of wins and losses a sports team has had in 42 seasons. Create a side-by-side stem-and-leaf plot of these wins and losses.

Losses | Wins | Year | Losses | Wins | Year |
---|---|---|---|---|---|

34 | 48 | 1968–1969 | 41 | 41 | 1989–1990 |

34 | 48 | 1969–1970 | 39 | 43 | 1990–1991 |

46 | 36 | 1970–1971 | 44 | 38 | 1991–1992 |

46 | 36 | 1971–1972 | 39 | 43 | 1992–1993 |

36 | 46 | 1972–1973 | 25 | 57 | 1993–1994 |

47 | 35 | 1973–1974 | 40 | 42 | 1994–1995 |

51 | 31 | 1974–1975 | 36 | 46 | 1995–1996 |

53 | 29 | 1975–1976 | 26 | 56 | 1996–1997 |

51 | 31 | 1976–1977 | 32 | 50 | 1997–1998 |

41 | 41 | 1977–1978 | 19 | 31 | 1998–1999 |

36 | 46 | 1978–1979 | 54 | 28 | 1999–2000 |

32 | 50 | 1979–1980 | 57 | 25 | 2000–2001 |

51 | 31 | 1980–1981 | 49 | 33 | 2001–2002 |

40 | 42 | 1981–1982 | 47 | 35 | 2002–2003 |

39 | 43 | 1982–1983 | 54 | 28 | 2003–2004 |

42 | 40 | 1983–1984 | 69 | 13 | 2004–2005 |

48 | 34 | 1984–1985 | 56 | 26 | 2005–2006 |

32 | 50 | 1985–1986 | 52 | 30 | 2006–2007 |

25 | 57 | 1986–1987 | 45 | 37 | 2007–2008 |

32 | 50 | 1987–1988 | 35 | 47 | 2008–2009 |

30 | 52 | 1988–1989 | 29 | 53 | 2009–2010 |

Another type of graph that is useful for specific data values is a **line graph**. In the particular line graph shown in Example 2.4, the ** x-axis** (horizontal axis) consists of

**data values**and the

**(vertical axis) consists of**

*y*-axis**frequency points**. The frequency points are connected using line segments.

### Example 2.4

In a survey, 40 mothers were asked how many times per week a teenager must be reminded to do his or her chores. The results are shown in Table 2.7 and in Figure 2.2.

Number of Times Teenager Is Reminded | Frequency |
---|---|

0 | 2 |

1 | 5 |

2 | 8 |

3 | 14 |

4 | 7 |

5 | 4 |

In a survey, 40 people were asked how many times per year they had their car in the shop for repairs. The results are shown in Table 2.8. Construct a line graph.

Number of Times in Shop | Frequency |
---|---|

0 | 7 |

1 | 10 |

2 | 14 |

3 | 9 |

**Bar graphs** consist of bars that are separated from each other. The bars can be rectangles, or they can be rectangular boxes, used in three-dimensional plots, and they can be vertical or horizontal. The **bar graph** shown in Example 2.5 has age-groups represented on the ** x-axis** and proportions on the

**.**

*y*-axis### Example 2.5

By the end of 2011, a social media site had more than 146 million users in the United States. Table 2.9 shows three age-groups, the number of users in each age-group, and the proportion (percentage) of users in each age-group. Construct a bar graph using this data.

Age-Groups | Number of Site Users | Proportion (%) of Site Users |
---|---|---|

13–25 | 65,082,280 | 45% |

26–44 | 53,300,200 | 36% |

45–64 | 27,885,100 | 19% |

The population in Park City is made up of children, working-age adults, and retirees. Table 2.10 shows the three age-groups, the number of people in the town from each age-group, and the proportion (%) of people in each age-group. Construct a bar graph showing the proportions.

Age-Groups | Number of People | Proportion of Population |
---|---|---|

Children | 67,059 | 19% |

Working-age adults | 152,198 | 43% |

Retirees | 131,662 | 38% |

### Example 2.6

The columns in Table 2.11 contain the race or ethnicity of students in U.S. public schools for the class of 2011, percentages for the Advanced Placement (AP) examinee population for that class, and percentages for the overall student population. Create a bar graph with the student race or ethnicity (qualitative data) on the *x*-axis and the AP examinee population percentages on the *y*-axis.

Race/Ethnicity | AP Examinee Population | Overall Student Population |
---|---|---|

1 = Asian, Asian American, or Pacific Islander | 10.3% | 5.7% |

2 = Black or African American | 9.0% | 14.7% |

3 = Hispanic or Latino | 17.0% | 17.6% |

4 = American Indian or Alaska Native | 0.6% | 1.1% |

5 = White | 57.1% | 59.2% |

6 = Not reported/other | 6.0% | 1.7% |

Park City is broken down into six voting districts. The table shows the percentage of the total registered voter population that lives in each district as well as the percentage of the entire population that lives in each district. Construct a bar graph that shows the registered voter population by district.

District | Registered Voter Population | Overall City Population |
---|---|---|

1 | 15.5% | 19.4% |

2 | 12.2% | 15.6% |

3 | 9.8% | 9.0% |

4 | 17.4% | 18.5% |

5 | 22.8% | 20.7% |

6 | 22.3% | 16.8% |

### Example 2.7

Below is a two-way table showing the types of pets owned by men and women.

Dogs | Cats | Fish | Total | |
---|---|---|---|---|

Men | 4 | 2 | 2 | 8 |

Women | 4 | 6 | 2 | 12 |

Total | 8 | 8 | 4 | 20 |

Given these data, calculate the marginal distributions of pets for the people surveyed.

$$\text{Dogs=8/20=}\text{.4}$$

$$\text{Cats=8/20=}\text{.4}$$

$$\text{Fish=4/20=}\text{.2}$$

Note—The sum of all the marginal distributions must equal one. In this case, $$.4\text{}+\text{}.4\text{}+\text{}.2\text{}=\text{}1;$$ therefore, the solution *checks*.

### Example 2.8

Table 2.14 is a two-way table showing the types of pets owned by men and women.

Dogs | Cats | Fish | Total | |
---|---|---|---|---|

Men | 4 | 2 | 2 | 8 |

Women | 4 | 6 | 2 | 12 |

Total | 8 | 8 | 4 | 20 |

Given these data, calculate the conditional distributions for the subpopulation of men who own each pet type.

$$\text{Menwhoowndogs=4/8=}\text{.5}$$

$$\text{Menwhoowncats=2/8=}\text{.25}$$

$$\text{Menwhoownfish=2/8=}\text{.25}$$

Note—The sum of all the conditional distributions must equal 1. In this case, $$.5\text{}+\text{}.25\text{}+\text{}.25\text{}=\text{}1;$$ therefore, the solution *checks*.