### 2.1 Stem-and-Leaf Graphs (Stemplots), Line Graphs, and Bar Graphs

Student grades on a chemistry exam were 77, 78, 76, 81, 86, 51, 79, 82, 84, and 99.

- Construct a stem-and-leaf plot of the data.
- Are there any potential outliers? If so, which scores are they? Why do you consider them outliers?

Table 2.64 contains the 2010 rates for a specific disease in U.S. states and Washington, DC.

State | Percent (%) | State | Percent (%) | State | Percent (%) |
---|---|---|---|---|---|

Alabama | 32.2 | Kentucky | 31.3 | North Dakota | 27.2 |

Alaska | 24.5 | Louisiana | 31.0 | Ohio | 29.2 |

Arizona | 24.3 | Maine | 26.8 | Oklahoma | 30.4 |

Arkansas | 30.1 | Maryland | 27.1 | Oregon | 26.8 |

California | 24.0 | Massachusetts | 23.0 | Pennsylvania | 28.6 |

Colorado | 21.0 | Michigan | 30.9 | Rhode Island | 25.5 |

Connecticut | 22.5 | Minnesota | 24.8 | South Carolina | 31.5 |

Delaware | 28.0 | Mississippi | 34.0 | South Dakota | 27.3 |

Washington, DC | 22.2 | Missouri | 30.5 | Tennessee | 30.8 |

Florida | 26.6 | Montana | 23.0 | Texas | 31.0 |

Georgia | 29.6 | Nebraska | 26.9 | Utah | 22.5 |

Hawaii | 22.7 | Nevada | 22.4 | Vermont | 23.2 |

Idaho | 26.5 | New Hampshire | 25.0 | Virginia | 26.0 |

Illinois | 28.2 | New Jersey | 23.8 | Washington | 25.5 |

Indiana | 29.6 | New Mexico | 25.1 | West Virginia | 32.5 |

Iowa | 28.4 | New York | 23.9 | Wisconsin | 26.3 |

Kansas | 29.4 | North Carolina | 27.8 | Wyoming | 25.1 |

- Use a random number generator to randomly pick eight states. Construct a bar graph of the rates of a specific disease of those eight states.
- Construct a bar graph for all the states beginning with the letter
*A*. - Construct a bar graph for all the states beginning with the letter
*M*.

### 2.2 Histograms, Frequency Polygons, and Time Series Graphs

Suppose that three book publishers were interested in the number of fiction paperbacks adult consumers purchase per month. Each publisher conducted a survey. In the survey, adult consumers were asked the number of fiction paperbacks they had purchased the previous month. The results are as follows:

Number of Books | Frequency | Relative Frequency |
---|---|---|

0 | 10 | |

1 | 12 | |

2 | 16 | |

3 | 12 | |

4 | 8 | |

5 | 6 | |

6 | 2 | |

8 | 2 |

Number of Books | Frequency | Relative Frequency |
---|---|---|

0 | 18 | |

1 | 24 | |

2 | 24 | |

3 | 22 | |

4 | 15 | |

5 | 10 | |

7 | 5 | |

9 | 1 |

Number of Books | Frequency | Relative Frequency |
---|---|---|

0–1 | 20 | |

2–3 | 35 | |

4–5 | 12 | |

6–7 | 2 | |

8–9 | 1 |

- Find the relative frequencies for each survey. Write them in the charts.
- Using either a graphing calculator or computer or by hand, use the frequency column to construct a histogram for each publisher's survey. For Publishers A and B, make bar widths of 1. For Publisher C, make bar widths of 2.
- In complete sentences, give two reasons why the graphs for Publishers A and B are not identical.
- Would you have expected the graph for Publisher C to look like the other two graphs? Why or why not?
- Make new histograms for Publisher A and Publisher B. This time, make bar widths of 2.
- Now, compare the graph for Publisher C to the new graphs for Publishers A and B. Are the graphs more similar or more different? Explain your answer.

Often, cruise ships conduct all onboard transactions, with the exception of souvenirs, on a cashless basis. At the end of the cruise, guests pay one bill that covers all onboard transactions. Suppose that 60 single travelers and 70 couples were surveyed as to their onboard bills for a seven-day cruise from Los Angeles to the Mexican Riviera. Following is a summary of the bills for each group.

Amount ($) | Frequency | Relative Frequency |
---|---|---|

51–100 | 5 | |

101–150 | 10 | |

151–200 | 15 | |

201–250 | 15 | |

251–300 | 10 | |

301–350 | 5 |

Amount ($) | Frequency | Relative Frequency |
---|---|---|

100–150 | 5 | |

201–250 | 5 | |

251–300 | 5 | |

301–350 | 5 | |

351–400 | 10 | |

401–450 | 10 | |

451–500 | 10 | |

501–550 | 10 | |

551–600 | 5 | |

601–650 | 5 |

- Fill in the relative frequency for each group.
- Construct a histogram for the singles group. Scale the
*x*-axis by $50 widths. Use relative frequency on the*y*-axis. - Construct a histogram for the couples group. Scale the
*x*-axis by $50 widths. Use relative frequency on the*y*-axis. - Compare the two graphs:
- List two similarities between the graphs.
- List two differences between the graphs.
- Overall, are the graphs more similar or different?

- Construct a new graph for the couples by hand. Since each couple is paying for two individuals, instead of scaling the
*x*-axis by $50, scale it by $100. Use relative frequency on the*y*-axis. - Compare the graph for the singles with the new graph for the couples:
- List two similarities between the graphs.
- Overall, are the graphs more similar or different?

- How did scaling the couples graph differently change the way you compared it to the singles graph?
- Based on the graphs, do you think that individuals spend the same amount, more or less, as singles as they do person by person as a couple? Explain why in one or two complete sentences.

Twenty-five randomly selected students were asked the number of movies they watched the previous week. The results are as follows:

Number of Movies | Frequency | Relative Frequency | Cumulative Relative Frequency |
---|---|---|---|

0 | 5 | ||

1 | 9 | ||

2 | 6 | ||

3 | 4 | ||

4 | 1 |

- Construct a histogram of the data.
- Complete the columns of the chart.

*Use the following information to answer the next two exercises:* Suppose 111 people who shopped in a special T-shirt store were asked the number of T-shirts they own costing more than $19 each.

The percentage of people who own at most three T-shirts costing more than $19 each is approximately ________.

- 21
- 59
- 41
- cannot be determined

If the data were collected by asking the first 111 people who entered the store, then the type of sampling is ________.

- cluster
- simple random
- stratified
- convenience

Following are the 2010 obesity rates by U.S. states and Washington, DC.

State | Percent (%) | State | Percent (%) | State | Percent (%) |
---|---|---|---|---|---|

Alabama | 32.2 | Kentucky | 31.3 | North Dakota | 27.2 |

Alaska | 24.5 | Louisiana | 31.0 | Ohio | 29.2 |

Arizona | 24.3 | Maine | 26.8 | Oklahoma | 30.4 |

Arkansas | 30.1 | Maryland | 27.1 | Oregon | 26.8 |

California | 24.0 | Massachusetts | 23.0 | Pennsylvania | 28.6 |

Colorado | 21.0 | Michigan | 30.9 | Rhode Island | 25.5 |

Connecticut | 22.5 | Minnesota | 24.8 | South Carolina | 31.5 |

Delaware | 28.0 | Mississippi | 34.0 | South Dakota | 27.3 |

Washington, DC | 22.2 | Missouri | 30.5 | Tennessee | 30.8 |

Florida | 26.6 | Montana | 23.0 | Texas | 31.0 |

Georgia | 29.6 | Nebraska | 26.9 | Utah | 22.5 |

Hawaii | 22.7 | Nevada | 22.4 | Vermont | 23.2 |

Idaho | 26.5 | New Hampshire | 25.0 | Virginia | 26.0 |

Illinois | 28.2 | New Jersey | 23.8 | Washington | 25.5 |

Indiana | 29.6 | New Mexico | 25.1 | West Virginia | 32.5 |

Iowa | 28.4 | New York | 23.9 | Wisconsin | 26.3 |

Kansas | 29.4 | North Carolina | 27.8 | Wyoming | 25.1 |

Construct a bar graph of obesity rates of your state and the four states closest to your state. Hint—Label the *x*-axis with the states.

### 2.3 Measures of the Location of the Data

The median age for U.S. ethnicity A currently is 30.9 years; for U.S. ethnicity B, it is 42.3 years.

- Based on this information, give two reasons why ethnicity A median age could be lower than the ethnicity B median age.
- Does the lower median age for ethnicity A necessarily mean that ethnicity A die younger than ethnicity B? Why or why not?
- How might it be possible for ethnicity A and ethnicity B to die at approximately the same age but for the median age for ethnicity B to be higher?

Six hundred adult Americans were asked by telephone poll, "What do you think constitutes a middle-class income?" The results are in Table 2.72. Also, include the left endpoint but not the right endpoint.

Salary ($) | Relative Frequency |
---|---|

< 20,000 | 0.02 |

20,000–25,000 | 0.09 |

25,000–30,000 | 0.19 |

30,000–40,000 | 0.26 |

40,000–50,000 | 0.18 |

50,000–75,000 | 0.17 |

75,000–99,999 | 0.02 |

100,000+ | 0.01 |

- What percentage of the survey answered “not sure”?
- What percentage think that middle class is from $25,000 to $50,000?
- Construct a histogram of the data.
- Should all bars have the same width, based on the data? Why or why not?
- How should the < 20,000 and the 100,000+ intervals be handled? Why?

- Find the 40
^{th}and 80^{th}percentiles. - Construct a bar graph of the data.

Given the following box plot, answer the questions.

- Which quarter has the smallest spread of data? What is that spread?
- Which quarter has the largest spread of data? What is that spread?
- Find the interquartile range (
*IQR*). - Are there more data in the interval 5–10 or in the interval 10–13? How do you know this?
- Which interval has the fewest data in it? How do you know this?
- 0–2
- 2–4
- 10–12
- 12–13
- need more information

The following box plot shows the ages of the U.S. population for 1990, the latest available year.

- Are there fewer or more children (age 17 and under) than senior citizens (age 65 and over)? How do you know?
- 12.6 percent are age 65 and over. Approximately what percentage of the population are working-age adults (above age 17 to age 65)?

### 2.4 Box Plots

In a survey of 20-year-olds in China, Germany, and the United States, people were asked the number of foreign countries they had visited in their lifetime. The following box plots display the results.

- In complete sentences, describe what the shape of each box plot implies about the distribution of the data collected.
- Have more Americans or more Germans surveyed been to more than eight foreign countries?
- Compare the three box plots. What do they imply about the foreign travel of 20-year-old residents of the three countries when compared to each other?

Given the following box plot, answer the questions.

- Think of an example (in words) where the data might fit into the above box plot. In two to five sentences, write down the example.
- What does it mean to have the first and second quartiles so close together, while the second to third quartiles are far apart?

Given the following box plots, answer the questions.

- In complete sentences, explain why each statement is false.
**Data 1**has more data values above two than**Data 2**has above two.- The data sets cannot have the same mode.
- For
**Data 1**, there are more data values below four than there are above four.

- For which group, Data 1 or Data 2, is the value of 7 more likely to be an outlier? Explain why in complete sentences.

A survey was conducted of 130 purchasers of new black sports cars, 130 purchasers of new red sports cars, and 130 purchasers of new white sports cars. In it, people were asked the age they were when they purchased their car. The following box plots display the results.

- In complete sentences, describe what the shape of each box plot implies about the distribution of the data collected for that car series.
- Which group is most likely to have an outlier? Explain how you determined that.
- Compare the three box plots. What do they imply about the age of purchasing a sports car from the series when compared to each other?
- Look at the red sports cars. Which quarter has the smallest spread of data? What is the spread?
- Look at the red sports cars. Which quarter has the largest spread of data? What is the spread?
- Look at the red sports cars. Estimate the interquartile range (
*IQR*). - Look at the red sports cars. Are there more data in the interval 31–38 or in the interval 45–55? How do you know this?
- Look at the red sports cars. Which interval has the fewest data in it? How do you know this?
- 31–35
- 38–41
- 41–64

Twenty-five randomly selected students were asked the number of movies they watched the previous week. The results are as follows:

Number of Movies | Frequency |
---|---|

0 | 5 |

1 | 9 |

2 | 6 |

3 | 4 |

4 | 1 |

Construct a box plot of the data.

### 2.5 Measures of the Center of the Data

Scientists are studying a particular disease. They found that countries that have the highest rates of people who have ever been diagnosed with this disease range from 11.4 percent to 74.6 percent.

Percentage of Population Diagnosed | Number of Countries |
---|---|

11.4–20.45 | 29 |

20.45–29.45 | 13 |

29.45–38.45 | 4 |

38.45–47.45 | 0 |

47.45–56.45 | 2 |

56.45–65.45 | 1 |

65.45–74.45 | 0 |

74.45–83.45 | 1 |

- What is the best estimate of the average percentage affected by the disease for these countries?
- The United States has an average disease rate of 33.9 percent. Is this rate above average or below?
- How does the United States compare to other countries?

Table 2.75 gives the percentage of children under age five have been diagnosed with a medical condition. What is the best estimate for the mean percentage of children with the condition?

Percentage of Children with the Condition | Number of Countries |
---|---|

16–21.45 | 23 |

21.45–26.9 | 4 |

26.9–32.35 | 9 |

32.35–37.8 | 7 |

37.8–43.25 | 6 |

43.25–48.7 | 1 |

### 2.6 Skewness and the Mean, Median, and Mode

The median age of the U.S. population in 1980 was 30.0 years. In 1991, the median age was 33.1 years.

- What does it mean for the median age to rise?
- Give two reasons why the median age could rise.
- For the median age to rise, is the actual number of children less in 1991 than it was in 1980? Why or why not?

### 2.7 Measures of the Spread of the Data

*Use the following information to answer the next nine exercises:*The population parameters below describe the full-time equivalent number of students (FTES) each year at Lake Tahoe Community College from 1976–1977 through 2004–2005.

*μ*= 1,000 FTES- median = 1,014 FTES
*σ*= 474 FTES- first quartile = 528.5 FTES
- third quartile = 1,447.5 FTES
*n*= 29 years

A sample of 11 years is taken. About how many are expected to have an FTES of 1,014 or above? Explain how you determined your answer.

Seventy-five percent of all years have an FTES.

- at or below ______.
- at or above ______.

The population standard deviation = ______.

What percentage of the FTES were from 528.5 to 1,447.5? How do you know?

What is the *IQR*? What does the *IQR* represent?

How many standard deviations away from the mean is the median?

*Additional Information:* The population FTES for 2005–2006 through 2010–2011 was given in an updated report. The data are reported here.

Year |
2005–2006 | 2006–2007 | 2007–2008 | 2008–2009 | 2009–2010 | 2010–2011 |

Total FTES |
1,585 | 1,690 | 1,735 | 1,935 | 2,021 | 1,890 |

Calculate the mean, median, standard deviation, the first quartile, the third quartile, and the *IQR*. Round to one decimal place.

Construct a box plot for the FTES for 2005–2006 through 2010–2011 and a box plot for the FTES for 1976–1977 through 2004–2005.

Compare the *IQR* for the FTES for 1976–1977 through 2004–2005 with the *IQR* for the FTES for 2005-2006 through 2010–2011. Why do you suppose the *IQR*s are so different?

Three students were applying to the same graduate school. They came from schools with different grading systems. Which student had the best GPA when compared to other students at his school? Explain how you determined your answer.

Student | GPA | School Average GPA | School Standard Deviation |
---|---|---|---|

Thuy | 2.7 | 3.2 | 0.8 |

Vichet | 87 | 75 | 20 |

Kamala | 8.6 | 8 | 0.4 |

A music school has budgeted to purchase three musical instruments. The school plans to purchase a piano costing $3,000, a guitar costing $550, and a drum set costing $600. The mean cost for a piano is $4,000 with a standard deviation of $2,500. The mean cost for a guitar is $500 with a standard deviation of $200. The mean cost for drums is $700 with a standard deviation of $100. Which cost is the lowest when compared to other instruments of the same type? Which cost is the highest when compared to other instruments of the same type? Justify your answer.

An elementary school class ran one mile with a mean of 11 minutes and a standard deviation of three minutes. Rachel, a student in the class, ran one mile in eight minutes. A junior high school class ran one mile with a mean of nine minutes and a standard deviation of two minutes. Kenji, a student in the class, ran one mile in 8.5 minutes. A high school class ran one mile with a mean of seven minutes and a standard deviation of four minutes. Nedda, a student in the class, ran one mile in eight minutes.

- Why is Kenji considered a better runner than Nedda even though Nedda ran faster than he?
- Who is the fastest runner with respect to his or her class? Explain why.

Scientists are studying a particular disease. They found that countries that have the highest rates of people who have ever been diagnosed with this disease range from 11.4 percent to 74.6 percent.

Percentage of Population with Disease | Number of Countries |
---|---|

11.4–20.45 | 29 |

20.45–29.45 | 13 |

29.45–38.45 | 4 |

38.45–47.45 | 0 |

47.45–56.45 | 2 |

56.45–65.45 | 1 |

65.45–74.45 | 0 |

74.45–83.45 | 1 |

What is the best estimate of the average percentage of people with the disease for these countries? What is the standard deviation for the listed rates? The United States has an average disease rate of 33.9 percent. Is this rate above average or below? How *unusual* is the U.S. obesity rate compared to the average rate? Explain.

Table 2.79 gives the percentage of children under age five diagnosed with a specific medical condition.

Percentage of Children with the Condition | Number of Countries |
---|---|

16–21.45 | 23 |

21.45–26.9 | 4 |

26.9–32.35 | 9 |

32.35–37.8 | 7 |

37.8–43.25 | 6 |

43.25–48.7 | 1 |

What is the best estimate for the mean percentage of children with the condition? What is the standard deviation? Which interval(s) could be considered unusual? Explain.