### Introduction

The common measures of location are quartiles and percentiles.

Quartiles are special percentiles. The first quartile, *Q*_{1}, is the same as the 25^{th} percentile, and the third quartile, *Q*_{3}, is the same as the 75^{th} percentile. The median, *M*, is called both the second quartile and the 50^{th} percentile.

To calculate quartiles and percentiles, you must order the data from smallest to largest. Quartiles divide ordered data into quarters. Percentiles divide ordered data into hundredths. Recall that a percent means one-hundredth. So, percentiles mean the data is divided into 100 sections. To score in the 90^{th} percentile of an exam does not mean, necessarily, that you received 90 percent on a test. It means that 90 percent of test scores are the same as or less than your score and that 10 percent of the test scores are the same as or greater than your test score.

Percentiles are useful for comparing values. For this reason, universities and colleges use percentiles extensively. One instance in which colleges and universities use percentiles is when SAT results are used to determine a minimum testing score that will be used as an acceptance factor. For example, suppose Duke accepts SAT scores at or above the 75^{th} percentile. That translates into a score of at least 1220.

Percentiles are mostly used with very large populations. Therefore, if you were to say that 90 percent of the test scores are less—and not the same or less—than your score, it would be acceptable because removing one particular data value is not significant.

The median is a number that measures the *center* of the data. You can think of the median as the *middle value*, but it does not actually have to be one of the observed values. It is a number that separates ordered data into halves. Half the values are the same number or smaller than the median, and half the values are the same number or larger. For example, consider the following data:

Since there are 14 observations (an even number of data values), the median is between the seventh value, 6.8, and the eighth value, 7.2. To find the median, add the two values together and divide by two.

The median is seven. Half of the values are smaller than seven and half of the values are larger than seven.

Quartiles are numbers that separate the data into quarters. Quartiles may or may not be part of the data. To find the quartiles, first find the median, or second, quartile. The **first quartile**, *Q*_{1}, is the middle value of the lower half of the data, and the third quartile, *Q*_{3}, is the middle value, or median, of the upper half of the data. To get the idea, consider the same data set:

1, 1, 2, 2, 4, 6, 6.8, 7.2, 8, 8.3, 9, 10, 10, 11.5

The data set has an even number of values (14 data values), so the median will be the average of the two middle values (the average of 6.8 and 7.2), which is calculated as $\frac{6.8+7.2}{2}$ and equals 7.

So, the median, or second quartile (${Q}_{2}$), is 7.

The first quartile is the median of the lower half of the data, so if we divide the data into seven values in the lower half and seven values in the upper half, we can see that we have an odd number of values in the lower half. Thus, the median of the lower half, or the first quartile (${Q}_{1}$) will be the middle value, or 2. Using the same procedure, we can see that the median of the upper half, or the third quartile (${Q}_{3}$) will be the middle value of the upper half, or 9.

The quartiles are illustrated below:

The interquartile range is a number that indicates the spread of the middle half, or the middle 50 percent of the data. It is the difference between the third quartile (*Q*_{3}) and the first quartile (*Q*_{1})

*IQR* = *Q*_{3} – *Q*_{1}. The *IQR* for this data set is calculated as 9 minus 2, or 7.

The *IQR* can help to determine potential **outliers**. **A value is suspected to be a potential outlier if it is less than 1.5 × IQR below the first quartile or more than 1.5 × IQR above the third quartile**. Potential outliers always require further investigation.

### NOTE

A potential outlier is a data point that is significantly different from the other data points. These special data points may be errors or some kind of abnormality, or they may be a key to understanding the data.

### Example 2.15

For the following 13 real estate prices, calculate the *IQR* and determine if any prices are potential outliers. Prices are in dollars.

389,950; 230,500; 158,000; 479,000; 639,000; 114,950; 5,500,000; 387,000; 659,000; 529,000; 575,000; 488,800; 1,095,000

Order the following data from smallest to largest:

114,950, 158,000, 230,500, 387,000, 389,950, 479,000, 488,800, 529,000, 575,000, 639,000, 659,000, 1,095,000, 5,500,000.

*M* = 488,800

*Q*_{1} = $\frac{\text{230,500+387,000}}{2}$ = 308,750

*Q*_{3} = $\frac{\text{639,000+659,000}}{2}$ = 649,000

*IQR* = 649,000 – 308,750 = 340,250

(1.5)(*IQR*) = (1.5)(340,250) = 510,375

*Q*_{1} – (1.5)(*IQR*) = 308,750 – 510,375 = –201,625

*Q*_{3} + (1.5)(*IQR*) = 649,000 + 510,375 = 1,159,375

No house price is less than –201,625. However, 5,500,000 is more than 1,159,375. Therefore, 5,500,000 is a potential outlier.

For the 11 salaries, calculate the *IQR* and determine if any salaries are outliers. The following salaries are in dollars:

$33,000, $64,500, $28,000, $54,000, $72,000, $68,500, $69,000, $42,000, $54,000, $120,000, $40,500

In the example above, you just saw the calculation of the median, first quartile, and third quartile. These three values are part of the five number summary. The other two values are the minimum value (or min) and the maximum value (or max). The five number summary is used to create a box plot.

Find the interquartile range for the following two data sets and compare them:

Test Scores for Class *A*

69, 96, 81, 79, 65, 76, 83, 99, 89, 67, 90, 77, 85, 98, 66, 91, 77, 69, 80, 94

Test Scores for Class *B*

### Example 2.16

Fifty statistics students were asked how much sleep they get per school night (rounded to the nearest hour). The results were as follows:

Amount of Sleep per School Night (Hours) | Frequency | Relative Frequency | Cumulative Relative Frequency |
---|---|---|---|

4 | 2 | 0.04 | 0.04 |

5 | 5 | 0.10 | 0.14 |

6 | 7 | 0.14 | 0.28 |

7 | 12 | 0.24 | 0.52 |

8 | 14 | 0.28 | 0.80 |

9 | 7 | 0.14 | 0.94 |

10 | 3 | 0.06 | 1.00 |

**Find the 28 ^{th} percentile**. Notice the 0.28 in the Cumulative Relative Frequency column. Twenty-eight percent of 50 data values is 14 values. There are 14 values less than the 28

^{th}percentile. They include the two 4s, the five 5s, and the seven 6s. The 28

^{th}percentile is between the last six and the first seven.

**The 28**

^{th}percentile is 6.5.**Find the median**. Look again at the Cumulative Relative Frequency column and find 0.52. The median is the 50^{th} percentile or the second quartile. Fifty percent of 50 is 25. There are 25 values less than the median. They include the two 4s, the five 5s, the seven 6s, and 11 of the 7s. The median or 50^{th} percentile is between the 25^{th}, or seven, and 26^{th}, or seven, values. **The median is seven.**

**Find the third quartile**. The third quartile is the same as the 75^{th} percentile. You can *eyeball* this answer. If you look at the Cumulative Relative Frequency column, you find 0.52 and 0.80. When you have all the fours, fives, sixes, and sevens, you have 52 percent of the data. When you include all the 8s, you have 80 percent of the data. **The 75 ^{th} percentile, then, must be an eight**. Another way to look at the problem is to find 75 percent of 50, which is 37.5, and round up to 38. The third quartile,

*Q*

_{3}, is the 38

^{th}value, which is an eight. You can check this answer by counting the values. There are 37 values below the third quartile and 12 values above.

Forty bus drivers were asked how many hours they spend each day running their routes (rounded to the nearest hour). Find the 65^{th} percentile.

Amount of Time Spent on Route (Hours) | Frequency | Relative Frequency | Cumulative Relative Frequency |
---|---|---|---|

2 | 12 | 0.30 | 0.30 |

3 | 14 | 0.35 | 0.65 |

4 | 10 | 0.25 | 0.90 |

5 | 4 | .10 | 1.00 |

### Example 2.17

Using Table 2.24:

- Find the 80
^{th}percentile. - Find the 90
^{th}percentile. - Find the first quartile. What is another name for the first quartile?

Using the data from the frequency table, we have the following:

- The 80
^{th}percentile is between the last eight and the first nine in the table (between the 40^{th}and 41^{st}values). Therefore, we need to take the mean of the 40^{th}and 41^{st}values. The 80^{th}percentile $=\frac{8+9}{2}=8.5\text{.}$ - The 90
^{th}percentile will be the 45^{th}data value (location is 0.90(50) = 45), and the 45^{th}data value is nine. *Q*_{1}is also the 25^{th}percentile. The 25^{th}percentile location calculation:*P*_{25}= 0.25(50) = 12.5 ≈ 13, the 13^{th}data value. Thus, the 25^{th}percentile is six.

Refer to Table 2.25. Find the third quartile. What is another name for the third quartile?

### Collaborative Exercise

Your instructor or a member of the class will ask everyone in class how many sweaters he or she owns. Answer the following questions:

- How many students were surveyed?
- What kind of sampling did you do?
- Construct two different histograms. For each, starting value = ________ and ending value = ________.
- Find the median, first quartile, and third quartile.
- Construct a table of the data to find the following:
- The 10
^{th}percentile - The 70
^{th}percentile - The percentage of students who own fewer than four sweaters

- The 10