Introduction
Introduction
Tests of independence involve using a contingency table of observed (data) values.
The test statistic for a test of independence is similar to that of a goodnessoffit test
where
 O = observed values,
 E = expected values,
 i = the number of rows in the table, and
 j = the number of columns in the table.
There are $$
A test of independence determines whether two factors are independent or not. You first encountered the term independence in Probability Topics. As a review, consider the following example:
Note
The expected value for each cell needs to be at least five for you to use this test.
Example 11.5
Suppose A = a speeding violation in the last year and B = a cell phone user while driving. If A and B are independent, then P(A AND B) = P(A)P(B). A AND B is the event that a driver received a speeding violation last year and also used a cell phone while driving. Suppose, in a study of drivers who received speeding violations in the last year, and who used cell phone while driving, that 755 people were surveyed. Out of the 755, 70 had a speeding violation and 685 did not; 305 used cell phones while driving and 450 did not.
Let y = expected number of drivers who used a cell phone while driving and received speeding violations.
If A and B are independent, then P(A AND B) = P(A)P(B). By substitution,
$$\frac{y}{755}=\left(\frac{70}{755}\right)\left(\frac{305}{755}\right)\text{.}$$
Solve for y: y = $\frac{(70)(305)}{755}=28.3\text{.}$
About 28 people from the sample are expected to use cell phones while driving and to receive speeding violations.
In a test of independence, we state the null and alternative hypotheses in words. Since the contingency table consists of two factors, the null hypothesis states that the factors are independent and the alternative hypothesis states that they are not independent—dependent. If we do a test of independence using the example, then the null hypothesis is the following:
H_{0}: Being a cell phone user while driving and receiving a speeding violation are independent events.
If the null hypothesis were true, we would expect about 28 people to use cell phones while driving and to receive a speeding violation.
The test of independence is always righttailed because of the calculation of the test statistic. If the expected and observed values are not close together, then the test statistic is very large and way out in the right tail of the chisquare curve, as it is in a goodnessoffit.
The number of degrees of freedom for the test of independence is
df = (number of columns – 1)(number of rows – 1).
The following formula calculates the expected number (E):
$$E=\frac{\text{(row total)(column total)}}{\text{total number}}$$
A sample of 300 students is taken. Of the students surveyed, 50 were music students, while 250 were not. Ninetyseven were on the honor roll, while 203 were not. If we assume being a music student and being on the honor roll are independent events, what is the expected number of music students who are also on the honor roll?
Example 11.6
In a volunteer group, adults 21 and older volunteer from one to nine hours each week to spend time with a disabled senior citizen. The program recruits among community college students, fouryear college students, and nonstudents. In Table 11.15 is a sample of the adult volunteers and the number of hours they volunteer per week.
Type of Volunteer  1–3 Hours  4–6 Hours  7–9 Hours  Row Total 

Community college students  111  96  48  255 
Fouryear college students  96  133  61  290 
Nonstudents  91  150  53  294 
Column total  298  379  162  839 
Is the number of hours volunteered independent of the type of volunteer?
The observed values and the question at the end of the problem, “Is the number of hours volunteered independent of the type of volunteer?” tell you this is a test of independence. The two factors are number of hours volunteered and type of volunteer. This test is always righttailed.
H_{0}: The number of hours volunteered is independent of the type of volunteer.
H_{a}: The number of hours volunteered is dependent on the type of volunteer.
The expected result are in Table 11.16.
Type of Volunteer  1–3 Hours  4–6 Hours  7–9 Hours 

Community college students  90.57  115.19  49.24 
Fouryear college students  103  131  56 
Nonstudents  104.42  132.81  56.77 
For example, the calculation for the expected frequency for the topleft cell is
$$E=\frac{(\text{rowtotal})(\text{columntotal})}{\text{totalnumbersurveyed}}=\frac{\left(255\right)\left(298\right)}{839}=90.57\text{.}$$
Calculate the test statistic: χ^{2} = 12.99 (calculator or computer)
Distribution for the test: ${\chi}_{4}^{2}\text{}$
df = (3 columns – 1)(3 rows – 1) = (2)(2) = 4
Graph
Probability statement: pvalue = P(χ^{2} > 12.99) = 0.0113
Compare α and the pvalue: Since no α is given, assume α = 0.05. pvalue = 0.0113. α > pvalue.
Make a decision: Since α > pvalue, reject H_{0}. This means that the factors are not independent.
Conclusion: At a 5 percent level of significance, from the data, there is sufficient evidence to conclude that the number of hours volunteered and the type of volunteer are dependent on each other.
For the example in Table 11.16, if there had been another type of volunteer, teenagers, what would the degrees of freedom be?
Using the TI83, 83+, 84, 84+ Calculator
Press the MATRX
key and arrow over to EDIT
. Press 1:[A]
. Press 3 ENTER 3 ENTER
. Enter the table values by row from Table 11.16. Press ENTER
after each. Press 2nd QUIT
. Press STAT
and arrow over to TESTS
. Arrow down to C:χ2TEST
. Press ENTER
. You should see Observed:[A]
and Expected:[B]
. Arrow down to Calculate
. Press ENTER
. The test statistic is 12.9909 and the pvalue = .0113. Do the procedure a second time, but arrow down to Draw
instead of Calculate
.
The Bureau of Labor Statistics gathers data about employment in the United States. A sample is taken to calculate the number of U.S. citizens working in one of several industry sectors over time. Table 11.17 shows the results:
Industry Sector  2000  2010  2020  Total 

Nonagriculture wage and salary  13,243  13,044  15,018  41,305 
Goodsproducing, excluding agriculture  2,457  1,771  1,950  6,178 
Servicesproviding  10,786  11,273  13,068  35,127 
Agriculture, forestry, fishing, and hunting  240  214  201  655 
Nonagriculture selfemployed and unpaid family worker  931  894  972  2,797 
Secondary wage and salary jobs in agriculture and private household industries  14  11  11  36 
Secondary jobs as a selfemployed or unpaid family worker  196  144  152  492 
Total  27,867  27,351  31,372  86,590 
We want to know if the change in the number of jobs is independent of the change in years. State the null and alternative hypotheses and the degrees of freedom.
Example 11.7
De Anza College is interested in the relationship between anxiety level and the need to succeed in school. A random sample of 400 students took a test that measured anxiety level and need to succeed in school. Table 11.18 shows the results. De Anza College wants to know if anxiety level and need to succeed in school are independent events.
Need to succeed in school  High

Medhigh

Medium

Medlow

Low

Row total 

High need  35  42  53  15  10  155 
Medium need  18  48  63  33  31  193 
Low need  4  5  11  15  17  52 
Column total  57  95  127  63  58  400 
a. How many high anxiety level students are expected to have a high need to succeed in school?
a. The column total for a high anxiety level is 57. The row total for high need to succeed in school is 155. The sample size or total surveyed is 400.
$$E=\frac{\text{(row total)(column total)}}{\text{total surveyed}}=\frac{155\cdot 57}{400}=22.09$$
The expected number of students who have a high anxiety level and a high need to succeed in school is about 22.
b. If the two variables are independent, how many students do you expect to have a low need to succeed in school and a medlow level of anxiety?
b. The column total for a medlow anxiety level is 63. The row total for a low need to succeed in school is 52. The sample size or total surveyed is 400.
c. $E=\frac{\text{(row total)(column total)}}{\text{total surveyed}}$
c. $E=\frac{\text{(row total)(column total)}}{\text{total surveyed}}=8.19$
d. The expected number of students who have a medlow anxiety level and a low need to succeed in school is about ________.
d. 8
Refer back to the information in Try It 11.6. How many servicesproviding jobs are there expected to be in 2020? How many nonagriculture wage and salary jobs are there expected to be in 2020?