Hypothesis Testing - Chi Squared Test

Lisa Sullivan, PhD

Professor of Biostatistics

Boston University School of Public Health

Introductory word scramble

Introduction

This module will continue the discussion of hypothesis testing, where a specific statement or hypothesis is generated about a population parameter, and sample statistics are used to assess the likelihood that the hypothesis is true. The hypothesis is based on available information and the investigator's belief about the population parameters. The specific tests considered here are called chi-square tests and are appropriate when the outcome is discrete (dichotomous, ordinal or categorical). For example, in some clinical trials the outcome is a classification such as hypertensive, pre-hypertensive or normotensive. We could use the same classification in an observational study such as the Framingham Heart Study to compare men and women in terms of their blood pressure status - again using the classification of hypertensive, pre-hypertensive or normotensive status.  

The technique to analyze a discrete outcome uses what is called a chi-square test. Specifically, the test statistic follows a chi-square probability distribution. We will consider chi-square tests here with one, two and more than two independent comparison groups.

Learning Objectives

After completing this module, the student will be able to:

  • Perform chi-square tests by hand
  • Appropriately interpret results of chi-square tests
  • Identify the appropriate hypothesis testing procedure based on type of outcome variable and number of samples

Tests with One Sample, Discrete Outcome

Here we consider hypothesis testing with a discrete outcome variable in a single population. Discrete variables are variables that take on more than two distinct responses or categories and the responses can be ordered or unordered (i.e., the outcome can be ordinal or categorical). The procedure we describe here can be used for dichotomous (exactly 2 response options), ordinal or categorical discrete outcomes and the objective is to compare the distribution of responses, or the proportions of participants in each response category, to a known distribution. The known distribution is derived from another study or report and it is again important in setting up the hypotheses that the comparator distribution specified in the null hypothesis is a fair comparison. The comparator is sometimes called an external or a historical control.   

In one sample tests for a discrete outcome, we set up our hypotheses against an appropriate comparator. We select a sample and compute descriptive statistics on the sample data. Specifically, we compute the sample size (n) and the proportions of participants in each response

Test Statistic for Testing H 0 : p 1 = p 10 , p 2 = p 20 , ..., p k = p k0

We find the critical value in a table of probabilities for the chi-square distribution with degrees of freedom (df) = k-1. In the test statistic, O = observed frequency and E=expected frequency in each of the response categories. The observed frequencies are those observed in the sample and the expected frequencies are computed as described below. χ 2 (chi-square) is another probability distribution and ranges from 0 to ∞. The test above statistic formula above is appropriate for large samples, defined as expected frequencies of at least 5 in each of the response categories.  

When we conduct a χ 2 test, we compare the observed frequencies in each response category to the frequencies we would expect if the null hypothesis were true. These expected frequencies are determined by allocating the sample to the response categories according to the distribution specified in H 0 . This is done by multiplying the observed sample size (n) by the proportions specified in the null hypothesis (p 10 , p 20 , ..., p k0 ). To ensure that the sample size is appropriate for the use of the test statistic above, we need to ensure that the following: min(np 10 , n p 20 , ..., n p k0 ) > 5.  

The test of hypothesis with a discrete outcome measured in a single sample, where the goal is to assess whether the distribution of responses follows a known distribution, is called the χ 2 goodness-of-fit test. As the name indicates, the idea is to assess whether the pattern or distribution of responses in the sample "fits" a specified population (external or historical) distribution. In the next example we illustrate the test. As we work through the example, we provide additional details related to the use of this new test statistic.  

A University conducted a survey of its recent graduates to collect demographic and health information for future planning purposes as well as to assess students' satisfaction with their undergraduate experiences. The survey revealed that a substantial proportion of students were not engaging in regular exercise, many felt their nutrition was poor and a substantial number were smoking. In response to a question on regular exercise, 60% of all graduates reported getting no regular exercise, 25% reported exercising sporadically and 15% reported exercising regularly as undergraduates. The next year the University launched a health promotion campaign on campus in an attempt to increase health behaviors among undergraduates. The program included modules on exercise, nutrition and smoking cessation. To evaluate the impact of the program, the University again surveyed graduates and asked the same questions. The survey was completed by 470 graduates and the following data were collected on the exercise question:

Based on the data, is there evidence of a shift in the distribution of responses to the exercise question following the implementation of the health promotion campaign on campus? Run the test at a 5% level of significance.

In this example, we have one sample and a discrete (ordinal) outcome variable (with three response options). We specifically want to compare the distribution of responses in the sample to the distribution reported the previous year (i.e., 60%, 25%, 15% reporting no, sporadic and regular exercise, respectively). We now run the test using the five-step approach.  

  • Step 1. Set up hypotheses and determine level of significance.

The null hypothesis again represents the "no change" or "no difference" situation. If the health promotion campaign has no impact then we expect the distribution of responses to the exercise question to be the same as that measured prior to the implementation of the program.

H 0 : p 1 =0.60, p 2 =0.25, p 3 =0.15,  or equivalently H 0 : Distribution of responses is 0.60, 0.25, 0.15  

H 1 :   H 0 is false.          α =0.05

Notice that the research hypothesis is written in words rather than in symbols. The research hypothesis as stated captures any difference in the distribution of responses from that specified in the null hypothesis. We do not specify a specific alternative distribution, instead we are testing whether the sample data "fit" the distribution in H 0 or not. With the χ 2 goodness-of-fit test there is no upper or lower tailed version of the test.

  • Step 2. Select the appropriate test statistic.  

The test statistic is:

We must first assess whether the sample size is adequate. Specifically, we need to check min(np 0 , np 1, ..., n p k ) > 5. The sample size here is n=470 and the proportions specified in the null hypothesis are 0.60, 0.25 and 0.15. Thus, min( 470(0.65), 470(0.25), 470(0.15))=min(282, 117.5, 70.5)=70.5. The sample size is more than adequate so the formula can be used.

  • Step 3. Set up decision rule.  

The decision rule for the χ 2 test depends on the level of significance and the degrees of freedom, defined as degrees of freedom (df) = k-1 (where k is the number of response categories). If the null hypothesis is true, the observed and expected frequencies will be close in value and the χ 2 statistic will be close to zero. If the null hypothesis is false, then the χ 2 statistic will be large. Critical values can be found in a table of probabilities for the χ 2 distribution. Here we have df=k-1=3-1=2 and a 5% level of significance. The appropriate critical value is 5.99, and the decision rule is as follows: Reject H 0 if χ 2 > 5.99.

  • Step 4. Compute the test statistic.  

We now compute the expected frequencies using the sample size and the proportions specified in the null hypothesis. We then substitute the sample data (observed frequencies) and the expected frequencies into the formula for the test statistic identified in Step 2. The computations can be organized as follows.

Notice that the expected frequencies are taken to one decimal place and that the sum of the observed frequencies is equal to the sum of the expected frequencies. The test statistic is computed as follows:

  • Step 5. Conclusion.  

We reject H 0 because 8.46 > 5.99. We have statistically significant evidence at α=0.05 to show that H 0 is false, or that the distribution of responses is not 0.60, 0.25, 0.15.  The p-value is p < 0.005.  

In the χ 2 goodness-of-fit test, we conclude that either the distribution specified in H 0 is false (when we reject H 0 ) or that we do not have sufficient evidence to show that the distribution specified in H 0 is false (when we fail to reject H 0 ). Here, we reject H 0 and concluded that the distribution of responses to the exercise question following the implementation of the health promotion campaign was not the same as the distribution prior. The test itself does not provide details of how the distribution has shifted. A comparison of the observed and expected frequencies will provide some insight into the shift (when the null hypothesis is rejected). Does it appear that the health promotion campaign was effective?  

Consider the following: 

If the null hypothesis were true (i.e., no change from the prior year) we would have expected more students to fall in the "No Regular Exercise" category and fewer in the "Regular Exercise" categories. In the sample, 255/470 = 54% reported no regular exercise and 90/470=19% reported regular exercise. Thus, there is a shift toward more regular exercise following the implementation of the health promotion campaign. There is evidence of a statistical difference, is this a meaningful difference? Is there room for improvement?

The National Center for Health Statistics (NCHS) provided data on the distribution of weight (in categories) among Americans in 2002. The distribution was based on specific values of body mass index (BMI) computed as weight in kilograms over height in meters squared. Underweight was defined as BMI< 18.5, Normal weight as BMI between 18.5 and 24.9, overweight as BMI between 25 and 29.9 and obese as BMI of 30 or greater. Americans in 2002 were distributed as follows: 2% Underweight, 39% Normal Weight, 36% Overweight, and 23% Obese. Suppose we want to assess whether the distribution of BMI is different in the Framingham Offspring sample. Using data from the n=3,326 participants who attended the seventh examination of the Offspring in the Framingham Heart Study we created the BMI categories as defined and observed the following:

  • Step 1.  Set up hypotheses and determine level of significance.

H 0 : p 1 =0.02, p 2 =0.39, p 3 =0.36, p 4 =0.23     or equivalently

H 0 : Distribution of responses is 0.02, 0.39, 0.36, 0.23

H 1 :   H 0 is false.        α=0.05

The formula for the test statistic is:

We must assess whether the sample size is adequate. Specifically, we need to check min(np 0 , np 1, ..., n p k ) > 5. The sample size here is n=3,326 and the proportions specified in the null hypothesis are 0.02, 0.39, 0.36 and 0.23. Thus, min( 3326(0.02), 3326(0.39), 3326(0.36), 3326(0.23))=min(66.5, 1297.1, 1197.4, 765.0)=66.5. The sample size is more than adequate, so the formula can be used.

Here we have df=k-1=4-1=3 and a 5% level of significance. The appropriate critical value is 7.81 and the decision rule is as follows: Reject H 0 if χ 2 > 7.81.

We now compute the expected frequencies using the sample size and the proportions specified in the null hypothesis. We then substitute the sample data (observed frequencies) into the formula for the test statistic identified in Step 2. We organize the computations in the following table.

The test statistic is computed as follows:

We reject H 0 because 233.53 > 7.81. We have statistically significant evidence at α=0.05 to show that H 0 is false or that the distribution of BMI in Framingham is different from the national data reported in 2002, p < 0.005.  

Again, the χ 2   goodness-of-fit test allows us to assess whether the distribution of responses "fits" a specified distribution. Here we show that the distribution of BMI in the Framingham Offspring Study is different from the national distribution. To understand the nature of the difference we can compare observed and expected frequencies or observed and expected proportions (or percentages). The frequencies are large because of the large sample size, the observed percentages of patients in the Framingham sample are as follows: 0.6% underweight, 28% normal weight, 41% overweight and 30% obese. In the Framingham Offspring sample there are higher percentages of overweight and obese persons (41% and 30% in Framingham as compared to 36% and 23% in the national data), and lower proportions of underweight and normal weight persons (0.6% and 28% in Framingham as compared to 2% and 39% in the national data). Are these meaningful differences?

In the module on hypothesis testing for means and proportions, we discussed hypothesis testing applications with a dichotomous outcome variable in a single population. We presented a test using a test statistic Z to test whether an observed (sample) proportion differed significantly from a historical or external comparator. The chi-square goodness-of-fit test can also be used with a dichotomous outcome and the results are mathematically equivalent.  

In the prior module, we considered the following example. Here we show the equivalence to the chi-square goodness-of-fit test.

The NCHS report indicated that in 2002, 75% of children aged 2 to 17 saw a dentist in the past year. An investigator wants to assess whether use of dental services is similar in children living in the city of Boston. A sample of 125 children aged 2 to 17 living in Boston are surveyed and 64 reported seeing a dentist over the past 12 months. Is there a significant difference in use of dental services between children living in Boston and the national data?

We presented the following approach to the test using a Z statistic. 

  • Step 1. Set up hypotheses and determine level of significance

H 0 : p = 0.75

H 1 : p ≠ 0.75                               α=0.05

We must first check that the sample size is adequate. Specifically, we need to check min(np 0 , n(1-p 0 )) = min( 125(0.75), 125(1-0.75))=min(94, 31)=31. The sample size is more than adequate so the following formula can be used

This is a two-tailed test, using a Z statistic and a 5% level of significance. Reject H 0 if Z < -1.960 or if Z > 1.960.

We now substitute the sample data into the formula for the test statistic identified in Step 2. The sample proportion is:

hypothesis testing and chi square

We reject H 0 because -6.15 < -1.960. We have statistically significant evidence at a =0.05 to show that there is a statistically significant difference in the use of dental service by children living in Boston as compared to the national data. (p < 0.0001).  

We now conduct the same test using the chi-square goodness-of-fit test. First, we summarize our sample data as follows:

H 0 : p 1 =0.75, p 2 =0.25     or equivalently H 0 : Distribution of responses is 0.75, 0.25 

We must assess whether the sample size is adequate. Specifically, we need to check min(np 0 , np 1, ...,np k >) > 5. The sample size here is n=125 and the proportions specified in the null hypothesis are 0.75, 0.25. Thus, min( 125(0.75), 125(0.25))=min(93.75, 31.25)=31.25. The sample size is more than adequate so the formula can be used.

Here we have df=k-1=2-1=1 and a 5% level of significance. The appropriate critical value is 3.84, and the decision rule is as follows: Reject H 0 if χ 2 > 3.84. (Note that 1.96 2 = 3.84, where 1.96 was the critical value used in the Z test for proportions shown above.)

(Note that (-6.15) 2 = 37.8, where -6.15 was the value of the Z statistic in the test for proportions shown above.)

We reject H 0 because 37.8 > 3.84. We have statistically significant evidence at α=0.05 to show that there is a statistically significant difference in the use of dental service by children living in Boston as compared to the national data.  (p < 0.0001). This is the same conclusion we reached when we conducted the test using the Z test above. With a dichotomous outcome, Z 2 = χ 2 !   In statistics, there are often several approaches that can be used to test hypotheses. 

Tests for Two or More Independent Samples, Discrete Outcome

Here we extend that application of the chi-square test to the case with two or more independent comparison groups. Specifically, the outcome of interest is discrete with two or more responses and the responses can be ordered or unordered (i.e., the outcome can be dichotomous, ordinal or categorical). We now consider the situation where there are two or more independent comparison groups and the goal of the analysis is to compare the distribution of responses to the discrete outcome variable among several independent comparison groups.  

The test is called the χ 2 test of independence and the null hypothesis is that there is no difference in the distribution of responses to the outcome across comparison groups. This is often stated as follows: The outcome variable and the grouping variable (e.g., the comparison treatments or comparison groups) are independent (hence the name of the test). Independence here implies homogeneity in the distribution of the outcome among comparison groups.    

The null hypothesis in the χ 2 test of independence is often stated in words as: H 0 : The distribution of the outcome is independent of the groups. The alternative or research hypothesis is that there is a difference in the distribution of responses to the outcome variable among the comparison groups (i.e., that the distribution of responses "depends" on the group). In order to test the hypothesis, we measure the discrete outcome variable in each participant in each comparison group. The data of interest are the observed frequencies (or number of participants in each response category in each group). The formula for the test statistic for the χ 2 test of independence is given below.

Test Statistic for Testing H 0 : Distribution of outcome is independent of groups

and we find the critical value in a table of probabilities for the chi-square distribution with df=(r-1)*(c-1).

Here O = observed frequency, E=expected frequency in each of the response categories in each group, r = the number of rows in the two-way table and c = the number of columns in the two-way table.   r and c correspond to the number of comparison groups and the number of response options in the outcome (see below for more details). The observed frequencies are the sample data and the expected frequencies are computed as described below. The test statistic is appropriate for large samples, defined as expected frequencies of at least 5 in each of the response categories in each group.  

The data for the χ 2 test of independence are organized in a two-way table. The outcome and grouping variable are shown in the rows and columns of the table. The sample table below illustrates the data layout. The table entries (blank below) are the numbers of participants in each group responding to each response category of the outcome variable.

Table - Possible outcomes are are listed in the columns; The groups being compared are listed in rows.

In the table above, the grouping variable is shown in the rows of the table; r denotes the number of independent groups. The outcome variable is shown in the columns of the table; c denotes the number of response options in the outcome variable. Each combination of a row (group) and column (response) is called a cell of the table. The table has r*c cells and is sometimes called an r x c ("r by c") table. For example, if there are 4 groups and 5 categories in the outcome variable, the data are organized in a 4 X 5 table. The row and column totals are shown along the right-hand margin and the bottom of the table, respectively. The total sample size, N, can be computed by summing the row totals or the column totals. Similar to ANOVA, N does not refer to a population size here but rather to the total sample size in the analysis. The sample data can be organized into a table like the above. The numbers of participants within each group who select each response option are shown in the cells of the table and these are the observed frequencies used in the test statistic.

The test statistic for the χ 2 test of independence involves comparing observed (sample data) and expected frequencies in each cell of the table. The expected frequencies are computed assuming that the null hypothesis is true. The null hypothesis states that the two variables (the grouping variable and the outcome) are independent. The definition of independence is as follows:

 Two events, A and B, are independent if P(A|B) = P(A), or equivalently, if P(A and B) = P(A) P(B).

The second statement indicates that if two events, A and B, are independent then the probability of their intersection can be computed by multiplying the probability of each individual event. To conduct the χ 2 test of independence, we need to compute expected frequencies in each cell of the table. Expected frequencies are computed by assuming that the grouping variable and outcome are independent (i.e., under the null hypothesis). Thus, if the null hypothesis is true, using the definition of independence:

P(Group 1 and Response Option 1) = P(Group 1) P(Response Option 1).

 The above states that the probability that an individual is in Group 1 and their outcome is Response Option 1 is computed by multiplying the probability that person is in Group 1 by the probability that a person is in Response Option 1. To conduct the χ 2 test of independence, we need expected frequencies and not expected probabilities . To convert the above probability to a frequency, we multiply by N. Consider the following small example.

The data shown above are measured in a sample of size N=150. The frequencies in the cells of the table are the observed frequencies. If Group and Response are independent, then we can compute the probability that a person in the sample is in Group 1 and Response category 1 using:

P(Group 1 and Response 1) = P(Group 1) P(Response 1),

P(Group 1 and Response 1) = (25/150) (62/150) = 0.069.

Thus if Group and Response are independent we would expect 6.9% of the sample to be in the top left cell of the table (Group 1 and Response 1). The expected frequency is 150(0.069) = 10.4.   We could do the same for Group 2 and Response 1:

P(Group 2 and Response 1) = P(Group 2) P(Response 1),

P(Group 2 and Response 1) = (50/150) (62/150) = 0.138.

The expected frequency in Group 2 and Response 1 is 150(0.138) = 20.7.

Thus, the formula for determining the expected cell frequencies in the χ 2 test of independence is as follows:

Expected Cell Frequency = (Row Total * Column Total)/N.

The above computes the expected frequency in one step rather than computing the expected probability first and then converting to a frequency.  

In a prior example we evaluated data from a survey of university graduates which assessed, among other things, how frequently they exercised. The survey was completed by 470 graduates. In the prior example we used the χ 2 goodness-of-fit test to assess whether there was a shift in the distribution of responses to the exercise question following the implementation of a health promotion campaign on campus. We specifically considered one sample (all students) and compared the observed distribution to the distribution of responses the prior year (a historical control). Suppose we now wish to assess whether there is a relationship between exercise on campus and students' living arrangements. As part of the same survey, graduates were asked where they lived their senior year. The response options were dormitory, on-campus apartment, off-campus apartment, and at home (i.e., commuted to and from the university). The data are shown below.

Based on the data, is there a relationship between exercise and student's living arrangement? Do you think where a person lives affect their exercise status? Here we have four independent comparison groups (living arrangement) and a discrete (ordinal) outcome variable with three response options. We specifically want to test whether living arrangement and exercise are independent. We will run the test using the five-step approach.  

H 0 : Living arrangement and exercise are independent

H 1 : H 0 is false.                α=0.05

The null and research hypotheses are written in words rather than in symbols. The research hypothesis is that the grouping variable (living arrangement) and the outcome variable (exercise) are dependent or related.   

  • Step 2.  Select the appropriate test statistic.  

The condition for appropriate use of the above test statistic is that each expected frequency is at least 5. In Step 4 we will compute the expected frequencies and we will ensure that the condition is met.

The decision rule depends on the level of significance and the degrees of freedom, defined as df = (r-1)(c-1), where r and c are the numbers of rows and columns in the two-way data table.   The row variable is the living arrangement and there are 4 arrangements considered, thus r=4. The column variable is exercise and 3 responses are considered, thus c=3. For this test, df=(4-1)(3-1)=3(2)=6. Again, with χ 2 tests there are no upper, lower or two-tailed tests. If the null hypothesis is true, the observed and expected frequencies will be close in value and the χ 2 statistic will be close to zero. If the null hypothesis is false, then the χ 2 statistic will be large. The rejection region for the χ 2 test of independence is always in the upper (right-hand) tail of the distribution. For df=6 and a 5% level of significance, the appropriate critical value is 12.59 and the decision rule is as follows: Reject H 0 if c 2 > 12.59.

We now compute the expected frequencies using the formula,

Expected Frequency = (Row Total * Column Total)/N.

The computations can be organized in a two-way table. The top number in each cell of the table is the observed frequency and the bottom number is the expected frequency.   The expected frequencies are shown in parentheses.

Notice that the expected frequencies are taken to one decimal place and that the sums of the observed frequencies are equal to the sums of the expected frequencies in each row and column of the table.  

Recall in Step 2 a condition for the appropriate use of the test statistic was that each expected frequency is at least 5. This is true for this sample (the smallest expected frequency is 9.6) and therefore it is appropriate to use the test statistic.

We reject H 0 because 60.5 > 12.59. We have statistically significant evidence at a =0.05 to show that H 0 is false or that living arrangement and exercise are not independent (i.e., they are dependent or related), p < 0.005.  

Again, the χ 2 test of independence is used to test whether the distribution of the outcome variable is similar across the comparison groups. Here we rejected H 0 and concluded that the distribution of exercise is not independent of living arrangement, or that there is a relationship between living arrangement and exercise. The test provides an overall assessment of statistical significance. When the null hypothesis is rejected, it is important to review the sample data to understand the nature of the relationship. Consider again the sample data. 

Because there are different numbers of students in each living situation, it makes the comparisons of exercise patterns difficult on the basis of the frequencies alone. The following table displays the percentages of students in each exercise category by living arrangement. The percentages sum to 100% in each row of the table. For comparison purposes, percentages are also shown for the total sample along the bottom row of the table.

From the above, it is clear that higher percentages of students living in dormitories and in on-campus apartments reported regular exercise (31% and 23%) as compared to students living in off-campus apartments and at home (10% each).  

Test Yourself

 Pancreaticoduodenectomy (PD) is a procedure that is associated with considerable morbidity. A study was recently conducted on 553 patients who had a successful PD between January 2000 and December 2010 to determine whether their Surgical Apgar Score (SAS) is related to 30-day perioperative morbidity and mortality. The table below gives the number of patients experiencing no, minor, or major morbidity by SAS category.  

Question: What would be an appropriate statistical test to examine whether there is an association between Surgical Apgar Score and patient outcome? Using 14.13 as the value of the test statistic for these data, carry out the appropriate test at a 5% level of significance. Show all parts of your test.

In the module on hypothesis testing for means and proportions, we discussed hypothesis testing applications with a dichotomous outcome variable and two independent comparison groups. We presented a test using a test statistic Z to test for equality of independent proportions. The chi-square test of independence can also be used with a dichotomous outcome and the results are mathematically equivalent.  

In the prior module, we considered the following example. Here we show the equivalence to the chi-square test of independence.

A randomized trial is designed to evaluate the effectiveness of a newly developed pain reliever designed to reduce pain in patients following joint replacement surgery. The trial compares the new pain reliever to the pain reliever currently in use (called the standard of care). A total of 100 patients undergoing joint replacement surgery agreed to participate in the trial. Patients were randomly assigned to receive either the new pain reliever or the standard pain reliever following surgery and were blind to the treatment assignment. Before receiving the assigned treatment, patients were asked to rate their pain on a scale of 0-10 with higher scores indicative of more pain. Each patient was then given the assigned treatment and after 30 minutes was again asked to rate their pain on the same scale. The primary outcome was a reduction in pain of 3 or more scale points (defined by clinicians as a clinically meaningful reduction). The following data were observed in the trial.

We tested whether there was a significant difference in the proportions of patients reporting a meaningful reduction (i.e., a reduction of 3 or more scale points) using a Z statistic, as follows. 

H 0 : p 1 = p 2    

H 1 : p 1 ≠ p 2                             α=0.05

Here the new or experimental pain reliever is group 1 and the standard pain reliever is group 2.

We must first check that the sample size is adequate. Specifically, we need to ensure that we have at least 5 successes and 5 failures in each comparison group or that:

In this example, we have

Therefore, the sample size is adequate, so the following formula can be used:

Reject H 0 if Z < -1.960 or if Z > 1.960.

We now substitute the sample data into the formula for the test statistic identified in Step 2. We first compute the overall proportion of successes:

We now substitute to compute the test statistic.

  • Step 5.  Conclusion.  

We now conduct the same test using the chi-square test of independence.  

H 0 : Treatment and outcome (meaningful reduction in pain) are independent

H 1 :   H 0 is false.         α=0.05

The formula for the test statistic is:  

For this test, df=(2-1)(2-1)=1. At a 5% level of significance, the appropriate critical value is 3.84 and the decision rule is as follows: Reject H0 if χ 2 > 3.84. (Note that 1.96 2 = 3.84, where 1.96 was the critical value used in the Z test for proportions shown above.)

We now compute the expected frequencies using:

The computations can be organized in a two-way table. The top number in each cell of the table is the observed frequency and the bottom number is the expected frequency. The expected frequencies are shown in parentheses.

A condition for the appropriate use of the test statistic was that each expected frequency is at least 5. This is true for this sample (the smallest expected frequency is 22.0) and therefore it is appropriate to use the test statistic.

(Note that (2.53) 2 = 6.4, where 2.53 was the value of the Z statistic in the test for proportions shown above.)

Chi-Squared Tests in R

The video below by Mike Marin demonstrates how to perform chi-squared tests in the R programming language.

Answer to Problem on Pancreaticoduodenectomy and Surgical Apgar Scores

We have 3 independent comparison groups (Surgical Apgar Score) and a categorical outcome variable (morbidity/mortality). We can run a Chi-Squared test of independence.

H 0 : Apgar scores and patient outcome are independent of one another.

H A : Apgar scores and patient outcome are not independent.

Chi-squared = 14.3

Since 14.3 is greater than 9.49, we reject H 0.

There is an association between Apgar scores and patient outcome. The lowest Apgar score group (0 to 4) experienced the highest percentage of major morbidity or mortality (16 out of 57=28%) compared to the other Apgar score groups.

JMP | Statistical Discovery.™ From SAS.

Statistics Knowledge Portal

A free online introduction to statistics

The Chi-Square Test

What is a chi-square test.

A Chi-square test is a hypothesis testing method. Two common Chi-square tests involve checking if observed frequencies in one or more categories match expected frequencies.

Is a Chi-square test the same as a χ² test?

Yes, χ is the Greek symbol Chi.

What are my choices?

If you have a single measurement variable, you use a Chi-square goodness of fit test . If you have two measurement variables, you use a Chi-square test of independence . There are other Chi-square tests, but these two are the most common.

Types of Chi-square tests

You use a Chi-square test for hypothesis tests about whether your data is as expected. The basic idea behind the test is to compare the observed values in your data to the expected values that you would see if the null hypothesis is true.

There are two commonly used Chi-square tests: the Chi-square goodness of fit test and the Chi-square test of independence . Both tests involve variables that divide your data into categories. As a result, people can be confused about which test to use. The table below compares the two tests.

Visit the individual pages for each type of Chi-square test to see examples along with details on assumptions and calculations.

Table 1: Choosing a Chi-square test

How to perform a chi-square test.

For both the Chi-square goodness of fit test and the Chi-square test of independence , you perform the same analysis steps, listed below. Visit the pages for each type of test to see these steps in action.

  • Define your null and alternative hypotheses before collecting your data.
  • Decide on the alpha value. This involves deciding the risk you are willing to take of drawing the wrong conclusion. For example, suppose you set α=0.05 when testing for independence. Here, you have decided on a 5% risk of concluding the two variables are independent when in reality they are not.
  • Check the data for errors.
  • Check the assumptions for the test. (Visit the pages for each test type for more detail on assumptions.)
  • Perform the test and draw your conclusion.

Both Chi-square tests in the table above involve calculating a test statistic. The basic idea behind the tests is that you compare the actual data values with what would be expected if the null hypothesis is true. The test statistic involves finding the squared difference between actual and expected data values, and dividing that difference by the expected data values. You do this for each data point and add up the values.

Then, you compare the test statistic to a theoretical value from the Chi-square distribution . The theoretical value depends on both the alpha value and the degrees of freedom for your data. Visit the pages for each test type for detailed examples.

Chi-Square (Χ²) Test & How To Calculate Formula Equation

Benjamin Frimodig

Science Expert

B.A., History and Science, Harvard University

Ben Frimodig is a 2021 graduate of Harvard College, where he studied the History of Science.

Learn about our Editorial Process

Saul Mcleod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul Mcleod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

On This Page:

Chi-square (χ2) is used to test hypotheses about the distribution of observations into categories with no inherent ranking.

What Is a Chi-Square Statistic?

The Chi-square test (pronounced Kai) looks at the pattern of observations and will tell us if certain combinations of the categories occur more frequently than we would expect by chance, given the total number of times each category occurred.

It looks for an association between the variables. We cannot use a correlation coefficient to look for the patterns in this data because the categories often do not form a continuum.

There are three main types of Chi-square tests, tests of goodness of fit, the test of independence, and the test for homogeneity. All three tests rely on the same formula to compute a test statistic.

These tests function by deciphering relationships between observed sets of data and theoretical or “expected” sets of data that align with the null hypothesis.

What is a Contingency Table?

Contingency tables (also known as two-way tables) are grids in which Chi-square data is organized and displayed. They provide a basic picture of the interrelation between two variables and can help find interactions between them.

In contingency tables, one variable and each of its categories are listed vertically, and the other variable and each of its categories are listed horizontally.

Additionally, including column and row totals, also known as “marginal frequencies,” will help facilitate the Chi-square testing process.

In order for the Chi-square test to be considered trustworthy, each cell of your expected contingency table must have a value of at least five.

Each Chi-square test will have one contingency table representing observed counts (see Fig. 1) and one contingency table representing expected counts (see Fig. 2).

contingency table representing observed counts

Figure 1. Observed table (which contains the observed counts).

To obtain the expected frequencies for any cell in any cross-tabulation in which the two variables are assumed independent, multiply the row and column totals for that cell and divide the product by the total number of cases in the table.

contingency table representing observed counts

Figure 2. Expected table (what we expect the two-way table to look like if the two categorical variables are independent).

To decide if our calculated value for χ2 is significant, we also need to work out the degrees of freedom for our contingency table using the following formula: df= (rows – 1) x (columns – 1).

Formula Calculation

chi-squared-equation

Calculate the chi-square statistic (χ2) by completing the following steps:

  • Calculate the expected frequencies and the observed frequencies.
  • For each observed number in the table, subtract the corresponding expected number (O — E).
  • Square the difference (O —E)².
  • Divide the squares obtained for each cell in the table by the expected number for that cell (O – E)² / E.
  • Sum all the values for (O – E)² / E. This is the chi-square statistic.
  • Calculate the degrees of freedom for the contingency table using the following formula; df= (rows – 1) x (columns – 1).

Once we have calculated the degrees of freedom (df) and the chi-squared value (χ2), we can use the χ2 table (often at the back of a statistics book) to check if our value for χ2 is higher than the critical value given in the table. If it is, then our result is significant at the level given.

Interpretation

The chi-square statistic tells you how much difference exists between the observed count in each table cell to the counts you would expect if there were no relationship at all in the population.

Small Chi-Square Statistic: If the chi-square statistic is small and the p-value is large (usually greater than 0.05), this often indicates that the observed frequencies in the sample are close to what would be expected under the null hypothesis.

The null hypothesis usually states no association between the variables being studied or that the observed distribution fits the expected distribution.

In theory, if the observed and expected values were equal (no difference), then the chi-square statistic would be zero — but this is unlikely to happen in real life.

Large Chi-Square Statistic : If the chi-square statistic is large and the p-value is small (usually less than 0.05), then the conclusion is often that the data does not fit the model well, i.e., the observed and expected values are significantly different. This often leads to the rejection of the null hypothesis.

How to Report

To report a chi-square output in an APA-style results section, always rely on the following template:

χ2 ( degrees of freedom , N = sample size ) = chi-square statistic value , p = p value .

chi-squared-spss output

In the case of the above example, the results would be written as follows:

A chi-square test of independence showed that there was a significant association between gender and post-graduation education plans, χ2 (4, N = 101) = 54.50, p < .001.

APA Style Rules

  • Do not use a zero before a decimal when the statistic cannot be greater than 1 (proportion, correlation, level of statistical significance).
  • Report exact p values to two or three decimals (e.g., p = .006, p = .03).
  • However, report p values less than .001 as “ p < .001.”
  • Put a space before and after a mathematical operator (e.g., minus, plus, greater than, less than, equals sign).
  • Do not repeat statistics in both the text and a table or figure.

p -value Interpretation

You test whether a given χ2 is statistically significant by testing it against a table of chi-square distributions , according to the number of degrees of freedom for your sample, which is the number of categories minus 1. The chi-square assumes that you have at least 5 observations per category.

If you are using SPSS then you will have an expected p -value.

For a chi-square test, a p-value that is less than or equal to the .05 significance level indicates that the observed values are different to the expected values.

Thus, low p-values (p< .05) indicate a likely difference between the theoretical population and the collected sample. You can conclude that a relationship exists between the categorical variables.

Remember that p -values do not indicate the odds that the null hypothesis is true but rather provide the probability that one would obtain the sample distribution observed (or a more extreme distribution) if the null hypothesis was true.

A level of confidence necessary to accept the null hypothesis can never be reached. Therefore, conclusions must choose to either fail to reject the null or accept the alternative hypothesis, depending on the calculated p-value.

The four steps below show you how to analyze your data using a chi-square goodness-of-fit test in SPSS (when you have hypothesized that you have equal expected proportions).

Step 1 : Analyze > Nonparametric Tests > Legacy Dialogs > Chi-square… on the top menu as shown below:

Step 2 : Move the variable indicating categories into the “Test Variable List:” box.

Step 3 : If you want to test the hypothesis that all categories are equally likely, click “OK.”

Step 4 : Specify the expected count for each category by first clicking the “Values” button under “Expected Values.”

Step 5 : Then, in the box to the right of “Values,” enter the expected count for category one and click the “Add” button. Now enter the expected count for category two and click “Add.” Continue in this way until all expected counts have been entered.

Step 6 : Then click “OK.”

The four steps below show you how to analyze your data using a chi-square test of independence in SPSS Statistics.

Step 1 : Open the Crosstabs dialog (Analyze > Descriptive Statistics > Crosstabs).

Step 2 : Select the variables you want to compare using the chi-square test. Click one variable in the left window and then click the arrow at the top to move the variable. Select the row variable and the column variable.

Step 3 : Click Statistics (a new pop-up window will appear). Check Chi-square, then click Continue.

Step 4 : (Optional) Check the box for Display clustered bar charts.

Step 5 : Click OK.

Goodness-of-Fit Test

The Chi-square goodness of fit test is used to compare a randomly collected sample containing a single, categorical variable to a larger population.

This test is most commonly used to compare a random sample to the population from which it was potentially collected.

The test begins with the creation of a null and alternative hypothesis. In this case, the hypotheses are as follows:

Null Hypothesis (Ho) : The null hypothesis (Ho) is that the observed frequencies are the same (except for chance variation) as the expected frequencies. The collected data is consistent with the population distribution.

Alternative Hypothesis (Ha) : The collected data is not consistent with the population distribution.

The next step is to create a contingency table that represents how the data would be distributed if the null hypothesis were exactly correct.

The sample’s overall deviation from this theoretical/expected data will allow us to draw a conclusion, with a more severe deviation resulting in smaller p-values.

Test for Independence

The Chi-square test for independence looks for an association between two categorical variables within the same population.

Unlike the goodness of fit test, the test for independence does not compare a single observed variable to a theoretical population but rather two variables within a sample set to one another.

The hypotheses for a Chi-square test of independence are as follows:

Null Hypothesis (Ho) : There is no association between the two categorical variables in the population of interest.

Alternative Hypothesis (Ha) : There is no association between the two categorical variables in the population of interest.

The next step is to create a contingency table of expected values that reflects how a data set that perfectly aligns the null hypothesis would appear.

The simplest way to do this is to calculate the marginal frequencies of each row and column; the expected frequency of each cell is equal to the marginal frequency of the row and column that corresponds to a given cell in the observed contingency table divided by the total sample size.

Test for Homogeneity

The Chi-square test for homogeneity is organized and executed exactly the same as the test for independence.

The main difference to remember between the two is that the test for independence looks for an association between two categorical variables within the same population, while the test for homogeneity determines if the distribution of a variable is the same in each of several populations (thus allocating population itself as the second categorical variable).

Null Hypothesis (Ho) : There is no difference in the distribution of a categorical variable for several populations or treatments.

Alternative Hypothesis (Ha) : There is a difference in the distribution of a categorical variable for several populations or treatments.

The difference between these two tests can be a bit tricky to determine, especially in the practical applications of a Chi-square test. A reliable rule of thumb is to determine how the data was collected.

If the data consists of only one random sample with the observations classified according to two categorical variables, it is a test for independence. If the data consists of more than one independent random sample, it is a test for homogeneity.

What is the chi-square test?

The Chi-square test is a non-parametric statistical test used to determine if there’s a significant association between two or more categorical variables in a sample.

It works by comparing the observed frequencies in each category of a cross-tabulation with the frequencies expected under the null hypothesis, which assumes there is no relationship between the variables.

This test is often used in fields like biology, marketing, sociology, and psychology for hypothesis testing.

What does chi-square tell you?

The Chi-square test informs whether there is a significant association between two categorical variables. Suppose the calculated Chi-square value is above the critical value from the Chi-square distribution.

In that case, it suggests a significant relationship between the variables, rejecting the null hypothesis of no association.

How to calculate chi-square?

To calculate the Chi-square statistic, follow these steps:

1. Create a contingency table of observed frequencies for each category.

2. Calculate expected frequencies for each category under the null hypothesis.

3. Compute the Chi-square statistic using the formula: Χ² = Σ [ (O_i – E_i)² / E_i ], where O_i is the observed frequency and E_i is the expected frequency.

4. Compare the calculated statistic with the critical value from the Chi-square distribution to draw a conclusion.

Print Friendly, PDF & Email

If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

To log in and use all the features of Khan Academy, please enable JavaScript in your browser.

Unit 14: Inference for categorical data (chi-square tests)

About this unit.

Chi-square tests are a family of significance tests that give us ways to test hypotheses about distributions of categorical data. This topic covers goodness-of-fit tests to see if sample data fits a hypothesized distribution, and tests for independence between two categorical variables.

Chi-square goodness-of-fit tests

  • Chi-square distribution introduction (Opens a modal)
  • Pearson's chi square test (goodness of fit) (Opens a modal)
  • Chi-square statistic for hypothesis testing (Opens a modal)
  • Chi-square goodness-of-fit example (Opens a modal)
  • Expected counts in a goodness-of-fit test Get 3 of 4 questions to level up!
  • Conditions for a goodness-of-fit test Get 3 of 4 questions to level up!
  • Test statistic and P-value in a goodness-of-fit test Get 3 of 4 questions to level up!
  • Conclusions in a goodness-of-fit test Get 3 of 4 questions to level up!

Chi-square tests for relationships

  • Filling out frequency table for independent events (Opens a modal)
  • Contingency table chi-square test (Opens a modal)
  • Introduction to the chi-square test for homogeneity (Opens a modal)
  • Chi-square test for association (independence) (Opens a modal)
  • Expected counts in chi-squared tests with two-way tables Get 3 of 4 questions to level up!
  • Test statistic and P-value in chi-square tests with two-way tables Get 3 of 4 questions to level up!
  • Making conclusions in chi-square tests for two-way tables Get 3 of 4 questions to level up!

hypothesis testing and chi square

User Preferences

Content preview.

Arcu felis bibendum ut tristique et egestas quis:

  • Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
  • Duis aute irure dolor in reprehenderit in voluptate
  • Excepteur sint occaecat cupidatat non proident

Keyboard Shortcuts

8.1 - the chi-square test of independence.

How do we test the independence of two categorical variables? It will be done using the Chi-Square Test of Independence.

As with all prior statistical tests we need to define null and alternative hypotheses. Also, as we have learned, the null hypothesis is what is assumed to be true until we have evidence to go against it. In this lesson, we are interested in researching if two categorical variables are related or associated (i.e., dependent). Therefore, until we have evidence to suggest that they are, we must assume that they are not. This is the motivation behind the hypothesis for the Chi-Square Test of Independence:

  • \(H_0\): In the population, the two categorical variables are independent.
  • \(H_a\): In the population, the two categorical variables are dependent.

Note! There are several ways to phrase these hypotheses. Instead of using the words "independent" and "dependent" one could say "there is no relationship between the two categorical variables" versus "there is a relationship between the two categorical variables." Or "there is no association between the two categorical variables" versus "there is an association between the two variables." The important part is that the null hypothesis refers to the two categorical variables not being related while the alternative is trying to show that they are related.

Once we have gathered our data, we summarize the data in the two-way contingency table. This table represents the observed counts and is called the Observed Counts Table or simply the Observed Table. The contingency table on the introduction page to this lesson represented the observed counts of the party affiliation and opinion for those surveyed.

The question becomes, "How would this table look if the two variables were not related?" That is, under the null hypothesis that the two variables are independent, what would we expect our data to look like?

Consider the following table:

The total count is \(A+B+C+D\). Let's focus on one cell, say Group 1 and Success with observed count A. If we go back to our probability lesson, let \(G_1\) denote the event 'Group 1' and \(S\) denote the event 'Success.' Then,

\(P(G_1)=\dfrac{A+B}{A+B+C+D}\) and \(P(S)=\dfrac{A+C}{A+B+C+D}\).

Recall that if two events are independent, then their intersection is the product of their respective probabilities. In other words, if \(G_1\) and \(S\) are independent, then...

\begin{align} P(G_1\cap S)&=P(G_1)P(S)\\&=\left(\dfrac{A+B}{A+B+C+D}\right)\left(\dfrac{A+C}{A+B+C+D}\right)\\[10pt] &=\dfrac{(A+B)(A+C)}{(A+B+C+D)^2}\end{align}

If we considered counts instead of probabilities, then we get the count by multiplying the probability by the total count. In other words...

\begin{align} \text{Expected count for cell with A} &=P(G_1)P(S)\  x\  (\text{total count}) \\   &= \left(\dfrac{(A+B)(A+C)}{(A+B+C+D)^2}\right)(A+B+C+D)\\[10pt]&=\mathbf{\dfrac{(A+B)(A+C)}{A+B+C+D}} \end{align}

This is the count we would expect to see if the two variables were independent (i.e. assuming the null hypothesis is true).

The expected count for each cell under the null hypothesis is:

\(E=\dfrac{\text{(row total)}(\text{column total})}{\text{total sample size}}\)

Example 8-1: Political Affiliation and Opinion Section  

To demonstrate, we will use the Party Affiliation and Opinion on Tax Reform example.

Observed Table:

Find the expected counts for all of the cells.

We need to find what is called the Expected Counts Table or simply the Expected Table. This table displays what the counts would be for our sample data if there were no association between the variables.

Calculating Expected Counts from Observed Counts

Chi-Square Test Statistic Section  

To better understand what these expected counts represent, first recall that the expected counts table is designed to reflect what the sample data counts would be if the two variables were independent. Taking what we know of independent events, we would be saying that the sample counts should show similarity in opinions of tax reform between democrats and republicans. If you find the proportion of each cell by taking a cell's expected count divided by its row total, you will discover that in the expected table each opinion proportion is the same for democrats and republicans. That is, from the expected counts, 0.404 of the democrats and 0.404 of the republicans favor the bill; 0.3 of the democrats and 0.3 of the republicans are indifferent; and 0.296 of the democrats and 0.296 of the republicans are opposed.

The statistical question becomes, "Are the observed counts so different from the expected counts that we can conclude a relationship exists between the two variables?" To conduct this test we compute a Chi-Square test statistic where we compare each cell's observed count to its respective expected count.

In a summary table, we have \(r\times c=rc\) cells. Let \(O_1, O_2, …, O_{rc}\) denote the observed counts for each cell and \(E_1, E_2, …, E_{rc}\) denote the respective expected counts for each cell.

The Chi-Square test statistic is calculated as follows:

\(\chi^{2*}=\frac{(O_1-E_1)^2}{E_1}+\frac{(O_2-E_2)^2}{E_2}+...+\frac{(O_{rc}-E_{rc})^2}{E_{rc}}=\overset{rc}{ \underset{i=1}{\sum}}\frac{(O_i-E_i)^2}{E_i}\)

Under the null hypothesis and certain conditions (discussed below), the test statistic follows a Chi-Square distribution with degrees of freedom equal to \((r-1)(c-1)\), where \(r\) is the number of rows and \(c\) is the number of columns. We leave out the mathematical details to show why this test statistic is used and why it follows a Chi-Square distribution.

As we have done with other statistical tests, we make our decision by either comparing the value of the test statistic to a critical value (rejection region approach) or by finding the probability of getting this test statistic value or one more extreme (p-value approach).

The critical value for our Chi-Square test is \(\chi^2_{\alpha}\) with degree of freedom =\((r - 1) (c -1)\), while the p-value is found by \(P(\chi^2>\chi^{2*})\) with degrees of freedom =\((r - 1)(c - 1)\).

Example 8-1 Cont'd: Chi-Square Section  

Let's apply the Chi-Square Test of Independence to our example where we have a random sample of 500 U.S. adults who are questioned regarding their political affiliation and opinion on a tax reform bill. We will test if the political affiliation and their opinion on a tax reform bill are dependent at a 5% level of significance. Calculate the test statistic.

  • Using Minitab

The contingency table ( political_affiliation.csv ) is given below. Each cell contains the observed count and the expected count in parentheses. For example, there were 138 democrats who favored the tax bill. The expected count under the null hypothesis is 115.14. Therefore, the cell is displayed as 138 (115.14).

Calculating the test statistic by hand:

\begin{multline} \chi^{2*}=\dfrac{(138−115.14)^2}{115.14}+\dfrac{(83−85.50)^2}{85.50}+\dfrac{(64−84.36)^2}{84.36}+\\ \dfrac{(64−86.86)^2}{86.86}+\dfrac{(67−64.50)^2}{64.50}+\dfrac{(84−63.64)^2}{63.64}=22.152\end{multline}

...with degrees for freedom equal to \((2 - 1)(3 - 1) = 2\).

  Minitab: Chi-Square Test of Independence

To perform the Chi-Square test in Minitab...

  • Choose Stat  >  Tables  >  Chi-Square Test for Association
  • If you have summarized data (i.e., observed count) from the drop-down box 'Summarized data in a two-way table.' Select and enter the columns that contain the observed counts, otherwise, if you have the raw data use 'Raw data' (categorical variables). Note that if using the raw data your data will need to consist of two columns: one with the explanatory variable data (goes in the 'row' field) and the response variable data (goes in the 'column' field).
  • Labeling (Optional) When using the summarized data you can label the rows and columns if you have the variable labels in columns of the worksheet. For example, if we have a column with the two political party affiliations and a column with the three opinion choices we could use these columns to label the output.
  • Click the Statistics  tab. Keep checked the four boxes already checked, but also check the box for 'Each cell's contribution to the chi-square.' Click OK .

Note! If you have the observed counts in a table, you can copy/paste them into Minitab. For instance, you can copy the entire observed counts table (excluding the totals!) for our example and paste these into Minitab starting with the first empty cell of a column.

The following is the Minitab output for this example.

Cell Contents: Count, Expected count, Contribution to Chi-Square

Pearson Chi-Sq = 4.5386 + 0.073 + 4.914 + 6.016 + 0.097 + 6.5137 = 22.152 DF = 2, P-Value = 0.000

Likelihood Ratio Chi-Square

The Chi-Square test statistic is 22.152 and calculated by summing all the individual cell's Chi-Square contributions:

\(4.584 + 0.073 + 4.914 + 6.016 + 0.097 + 6.532 = 22.152\)

The p-value is found by \(P(X^2>22.152)\) with degrees of freedom =\((2-1)(3-1) = 2\).  

Minitab calculates this p-value to be less than 0.001 and reports it as 0.000. Given this p-value of 0.000 is less than the alpha of 0.05, we reject the null hypothesis that political affiliation and their opinion on a tax reform bill are independent. We conclude that there is evidence that the two variables are dependent (i.e., that there is an association between the two variables).

Conditions for Using the Chi-Square Test Section  

Exercise caution when there are small expected counts. Minitab will give a count of the number of cells that have expected frequencies less than five. Some statisticians hesitate to use the Chi-Square test if more than 20% of the cells have expected frequencies below five, especially if the p-value is small and these cells give a large contribution to the total Chi-Square value.

Example 8-2: Tire Quality Section  

The operations manager of a company that manufactures tires wants to determine whether there are any differences in the quality of work among the three daily shifts. She randomly selects 496 tires and carefully inspects them. Each tire is either classified as perfect, satisfactory, or defective, and the shift that produced it is also recorded. The two categorical variables of interest are the shift and condition of the tire produced. The data ( shift_quality.txt ) can be summarized by the accompanying two-way table. Does the data provide sufficient evidence at the 5% significance level to infer that there are differences in quality among the three shifts?

Chi-Square Test

Chi-Sq = 8.647 DF = 4, P-Value = 0.071 

Note that there are 3 cells with expected counts less than 5.0.

In the above example, we don't have a significant result at a 5% significance level since the p-value (0.071) is greater than 0.05. Even if we did have a significant result, we still could not trust the result, because there are 3 (33.3% of) cells with expected counts < 5.0

Sometimes researchers will categorize quantitative data (e.g., take height measurements and categorize as 'below average,' 'average,' and 'above average.') Doing so results in a loss of information - one cannot do the reverse of taking the categories and reproducing the raw quantitative measurements. Instead of categorizing, the data should be analyzed using quantitative methods.

Try it! Section  

A food services manager for a baseball park wants to know if there is a relationship between gender (male or female) and the preferred condiment on a hot dog. The following table summarizes the results. Test the hypothesis with a significance level of 10%.

The hypotheses are:

  • \(H_0\): Gender and condiments are independent
  • \(H_a\): Gender and condiments are not independent

We need to expected counts table:

None of the expected counts in the table are less than 5. Therefore, we can proceed with the Chi-Square test.

The test statistic is:

\(\chi^{2*}=\frac{(15-19.2)^2}{19.2}+\frac{(23-20.16)^2}{20.16}+...+\frac{(8-9.36)^2}{9.36}=2.95\)

The p-value is found by \(P(\chi^2>\chi^{2*})=P(\chi^2>2.95)\) with (3-1)(2-1)=2 degrees of freedom. Using a table or software, we find the p-value to be 0.2288.

With a p-value greater than 10%, we can conclude that there is not enough evidence in the data to suggest that gender and preferred condiment are related.

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons
  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

K12 LibreTexts

9.8: Chi-Square Test

  • Last updated
  • Save as PDF
  • Page ID 5790

The Chi-Square Distribution

To analyze patterns between distinct categories, such as genders, political candidates, locations, or preferences, we use the chi-square goodness-of-fit test.

This test is used when estimating how closely a sample matches the expected distribution (also known as the goodness-of-fit test) and when estimating if two random variables are independent of one another (also known as the test of independence).

In this lesson, we will learn more about the goodness-of-fit test and how to create and evaluate hypotheses using this test.

The chi-square distribution can be used to perform the goodness-of-fit test , which compares the observed values of a categorical variable with the expected values of that same variable.

Constructing a Contingency Table

We would use the chi-square goodness-of-fit test to evaluate if there was a preference in the type of lunch that 11th grade students bought in the cafeteria. For this type of comparison, it helps to make a table to visualize the problem. We could construct the following table, known as a contingency table , to compare the observed and expected values.

Research Question: Do 11th grade students prefer a certain type of lunch?

Using a sample of 100 11th grade students, we recorded the following information:

If there is no difference in which type of lunch is preferred, we would expect the students to prefer each type of lunch equally. To calculate the expected frequency of each category when assuming school lunch preferences are distributed equally, we divide the number of observations by the number of categories. Since there are 100 observations and 4 categories, the expected frequency of each category is 1004, or 25.

The Chi-Square Statistic

The value that indicates the comparison between the observed and expected frequency is called the chi-square statistic . The idea is that if the observed frequency is close to the expected frequency, then the chi-square statistic will be small. On the other hand, if there is a substantial difference between the two frequencies, then we would expect the chi-square statistic to be large.

To calculate the chi-square statistic, χ2, we use the following formula:

Screen Shot 2020-07-24 at 12.42.45 PM.png

χ2 is the chi-square test statistic.

O is the observed frequency value for each event.

E is the expected frequency value for each event.

We compare the value of the test statistic to a tabled chi-square value to determine the probability that a sample fits an expected pattern.

Features of the Goodness-of-Fit Test

As mentioned, the goodness-of-fit test is used to determine patterns of distinct categorical variables. The test requires that the data are obtained through a random sample. The number of degrees of freedom associated with a particular chi-square test is equal to the number of categories minus one. That is, df=c−1.

Using our example about the preferences for types of school lunches, we calculate the degrees of freedom as follows:

df3=number of categories−1=4−1

There are many situations that use the goodness-of-fit test, including surveys, taste tests, and analysis of behaviors. Interestingly, goodness-of-fit tests are also used in casinos to determine if there is cheating in games of chance, such as cards or dice. For example, if a certain card or number on a die shows up more than expected (a high observed frequency compared to the expected frequency), officials use the goodness-of-fit test to determine the likelihood that the player may be cheating or that the game may not be fair.

Testing Hypotheses

Let’s use our original example to create and test a hypothesis using the goodness-of-fit chi-square test. First, we will need to state the null and alternative hypotheses for our research question. Since our research question asks, “Do 11th grade students prefer a certain type of lunch?” our null hypothesis for the chi-square test would state that there is no difference between the observed and the expected frequencies. Therefore, our alternative hypothesis would state that there is a significant difference between the observed and expected frequencies.

Null Hypothesis

H 0 :O=E (There is no statistically significant difference between observed and expected frequencies.)

Alternative Hypothesis

Ha:O≠E (There is a statistically significant difference between observed and expected frequencies.)

Also, the number of degrees of freedom for this test is 3.

Using an alpha level of 0.05, we look under the column for 0.05 and the row for degrees of freedom, which, again, is 3. According to the standard chi-square distribution table, we see that the critical value for chi-square is 7.815. Therefore, we would reject the null hypothesis if the chi-square statistic is greater than 7.815.

Note that we can calculate the chi-square statistic with relative ease.

Since our chi-square statistic of 10.96 is greater than 7.815, we reject the null hypotheses and accept the alternative hypothesis. Therefore, we can conclude that there is a significant difference between the types of lunches that 11th grade students prefer.

Using the Chi-Square Goodness of Fit Test

A game involves rolling 3 dice. The winnings are directly proportional to the number of fives rolled. Suppose someone plays the game 100 times with the following observed counts:

Someone becomes suspicious and wants to determine whether the dice are fair.

If the dice are fair the probability of rolling a 5 is 1/6. If we roll 3 dice, independently then the number fives in three rolls is distributed as a Binomial (3,1/6).

a. Determine the probability of 0, 1, 2 and 3 fives under this distribution.

Since we have a binomial distribution with 3 independent trials and probability of success 1 / 6 on each trial, we can compute the probabilities using either the TI Calculator binompdf(3,1/6, k) where k represents the particular value in which we are interested or we can use the formula

Screen Shot 2020-07-24 at 12.47.34 PM.png

b. Determine if the dice are fair (Use a chi-square goodness of fit test).

Screen Shot 2020-07-24 at 12.54.17 PM.png

You then sum the values in List 3. This will be the value of your chi-square statistic:

χ2=1.72+0.007+9.14+4.5=15.367

In the previous example, we saw that the critical value for a chi-squared statistic at the 0.05 level of significance is 7.815. Since χ2=15.367>7.815, at the .05 level of significance, we can reject the null hypothesis and conclude that the dice are not fair.

The marital status distribution of the U.S. Female population, age 18 and older, is as shown below.

(Source: US Census Bureau, “America’s Families and Living Arrangements, 2008)

Suppose a random sample of 400 US young adult females, 18-24 years old, yielded the following frequency distribution. Does this age group of females fit the distribution of the US adult population?

In this problem you determine the expect number for each category by multiplying the proportion by 400, the total number of people in the study.

The chi-square statistic is 238.63 + 30.77 + 33.43 + 16.51 = 319.34 with 3 degrees of freedom. The p-value is 0.00. The decision, at the 0.05 and 0.01 levels of significance, is to reject the null hypothesis. With a goodness-of-fit test, the null hypothesis is always that the two data sets have the same distribution. Since we are rejecting the null hypothesis, this means that this age group of young adult females does not fit the distribution of the US adult population.

  • Student’s t-test
  • the ANOVA test
  • the chi-square test
  • the z-score
  • the goodness-of-fit test
  • the test for independence
  • What is the formula for calculating the chi-square statistic?
  • A principal is planning a field trip. She samples a group of 100 students to see if they prefer a sporting event, a play at the local college, or a science museum. She records the following results:

(a) What is the observed frequency value for the Science Museum category?

(b) What is the expected frequency value for the Sporting Event category?

(c) What would be the null hypothesis for the situation above?

(i) There is no preference between the types of field trips that students prefer.

(ii) There is a preference between the types of field trips that students prefer.

(d) What would be the chi-square statistic for the research question above?

(e) If the estimated chi-square level of significance was 5.99, would you reject or fail to reject the null hypothesis?

  • In 1982 in Western Australia, 1317 males and 854 females died of heart disease, 1119 males and 828 females died of cancer, 371 males and 460 females died of cerebral vascular disease and 346 males and 147 females died of accidents. (source: www.statsci.org/data/z/deathwa.html) Put this information into a contingency table.
  • χ2=3.84,df=1
  • χ2=6.7 for a table with 3 rows and 3 columns
  • χ2=26.23 for a table with 2 rows and 3 columns
  • Level of significance is 05, degrees of freedom = 1
  • Level of significance is 0.01; table has 3 rows and 4 columns
  • Level of significance is 0.05, degrees of freedom = 8
  • χ2=2.89,df=1
  • χ2=23.60,df=4
  • Are the situations in problem 10 statistically significant at the .01 level?
  • k=3,H 0 :p 1 =p 2 =p 3 =1/3,n=300
  • k=3,H 0 :p 1 =1/4,p 2 =1/4,p 3 = 1 / 2 ,n=1000
  • The chi-square statistic is negative.
  • The chi-square statistic is 0.
  • A 6-sided die is rolled 120 times. Conduct a hypothesis test to determine if the die is fair. The data below are the result of the 120 rolls.
  • True or False: (if false rewrite so it is true). As the degrees of freedom increase, the graph of the chi-square distribution looks more and more symmetrical.
  • True or False: (if false rewrite so it is true). In a goodness of fit test the expected values are the values we would expect if the null hypothesis were true.
  • True or False: (if false rewrite so it is true). Use a goodness of fit test to determine if high school principals believe that students are absent equally during the week or not.
  • True or False: (if false rewrite so it is true). For a chi-square distribution with 17 degrees of freedom, the probability that a value is greater than 20 is 0.7248.
  • Suppose the p-value of the study is not small enough to reject the null hypothesis. Write this conclusion in the context of the situation.
  • Now suppose the p-value of the study is small enough to reject the null hypothesis. In the context of the situation, express the conclusion in two different ways.
  • Suppose a car dealer offers cars in three different colors: silver, black and white. In a sample of 111 buyers, 59 chose black, 25 chose silver and the remainder chose white. Is there sufficient evidence to conclude that the colors are not equally preferred? Carry out a significance test and be sure to state the null hypothesis and the population to which your conclusion applies.
  • The manufacturer of M&Ms states, on the website, the color distribution of M&Ms. Access the website to discover the claim of the manufacturer. Purchase and combine a number of 1-lb bags of M&Ms. Are the observed results statistically significant from the claim of the manufacturer.

Review (Answers)

To view the Review answers, open this PDF file and look for section 10.1.

Additional Resources

Video: Example of a Goodness-of-Fit Test

Practice: Chi-Square Test

Logo for OPEN OCO

18 Chi-square

Learning outcomes.

In this chapter, you will learn how to:

  • Identify when appropriate to run a chi-square test of goodness-of-fit for independence.
  • Describe the concept of a contingency table for categorical data.
  • Complete hypothesis test for chi-square test of goodness-of-fit and independence.
  • Compute and interpret effect size for chi-square.
  • Describe Simpson’s paradox and why it is important for categorical data analysis.

We come at last to our final statistic: chi-square (χ 2 ). This test is a special form of analysis called a nonparametric test, so the structure of it will look a little bit different from what we have done so far. However, the logic of hypothesis testing remains unchanged. The purpose of chi-square is to understand the frequency distribution of a single categorical variable or find a relationship between two categorical variables, which is a frequently very useful way to look at our data.

Categories and Frequency Tables

Our data for the χ 2 test are categorical, specifically nominal, variables. Recall from Unit 1 that nominal variables have no specified order and can only be described by their names and the frequencies with which they occur in the dataset. Thus, unlike our other variables that we have tested, we cannot describe our data for the χ 2 test using means and standard deviations. Instead, we will use frequencies tables.

Table 1. Pet Preferences

Table 1 gives an example of a frequency table used for a χ 2 test. The columns represent the different categories within our single variable, which in this example is pet preference. The χ 2 test can assess as few as two categories, and there is no technical upper limit on how many categories can be included in our variable, although, as with ANOVA, having too many categories makes our computations long and our interpretation difficult. The final column in the table is the total number of observations, or N. The χ 2 test assumes that each observation comes from only one person and that each person will provide only one observation, so our total observations will always equal our sample size.

There are two rows in this table. The first row gives the observed frequencies of each category from our dataset; in this example, 14 people reported preferring cats as pets, 17 people reported preferring dogs, and 5 people reported preferring a different animal. The second row gives expected values; expected values are what would be found if each category had equal representation. Note: Chi-square tests should not be used when an expected value for any cell will be less than 5.

Calculation for Expected Value

[latex]E=\frac{N}{C}[/latex]

[latex]E=[/latex] the expected value or what we would expect if there was no preference or the preferences were equal

[latex]N=[/latex] the total number of people in our sample

[latex]C=[/latex] the number of categories in our variable (also the number of columns in our table, not including the Total column)

The expected values correspond with the null hypothesis for χ 2 tests: equal representation of categories. Our first of two χ 2 tests, the Goodness-of-Fit test, will assess how well our data lines up with, or deviates from, this assumption.

Goodness-of-Fit Test

The first of our two χ 2 tests assesses one categorical variable against a null hypothesis of equally sized frequencies. Equal frequency distributions are what we would expect to get if placement into a particular category was completely random. We could, in theory, also test against a specific distribution of category sizes if we have a good reason to (e.g., we have a solid foundation of how the regular population is distributed), but this is less common, so we will not deal with it in this text. For example, if you know that for every 2 cats available for adoption there are 5 dogs available, you might expect more dogs than cats in your expected values row.

Step 1 – State the Hypotheses

All χ 2 tests, including the goodness-of-fit test, are nonparametric . This means that there is no population parameter we are estimating or testing against; we are working only with our sample data. Because of this, there are no mathematical statements for χ 2 hypotheses. This should make sense because the mathematical hypothesis statements were always about population parameters (e.g., μ), so if our test is nonparametric, we have no parameters and therefore no statistical notation hypothesis statements.

We do, however, still state our hypotheses in written form. For goodness-of-fit χ 2 tests, our null hypothesis is that there is an equal number of observations in each category. That is, there is no difference between the categories in how prevalent they are. Our alternative hypothesis says that the categories do differ in their frequency. We do not have specific directions or one-tailed tests for χ 2 , matching our lack of mathematical statement. That is:

H 0 : There is not difference in number of observations between categories

H A : There is a difference in number of observations between categories

Step 2 – Find the Critical Value

Our degrees of freedom for the χ 2 test are based on the number of categories we have in our variable, not on the number of people or observations like it was for our other tests. Luckily, they are still as simple to calculate.

Degrees of Freedom for χ 2 Goodness of Fit Test

[latex]df=C-1[/latex]

So for our pet preference example, we have 3 categories, so we have 3-1=2 degrees of freedom. Our degrees of freedom, along with our significance level (still defaulted to α = 0.05) are used to find our critical values in a χ 2 table, which is shown in Figure 1. Because we do not have directional hypotheses for χ 2 tests (notice the number is a squared value so it can never be negative), we do not need to differentiate between critical values for 1- or 2-tailed tests. Just like our F tests for regression and ANOVA, all χ 2 tests are 1-tailed tests. According Figure 1, our critical value would be χ 2 crit = 5.99.

image

Figure 1. First 10 rows of a χ 2 table

Step 3 – Calculate the Test Statistic

The calculations for our test statistic in χ 2 tests combine our information from our observed frequencies (O) and our expected frequencies (E) for each level of our categorical variable. For each cell (category) we find the difference between the observed and expected values, square them, and divide by the expected values. We then sum this value across cells for our test statistic.

χ 2 Goodness of Fit Test Statistic

[latex]\chi^2=\sum\frac{(O-E)^2}{E}[/latex]

[latex]O=[/latex] the observed frequency for a category

[latex]E=[/latex] the expected frequency for a category

If we do this for our pet preference data, we would have the following.

Table 2. Pet Preferences

[latex]\chi^2=\frac{(14-12)^2}{12}+\frac{(17-12)^2}{12}+\frac{(5-12)^2}{12}[/latex]

[latex]\chi^2=0.33+2.08+4.08=6.49[/latex]

Step 4 – Make a Decision and Interpret the Results

Now we can compare the test statistic of 6.49 to the critical value of 5.99. Because 6.49 is greater than 5.99, we can reject the null hypothesis and state that pet preference is different from random chance. So, let’s interpret this in APA style.

The sample of 36 people showed a significant preference for type of pet, χ 2 (2) = 6.49, p < .05.

Finding a Relationship between Two Categorical Variables

An example contingency table is shown in Table 3 which displays whether or not 168 college students watched college sports growing up (Yes/No) and whether the students’ final choice of which college to attend was influenced by the college’s sports teams (Yes – Primary, Yes – Somewhat, No).

Table 3. Contingency table of college sports and decision making

In contrast to the frequency table for our goodness-of-fit test, our contingency table does not contain expected values, only observed data. Within our table, wherever our rows and columns cross, we have a “cell”. A cell contains the frequency of observing its corresponding specific levels of each variable at the same time. The top left cell in table 3 shows us that 47 people in our study watched college sports as a child AND had college sports as their primary deciding factor in which college to attend.

Cells are numbered based on which row they are in (rows are numbered top to bottom) and which column they are in (columns are numbered left to right). We always name the cell using (R, C), with the row first and the column second. Based on this convention, the top left cell containing our 47 participants who watched college sports as a child and had sports as a primary criteria is cell (1, 1). Next to it, which has 26 people who watched college sports as a child but had sports only somewhat affect their decision, is cell (1, 2), and so on. We only number the cells where our categories cross. We do not number our total cells, which have their own special name: marginal values . Marginal values are the total values for a single category of one variable, added up across levels of the other variable. In table 3, these marginal values have been italicized for ease of explanation, though this is not normally the case. We can see that, in total, 87 of our participants (47+26+14) watched college sports growing up and 81 (21+23+37) did not. The total of these two marginal values is 168, the total number of people in our study. Likewise, 68 people used sports as a primary criteria for deciding which college to attend, 50 considered it somewhat, and 50 did not use it as criteria at all. The total of these marginal values is also 168, our total number of people. The marginal values for rows and columns will always both add up to the total number of participants, N , in the study. If they do not, then a calculation error was made and you must go back and check your work.

Expected Values of Contingency Tables

Our expected values for contingency tables are based on the same logic as they were for frequency tables, but now we must incorporate information about how frequently each row and column was observed (the marginal values) and how many people were in the sample overall (N) to find what random chance would have made the frequencies out to be.

Calculating Expected Values for Contingency Tables

[latex]E=\frac{n_c*n_r}{N}[/latex]

[latex]n_c=[/latex] the number of people in the column

[latex]n_r=[/latex] the number of people in the row

[latex]N=[/latex] the total number of people in the entire sample

So, for our data we would calculate expected values for each cell of observed values as follows.

Table 4. Contingency table of expected value calculations (with observed totals included for calculations) of college sports and decision making

Notice that the marginal values still add up to the same totals as before. This is because the expected frequencies are just row and column averages simultaneously. Our total N will also add up to the same value.

χ 2 Test for Independence

The χ 2 test performed on contingency tables is known as the test for independence . In this analysis, we are looking to see if the values of each categorical variable (that is, the frequency of their levels) is related to or independent of the values of the other categorical variable. Let’s do the four-step test for this example.

Because we are still doing a χ 2 test which is nonparametric, we still do not have statistical versions of our hypotheses. The actual interpretations of the hypotheses are quite simple: the null hypothesis says that the variables are independent or not related, and alternative says that they are not independent or that they are related.

H 0 : Watching college sports as a child is not related to choosing college.

H 0 : Watching college sports as a child is independent of college choice.

H A : Watching college sports as a child is related to choosing college.

H A : Watching college sports as a child is not independent of college choice.

For step 2, the only change is degrees of formula. Our critical value will come from the same table that we used for the goodness-of-fit test, but our degrees of freedom will change. Because we now have rows and columns (instead of just columns) our new degrees of freedom is found by multiplying two numbers together.

Degrees of Freedom for Test for Independence

[latex]df=(R-1)(C-1)[/latex]

[latex]R=[/latex] the number of rows in the contingency table (or the number of categories in the first categorical variable)

[latex]C=[/latex] the number of columns in the contingency table (or the number of categories in the second categorical variable)

So, for our example, we have

[latex]df=(2-1)(3-1)=1*2=2[/latex]

If we go back to Figure 1 and look at df = 2 for α = .05, we get a critical value of 5.99 again.

As you can see below, we calculate our test statistic for the test for independence the same way we did for the goodness of fit test. Our equation just gets a bit longer because we now have to do it for every cell in the contingency table. So, for our 2×3 contingency table, we will have 6 components in our equation.

Test for Independence Test Statistic

Table 5. Contingency table with observed (and expected) values of college sports and decision making

With the observed and expected values found in Table 5, we can calculate our χ 2 test statistic for this example.

[latex]\chi^2=\frac{(35.21-47)^2}{35.21}+\frac{(25.38-26)^2}{25.38}+\frac{(26.41-14)^2}{26.41}+\frac{(32.79-21)^2}{32.79}+\frac{(23.62-23)^2}{23.62}+\frac{(24.59-37)^2}{24.59}[/latex]

[latex]\chi^2=3.94+0.02+5.83+4.24+0.02+6.26=20.31[/latex]

Reject H 0 . Based on our data from 168 people, we can say that there is a statistically significant relationship between whether or not someone watches college sports growing up and how much a college’s sports team factor in to that person’s decision on which college to attend, χ 2 (2) = 20.31, p < 0.05.

Effect Size for χ 2

Like all other significance tests, χ 2 tests – both goodness-of-fit and tests for independence – have effect sizes that can and should be calculated for statistically significant results. There are many options for which effect size to use, and the ultimate decision is based on the type of data, the structure of your frequency or contingency table, and the types of conclusions you would like to draw. For the purpose of our introductory course, we will focus only on a single effect size that is simple and flexible: Cramer’s V . This is appropriate to use when the χ 2 test involves a matrix larger that 2 X 2. Cramer’s V is a type of correlation coefficient that can be computed on categorical data. 

Cramer’s V

[latex]V=\sqrt{\frac{\chi^2}{N(k-1)}}[/latex]

[latex]N=[/latex] the total number of people in the sample

[latex]k=[/latex] the smaller value of either R (the number of rows) or C (the number of columns, that is the number of categories for the variable with the smallest number of categories

[latex]\chi^2=[/latex] the test statistic calculated in Step 4

So, for our example above, we can calculate an effect size given that we found a significant relationship and that we have a 2×3 contingency table that we are working with.

We know that N = 168 and k would either be 2 or 3, but we want the smaller of the two, so k = 2. And, finally, χ 2 = 20.31.

[latex]V=\sqrt{\frac{20.31}{168(2-1)}}=\sqrt{\frac{20.31}{168}}=\sqrt{0.121}=0.35[/latex]

Like other statistic effect sizes there are range cut offs of small, medium, and large. So the statistically significant relation between our variables was moderately strong examining the effect size table below.

Table 6. The effect size ranges of Cramer’s V .

Additional Thoughts

Beyond pearson’s chi-square test: standardized residuals.

For a more applicable example, let’s take the question of whether a Black driver is more likely to be searched when they are pulled over by a police officer, compared to a white driver. The Stanford Open Policing Project ( https://openpolicing.stanford.edu/ ) has studied this, and provides data that we can use to analyze the question. We will use the data from the State of Connecticut since they are fairly small and thus easier to analyze.

The standard way to represent data from a categorical analysis is through a contingency table , which presents the number or proportion of observations falling into each possible combination of values for each of the variables. Table 7 below shows the contingency table for the police search data. It can also be useful to look at the contingency table using proportions rather than raw numbers, since they are easier to compare visually, so we include both absolute and relative numbers here.

Table 7. Contingency Table for Police Search Data

The Pearson chi-squared test (discussed above) allows us to test whether observed frequencies are different from expected frequencies, so we need to determine what frequencies we would expect in each cell if searches and race were unrelated – which we can define as being independent. If we perform this test easily using our statistical software, X 2 (1) = 828, p < .001. This shows that the observed data would be highly unlikely if there was truly no relationship between race and police searches, and thus we should reject the null hypothesis of independence.

When we find a significant effect with the chi-squared test, this tells us that the data are unlikely under the null hypothesis, but it doesn’t tell us how the data differ. To get a deeper insight into how the data differ from what we would expect under the null hypothesis, we can examine the residuals from the model, which reflects the deviation of the data (i.e., the observed frequencies) from the model (i.e., the expected frequencies) in each cell. Rather than looking at the raw residuals (which will vary simply depending on the number of observations in the data), it’s more common to look at the standardized residuals (sometimes called Pearson residuals ).

Table 8 shows these for the police stop data from X 2 above. Remember that we examined the question of whether a Black driver is more likely to be searched when they are pulled over by a police officer, compared to a white driver. These standardized residuals can be interpreted as Z scores – in this case, we see that the number of searches for Black individuals are substantially higher than expected based on independence, and the number of searches for white individuals are substantially lower than expected. This provides us with the context that we need to interpret the significant chi-squared result.

Table 8. Summary of standardized residuals for police stop data

Beware of Simpson’s paradox

The contingency tables that represent summaries of large numbers of observations, but summaries can sometimes be misleading. Let’s take an example from baseball. The table below shows the batting data (hits/at bats and batting average) for Derek Jeter and David Justice over the years 1995-1997:

Table 9. Player Batting data for 2 baseball players

If you look closely, you will see that something odd is going on: In each individual year Justice had a higher batting average than Jeter, but when we combine the data across all three years, Jeter’s average is actually higher than Justice’s! This is an example of a phenomenon known as Simpson’s paradox , in which a pattern that is present in a combined dataset may not be present in any of the subsets of the data. This occurs when there is another variable that may be changing across the different subsets – in this case, the number of at-bats varies across years, with Justice batting many more times in 1995 (when batting averages were low). We refer to this as a lurking variable , and it’s always important to be attentive to such variables whenever one examines categorical data.

describing any analytic method that does not involve making assumptions about the data of interest.

a two-dimensional table in which frequency values for categories of one variable are presented in the rows and values for categories of a second variable are presented in the columns: Values that appear in the various cells then represent the number or percentage of cases that fall into the two categories that intersect at this point.

In a contingency table, the total values for a single category of one variable, added up across levels of the other variable

a procedure used to test the hypothesis of the relationship between two categorical variables. The observed frequencies of a variable are compared with the frequencies that would be expected if the null hypothesis of no association (i.e., statistical independence) were true.

Introduction to Statistics for the Social Sciences Copyright © 2021 by Jennifer Ivie; Alicia MacKay is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

Tutorial Playlist

Statistics tutorial, everything you need to know about the probability density function in statistics, the best guide to understand central limit theorem, an in-depth guide to measures of central tendency : mean, median and mode, the ultimate guide to understand conditional probability, a comprehensive look at percentile in statistics, the best guide to understand bayes theorem, everything you need to know about the normal distribution, an in-depth explanation of cumulative distribution function, a complete guide to chi-square test.

A Complete Guide on Hypothesis Testing in Statistics

Understanding the Fundamentals of Arithmetic and Geometric Progression

The definitive guide to understand spearman’s rank correlation, a comprehensive guide to understand mean squared error, all you need to know about the empirical rule in statistics, the complete guide to skewness and kurtosis, a holistic look at bernoulli distribution, all you need to know about bias in statistics, a complete guide to get a grasp of time series analysis.

The Key Differences Between Z-Test Vs. T-Test

The Complete Guide to Understand Pearson's Correlation

A complete guide on the types of statistical studies, everything you need to know about poisson distribution, your best guide to understand correlation vs. regression, the most comprehensive guide for beginners on what is correlation, what is a chi-square test formula, examples & application.

Lesson 9 of 24 By Avijeet Biswal

A Complete Guide to Chi-Square Test

Table of Contents

The world is constantly curious about the Chi-Square test's application in machine learning and how it makes a difference. Feature selection is a critical topic in machine learning , as you will have multiple features in line and must choose the best ones to build the model. By examining the relationship between the elements, the chi-square test aids in the solution of feature selection problems. In this tutorial, you will learn about the chi-square test and its application.

What Is a Chi-Square Test?

The Chi-Square test is a statistical procedure for determining the difference between observed and expected data. This test can also be used to determine whether it correlates to the categorical variables in our data. It helps to find out whether a difference between two categorical variables is due to chance or a relationship between them.

Chi-Square Test Definition

A chi-square test is a statistical test that is used to compare observed and expected results. The goal of this test is to identify whether a disparity between actual and predicted data is due to chance or to a link between the variables under consideration. As a result, the chi-square test is an ideal choice for aiding in our understanding and interpretation of the connection between our two categorical variables.

A chi-square test or comparable nonparametric test is required to test a hypothesis regarding the distribution of a categorical variable. Categorical variables, which indicate categories such as animals or countries, can be nominal or ordinal. They cannot have a normal distribution since they can only have a few particular values.

For example, a meal delivery firm in India wants to investigate the link between gender, geography, and people's food preferences.

It is used to calculate the difference between two categorical variables, which are:

  • As a result of chance or
  • Because of the relationship

Your Data Analytics Career is Around The Corner!

Your Data Analytics Career is Around The Corner!

Formula For Chi-Square Test

Chi_Sq_formula.

c = Degrees of freedom

O = Observed Value

E = Expected Value

The degrees of freedom in a statistical calculation represent the number of variables that can vary in a calculation. The degrees of freedom can be calculated to ensure that chi-square tests are statistically valid. These tests are frequently used to compare observed data with data that would be expected to be obtained if a particular hypothesis were true.

The Observed values are those you gather yourselves.

The expected values are the frequencies expected, based on the null hypothesis. 

Fundamentals of Hypothesis Testing

Hypothesis testing is a technique for interpreting and drawing inferences about a population based on sample data. It aids in determining which sample data best support mutually exclusive population claims.

Null Hypothesis (H0) - The Null Hypothesis is the assumption that the event will not occur. A null hypothesis has no bearing on the study's outcome unless it is rejected.

H0 is the symbol for it, and it is pronounced H-naught.

Alternate Hypothesis(H1 or Ha) - The Alternate Hypothesis is the logical opposite of the null hypothesis. The acceptance of the alternative hypothesis follows the rejection of the null hypothesis. H1 is the symbol for it.

Become a Data Science Expert & Get Your Dream Job

Become a Data Science Expert & Get Your Dream Job

What Are Categorical Variables?

Categorical variables belong to a subset of variables that can be divided into discrete categories. Names or labels are the most common categories. These variables are also known as qualitative variables because they depict the variable's quality or characteristics.

Categorical variables can be divided into two categories:

  • Nominal Variable: A nominal variable's categories have no natural ordering. Example: Gender, Blood groups
  • Ordinal Variable: A variable that allows the categories to be sorted is ordinal variables. Customer satisfaction (Excellent, Very Good, Good, Average, Bad, and so on) is an example.

Why Do You Use the Chi-Square Test?

Chi-square is a statistical test that examines the differences between categorical variables from a random sample in order to determine whether the expected and observed results are well-fitting.

Here are some of the uses of the Chi-Squared test:

  • The Chi-squared test can be used to see if your data follows a well-known theoretical probability distribution like the Normal or Poisson distribution.
  • The Chi-squared test allows you to assess your trained regression model's goodness of fit on the training, validation, and test data sets.

Become an Expert in Data Analytics!

Become an Expert in Data Analytics!

What Does A Chi-Square Statistic Test Tell You?

A Chi-Square test ( symbolically represented as  2 ) is fundamentally a data analysis based on the observations of a random set of variables. It computes how a model equates to actual observed data. A Chi-Square statistic test is calculated based on the data, which must be raw, random, drawn from independent variables, drawn from a wide-ranging sample and mutually exclusive. In simple terms, two sets of statistical data are compared -for instance, the results of tossing a fair coin. Karl Pearson introduced this test in 1900 for categorical data analysis and distribution. This test is also known as ‘Pearson’s Chi-Squared Test’. 

Chi-Squared Tests are most commonly used in hypothesis testing. A hypothesis is an assumption that any given condition might be true, which can be tested afterwards. The Chi-Square test estimates the size of inconsistency between the expected results and the actual results when the size of the sample and the number of variables in the relationship is mentioned. 

These tests use degrees of freedom to determine if a particular null hypothesis can be rejected based on the total number of observations made in the experiments. Larger the sample size, more reliable is the result.

There are two main types of Chi-Square tests namely -

Independence 

  • Goodness-of-Fit 

The Chi-Square Test of Independence is a derivable ( also known as inferential ) statistical test which examines whether the two sets of variables are likely to be related with each other or not. This test is used when we have counts of values for two nominal or categorical variables and is considered as non-parametric test. A relatively large sample size and independence of obseravations are the required criteria for conducting this test.

For Example- 

In a movie theatre, suppose we made a list of movie genres. Let us consider this as the first variable. The second variable is whether or not the people who came to watch those genres of movies have bought snacks at the theatre. Here the null hypothesis is that th genre of the film and whether people bought snacks or not are unrelatable. If this is true, the movie genres don’t impact snack sales. 

Future-Proof Your AI/ML Career: Top Dos and Don'ts

Future-Proof Your AI/ML Career: Top Dos and Don'ts

Goodness-Of-Fit

In statistical hypothesis testing, the Chi-Square Goodness-of-Fit test determines whether a variable is likely to come from a given distribution or not. We must have a set of data values and the idea of the distribution of this data. We can use this test when we have value counts for categorical variables. This test demonstrates a way of deciding if the data values have a “ good enough” fit for our idea or if it is a representative sample data of the entire population. 

Suppose we have bags of balls with five different colours in each bag. The given condition is that the bag should contain an equal number of balls of each colour. The idea we would like to test here is that the proportions of the five colours of balls in each bag must be exact. 

Who Uses Chi-Square Analysis?

Chi-square is most commonly used by researchers who are studying survey response data because it applies to categorical variables. Demography, consumer and marketing research, political science, and economics are all examples of this type of research.

Let's say you want to know if gender has anything to do with political party preference. You poll 440 voters in a simple random sample to find out which political party they prefer. The results of the survey are shown in the table below:

chi-1.

To see if gender is linked to political party preference, perform a Chi-Square test of independence using the steps below.

Step 1: Define the Hypothesis

H0: There is no link between gender and political party preference.

H1: There is a link between gender and political party preference.

Step 2: Calculate the Expected Values

Now you will calculate the expected frequency.

Chi_Sq_formula_1.

For example, the expected value for Male Republicans is: 

Chi_Sq_formula_2

Similarly, you can calculate the expected value for each of the cells.

chi-2.

Step 3: Calculate (O-E)2 / E for Each Cell in the Table

Now you will calculate the (O - E)2 / E for each cell in the table.

chi-3.

Step 4: Calculate the Test Statistic X2

X2  is the sum of all the values in the last table

 =  0.743 + 2.05 + 2.33 + 3.33 + 0.384 + 1

Before you can conclude, you must first determine the critical statistic, which requires determining our degrees of freedom. The degrees of freedom in this case are equal to the table's number of columns minus one multiplied by the table's number of rows minus one, or (r-1) (c-1). We have (3-1)(2-1) = 2.

Finally, you compare our obtained statistic to the critical statistic found in the chi-square table. As you can see, for an alpha level of 0.05 and two degrees of freedom, the critical statistic is 5.991, which is less than our obtained statistic of 9.83. You can reject our null hypothesis because the critical statistic is higher than your obtained statistic.

This means you have sufficient evidence to say that there is an association between gender and political party preference.

Chi_Sq_formula_3

When to Use a Chi-Square Test?

A Chi-Square Test is used to examine whether the observed results are in order with the expected values. When the data to be analysed is from a random sample, and when the variable is the question is a categorical variable, then Chi-Square proves the most appropriate test for the same. A categorical variable consists of selections such as breeds of dogs, types of cars, genres of movies, educational attainment, male v/s female etc. Survey responses and questionnaires are the primary sources of these types of data. The Chi-square test is most commonly used for analysing this kind of data. This type of analysis is helpful for researchers who are studying survey response data. The research can range from customer and marketing research to political sciences and economics. 

The Ultimate Ticket to Top Data Science Job Roles

The Ultimate Ticket to Top Data Science Job Roles

Chi-Square Distribution 

Chi-square distributions (X2) are a type of continuous probability distribution. They're commonly utilized in hypothesis testing, such as the chi-square goodness of fit and independence tests. The parameter k, which represents the degrees of freedom, determines the shape of a chi-square distribution.

A chi-square distribution is followed by very few real-world observations. The objective of chi-square distributions is to test hypotheses, not to describe real-world distributions. In contrast, most other commonly used distributions, such as normal and Poisson distributions, may explain important things like baby birth weights or illness cases per year.

Because of its close resemblance to the conventional normal distribution, chi-square distributions are excellent for hypothesis testing. Many essential statistical tests rely on the conventional normal distribution.

In statistical analysis , the Chi-Square distribution is used in many hypothesis tests and is determined by the parameter k degree of freedoms. It belongs to the family of continuous probability distributions . The Sum of the squares of the k independent standard random variables is called the Chi-Squared distribution. Pearson’s Chi-Square Test formula is - 

Chi_Square_Distribution_1

Where X^2 is the Chi-Square test symbol

Σ is the summation of observations

O is the observed results

E is the expected results 

The shape of the distribution graph changes with the increase in the value of k, i.e. degree of freedoms. 

When k is 1 or 2, the Chi-square distribution curve is shaped like a backwards ‘J’. It means there is a high chance that X^2 becomes close to zero. 

Courtesy: Scribbr

When k is greater than 2, the shape of the distribution curve looks like a hump and has a low probability that X^2 is very near to 0 or very far from 0. The distribution occurs much longer on the right-hand side and shorter on the left-hand side. The probable value of X^2 is (X^2 - 2).

When k is greater than ninety, a normal distribution is seen, approximating the Chi-square distribution.

Become a Data Scientist With Real-World Experience

Become a Data Scientist With Real-World Experience

Chi-Square P-Values

Here P denotes the probability; hence for the calculation of p-values, the Chi-Square test comes into the picture. The different p-values indicate different types of hypothesis interpretations. 

  • P <= 0.05 (Hypothesis interpretations are rejected)
  • P>= 0.05 (Hypothesis interpretations are accepted) 

The concepts of probability and statistics are entangled with Chi-Square Test. Probability is the estimation of something that is most likely to happen. Simply put, it is the possibility of an event or outcome of the sample. Probability can understandably represent bulky or complicated data. And statistics involves collecting and organising, analysing, interpreting and presenting the data. 

Finding P-Value

When you run all of the Chi-square tests, you'll get a test statistic called X2. You have two options for determining whether this test statistic is statistically significant at some alpha level:

  • Compare the test statistic X2 to a critical value from the Chi-square distribution table.
  • Compare the p-value of the test statistic X2 to a chosen alpha level.

Test statistics are calculated by taking into account the sampling distribution of the test statistic under the null hypothesis, the sample data, and the approach which is chosen for performing the test. 

The p-value will be as mentioned in the following cases.

  • A lower-tailed test is specified by: P(TS ts | H0 is true) p-value = cdf (ts)
  • Lower-tailed tests have the following definition: P(TS ts | H0 is true) p-value = cdf (ts)
  • A two-sided test is defined as follows, if we assume that the test static distribution  of H0 is symmetric about 0. 2 * P(TS |ts| | H0 is true) = 2 * (1 - cdf(|ts|))

P: probability Event

TS: Test statistic is computed observed value of the test statistic from your sample cdf(): Cumulative distribution function of the test statistic's distribution (TS)

Types of Chi-square Tests

Pearson's chi-square tests are classified into two types:

  • Chi-square goodness-of-fit analysis
  • Chi-square independence test

These are, mathematically, the same exam. However, because they are utilized for distinct goals, we generally conceive of them as separate tests.

The chi-square test has the following significant properties:

  • If you multiply the number of degrees of freedom by two, you will receive an answer that is equal to the variance.
  • The chi-square distribution curve approaches the data is normally distributed as the degree of freedom increases.
  • The mean distribution is equal to the number of degrees of freedom.

Properties of Chi-Square Test 

  • Variance is double the times the number of degrees of freedom.
  • Mean distribution is equal to the number of degrees of freedom.
  • When the degree of freedom increases, the Chi-Square distribution curve becomes normal.

Limitations of Chi-Square Test

There are two limitations to using the chi-square test that you should be aware of. 

  • The chi-square test, for starters, is extremely sensitive to sample size. Even insignificant relationships can appear statistically significant when a large enough sample is used. Keep in mind that "statistically significant" does not always imply "meaningful" when using the chi-square test.
  • Be mindful that the chi-square can only determine whether two variables are related. It does not necessarily follow that one variable has a causal relationship with the other. It would require a more detailed analysis to establish causality.

Get In-Demand Skills to Launch Your Data Career

Get In-Demand Skills to Launch Your Data Career

Chi-Square Goodness of Fit Test

When there is only one categorical variable, the chi-square goodness of fit test can be used. The frequency distribution of the categorical variable is evaluated for determining whether it differs significantly from what you expected. The idea is that the categories will have equal proportions, however, this is not always the case.

When you want to see if there is a link between two categorical variables, you perform the chi-square test. To acquire the test statistic and its related p-value in SPSS, use the chisq option on the statistics subcommand of the crosstabs command. Remember that the chi-square test implies that each cell's anticipated value is five or greater.

In this tutorial titled ‘The Complete Guide to Chi-square test’, you explored the concept of Chi-square distribution and how to find the related values. You also take a look at how the critical value and chi-square value is related to each other.

If you want to gain more insight and get a work-ready understanding in statistical concepts and learn how to use them to get into a career in Data Analytics , our Post Graduate Program in Data Analytics in partnership with Purdue University should be your next stop. A comprehensive program with training from top practitioners and in collaboration with IBM, this will be all that you need to kickstart your career in the field. 

Was this tutorial on the Chi-square test useful to you? Do you have any doubts or questions for us? Mention them in this article's comments section, and we'll have our experts answer them for you at the earliest!

1) What is the chi-square test used for? 

The chi-square test is a statistical method used to determine if there is a significant association between two categorical variables. It helps researchers understand whether the observed distribution of data differs from the expected distribution, allowing them to assess whether any relationship exists between the variables being studied.

2) What is the chi-square test and its types? 

The chi-square test is a statistical test used to analyze categorical data and assess the independence or association between variables. There are two main types of chi-square tests: a) Chi-square test of independence: This test determines whether there is a significant association between two categorical variables. b) Chi-square goodness-of-fit test: This test compares the observed data to the expected data to assess how well the observed data fit the expected distribution.

3) What is the chi-square test easily explained? 

The chi-square test is a statistical tool used to check if two categorical variables are related or independent. It helps us understand if the observed data differs significantly from the expected data. By comparing the two datasets, we can draw conclusions about whether the variables have a meaningful association.

4) What is the difference between t-test and chi-square? 

The t-test and the chi-square test are two different statistical tests used for different types of data. The t-test is used to compare the means of two groups and is suitable for continuous numerical data. On the other hand, the chi-square test is used to examine the association between two categorical variables. It is applicable to discrete, categorical data. So, the choice between the t-test and chi-square test depends on the nature of the data being analyzed.

5) What are the characteristics of chi-square? 

The chi-square test has several key characteristics:

1) It is non-parametric, meaning it does not assume a specific probability distribution for the data.

2) It is sensitive to sample size; larger samples can result in more significant outcomes.

3) It works with categorical data and is used for hypothesis testing and analyzing associations.

4) The test output provides a p-value, which indicates the level of significance for the observed relationship between variables.

5)It can be used with different levels of significance (e.g., 0.05 or 0.01) to determine statistical significance.

Find our Data Analyst Online Bootcamp in top cities:

About the author.

Avijeet Biswal

Avijeet is a Senior Research Analyst at Simplilearn. Passionate about Data Analytics, Machine Learning, and Deep Learning, Avijeet is also interested in politics, cricket, and football.

Recommended Resources

A Complete Guide on Hypothesis Testing in Statistics

Getting Started with Google Display Network: The Ultimate Beginner’s Guide

Sanity Testing Vs Smoke Testing: Know the Differences, Applications, and Benefits Of Each

Sanity Testing Vs Smoke Testing: Know the Differences, Applications, and Benefits Of Each

Fundamentals of Software Testing

Fundamentals of Software Testing

The Key Differences Between Z-Test Vs. T-Test

The Building Blocks of API Development

  • PMP, PMI, PMBOK, CAPM, PgMP, PfMP, ACP, PBA, RMP, SP, and OPM3 are registered marks of the Project Management Institute, Inc.

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons
  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Statistics LibreTexts

9.6: Chi Squared Test for Variance or Standard Deviation

  • Last updated
  • Save as PDF
  • Page ID 31816

The possible hypothesis pairs are, for variance :

For standard deviation we use the square roots of everything :

Note that we did not square root \(k\). This is because we are using \(k\) to stand in for whatever number. That number from \(H_{0}\) will appear in our formulae as either \(\sigma^{2}\) or \(\sigma\) depending on the set up. Generally we will work with variance as we work through the problem and convert to standard deviation only in the last interpretation step if required by the wording of the question.

The new test statistic is :

\[\chi^{2}_{\rm test} = \frac{(n-1)s^2}{\sigma^2}\]

where \(s\) comes from the sample and \(\sigma^{2}\) comes from the number \(k\) in \(H_{0}\). The degrees of freedom associated with the test statistic (for finding the critical statistic) is \(\nu = n-1\). There is no mystery where this test statistic came from — this is just how \(\chi^{2}\) as a probability distribution is defined. So, for this test to be valid, the population must be normally distributed . The \(\chi^{2}\) test here is not very robust to violations of that assumption because there is no normalizing intermediate central limit theorem here.

The critical regions on the \(\chi^{2}\) distribution will appear as shown in Figure 9.5.

fig76png.jpg

Let’s work through an example of each hypotheses pair case. In all of the examples we assume that the population is normally distributed.

Example 9.6

1. Hypotheses.

\[ H_{0}: \sigma^2 \geq 225 \hspace{.25in} H_{1}: \sigma^{2} < 225 \]

2. Critical statistic.

Refer to Figure 9.6 as we get the critical statistic from the Chi-squared Distribution Table . As we see in that figure, we must look in the column that corresponds to a right tail area of 0.95. The row we need is for \(\nu = n-1=23-1=22\). With that information we find \(\chi^2_{\rm crit} = 12.338\).

FIgure-9.6.jpg

3. Test statistic.

The values we need for the test statistic are \(\sigma^{2} = 225\) (from \(H_{0}\)), \(s^{2} = 198\) and \(n-1=22\) from the information in the problem. So :

\begin{eqnarray*} \chi^{2} & = & \frac{(n-1)s^{2}}{\sigma^{2}} \\ \chi^{2} & = & \frac{(22)(198)}{225} = 19.36 \end{eqnarray*}

At this point we can also estimate the \(p\) value from the Chi-squared Distribution Table . The \(p\) value is the area under the \(\chi^{2}\) distribution with \(\nu = 22\) to the left of \(\chi^{2_{\rm test}}\). In the \(\nu = 22\) row of the Chi-squared Distribution Table (in general use the closest \(\nu\) if your particular value is not in the Chi-squared Distribution Table ) hunt down the test statistic value of 19.38. You won’t find it but you can bracket it with values higher and lower than 19.38. Those numbers are 14.042 which has a right tail area of 0.90 (and so a left tail area of 0.10) and 30.813 which has a right tail area of 0.10 (and so a left tail area of 0.90). Recall that the \(\alpha\) in the column headings of the Chi-squared Distribution Table refers to right tail areas. So, considering the left tail areas we know that \(0.10 < p < 0.90\) since \(30.813 > 19.38 > 14.042\) for the relevant \(\chi^{2}\) values.

4. Decision.

fig77apng.jpg

Since \(\chi^{2}_{\rm test}\) doesn’t fall in the rejection region, do not reject \(H_{0}\). We come to the same conclusion with our \(p\)-value estimate:

\[ (0.10 < p < 0.90) > (\alpha = 0.05) \]

5. Interpretation.

There is not enough evidence, at \(\alpha = 0.05\) with a \(\chi^{2}\) test, to support the claim that the variation in test scores of the class is less than 225.

Example 9.7 : A hospital administrator believes that the standard deviation of the number of people using out-patient surgery per day is greater than eight. A random sample of 15 days is selected. The data are shown below. At \(\alpha = 0.10\) is there enough evidence to support the administrator’s claim?

\begin{equation*} 25 \hspace{.2cm} 30 \hspace{.2cm} 5 \hspace{.2cm} 15 \hspace{.2cm} 18 \\ 42 \hspace{.2cm} 16 \hspace{.2cm} 9 \hspace{.2cm} 10 \hspace{.2cm} 12 \\ 12 \hspace{.2cm} 38 \hspace{.2cm} 8 \hspace{.2cm} 14 \hspace{.2cm} 27 \end{equation*}

0. Data reduction.

We’ll introduce a step 0 when it looks like we should do some preliminary calculations with or data. In this case we should enter the dataset into our calculations and determine \(s\). We find \(s=11.2\).

\[ H_0: \sigma^2 \leq 64 \hspace{.25in} H_1: \sigma^2 > 64 \mbox{ (claim)} \]

Note conversion to \(\sigma^2\) right away.

fig78png-1.jpg

In the \(\nu = 15-1=14\) line and \(\alpha_{T}=0.10\) column of the Chi-squared Distribution Table , look up \(\chi^2_{\rm crit} = 21.064\)

\begin{eqnarray*} \chi^{2}_{\rm test} & = & \frac{(n-1)s^2}{\sigma^2} \\ \chi^{2}_{\rm test} & = & \frac{(14)(11.2)^2}{64} = 27.44 \end{eqnarray*}

To estimate the \(p\) value, find the bracketing values of \(\chi^{2}_{\rm test} = 27.44\) in the \(\nu = 14\) line of the Chi-squared Distribution Table . They are : 26.119 (\(\alpha = 0.025\)) and 29.141 (\(\alpha = 0.010\)), so \(0.010 < p < 0.025\).

fig78apng.jpg

Reject \(H_{0}\) since \(\chi^{2}_{\rm test}\) is in the rejection region. Our estimate of \(p\) leads to the same conclusion :

\[ (0.010 < p < 0.025) < (\alpha = 0.10) \]

There is enough evidence, at \(\alpha = 0.10\) with a \(\chi^{2}\) test, to support the claim that the standard deviation is greater than 8. (Note how we convert to a statement about standard deviation after working through the problem using variances.)

Example 9.8

\[ H_{0}: \sigma^{2} = 0.644 \mbox{ (claim)} \hspace{.25in} H_{1}: \sigma^{2} \neq 0.64 \]

fig79png.jpg

Referring to Figure 9.7, we see that we need two \(\chi^2_{\rm crit}\) values, one with a tail area of 0.025 and the other with a tail area of 1 – 0.025 = 0.975. From the Chi-squared Distribution Table in the \(\nu = n - 1 = 19\) line find \(\chi^2_{\rm crit} = 8.907\) from the \(\alpha_{T} = 0.975\) column and \(\chi^2_{\rm crit} = 32.852\) from the \(\alpha_{T} = 0.025\) column.

\begin{eqnarray*} \chi^{2}_{\rm test} & = & \frac{(n-1)s^{2}}{\sigma^{2}} \\ \chi^{2}_{\rm test} & = & \frac{(19)(1^{2})}{(0.644)} = 29.50 \end{eqnarray*}

To estimate the \(p\) value find the bracketing value of \(\chi^{2}_{\rm test} = 29.50\) in the \(\nu = 19\) row, They are 27.204 (\(\alpha_{T} = 0.10\)) and 30.144 (\(\alpha_{T} = 0.05\)). The \(\alpha_{T}\) are right tail areas, which is ok, but we need to multiply them by 2 because those right tail areas represent \( \color{red}{\text{latex code here}}\) as shown in Figure 9.8. So \(0.10 < p < 0.20\).

Figure-9.8.png

Do not reject \(H_{0}\). The estimate \(p\) value leads to the same conclusion :

\[ (0.10 < p < 0.20) > (\alpha = 0.05) \]

There is not enough evidence, at \(\alpha = 0.05\) with a \(\chi^{2}\) test, to reject the manufacturer’s claim that the variance of the nicotine content of the cigarettes is equal to 0.644.

Notice, with the claim on \(H_{0}\), that failing to reject \(H_{0}\) does not provide any evidence that \(H_{0}\) is true. We just have the weaker conclusion that we couldn’t disprove it. Such is the double negative nature of the logic behind hypothesis testing that arises where we don’t assign probabilities to hypothesis.

IMAGES

  1. Chi Square Test

    hypothesis testing and chi square

  2. PPT

    hypothesis testing and chi square

  3. Chi Square Test

    hypothesis testing and chi square

  4. 02 Complete Chi Square Hypothesis Test Example 1

    hypothesis testing and chi square

  5. PPT

    hypothesis testing and chi square

  6. PPT

    hypothesis testing and chi square

VIDEO

  1. Statistika & Probabilitas, 5 Jan 2024

  2. Hypothesis Testing Part 4 Chi-square and Correlation

  3. Test of Hypothesis, Chi-Square distribution vvi 6th level,4th level bank exam

  4. CHI SQUARE TEST INTRODUCTION

  5. Hypothesis Testing Using the Chi-Square Distribution: Example

  6. Hypothesis Testing

COMMENTS

  1. Hypothesis Testing

    We then determine the appropriate test statistic for the hypothesis test. The formula for the test statistic is given below. Test Statistic for Testing H0: p1 = p 10 , p2 = p 20 , ..., pk = p k0. We find the critical value in a table of probabilities for the chi-square distribution with degrees of freedom (df) = k-1.

  2. Chi-Square (Χ²) Tests

    Where: Χ 2 is the chi-square test statistic; Σ is the summation operator (it means "take the sum of") O is the observed frequency; E is the expected frequency; The larger the difference between the observations and the expectations (O − E in the equation), the bigger the chi-square will be.To decide whether the difference is big enough to be statistically significant, you compare the ...

  3. The Chi-Square Test

    The basic idea behind the test is to compare the observed values in your data to the expected values that you would see if the null hypothesis is true. There are two commonly used Chi-square tests: the Chi-square goodness of fit test and the Chi-square test of independence. Both tests involve variables that divide your data into categories.

  4. What Is Chi Square Test & How To Calculate Formula Equation

    The Chi-square test is a non-parametric statistical test used to determine if there's a significant association between two or more categorical variables in a sample. It works by comparing the observed frequencies in each category of a cross-tabulation with the frequencies expected under the null hypothesis, which assumes there is no ...

  5. Chi-squared test

    Chi-squared distribution, showing χ 2 on the x-axis and p-value (right tail probability) on the y-axis.. A chi-squared test (also chi-square or χ 2 test) is a statistical hypothesis test used in the analysis of contingency tables when the sample sizes are large. In simpler terms, this test is primarily used to examine whether two categorical variables (two dimensions of the contingency table ...

  6. Chi-squared test

    The formula for the chi-squared test is χ 2 = Σ (Oi − Ei)2/ Ei, where χ 2 represents the chi-squared value, Oi represents the observed value, Ei represents the expected value (that is, the value expected from the null hypothesis), and the symbol Σ represents the summation of values for all i. One then looks up in a table the chi-squared ...

  7. Chi-square statistic for hypothesis testing

    And we got a chi-squared value. Our chi-squared statistic was six. So this right over here tells us the probability of getting a 6.25 or greater for our chi-squared value is 10%. If we go back to this chart, we just learned that this probability from 6.25 and up, when we have three degrees of freedom, that this right over here is 10%.

  8. 9.6: Chi-Square Tests

    Computational Exercises. In each of the following exercises, specify the number of degrees of freedom of the chi-square statistic, give the value of the statistic and compute the P -value of the test. A coin is tossed 100 times, resulting in 55 heads. Test the null hypothesis that the coin is fair.

  9. Chi-Square (Χ²) Distributions

    Published on May 20, 2022 by Shaun Turney . Revised on June 21, 2023. A chi-square (Χ2) distribution is a continuous probability distribution that is used in many hypothesis tests. The shape of a chi-square distribution is determined by the parameter k. The graph below shows examples of chi-square distributions with different values of k.

  10. S.4 Chi-Square Tests

    The two categorical variables are dependent. Chi-Square Test Statistic. χ 2 = ∑ ( O − E) 2 / E. where O represents the observed frequency. E is the expected frequency under the null hypothesis and computed by: E = row total × column total sample size. We will compare the value of the test statistic to the critical value of χ α 2 with ...

  11. Chi-Square Test of Independence

    A chi-square (Χ 2) test of independence is a nonparametric hypothesis test. You can use it to test whether two categorical variables are related to each other. Example: Chi-square test of independence. Imagine a city wants to encourage more of its residents to recycle their household waste.

  12. Chi-square Test and its Application in Hypothesis Testing

    The logic of hypothesis testing was first invented by Karl Pearson (1857-1936), a renaissance scientist, in Victorian London in 1900. Pearson's Chi-square distribution and the Chi-square test also known as test for goodness-of-fit and test of independence are his most important contribution to the modern theory of statistics.

  13. Inference for categorical data (chi-square tests)

    About this unit. Chi-square tests are a family of significance tests that give us ways to test hypotheses about distributions of categorical data. This topic covers goodness-of-fit tests to see if sample data fits a hypothesized distribution, and tests for independence between two categorical variables. Chi-square distribution introduction.

  14. Chi-Square Test of Independence and an Example

    The Chi-square test of independence determines whether there is a statistically significant relationship between categorical variables.It is a hypothesis test that answers the question—do the values of one categorical variable depend on the value of other categorical variables? This test is also known as the chi-square test of association.

  15. 8.1

    To conduct this test we compute a Chi-Square test statistic where we compare each cell's observed count to its respective expected count. In a summary table, we have r × c = r c cells. Let O 1, O 2, …, O r c denote the observed counts for each cell and E 1, E 2, …, E r c denote the respective expected counts for each cell.

  16. 9.8: Chi-Square Test

    Determine if the dice are fair (Use a chi-square goodness of fit test). First you must find the expected number of rolls for each category. To do this, multiply the probability of each category by 100. For example, the expected number of rolls where you observe zero 5's is .5787⋅100=57.87.

  17. Chi-square

    The expected values correspond with the null hypothesis for χ 2 tests: equal representation of categories. Our first of two χ 2 tests, the Goodness-of-Fit test, will assess how well our data lines up with, or deviates from, this assumption.. Goodness-of-Fit Test. The first of our two χ 2 tests assesses one categorical variable against a null hypothesis of equally sized frequencies.

  18. PDF The Chi Square Test

    Chi-Square Test. To determine whether the association between two qualitative variables is statistically significant, researchers must conduct a test of significance called the Chi-Square Test. There are five steps to conduct this test. Step 1: Formulate the hypotheses. Null Hypothesis:

  19. How the Chi-Squared Test of Independence Works

    For chi-squared tests, the term "expected frequencies" refers to the values we'd expect to see if the null hypothesis is true. To calculate the expected frequency for a specific combination of categorical variables (e.g., blue shirts who died), multiply the column total (Blue) by the row total (Dead), and divide by the sample size.

  20. What is a Chi-Square Test? Formula, Examples & Uses

    Chi-Squared Tests are most commonly used in hypothesis testing. A hypothesis is an assumption that any given condition might be true, which can be tested afterwards. The Chi-Square test estimates the size of inconsistency between the expected results and the actual results when the size of the sample and the number of variables in the ...

  21. Chi-Square Goodness of Fit Test

    Example: Chi-square goodness of fit test conditions. You can use a chi-square goodness of fit test to analyze the dog food data because all three conditions have been met: You want to test a hypothesis about the distribution of one categorical variable. The categorical variable is the dog food flavors. You recruited a random sample of 75 dogs.

  22. 9.6: Chi Squared Test for Variance or Standard Deviation

    Figure 9.6 : Schematics of the critical regions for χ2 χ 2 tests of variance. In the two-tailed situation the tail areas are equal. 3. Test statistic. The values we need for the test statistic are σ2 = 225 σ 2 = 225 (from H0 H 0 ), s2 = 198 s 2 = 198 and n − 1 = 22 n − 1 = 22 from the information in the problem. So :