## One Way Goodness of Fit Test

Typical Research Question

“Are the proportions of people in income groups on Medicare in Florida significantly different from those proportions in the US overall? “

More Examples of one-way Chi-square research questions.

“is there a significant difference in the proportion of
observed counts of male and female cardiovascular deaths and the expected from
the proportions of males and females in Indiana.”

“Is there a significant difference in frequency of modifier 51 assignments across surgical specialties during the second quarter of 2019? [this is a test against a uniform distribution.]

““Currently the question is if there is a significant difference in the number of patients checked in by our three front desk representatives. The three representatives should average about the same number of patients as they are all scheduled the normal business hours of 8a-5p Monday through Friday. “[this is a test against a uniform distribution.]

“If a homebuyer believes the crime in a particular area is high, they will likely not purchase a home in that neighborhood. Is there a difference in the frequency of crimes reported in three areas of the city? [this is a test against a uniform distribution.

A One-way Chi-square test is a goodness of fit test. It determines if the distribution of frequencies (counts) in a sample “fit” a hypothesized (assumed) specific distribution.

• Null Hypothesis Ho: The data are consistent with a specified distribution.
• Alternative Hypothesis Ha: The data are not consistent with a specified distribution.

The Null specifies the proportion of observations at each level of the categorical variable. The alternative says that at least one of the specified proportions is not true. Note, that if one is not true, because the whole must equal the sum of the parts, at least one other proportion is not true.

When is the One-way Chi-square Goodness of Fit appropriate?

• The observed frequency (count) data is from a random sample.
• The one variable of concern is categorical.
• Each level of the categorical variable has an expected frequency of at least 5.

The specified distribution can

• have different proportions for the levels which sum to the total count, such as
• a historical distribution.
• a reference distribution.
• proportions based on an actual “denominator.”
• be a uniform distribution (all levels have an equal proportion of the total count),

An important limitation of the Chi-square Goodness of Fit test is that it does not specify which categories are significantly different. If you have only two categories and a statistically significant outcome (p-value < significance level), you can say the two categories are different, But with more than two categories, additional tests (post hoc tests) will be required to confirm if an individual category observed count is significantly different from the expected value.

Example 1: Expected Frequency Distribution is Specified

A manager suspects that the pattern of patient admissions has changed since the last staffing analysis six months ago. At that time, the patient admissions followed this distribution labeled “Previous Proportions” in the table below. From the hospital records for the month, the manager finds these are the current frequencies in each time period:

For this hypothesis test, the Null and Alternative are:

• The Null hypothesis is that the proportions of patient admissions across the time periods are Morning 15%, Afternoon 30%, Evening 35%, and Night 20%.
• The Alternative is that the proportions are not Morning 15%, Afternoon 30%, Evening 35%, and Night 20%.

Below is a bar chart visualization of the observed data and the expected distribution.

You can use the Excel Chi-square calculator, an online calculator such as GraphPad, or another technology (such as R or Python). See Technology Tools at the bottom of the page.

Below is the GraphPad “QuickCalcs” One-way Chi-square output for this test:

Here is one way to report these results in Section D1 of the Task 2 report:

The Chi-square statistic is 7.143 with 3 degrees of freedom. The p-value is 0.067, which is greater than the significance level of 0.05 which indicates the Null hypothesis that the distribution of patient admissions is the historical distribution should not be rejected. The conclusion is that the distribution of patient admissions is not significantly different from the historical distribution.

Below is another chart appropriate for this example. This chart indicates the observed values and the expected values are very similar for each category. It confirms visually what the p-value tells us – you cannot conclude the observed patient frequencies are different from the historical proportions.

Example 2: Expected Distribution is a Uniform Distribution

A manager wants to know if the number of patient hospital admissions varies across four time periods (morning, afternoon, evening, and night; each representing a 6-hour period). This has important staffing implications. The manager collects one month of data and observes the following admission frequencies (i.e., Observed frequencies).

The research question: Is there a significant difference in the distribution of the frequencies (counts) of admissions across the four time periods?

To answer the manager’s question, a one-way Chi-square test of hypotheses is conducted. The Null and Alternative hypotheses are:

• Ho: The proportions of patient admissions are Morning 25%, Afternoon 25%, Evening 25%, and Night 25%.
• Ha: At least one proportion is not equal to 25%.

Given that there is a total of 1,200 admissions over the month, the null hypothesis predicts that there should be 300 admissions for each of the 4 time periods (i.e., 1,200/4 = 25%). These represent the Expected frequencies assuming the null hypothesis is true.

To determine whether the Observed Frequencies are a significant departure from the Expected frequencies (i.e., whether or not the Null hypothesis should be rejected), a Chi-Square test is performed. This involves calculating the Chi-Square statistic (i.e., χ2). Instead of doing the manual calculations, we will use the Excel Chi-square Calculators workbook here.

Here is the Excel Calculator output:

Here is one way to report these results in section D1 of the report:

The Chi-square statistic is 78.000 with 3 degrees of freedom. The p-value, which is in scientific notation, is less than 0.0001, and is smaller than the significance level of 0.05. This indicates the Null hypothesis that the distribution of patient admissions is uniform should be rejected. The conclusion is that the distribution of patient admissions differs across the time periods.

A limitation of the Chi-Square analysis is that it doesn’t specify which categories are significantly different from the expected frequencies. We just know that the pattern of Observed frequencies is significantly different from what is expected under the null hypothesis.

However, by inspecting a chart, we can often make useful inferences. This is a reasonable chart to include. From the chart, it is reasonable to conclude that admissions are greater during the evening than during the morning because those two categories have the largest and smallest frequencies. But due to the limitation of the Chi-square test previously discussed, drawing any further conclusions about differences will require conducting post hoc tests.

Example 3: Consider the “denominator.”

Did the way that states managed the COVID-19 crisis make a difference in the outcomes?

There are several ways of classifying states but one currently popular is “red” and “blue,” meaning generally republican-controlled states and generally democrat-controlled states.

One student phrased the question this way: “Is the total number of positive COVID-19 cases significantly different between states that are controlled by the Republican party and those controlled by the Democratic party?”

The total counts of COVID-19 cases as of March 11. 2021 were

• 27 Red states – 14,371,859
• 23 Blue states – 14,736, 550

Assuming there are no reasons why the counts should be different, an equal proportions (uniform) distribution of 50%:50% was used as the Expected Null Distribution. Note the green highlight assumption of the one-way Chi-square test.

The results of the One-way Chi-square test are below:

But especially with health issues, we need to recognize that the underlying population size could impact the expected values. That is why we generally use prevalence instead of focusing just on raw counts. Prevalence in a count per 100,000 or per 10,000 population.

The Chi-square test was run again this time using the proportions of the total population in the red and blue states, approximately 54% in the blue states and approximately 46% in the red states.

The results are still statistically significant – there is a difference in red and blue states. But by comparing the observed and expected counts, we see the blue states have about a million fewer cases than expected while the red had a million more cases than expected.

The denominator is important.

The Technology Tool

You can use any Chi-square technology to run the test. But you must use some technology and not just “manually” go step-by-step through the process. It is recommended you use the Excel Chi-square calculator or the on-line one-way QuickCalcs Chi-square calculator.