The test statistic adds squared gaps between observed and expected counts, each divided by its expected count.
The Chi Square Goodness Of Fit Test Formula is built for one job: checking whether the counts you observed match a claimed pattern closely enough. It works with categories, not measurements on a smooth scale. Think dice rolls, survey choices, color counts, genetic ratios, defect types, or weekday sales counts.
The formula is:
χ² = Σ ((O − E)² / E)
Here, O means the observed count in a category, and E means the expected count for that same category. You calculate one term for each category, add the terms, then compare the total to a chi-square distribution using degrees of freedom.
When This Test Fits The Job
Use this test when one categorical variable is being checked against a claimed distribution. The claim can be equal shares across groups, a fixed ratio, or a model that gives expected counts for each bin.
Good uses include:
- Checking whether a die lands on each face at the same rate.
- Testing whether customer choices match planned product shares.
- Comparing observed offspring counts with a Mendelian ratio.
- Checking whether calls arrive across weekdays in a claimed pattern.
Don’t use it when you’re testing a relationship between two categorical variables. That is a chi-square test of independence. Don’t use it for means, medians, or paired before-and-after measurements.
Using The Goodness Of Fit Formula With Observed Counts
The cleanest way to use the formula is to build a small work table. Put categories in the first column, observed counts in the second, and expected counts in the third. Then add a fourth working value if you’re doing the math outside your published table: (O − E)² / E.
How To Find Expected Counts
Expected counts come from the claim being tested. If the claim says four categories should be equal and the total sample is 200, each category gets an expected count of 50. If the claim says 50%, 30%, and 20%, multiply those shares by the total sample.
For a sample of 200, that gives:
- 50% share:
200 × 0.50 = 100 - 30% share:
200 × 0.30 = 60 - 20% share:
200 × 0.20 = 40
The formula rewards close matches with small values. It punishes bigger gaps, since the difference is squared. Dividing by the expected count stops large categories from swamping the whole result just because their raw counts are bigger.
Calculation Steps That Keep The Math Clean
Start with the claim, not the sample. The claim sets the expected counts, and the sample supplies the observed counts. If those two pieces get mixed, the result can look neat while being wrong.
The NIST chi-square goodness-of-fit page defines the test around observed bin counts compared with counts expected under a specified distribution. That wording matches the practical rule: each row needs one observed count and one expected count tied to the same category.
A Simple Worked Layout
Say a snack brand expects buyers to choose four flavors equally. A sample of 120 purchases gives these counts: 36, 28, 31, and 25. Equal choice means each flavor has an expected count of 30.
The terms are:
- Flavor A:
(36 − 30)² / 30 = 1.20 - Flavor B:
(28 − 30)² / 30 = 0.13 - Flavor C:
(31 − 30)² / 30 = 0.03 - Flavor D:
(25 − 30)² / 30 = 0.83
Add them: χ² = 1.20 + 0.13 + 0.03 + 0.83 = 2.19. With four categories and no estimated model parameter, degrees of freedom are 4 − 1 = 3. You then find the right-tail p-value for χ² = 2.19 with 3 degrees of freedom.
| Piece Of The Test | What It Means | Clean Check |
|---|---|---|
| Observed count | The real count found in one category. | Use whole counts, not percentages. |
| Expected count | The count predicted by the claimed pattern. | Expected counts should add to the sample total. |
| Category match | Observed and expected counts must refer to the same bin. | Never compare one group’s observed count with another group’s expected count. |
| Formula term | Each row contributes (O − E)² / E. |
Square the gap before dividing. |
| Total statistic | The sum of all row terms gives χ². |
Small totals mean closer fit. |
| Degrees of freedom | Usually categories minus one. | Subtract estimated parameters if the data were used to fit them. |
| P-value | The right-tail area beyond the test statistic. | Smaller p-values give more reason to reject the claimed pattern. |
| Decision | The result tells whether the gaps are too large for the claim. | State the claim, p-value, and plain meaning together. |
Degrees Of Freedom And The P-Value
For a basic goodness-of-fit test, degrees of freedom are the number of categories minus one. If there are six categories, the starting value is five. When parameters are estimated from the same data, subtract the number of estimated parameters too.
If A Parameter Was Fitted
When a parameter is fitted from the same sample, the degrees of freedom drop by one more. A normal curve checked after fitting the mean and standard deviation loses two degrees of freedom. A fixed distribution named before sampling does not lose those extra counts.
Penn State’s lesson on the goodness-of-fit test gives the same count rule for degrees of freedom and also states the usual expected-count condition. This matters because the chi-square curve is an approximation. Sparse expected counts can make the p-value shaky.
The test is right-tailed. A larger statistic means the observed counts sit farther from the expected counts. You do not use a left-tail result, because a tiny statistic means the counts fit the claim closely.
What The Result Does And Doesn’t Say
A small p-value says the observed pattern would be odd if the claimed distribution were true. It does not prove why the pattern differs. A large p-value says the data do not give strong reason to reject the claim. It does not prove the claim is perfect.
Write the result in plain language. Say what was tested, name the sample size, give the statistic, give degrees of freedom, give the p-value, then state the decision.
Common Mistakes That Change The Answer
Most errors come from using percentages in the formula, copying expected counts from the sample by accident, or forgetting the degrees of freedom adjustment. The fix is slow, plain arithmetic.
| Mistake | Why It Breaks The Test | Better Move |
|---|---|---|
| Using percentages as counts | The formula expects counts in each category. | Convert claimed shares into expected counts. |
| Rounding too early | Small rounding changes can shift the final statistic. | Carry extra decimals until the last step. |
| Low expected counts | The chi-square approximation may not behave well. | Combine sensible categories or use another method. |
| Wrong degrees of freedom | The p-value comes from the wrong curve. | Start with categories minus one, then adjust for fitted parameters. |
| Calling the result proof | The test weighs evidence; it doesn’t prove a distribution. | Use careful wording tied to the p-value. |
Reporting The Formula Result Clearly
OpenStax describes the goodness-of-fit test as a comparison between observed and expected values, with the statistic moving into the right tail when the counts are far apart. That is the wording readers need when you report a result.
A clean report can read like this:
A chi-square goodness-of-fit test compared the observed flavor counts with an equal-choice claim. The result was χ²(3) = 2.19, p = 0.53. The sample does not give strong reason to reject the equal-choice claim.
That sentence gives the method, degrees of freedom, statistic, p-value, and plain meaning. It avoids overclaiming. It also tells the reader what decision can be made from the calculation.
Final Check Before You Calculate
- One categorical variable is being tested.
- Observed counts are raw counts.
- Expected counts come from the claim.
- Expected counts add to the total sample size.
- Each expected count is large enough for the chi-square approximation.
- The p-value comes from the right tail.
Once those checks pass, the formula is straight arithmetic. The hard part is not the square or the sum. The hard part is making sure the expected counts truly match the claim being tested.
References & Sources
- National Institute of Standards and Technology (NIST).“Chi-Square Goodness-of-Fit Test.”Defines the test through observed bin counts compared with counts expected under a specified distribution.
- Penn State STAT 200.“Goodness of Fit Test.”Explains observed counts, expected counts, expected-count conditions, and degrees of freedom.
- OpenStax.“Goodness-of-Fit Test.”Shows how the chi-square statistic and right-tail p-value are read in an introductory statistics setting.