Chi Square Goodness Of Fit Test Formula | Clean Count Method

The test statistic adds squared gaps between observed and expected counts, each divided by its expected count.

The Chi Square Goodness Of Fit Test Formula is built for one job: checking whether the counts you observed match a claimed pattern closely enough. It works with categories, not measurements on a smooth scale. Think dice rolls, survey choices, color counts, genetic ratios, defect types, or weekday sales counts.

The formula is:

χ² = Σ ((O − E)² / E)

Here, O means the observed count in a category, and E means the expected count for that same category. You calculate one term for each category, add the terms, then compare the total to a chi-square distribution using degrees of freedom.

When This Test Fits The Job

Use this test when one categorical variable is being checked against a claimed distribution. The claim can be equal shares across groups, a fixed ratio, or a model that gives expected counts for each bin.

Good uses include:

Checking whether a die lands on each face at the same rate.
Testing whether customer choices match planned product shares.
Comparing observed offspring counts with a Mendelian ratio.
Checking whether calls arrive across weekdays in a claimed pattern.

Don’t use it when you’re testing a relationship between two categorical variables. That is a chi-square test of independence. Don’t use it for means, medians, or paired before-and-after measurements.

Using The Goodness Of Fit Formula With Observed Counts

The cleanest way to use the formula is to build a small work table. Put categories in the first column, observed counts in the second, and expected counts in the third. Then add a fourth working value if you’re doing the math outside your published table: (O − E)² / E.

How To Find Expected Counts

Expected counts come from the claim being tested. If the claim says four categories should be equal and the total sample is 200, each category gets an expected count of 50. If the claim says 50%, 30%, and 20%, multiply those shares by the total sample.

For a sample of 200, that gives:

50% share: 200 × 0.50 = 100
30% share: 200 × 0.30 = 60
20% share: 200 × 0.20 = 40

The formula rewards close matches with small values. It punishes bigger gaps, since the difference is squared. Dividing by the expected count stops large categories from swamping the whole result just because their raw counts are bigger.

Calculation Steps That Keep The Math Clean

Start with the claim, not the sample. The claim sets the expected counts, and the sample supplies the observed counts. If those two pieces get mixed, the result can look neat while being wrong.

The NIST chi-square goodness-of-fit page defines the test around observed bin counts compared with counts expected under a specified distribution. That wording matches the practical rule: each row needs one observed count and one expected count tied to the same category.

A Simple Worked Layout

Say a snack brand expects buyers to choose four flavors equally. A sample of 120 purchases gives these counts: 36, 28, 31, and 25. Equal choice means each flavor has an expected count of 30.

The terms are:

Flavor A: (36 − 30)² / 30 = 1.20
Flavor B: (28 − 30)² / 30 = 0.13
Flavor C: (31 − 30)² / 30 = 0.03
Flavor D: (25 − 30)² / 30 = 0.83

Add them: χ² = 1.20 + 0.13 + 0.03 + 0.83 = 2.19. With four categories and no estimated model parameter, degrees of freedom are 4 − 1 = 3. You then find the right-tail p-value for χ² = 2.19 with 3 degrees of freedom.

Piece Of The Test	What It Means	Clean Check
Observed count	The real count found in one category.	Use whole counts, not percentages.
Expected count	The count predicted by the claimed pattern.	Expected counts should add to the sample total.
Category match	Observed and expected counts must refer to the same bin.	Never compare one group’s observed count with another group’s expected count.
Formula term	Each row contributes `(O − E)² / E`.	Square the gap before dividing.
Total statistic	The sum of all row terms gives `χ²`.	Small totals mean closer fit.
Degrees of freedom	Usually categories minus one.	Subtract estimated parameters if the data were used to fit them.
P-value	The right-tail area beyond the test statistic.	Smaller p-values give more reason to reject the claimed pattern.
Decision	The result tells whether the gaps are too large for the claim.	State the claim, p-value, and plain meaning together.

Degrees Of Freedom And The P-Value

For a basic goodness-of-fit test, degrees of freedom are the number of categories minus one. If there are six categories, the starting value is five. When parameters are estimated from the same data, subtract the number of estimated parameters too.

If A Parameter Was Fitted

When a parameter is fitted from the same sample, the degrees of freedom drop by one more. A normal curve checked after fitting the mean and standard deviation loses two degrees of freedom. A fixed distribution named before sampling does not lose those extra counts.

Penn State’s lesson on the goodness-of-fit test gives the same count rule for degrees of freedom and also states the usual expected-count condition. This matters because the chi-square curve is an approximation. Sparse expected counts can make the p-value shaky.

The test is right-tailed. A larger statistic means the observed counts sit farther from the expected counts. You do not use a left-tail result, because a tiny statistic means the counts fit the claim closely.

What The Result Does And Doesn’t Say

A small p-value says the observed pattern would be odd if the claimed distribution were true. It does not prove why the pattern differs. A large p-value says the data do not give strong reason to reject the claim. It does not prove the claim is perfect.

Write the result in plain language. Say what was tested, name the sample size, give the statistic, give degrees of freedom, give the p-value, then state the decision.

Common Mistakes That Change The Answer

Most errors come from using percentages in the formula, copying expected counts from the sample by accident, or forgetting the degrees of freedom adjustment. The fix is slow, plain arithmetic.

Mistake	Why It Breaks The Test	Better Move
Using percentages as counts	The formula expects counts in each category.	Convert claimed shares into expected counts.
Rounding too early	Small rounding changes can shift the final statistic.	Carry extra decimals until the last step.
Low expected counts	The chi-square approximation may not behave well.	Combine sensible categories or use another method.
Wrong degrees of freedom	The p-value comes from the wrong curve.	Start with categories minus one, then adjust for fitted parameters.
Calling the result proof	The test weighs evidence; it doesn’t prove a distribution.	Use careful wording tied to the p-value.

Reporting The Formula Result Clearly

OpenStax describes the goodness-of-fit test as a comparison between observed and expected values, with the statistic moving into the right tail when the counts are far apart. That is the wording readers need when you report a result.

A clean report can read like this:

A chi-square goodness-of-fit test compared the observed flavor counts with an equal-choice claim. The result was χ²(3) = 2.19, p = 0.53. The sample does not give strong reason to reject the equal-choice claim.

That sentence gives the method, degrees of freedom, statistic, p-value, and plain meaning. It avoids overclaiming. It also tells the reader what decision can be made from the calculation.

Final Check Before You Calculate

One categorical variable is being tested.
Observed counts are raw counts.
Expected counts come from the claim.
Expected counts add to the total sample size.
Each expected count is large enough for the chi-square approximation.
The p-value comes from the right tail.

Once those checks pass, the formula is straight arithmetic. The hard part is not the square or the sum. The hard part is making sure the expected counts truly match the claim being tested.

References & Sources

National Institute of Standards and Technology (NIST).“Chi-Square Goodness-of-Fit Test.”Defines the test through observed bin counts compared with counts expected under a specified distribution.
Penn State STAT 200.“Goodness of Fit Test.”Explains observed counts, expected counts, expected-count conditions, and degrees of freedom.
OpenStax.“Goodness-of-Fit Test.”Shows how the chi-square statistic and right-tail p-value are read in an introductory statistics setting.