A chi-square test checks whether observed category counts differ from expected counts by more than random chance would suggest.
The chi-square test is a statistical test for categorical data. It asks a plain question: if the counts you saw were only the product of random variation, would they look this far away from the counts you expected? If the gap is too large, the data push back against that idea.
That makes the test handy when your data come as counts in groups. Think survey answers, product choices, blood types, defect categories, or voting preferences. If the data are averages, times, weights, or scores, this is usually not the right tool.
Definition Of Chi-Square Test In Plain Language
In plain language, the test compares two sets of counts:
- Observed counts: what you actually recorded
- Expected counts: what you would expect if the null idea were true
The null idea depends on the job. In one setting, it says a sample follows a known pattern, such as 25% in each of four groups. In another, it says two categorical variables are unrelated, such as drink choice and age group.
The test then turns all those gaps into one number, written as χ2. Larger values mean the observed counts sit farther from the expected counts. That number is paired with degrees of freedom and a p-value to judge whether the gap is likely to be random.
Chi-Square Test Meaning And When It Fits The Job
You will usually meet two versions of the chi-square test.
Goodness-Of-Fit Test
This version checks one categorical variable against a claimed pattern. Say a six-sided die is supposed to be fair. After many rolls, you compare the face counts you got with the equal counts you would expect from a fair die.
Test Of Independence
This version checks whether two categorical variables are linked. Say a retailer records payment method and age band. The test asks whether payment method changes across age bands or whether the split looks about the same in each band.
Both versions rest on the same idea: compare what happened with what should happen under the null idea. The NIST Engineering Statistics Handbook page on the chi-square goodness-of-fit test gives the classic setup, and Penn State’s STAT 200 lesson on chi-square tests shows how the same logic carries into classroom-style data tables.
How The Formula Works Without The Fog
The formula is built from one repeated step: subtract expected from observed, square the gap, then divide by the expected count. After you do that for every cell, you add the pieces together.
χ2 = ∑ (Observed – Expected)2 / Expected
That structure does two neat things. Squaring turns all gaps positive, so overcounts and undercounts do not cancel each other out. Dividing by the expected count keeps a gap of 10 from meaning the same thing in a cell that expected 12 and a cell that expected 300.
Degrees of freedom tell you how much room the table had to vary once totals were fixed. In a goodness-of-fit test, degrees of freedom are usually the number of categories minus 1. In a contingency table, they are usually (rows – 1) × (columns – 1).
| Part | What It Means | Why It Matters |
|---|---|---|
| Observed count | The number you recorded in a category | This is the raw data the test starts with |
| Expected count | The count predicted by the null idea | It gives the baseline for comparison |
| Null hypothesis | The claim of no mismatch or no link | The test checks whether the data fit this claim |
| χ2 statistic | The summed gap between observed and expected counts | Larger values point to a poorer fit |
| Degrees of freedom | The number of free-moving count slots | Needed to read the test statistic correctly |
| P-value | The chance of seeing a gap this large if the null idea were true | A small value signals the data are hard to square with the null idea |
| Contingency table | A count table for two categorical variables | It is the usual input for a test of independence |
| Goodness of fit | A check of one variable against a claimed pattern | Useful when you want to test a stated distribution |
What The Test Can And Cannot Tell You
A chi-square test can tell you whether the gap between observed and expected counts is larger than you would expect from random noise alone. It does not tell you that one variable caused the other. It also does not tell you how large or meaningful the link is in a practical sense.
That last point trips people up. A huge sample can make tiny count gaps look persuasive. A small sample can hide a real pattern. So the test is best read alongside the table itself, the sample size, and, when relevant, an effect-size measure such as Cramér’s V.
If you are working with a two-way table, Penn State’s chi-square test of independence page also walks through expected counts and the usual minimum-count check.
Assumptions That Need To Hold
This test is simple to run. It is also easy to misuse. Before you trust the result, make sure the setup fits the test.
- The data are counts. Percentages alone are not enough unless you can recover the underlying counts.
- The categories do not overlap. Each case belongs in one cell only.
- The observations are independent. One person, item, or event should not quietly appear multiple times in a way that breaks the design.
- Expected counts are not too small. Tiny expected counts can make the approximation shaky.
When expected counts are sparse, researchers often merge categories, collect more data, or switch to another test such as Fisher’s exact test for a 2 × 2 table. This is one reason the chi-square test is often taught with a “check the table before the p-value” mindset.
| Situation | Better Choice | Why |
|---|---|---|
| One categorical variable versus a claimed split | Chi-square goodness-of-fit | Checks whether one set of counts matches a target pattern |
| Two categorical variables in one table | Chi-square test of independence | Checks whether the variables move together |
| Small expected counts in a 2 × 2 table | Fisher’s exact test | Works better when the chi-square approximation is weak |
| Measured values such as height or time | Another test, not chi-square | The chi-square test is built for categorical counts, not continuous measurements |
A Small Worked Example
Say a school cafeteria wants to know whether drink choice is linked to grade band. It records counts for water, juice, and milk across middle school and high school students. The observed table shows that high school students pick water more often than expected, while middle school students pick juice more often than expected.
The test builds expected counts from the row totals and column totals, not from a guess pulled out of thin air. Once those expected counts are in place, the table may show a large enough gap to produce a small p-value. That would suggest drink choice and grade band are not acting independently in this sample.
Notice what the test still does not say. It does not prove age caused the drink choice. It does not tell you whether price, cafeteria layout, or menu timing shaped the pattern. It only says the split in the table is too uneven to shrug off as random noise under the null idea.
Common Misreads That Cause Trouble
Three mistakes show up again and again:
- Using percentages with no counts. The test runs on frequencies.
- Treating a small p-value as proof of cause. Association is not cause.
- Ignoring sparse cells. A shaky table can hand you a shaky answer.
There is also a naming snag. People often say “chi-square test” as if it were one single procedure. In practice, the phrase is a family label. The two most common members are goodness-of-fit and independence, and they answer different questions.
Why This Definition Matters In Practice
Once you grasp the definition, software output starts to feel less cryptic. You stop seeing χ2, degrees of freedom, and p-value as random symbols and start reading them as a story about count data: what you saw, what you expected, and how far apart those two pictures are.
That clarity helps with classwork, research papers, dashboards, A/B tables, and market summaries. When the data are categorical and the cells are healthy, the chi-square test gives a tidy way to check whether a pattern is merely noisy or strong enough to deserve a closer look.
References & Sources
- NIST.“Chi-Square Goodness-of-Fit Test.”Defines the goodness-of-fit version of the test and shows how observed counts are compared with expected counts.
- Penn State STAT 200.“Lesson 11: Chi-Square Tests.”Explains when to use chi-square methods and how to read expected counts in categorical data tables.
- Penn State STAT 200.“11.3 – Chi-Square Test of Independence.”Shows the independence version of the test and the expected-count check used before reading the result.