A chi-square value comes from summing (Observed − Expected)² ÷ Expected across all cells, then pairing it with degrees of freedom to read a p-value.
Chi-square shows up any time you have counts and want to test whether a pattern looks like chance or like a real relationship. It’s used for contingency tables (two categorical variables), goodness-of-fit (one categorical variable vs a target split), and quick checks in QA logs.
This article walks through the math by hand, the setup that keeps results trustworthy, and the shortcuts that make sense once you know what the software is doing.
What Chi-Square Measures In Plain Terms
Chi-square starts with a simple idea: compare what you saw to what you would expect if “nothing special” was happening. If the gaps are small across the table, the chi-square value stays small. If the gaps stack up, the value grows.
Two pieces travel with the chi-square statistic:
- Degrees of freedom (df): the table’s flexibility. For an r×c table, df = (r−1)×(c−1).
- P-value: how often a chi-square value at least this large would appear if the null idea were true.
Chi-square does not tell you direction or practical impact. It answers a narrower question: are the deviations from “expected under the null” larger than what random sampling tends to create?
When Chi-Square Is The Right Tool
Use chi-square when your data are counts in categories, with each observation landing in one category per variable. Common setups:
- Test of independence: “Is variable A related to variable B?” using an r×c contingency table.
- Goodness-of-fit: “Do observed counts match a target split?” using one row of counts against expected proportions.
Avoid chi-square when entries are not counts (like averages), when the same person can land in multiple categories for the same variable, or when you are working with paired data that need a matched test.
Calculating Chi-Square For Real Tables
Here’s the standard workflow for an independence test. You can follow the same structure for goodness-of-fit with small edits.
Step 1: Build The Observed Table
Make a table of raw counts. Each cell is how many times that row category and column category occurred together.
Step 2: Compute Row Totals, Column Totals, And The Grand Total
These totals are the inputs for expected counts. Keep them visible; you will reuse them.
Step 3: Compute Expected Counts
For each cell, expected = (row total × column total) ÷ grand total. This is the “independence” prediction: the row share times the column share.
Step 4: Compute Each Cell’s Contribution
For each cell, compute (O − E)² ÷ E, where O is observed and E is expected. This value is always ≥ 0.
Step 5: Sum Contributions To Get χ²
Add every cell contribution. That sum is the chi-square statistic, written χ².
Step 6: Compute Degrees Of Freedom
For an r×c table, df = (r−1)×(c−1). For goodness-of-fit with k categories, df = k−1 (then subtract any parameters you estimated from the data).
Step 7: Get The P-Value
Use χ² and df with a chi-square distribution table or a stats function to compute the p-value. NIST’s overview of the chi-square distribution gives the core definition and context.
Data Checks That Prevent Bad Results
Most chi-square mistakes come from setup, not arithmetic. Run these checks before you trust the output.
Counts Only, One Slot Per Observation
Each person, item, or event should contribute to one cell per table. If your survey lets someone pick multiple answers, either restructure the question or treat each option as its own yes/no table.
Independence Of Observations
Repeated measures break the test. If the same customer shows up in the table many times, the chi-square value can look larger just because entries are tied together.
Expected Counts Should Not Be Tiny
A widely used rule of thumb is that expected counts in cells should be at least 5. SciPy’s documentation notes this common guideline and also shows options like a continuity correction for 2×2 tables. SciPy chi2_contingency
If you have small expected counts, you can merge sparse categories, collect more data, or switch to an exact test suited to your table size.
Missing Data Needs A Clear Rule
Don’t “half count” missing values. Either drop rows with missing categories for the variables in the table, or add an explicit “Unknown” category and treat it as a real level.
Worked Example With A Full Hand Calculation
Example: you tracked two traffic sources (Search, Social, Email) and two outcomes (Signup, No signup) for a week. Your observed table is:
- Search: 40 signup, 60 no signup
- Social: 30 signup, 70 no signup
- Email: 50 signup, 50 no signup
Totals: each row total is 100, grand total is 300. Column totals are 120 signup and 180 no signup.
Expected Counts
For Search×Signup: E = (100×120)÷300 = 40. For Search×No signup: E = (100×180)÷300 = 60. By symmetry, the expected row for each source is 40 and 60.
Cell Contributions
Search matches expected exactly, so its two contributions are 0. Social has O=30 vs E=40 for signup: (30−40)²÷40 = 100÷40 = 2.5. Social no signup is O=70 vs E=60: (70−60)²÷60 = 100÷60 ≈ 1.6667. Email signup is O=50 vs E=40: 100÷40 = 2.5. Email no signup is O=50 vs E=60: 100÷60 ≈ 1.6667.
Sum: χ² = 2.5 + 1.6667 + 2.5 + 1.6667 = 8.3334 (rounding at the end is fine).
Degrees Of Freedom And P-Value
This is a 3×2 table, so df = (3−1)×(2−1) = 2. With df=2, χ²≈8.33 gives a p-value a bit below 0.02 in most tables or calculators.
That result says the observed split between signup and no signup differs across sources more than random sampling usually produces under the “no relationship” idea.
Table Of Common Choices And What They Change
Use this table as a checklist when you’re setting up a chi-square test. It flags decisions that change results, plus the clean default for each one.
| Decision Point | Good Default | What To Watch |
|---|---|---|
| Test type | Independence for two variables | Use goodness-of-fit only for one variable vs target proportions |
| Table entries | Raw counts | No percentages, averages, or weighted scores |
| Expected count size | Most cells ≥ 5 | Merge sparse levels or use an exact method when many cells are small |
| Degrees of freedom | (r−1)×(c−1) | Adjust df in goodness-of-fit when parameters are estimated from data |
| 2×2 continuity correction | Off unless required | Yates correction lowers χ² a bit; many tools enable it by default |
| P-value source | Software or a trusted table | Be consistent about one-sided vs two-sided conventions in your tool |
| Effect size report | Cramér’s V | Report with sample size; a tiny V can still produce a small p-value in big samples |
| Post-hoc follow-up | Standardized residuals | Control for multiple comparisons if you test many cells |
Reading The Output Without Getting Tricked
Software prints χ², df, and a p-value. That’s the easy part. The harder part is telling a useful story that stays honest.
Start With The Table, Not The P-Value
Look at row percentages. Ask where the biggest gaps are. If the table barely changes from row to row, even a small p-value may not matter in practice, especially with a large sample.
Use An Effect Size When You Can
Cramér’s V is a common pick for independence tests. It scales from 0 to 1 and helps you compare results across datasets with different sample sizes. You can compute it from χ², the total n, and the smaller of (r−1) or (c−1).
Know What A Small Expected Count Does
Cells with tiny expected counts can inflate χ². If you can’t merge categories, switch tools. OpenStax notes the idea of combining categories when expected values are below five in classroom data collection. OpenStax test of independence
Shortcuts In Excel, Google Sheets, And Python
Once you’ve done one hand calculation, a spreadsheet feels less like a black box. Here are clean ways to compute χ² and the p-value.
Spreadsheet Setup
- Create your observed table.
- Add row totals and column totals.
- Compute expected with (row total * column total) / grand total.
- Compute contributions with (O−E)^2 / E and sum them.
Many spreadsheets also offer a direct chi-square test function. Still, building the expected table once helps you spot category problems early.
Python With SciPy
In Python, you can pass your observed array to SciPy’s chi-square contingency test and get χ², p-value, df, and expected counts in one call. The SciPy docs list outputs and options such as Yates correction and resampling-based p-values. SciPy chi2_contingency outputs
Table Of Common Reporting Lines
If you write up results for a report, you usually need one sentence with the test, degrees of freedom, sample size, and p-value, plus a quick description of the pattern. Here are templates you can adapt.
| Use Case | Template Sentence | Add-On |
|---|---|---|
| Independence test | χ²(df, n)=value, p=value; outcome rates differ across groups. | Add Cramér’s V for scale. |
| Goodness-of-fit | χ²(df, n)=value, p=value; observed counts differ from the target split. | Name the target proportions. |
| 2×2 table | χ²(df, n)=value, p=value with correction on/off. | State if Yates correction was used. |
| Sparse categories | χ²(df, n)=value, p=value after merging levels. | List the merged levels. |
| Large tables | χ²(df, n)=value, p=value; biggest gaps were in cells X and Y. | Show residuals or a heatmap in the appendix. |
Goodness-Of-Fit: The One-Variable Version
Goodness-of-fit tests one categorical variable against a target split. The math is the same sum of (O−E)²÷E, with expected counts built from target proportions. Degrees of freedom is the number of categories minus one, then minus any parameters estimated from the sample.
If you’re matching a published standard, link the standard in your own write-up, then copy the proportions into your expected counts. NIST also provides a chi-square test reference that shows the test framed around contingency tables and reporting.
Checklist You Can Use Before You Hit Run
- My table uses counts, not rates.
- Each observation lands in one cell per variable.
- Most expected counts are at least 5, or I have merged sparse levels.
- I wrote df correctly for the table shape.
- I will report the pattern in the table, not just a p-value.
References & Sources
- NIST/SEMATECH e-Handbook.“Chi-Square Distribution.”Defines the chi-square distribution used to map χ² and df to a p-value.
- SciPy.“chi2_contingency.”Documents χ² computation, df, p-value, and expected frequencies for contingency tables.
- OpenStax.“Test of Independence.”Explains contingency tables, expected counts, and the cell-size rule of thumb.
- NIST DataPlot Reference Manual.“CHI-SQUARE INDEPENDENCE TEST (LET).”Reference description of a chi-square independence test and its inputs.