Calculating Chi-Square | Get The Numbers Right

A chi-square value comes from summing (Observed − Expected)² ÷ Expected across all cells, then pairing it with degrees of freedom to read a p-value.

Chi-square shows up any time you have counts and want to test whether a pattern looks like chance or like a real relationship. It’s used for contingency tables (two categorical variables), goodness-of-fit (one categorical variable vs a target split), and quick checks in QA logs.

This article walks through the math by hand, the setup that keeps results trustworthy, and the shortcuts that make sense once you know what the software is doing.

What Chi-Square Measures In Plain Terms

Chi-square starts with a simple idea: compare what you saw to what you would expect if “nothing special” was happening. If the gaps are small across the table, the chi-square value stays small. If the gaps stack up, the value grows.

Two pieces travel with the chi-square statistic:

Degrees of freedom (df): the table’s flexibility. For an r×c table, df = (r−1)×(c−1).
P-value: how often a chi-square value at least this large would appear if the null idea were true.

Chi-square does not tell you direction or practical impact. It answers a narrower question: are the deviations from “expected under the null” larger than what random sampling tends to create?

When Chi-Square Is The Right Tool

Use chi-square when your data are counts in categories, with each observation landing in one category per variable. Common setups:

Test of independence: “Is variable A related to variable B?” using an r×c contingency table.
Goodness-of-fit: “Do observed counts match a target split?” using one row of counts against expected proportions.

Avoid chi-square when entries are not counts (like averages), when the same person can land in multiple categories for the same variable, or when you are working with paired data that need a matched test.

Calculating Chi-Square For Real Tables

Here’s the standard workflow for an independence test. You can follow the same structure for goodness-of-fit with small edits.

Step 1: Build The Observed Table

Make a table of raw counts. Each cell is how many times that row category and column category occurred together.

Step 2: Compute Row Totals, Column Totals, And The Grand Total

These totals are the inputs for expected counts. Keep them visible; you will reuse them.

Step 3: Compute Expected Counts

For each cell, expected = (row total × column total) ÷ grand total. This is the “independence” prediction: the row share times the column share.

Step 4: Compute Each Cell’s Contribution

For each cell, compute (O − E)² ÷ E, where O is observed and E is expected. This value is always ≥ 0.

Step 5: Sum Contributions To Get χ²

Add every cell contribution. That sum is the chi-square statistic, written χ².

Step 6: Compute Degrees Of Freedom

For an r×c table, df = (r−1)×(c−1). For goodness-of-fit with k categories, df = k−1 (then subtract any parameters you estimated from the data).

Step 7: Get The P-Value

Use χ² and df with a chi-square distribution table or a stats function to compute the p-value. NIST’s overview of the chi-square distribution gives the core definition and context.

Data Checks That Prevent Bad Results

Most chi-square mistakes come from setup, not arithmetic. Run these checks before you trust the output.

Counts Only, One Slot Per Observation

Each person, item, or event should contribute to one cell per table. If your survey lets someone pick multiple answers, either restructure the question or treat each option as its own yes/no table.

Independence Of Observations

Repeated measures break the test. If the same customer shows up in the table many times, the chi-square value can look larger just because entries are tied together.

Expected Counts Should Not Be Tiny

A widely used rule of thumb is that expected counts in cells should be at least 5. SciPy’s documentation notes this common guideline and also shows options like a continuity correction for 2×2 tables. SciPy chi2_contingency

If you have small expected counts, you can merge sparse categories, collect more data, or switch to an exact test suited to your table size.

Missing Data Needs A Clear Rule

Don’t “half count” missing values. Either drop rows with missing categories for the variables in the table, or add an explicit “Unknown” category and treat it as a real level.

Worked Example With A Full Hand Calculation

Example: you tracked two traffic sources (Search, Social, Email) and two outcomes (Signup, No signup) for a week. Your observed table is:

Search: 40 signup, 60 no signup
Social: 30 signup, 70 no signup
Email: 50 signup, 50 no signup

Totals: each row total is 100, grand total is 300. Column totals are 120 signup and 180 no signup.

Expected Counts

For Search×Signup: E = (100×120)÷300 = 40. For Search×No signup: E = (100×180)÷300 = 60. By symmetry, the expected row for each source is 40 and 60.

Cell Contributions

Search matches expected exactly, so its two contributions are 0. Social has O=30 vs E=40 for signup: (30−40)²÷40 = 100÷40 = 2.5. Social no signup is O=70 vs E=60: (70−60)²÷60 = 100÷60 ≈ 1.6667. Email signup is O=50 vs E=40: 100÷40 = 2.5. Email no signup is O=50 vs E=60: 100÷60 ≈ 1.6667.

Sum: χ² = 2.5 + 1.6667 + 2.5 + 1.6667 = 8.3334 (rounding at the end is fine).

Degrees Of Freedom And P-Value

This is a 3×2 table, so df = (3−1)×(2−1) = 2. With df=2, χ²≈8.33 gives a p-value a bit below 0.02 in most tables or calculators.

That result says the observed split between signup and no signup differs across sources more than random sampling usually produces under the “no relationship” idea.

Table Of Common Choices And What They Change

Use this table as a checklist when you’re setting up a chi-square test. It flags decisions that change results, plus the clean default for each one.

Decision Point	Good Default	What To Watch
Test type	Independence for two variables	Use goodness-of-fit only for one variable vs target proportions
Table entries	Raw counts	No percentages, averages, or weighted scores
Expected count size	Most cells ≥ 5	Merge sparse levels or use an exact method when many cells are small
Degrees of freedom	(r−1)×(c−1)	Adjust df in goodness-of-fit when parameters are estimated from data
2×2 continuity correction	Off unless required	Yates correction lowers χ² a bit; many tools enable it by default
P-value source	Software or a trusted table	Be consistent about one-sided vs two-sided conventions in your tool
Effect size report	Cramér’s V	Report with sample size; a tiny V can still produce a small p-value in big samples
Post-hoc follow-up	Standardized residuals	Control for multiple comparisons if you test many cells

Reading The Output Without Getting Tricked

Software prints χ², df, and a p-value. That’s the easy part. The harder part is telling a useful story that stays honest.

Start With The Table, Not The P-Value

Look at row percentages. Ask where the biggest gaps are. If the table barely changes from row to row, even a small p-value may not matter in practice, especially with a large sample.

Use An Effect Size When You Can

Cramér’s V is a common pick for independence tests. It scales from 0 to 1 and helps you compare results across datasets with different sample sizes. You can compute it from χ², the total n, and the smaller of (r−1) or (c−1).

Know What A Small Expected Count Does

Cells with tiny expected counts can inflate χ². If you can’t merge categories, switch tools. OpenStax notes the idea of combining categories when expected values are below five in classroom data collection. OpenStax test of independence

Shortcuts In Excel, Google Sheets, And Python

Once you’ve done one hand calculation, a spreadsheet feels less like a black box. Here are clean ways to compute χ² and the p-value.

Spreadsheet Setup

Create your observed table.
Add row totals and column totals.
Compute expected with (row total * column total) / grand total.
Compute contributions with (O−E)^2 / E and sum them.

Many spreadsheets also offer a direct chi-square test function. Still, building the expected table once helps you spot category problems early.

Python With SciPy

In Python, you can pass your observed array to SciPy’s chi-square contingency test and get χ², p-value, df, and expected counts in one call. The SciPy docs list outputs and options such as Yates correction and resampling-based p-values. SciPy chi2_contingency outputs

Table Of Common Reporting Lines

If you write up results for a report, you usually need one sentence with the test, degrees of freedom, sample size, and p-value, plus a quick description of the pattern. Here are templates you can adapt.

Use Case	Template Sentence	Add-On
Independence test	χ²(df, n)=value, p=value; outcome rates differ across groups.	Add Cramér’s V for scale.
Goodness-of-fit	χ²(df, n)=value, p=value; observed counts differ from the target split.	Name the target proportions.
2×2 table	χ²(df, n)=value, p=value with correction on/off.	State if Yates correction was used.
Sparse categories	χ²(df, n)=value, p=value after merging levels.	List the merged levels.
Large tables	χ²(df, n)=value, p=value; biggest gaps were in cells X and Y.	Show residuals or a heatmap in the appendix.

Goodness-Of-Fit: The One-Variable Version

Goodness-of-fit tests one categorical variable against a target split. The math is the same sum of (O−E)²÷E, with expected counts built from target proportions. Degrees of freedom is the number of categories minus one, then minus any parameters estimated from the sample.

If you’re matching a published standard, link the standard in your own write-up, then copy the proportions into your expected counts. NIST also provides a chi-square test reference that shows the test framed around contingency tables and reporting.

Checklist You Can Use Before You Hit Run

My table uses counts, not rates.
Each observation lands in one cell per variable.
Most expected counts are at least 5, or I have merged sparse levels.
I wrote df correctly for the table shape.
I will report the pattern in the table, not just a p-value.

References & Sources

NIST/SEMATECH e-Handbook.“Chi-Square Distribution.”Defines the chi-square distribution used to map χ² and df to a p-value.
SciPy.“chi2_contingency.”Documents χ² computation, df, p-value, and expected frequencies for contingency tables.
OpenStax.“Test of Independence.”Explains contingency tables, expected counts, and the cell-size rule of thumb.
NIST DataPlot Reference Manual.“CHI-SQUARE INDEPENDENCE TEST (LET).”Reference description of a chi-square independence test and its inputs.