Section author: Danielle J. Navarro and David R. Foxcroft


The key ideas discussed in this chapter are:

  • The χ² (chi-square) goodness-of-fit test is used when you have a table of observed frequencies of different categories, and the null hypothesis gives you a set of “known” probabilities to compare them to.
  • The χ² (chi-square) test of independence is used when you have a contingency table (cross-tabulation) of two categorical variables. The null hypothesis is that there is no relationship or association between the variables.
  • Effect size for a contingency table can be measured in several ways. In particular we noted the Cramér’s V statistic.
  • Both versions of the Pearson test rely on two assumptions: that the expected frequencies are sufficiently large, and that the observations are independent. The Fisher exact test can be used when the expected frequencies are small. The McNemar test can be used for some kinds of violations of independence.

If you’re interested in learning more about categorical data analysis a good first choice would be Agresti (2018) which, as the title suggests, provides an Introduction to Categorical Data Analysis. If the introductory book isn’t enough for you (or can’t solve the problem you’re working on) you could consider Agresti (2012), Categorical Data Analysis. The latter is a more advanced text, so it’s probably not wise to jump straight from this book to that one.