*Section author: Danielle J. Navarro and David R. Foxcroft*

# Effect size¶

The most commonly used measure of effect size for a *t*-test is
**Cohen’s d** (Cohen, 1988).
It’s a very simple measure in principle, with quite a few wrinkles when
you start digging into the details. Cohen himself defined it primarily
in the context of an independent samples *t*-test, specifically the
Student test. In that context, a natural way of defining the effect size
is to divide the difference between the means by an estimate of the
standard deviation. In other words, we’re looking to calculate
*something* along the lines of this:

and he suggested a rough guide for interpreting d in Table 12.
You’d think that this would
be pretty unambiguous, but it’s not. This is largely because Cohen
wasn’t too specific on what he thought should be used as the measure of
the standard deviation (in his defence he was trying to make a broader
point in his book, not nitpick about tiny details). As discussed by
McGrath and Meyer (2006), there are
several different versions in common usage, and each author tends to
adopt slightly different notation. For the sake of simplicity (as opposed
to accuracy), I’ll use d to refer to any statistic that you calculate from
the sample, and use δ to refer to a theoretical population effect. Obviously,
that does mean that there are several different things all called *d*.

My suspicion is that the only time that you would want Cohen’s *d*
is when you’re running a *t*-test, and jamovi has an option to
calculate the effect size for all the different flavours of
*t*-test it provides.

d-value | rough interpretation |
---|---|

about 0.2 | “small” effect |

about 0.5 | “moderate” effect |

about 0.8 | “large” effect |

## Cohen’s *d* from one sample¶

The simplest situation to consider is the one corresponding to a one-sample
*t*-test. In this case, this is the one sample mean *X̄* and one (hypothesised)
population mean *µ*_{o} to compare it to. Not only that, there’s really
only one sensible way to estimate the population standard deviation. We just
use our usual estimate \(\hat{\sigma}\). Therefore, we end up with the
following as the only way to calculate d:

When we look back at the results in Fig. 87, the effect size
value is Cohen’s *d* = 0.50. Overall, then, the psychology students in Dr
Zeppo’s class are achieving grades (mean = 72.3%) that are about 0.5 standard
deviations higher than the level that you’d expect (67.5%) if they were
performing at the same level as other students. Judged against Cohen’s rough
guide, this is a moderate effect size.

## Cohen’s *d* from a Student’s *t*-test¶

The majority of discussions of Cohen’s *d* focus on a situation that is
analogous to Student’s independent samples t test, and it’s in this context
that the story becomes messier, since there are several different versions of
*d* that you might want to use in this situation. To understand why there are
multiple versions of *d*, it helps to take the time to write down a formula
that corresponds to the true population effect size δ. It’s pretty
straightforward, δ = (*µ*_sub:1 - *µ*_sub:2) /

where, as usual, *µ*_sub:1 and *µ*_sub:2 are the population
means corresponding to group 1 and group 2 respectively, and
*σ* is the standard deviation (the same for both
populations). The obvious way to estimate δ is to do
exactly the same thing that we did in the *t*-test itself, i.e.,
use the sample means as the top line and a pooled standard deviation
estimate for the bottom line

where \(\hat\sigma_p\) is the exact same pooled standard deviation
measure that appears in the *t*-test. This is the most commonly
used version of Cohen’s *d* when applied to the outcome of a
Student *t*-test, and is the one provided in jamovi. It is
sometimes referred to as Hedges’ *g* statistic (Hedges, 1981).

However, there are other possibilities which I’ll briefly describe.
Firstly, you may have reason to want to use only one of the two groups
as the basis for calculating the standard deviation. This approach
(often called Glass’ *Δ*, pronounced *delta*) only makes most
sense when you have good reason to treat one of the two groups as a
purer reflection of “natural variation” than the other. This can happen
if, for instance, one of the two groups is a control group. Secondly,
recall that in the usual calculation of the pooled standard deviation we
divide by *N* - 2 to correct for the bias in the sample variance. In
one version of Cohen’s *d* this correction is omitted, and instead
we divide by *N*. This version makes sense primarily when you’re
trying to calculate the effect size in the sample rather than estimating
an effect size in the population. Finally, there is a version based on
Hedges and Olkin (1985), who point out
there is a small bias in the usual (pooled) estimation for Cohen’s *d*.
Thus they introduce a small correction by multiplying the usual value of
*d* by (*N* - 3) / (*N* - 2.25).

In any case, ignoring all those variations that you could make use of if you
wanted, let’s have a look at the default version in jamovi. In
Fig. 91 Cohen’s *d* = 0.74, indicating that the grade scores
for students in Anastasia’s class are, on average, 0.74 standard deviations
higher than the grade scores for students in Bernadette’s class. For a
Welch-test, the estimated effect size is the same (Fig. 93).

## Cohen’s *d* from a paired-samples test¶

Finally, what should we do for a paired samples *t*-test? In this
case, the answer depends on what it is you’re trying to do. jamovi
assumes that you want to measure your effect sizes relative to the
distribution of difference scores, and the measure of d that you
calculate is:

where \(\hat{\sigma}_D\) is the estimate of the standard deviation
of the differences. In Fig. 97 Cohen’s *d* = 1.45,
indicating that the time 2 grade scores are, on average, 1.45 standard
deviations higher than the time 1 grade scores.

This is the version of Cohen’s *d* that gets reported by the
jamovi `Paired Samples T-Test`

analysis. The only wrinkle is figuring
out whether this is the measure you want or not. To the extent that you
care about the practical consequences of your research, you often want
to measure the effect size relative to the *original* variables, not the
*difference* scores (e.g., the 1% improvement in Dr Chico’s class over
time is pretty small when measured against the amount of between-student
variation in grades), in which case you use the same versions of Cohen’s
d that you would use for a Student or Welch test. It’s not so
straightforward to do this in jamovi; essentially you have to change the
structure of the data in the spreadsheet view so I won’t go into that
here,[1] but the Cohen’s *d* for this perspective is quite different:
it is 0.22 which is quite small when assessed on the scale of the
original variables.

[1] | If you are interested, you can look at how this was done in the `chico2`
dataset |