Forfatter av avsnitt: Danielle J. Navarro and David R. Foxcroft

Post-hoc-tester

Time to switch to a different topic. Rather than pre-planned comparisons that you have tested using contrasts, let us suppose you have done your ANOVA and it turns out that you obtained some significant effects. Because of the fact that the F-tests are “omnibus” tests that only really test the null hypothesis that there are no differences among groups, obtaining a significant effect does not tell you which groups are different to which other ones. We discussed this issue back in chapter Sammenligning av flere gjennomsnitt (enveis ANOVA), and in that chapter our solution was to run t-tests for all possible pairs of groups, making corrections for multiple comparisons (e.g., Bonferroni, Holm) to control the Type I error rate across all comparisons. The methods that we used back there have the advantage of being relatively simple and being the kind of tools that you can use in a lot of different situations where you are testing multiple hypotheses, but they are not necessarily the best choices if you are interested in doing efficient post-hoc testing in an ANOVA context. There are actually quite a lot of different methods for performing multiple comparisons in the statistics literature (Hsu, 1996), and it would be beyond the scope of an introductory text like this one to discuss all of them in any detail.

That being said, there is one tool that I do want to draw your attention to, namely Tukey’s “Honestly Significant Difference”, or Tukey’s HSD for short. For once, I will spare you the formulas and just stick to the qualitative ideas. The basic idea in Tukey’s HSD is to examine all relevant pairwise comparisons between groups, and it is only really appropriate to use Tukey’s HSD if it is pairwise differences that you are interested in.[1] For instance, earlier we conducted a factorial ANOVA using the clinicaltrial data set, and where we specified a main effect for drug and a main effect of therapy we would be interested in the following four comparisons:

Humørforbedring (mood.gain) for personer som fikk anxifree versus personer som fikk placebo.
Humørforbedring (mood.gain) for personer som fikk joyzepam versus personer som fikk placebo.
Humørforbedring (mood.gain) for personer som fikk anxifree versus personer som fikk joyzepam.
Humørforbedring (mood.gain) for personer som ble behandlet med CBT og personer som ikke fikk ingen terapi (no.therapy).

For any one of these comparisons, we are interested in the true difference between (population) group means. Tukey’s HSD constructs simultaneous confidence intervals for all four of these comparisons. What we mean by 95% “simultaneous” confidence interval is that, if we were to repeat this study many times, then in 95% of the study results the confidence intervals would contain the relevant true value. Moreover, we can use these confidence intervals to calculate an adjusted p-value for any specific comparison.

The TukeyHSD function in jamovi is pretty easy to use. You simply specify the ANOVA model term that you want to run the post-hoc tests for. For example, if we were looking to run post-hoc tests for the main effects but not the interaction, we would open up the drop-down menu Post Hoc Tests in the ANOVA option panel, move the drug and therapy variables across to the box on the right, and then select the Tukey checkbox in the list of possible post-hoc corrections that could be applied. This, along with the corresponding results table, is shown in Fig. 189.

Analysepanel for å sette opp post-hoc-tester — Fig. 189 Analysepanel for oppsett av post-hoc-tester for faktoriell ANOVA i jamovi (de nåværende innstillingene ber om en Tukey HSD-statistikk): Umettet (*unsaturated*) modell med faktorene `drug` og `therapy`, men uten en interaksjonsterm (ved bruk av datasettet `clinicaltrial`)

Resultatene som vises i resultattabellen Post Hoc Tests er (håper jeg) ganske enkle. Den første sammenligningen er for eksempel forskjellen mellom Anxifree og placebo, og den første delen av utgaven viser at den observerte differansen i gruppegjennomsnitt er 0,27. Det neste tallet er standardfeilen for differansen. Deretter følger en kolonne med frihetsgrader, en kolonne med t-verdien og til slutt en kolonne med p-verdien. For den første sammenligningen er den justerte p-verdien 0,21. Hvis du derimot ser på neste linje, ser vi at den observerte forskjellen mellom joyzepam og placebo er 1,03, og dette resultatet er signifikant (p < 0,001).

So far, so good. What about the situation where your model includes interaction terms? For instance, the default option in jamovi is to allow for the possibility that there is an interaction between drug and therapy. If that is the case, the number of pairwise comparisons that we need to consider starts to increase. As before, we need to consider the three comparisons that are relevant to the main effect of drug and the one comparison that is relevant to the main effect of therapy. But, if we want to consider the possibility of a significant interaction (and try to find the group differences that underpin that significant interaction), we need to include comparisons such as the following:

Forskjellen i mood.gain for personer som fikk anxifree og ble behandlet med CBT, sammenlignet med personer som fikk placebo og ble behandlet med CBT
Forskjellen i mood.gain for personer som fikk anxifree og no.therapy, sammenlignet med personer som fikk placebo og no.therapy.
osv.

There are quite a lot of these comparisons that you need to consider. So, when we run the Tukey post-hoc analysis for this ANOVA model, we see that it has made a lot of pairwise comparisons (19 in total), as shown in Fig. 190. You can see that it looks pretty similar to before, but with a lot more comparisons made.