Forfatter av avsnitt: Danielle J. Navarro and David R. Foxcroft

Statistisk teori: Et forspill

The part on statistical theory is by far the most theoretical, focusing as it does on the theory of statistical inference. Over the next three chapters my goal is to give you an introduction to probability theory (chapter Introduksjon til sannsynlighetsregning), sampling and estimation (chapter Estimering av ukjente størrelser fra et utvalg) and statistical hypothesis testing (chapter Hypotesetesting). Before we get started though, I want to say something about the big picture. Statistical inference is primarily about learning from data. The goal is no longer merely to describe our data but to use the data to draw conclusions about the world. To motivate the discussion I want to spend a bit of time talking about a philosophical puzzle known as the riddle of induction, because it speaks to an issue that will pop up over and over again throughout the book: statistical inference relies on assumptions. This sounds like a bad thing. In everyday life people say things like “you should never make assumptions”, and psychology classes often talk about assumptions and biases as bad things that we should try to avoid. From personal experience I have learned never to say such things around philosophers!

Om grensene for logisk resonnement

Hele krigskunsten består i å få tak i det som befinner seg på den andre siden av åsen, eller med andre ord, å lære det vi ikke vet av det vi vet.

—Arthur Wellesley, 1. hertug av Wellington

This quote came about as a consequence of a carriage ride across the countryside. Wellesley and his companion, J. W. Croker, were playing a guessing game, each trying to predict what would be on the other side of each hill. In every case it turned out that Wellesley was right and Croker was wrong. Many years later when Wellesley was asked about the game he explained that “the whole art of war consists in getting at what is on the other side of the hill”. Indeed, war is not special in this respect. All of life is a guessing game of one form or another, and getting by on a day-to-day basis requires us to make good guesses. So let us play a guessing game of our own.

Suppose you and I are observing the Wellesley-Croker competition and after every three hills you and I have to predict who will win the next one, Wellesley or Croker. Let us say that W refers to a Wellesley victory and C refers to a Croker victory. After three hills, our data set looks like this:

WWW

Samtalen vår går slik:

Three in a row does not mean much. I suppose Wellesley might be better at this than Croker, but it might just be luck. Still, I am a bit of a gambler. I will bet on Wellesley. I agree that three in a row is not informative and I see no reason to prefer Wellesley’s guesses over Croker’s. I can not justify betting at this stage. Sorry. No bet for me.

Your gamble paid off: three more hills go by and Wellesley wins all three. Going into the next round of our game the score is 1 - 0 in favour of you and our data set looks like this:

WWW WWW

I have organised the data into blocks of three so that you can see which batch corresponds to the observations that we had available at each step in our little side game. After seeing this new batch, our conversation continues:

Six wins in a row for Duke Wellesley. This is starting to feel a bit suspicious. I am still not certain, but I reckon that he is going to win the next one too. I guess I do not see that. Sure, I agree that Wellesley has won six in a row, but I do not see any logical reason why that means he will win the seventh one. No bet. Do you really think so? Fair enough, but my bet worked out last time and I am okay with my choice.

For a second time you were right, and for a second time I was wrong. Wellesley wins the next three hills, extending his winning record against Croker to 9 - 0. The data set available to us is now this:

WWW WWW WWW

Og samtalen vår går slik:

Okay, this is pretty obvious. Wellesley is way better at this game. We both agree he is going to win the next hill, right? Is there really any logical evidence for that? Before we started this game, there were lots of possibilities for the first 10 outcomes, and I had no idea which one to expect. WWW WWW WWW W was one possibility, but so was WCC CWC WWC C and WWW WWW WWW C or even CCC CCC CCC C. Because I had no idea what would happen so I would have said they were all equally likely. I assume you would have too, right? I mean, that is what it means to say you have “no idea”, is not it? I suppose so. Well then, the observations we have made logically rule out all possibilities except two: WWW WWW WWW C or WWW WWW WWW W. Both of these are perfectly consistent with the evidence we have encountered so far, are they not? Yes, of course they are. Where are you going with this? So what is changed then? At the start of our game, you would have agreed with me that these are equally plausible and none of the evidence that we have encountered has discriminated between these two possibilities. Therefore, both of these possibilities remain equally plausible and I see no logical reason to prefer one over the other. So yes, while I agree with you that Wellesley’s run of nine wins in a row is remarkable, I can not think of a good reason to think he will win the 10th hill. No bet. I see your point, but I am still willing to chance it. I’m betting on Wellesley.

Wellesley’s winning streak continues for the next three hills. The score in the Wellesley-Croker game is now 12 - 0, and the score in our game is now 3 - 0. As we approach the fourth round of our game, our data set is this:

WWW WWW WWW WWW

And the conversation continues:

Oh yeah! Three more wins for Wellesley and another victory for me. Admit it, I was right about him! I guess we are both betting on Wellesley this time around, right? I do not know what to think. I feel like we are in the same situation we were in last round, and nothing much has changed. There are only two legitimate possibilities for a sequence of 13 hills that have not already been ruled out, WWW WWW WWW WWW C and WWW WWW WWW WWW W. It is just like I said last time. If all possible outcomes were equally sensible before the game started, should not these two be equally sensible now given that our observations do not rule out either one? I agree that it feels like Wellesley is on an amazing winning streak, but where is the logical evidence that the streak will continue? I think you are being unreasonable. Why not take a look at our scorecard, if you need evidence? You are the expert on statistics and you have been using this fancy logical analysis, but the fact is you are losing. I am just relying on common sense and I am winning. Maybe you should switch strategies. Hmm, that is a good point and I do not want to lose the game, but I am afraid I do not see any logical evidence that your strategy is better than mine. It seems to me that if there were someone else watching our game, what they would have observed is a run of three wins to you. Their data would look like this: YYY. Logically, I do not see that this is any different to our first round of watching Wellesley and Croker. Three wins to you does not seem like a lot of evidence, and I see no reason to think that your strategy is working out any better than mine. If I did not think that WWW was good evidence then for Wellesley being better than Croker at their game, surely I have no reason now to think that YYY is good evidence that you are better at ours? Okay, now I think you are being a jerk. I do not see the logical evidence for that.

Det er en myte at en kan lære uten å gjøre antakelser

There are lots of different ways in which we could dissect this dialogue, but since this is a statistics book pitched at psychologists and not an introduction to the philosophy and psychology of reasoning, I will keep it brief. What I have described above is sometimes referred to as the riddle of induction. It seems entirely reasonable to think that a 12 - 0 winning record by Wellesley is pretty strong evidence that he will win the 13th game, but it is not easy to provide a proper logical justification for this belief. On the contrary, despite the obviousness of the answer, it is not actually possible to justify betting on Wellesley without relying on some assumption that you do not have any logical justification for.

Induksjonsgåten er mest forbundet med David Humes filosofiske arbeid og i nyere tid Nelson Goodmans, men du kan finne eksempler på problemet på så forskjellige felt som litteratur (Lewis Carroll) og maskinlæring (teoremet om «ingen gratis lunsj»). Det er virkelig noe merkelig med å prøve å «lære det vi ikke vet av det vi vet». Det kritiske poenget er at antakelser og fordommer er uunngåelige hvis du vil lære noe som helst om verden. Det er umulig å komme utenom dette, og det gjelder like mye for statistisk inferens som for menneskelig resonnering. I dialogen var jeg ute etter dine helt fornuftige slutninger som menneske, men den fornuften du baserte deg på, er ikke noe annet enn det en statistiker ville ha gjort. Din «sunne fornuft» i dialogen baserte seg på en implisitt antakelse om at det eksisterer en viss forskjell i ferdigheter mellom Wellesley og Croker, og det du gjorde var å prøve å finne ut hva denne forskjellen i ferdighetsnivå ville være. Min «logiske analyse» avviser den antakelsen fullstendig. Alt jeg var villig til å akseptere, var at det finnes sekvenser av seire og tap, og at jeg ikke visste hvilke sekvenser som ville bli observert. Gjennom hele dialogen insisterte jeg på at alle logisk mulige datasett var like sannsynlige ved starten av Wellesely-Croker-spillet, og den eneste måten jeg noen gang reviderte mine oppfatninger på, var å eliminere de mulighetene som ikke stemte overens med observasjonene.

Det høres jo fornuftig ut i seg selv. Det høres til og med ut som kjennetegnet på et godt deduktivt resonnement. I likhet med Sherlock Holmes gikk jeg frem på samme måte for å utelukke det umulige i håp om at det som ville stå igjen, var sannheten. Men som vi så, førte det å utelukke det umulige aldri til at jeg kom med en prediksjon. I og for seg var alt jeg sa i min del av dialogen helt korrekt. Manglende evne til å komme med spådommer er den logiske konsekvensen av «ingen antakelser». Til slutt tapte jeg kampen fordi du gjorde noen antakelser, og disse antakelsene viste seg å være riktige. Ferdighet er en reell ting, og fordi du trodde på at det fantes ferdigheter, var du i stand til å lære at Wellesley hadde mer av det enn Croker. Hadde du basert deg på en mindre fornuftig antakelse for å lære, hadde du kanskje ikke vunnet kampen.

Ultimately there are two things you should take away from this. First, as I have said, you cannot avoid making assumptions if you want to learn anything from your data. But second, once you realise that assumptions are necessary it becomes important to make sure you make the right ones! A data analysis that relies on few assumptions is not necessarily better than one that makes many assumptions, it all depends on whether those assumptions are good ones for your data. As we go through the rest of this book I will often point out the assumptions that underpin a particular statistical technique, and how you can check whether those assumptions are sensible.