Forfatter av avsnitt: Danielle J. Navarro and David R. Foxcroft

Hva er forskjellen på sannsynlighet og statistikk?

Before we start talking about probability theory, it is helpful to spend a moment thinking about the relationship between probability and statistics. The two disciplines are closely related but they are not identical. Probability theory is “the doctrine of chances”. It is a branch of mathematics that tells you how often different kinds of events will happen. For example, all of these questions are things you can answer using probability theory:

Hvor stor er sjansen for at en rettferdig mynt slår kron 10 ganger på rad?
If I roll a six sided dice twice, how likely is it that I will roll two sixes?
Hvor sannsynlig er det at fem kort som trekkes fra en perfekt blandet kortstokk, alle er hjerter?
What are the chances that I will win the lottery?

Notice that all of these questions have something in common. In each case the truth of the world” is known and my question relates to “what kind of events” will happen. In the first question I know that the coin is fair so there is a 50% chance that any individual coin flip will come up heads. In the second question I know that the chance of rolling a 6 on a single dice is 1 in 6. In the third question I know that the deck is shuffled properly. And in the fourth question I know that the lottery follows specific rules. You get the idea. The critical point is that probabilistic questions start with a known model of the world, and we use that model to do some calculations. The underlying model can be quite simple. For instance, in the coin flipping example we can write down the model like this:

P(hoder) = 0,5

which you can read as “the probability of heads is 0.5”. As we will see later, in the same way that percentages are numbers that range from 0% to 100%, probabilities are just numbers that range from 0 to 1. When using this probability model to answer the first question I do not actually know exactly what is going to happen. Maybe I will get 10 heads, like the question says. But maybe I will get three heads. That is the key thing. In probability theory the model is known but the data are not.

So that is probability. What about statistics? Statistical questions work the other way around. In statistics we do not know the truth about the world. All we have is the data and it is from the data that we want to learn the truth about the world. Statistical questions tend to look more like these:

Hvis vennen min kaster en mynt 10 ganger og får 10 kron, spiller han meg et puss?
Hvis fem kort fra toppen av kortstokken er hjerter, hvor sannsynlig er det da at kortstokken ble blandet?
Hvis lotterikommisjonærens ektefelle vinner i lotteriet, hvor sannsynlig er det da at lotteriet var rigget?

Denne gangen har vi bare data. Det jeg vet er at jeg så vennen min kaste mynten 10 ganger, og at den ble krone hver gang. Og det jeg ønsker å avgjøre, er om jeg bør konkludere med at det jeg nettopp så, faktisk var en rettferdig mynt som ble kastet 10 ganger på rad, eller om jeg bør mistenke at vennen min spiller meg et puss. Dataene jeg har ser slik ut:

H H H H H H H H H H H

and what I am trying to do is work out which “model of the world” I should put my trust in. If the coin is fair then the model I should adopt is one that says that the probability of heads is 0.5, that is P(heads) = 0.5. If the coin is not fair then I should conclude that the probability of heads is not 0.5, which we would write as P(heads) ≠ 0.5. In other words, the statistical inference problem is to figure out which of these probability models is right. Clearly, the statistical question is not the same as the probability question, but they are deeply connected to one another. Because of this, a good introduction to statistical theory will start with a discussion of what probability is and how it works.