Forfatter av avsnitt: Danielle J. Navarro and David R. Foxcroft

Vurdere reliabiliteten til en måling

At this point we have thought a little bit about how to operationalise a theoretical construct and thereby create a psychological measure. And we have seen that by applying psychological measures we end up with variables, which can come in many different types. At this point, we should start discussing the obvious question: is the measurement any good? We will do this in terms of two related ideas: reliability and validity. Put simply, the reliability of a measure tells you how precisely you are measuring something, whereas the validity of a measure tells you how accurate the measure is. In this section, we will talk about reliability; we will talk about validity in section Eksperimentell og ikke-eksperimentell forskning.

Reliability is actually a very simple concept. It refers to the repeatability or consistency of your measurement. The measurement of my weight by means of a “bathroom scale” is very reliable. If I step on and off the scales over and over again, it will keep giving me the same answer. Measuring my intelligence by means of “asking my mum” is very unreliable. Some days she tells me I am a bit thick, and other days she tells me I am a complete idiot. Notice that this concept of reliability is different to the question of whether the measurements are correct (the correctness of a measurement relates to it is validity). If I am holding a sack of potatoes when I step on and off the bathroom scales the measurement will still be reliable: it will always give me the same answer. However, this highly reliable answer does not match up to my true weight at all, therefore it is wrong. In technical terms, this is a reliable but invalid measurement. Similarly, whilst my mum’s estimate of my intelligence is a bit unreliable, she might be right. Maybe I am just not too bright, and so while her estimate of my intelligence fluctuates pretty wildly from day-to-day, it is basically right. That would be an unreliable but valid measure. Of course, if my mum’s estimates are too unreliable it is going to be very hard to figure out which one of her many claims about my intelligence is actually the right one. To some extent, then, a very unreliable measure tends to end up being invalid for practical purposes; so much so that many people would say that reliability is necessary (but not sufficient) to ensure validity.

Okay, now that we are clear on the distinction between reliability and validity, let us have a think about the different ways in which we might measure reliability:

Test-retest-reliabilitet. Dette handler om konsistens over tid. Hvis vi gjentar målingen på et senere tidspunkt, får vi da det samme svaret?
Reliabilitet mellom bedømmere. Dette handler om konsistens på tvers av personer. Hvis noen andre gjentar målingen (f.eks. hvis noen andre vurderer intelligensen min), vil de komme frem til det samme svaret?
Parallellformsreliabilitet. Dette dreier seg om konsistens på tvers av teoretisk ekvivalente målinger. Hvis jeg bruker en annen badevekt til å måle vekten min, gir den samme svar?
Internal consistency reliability. If a measurement is constructed from lots of different parts that perform similar functions (e.g., a personality questionnaire result is added up across several questions) do the individual parts tend to give similar answers. We will look at this particular form of reliability later in the book, in section Reliabilitetsanalyse av intern konsistens.

Ikke alle målinger trenger å ha alle former for reliabilitet. For eksempel kan pedagogisk vurdering betraktes som en form for måling. Et av fagene jeg underviser i, Computational Cognitive Science, har en vurderingsstruktur som består av en forskningskomponent og en eksamenskomponent (pluss andre ting). Eksamenskomponenten er tiltenkt å måle noe annet enn forskningskomponenten, så vurderingen som helhet har lav intern konsistens. I eksamenen er det imidlertid flere spørsmål som er ment å måle (omtrent) det samme, og disse har en tendens til å gi lignende resultater. Så eksamenen i seg selv har en ganske høy intern konsistens. Og det er som det skal være. Du bør bare kreve reliabilitet i de situasjonene der du ønsker å måle det samme!