Autor des Abschnitts: Danielle J. Navarro and David R. Foxcroft

Beurteilung der Zuverlässigkeit einer Messung

At this point we have thought a little bit about how to operationalise a theoretical construct and thereby create a psychological measure. And we have seen that by applying psychological measures we end up with variables, which can come in many different types. At this point, we should start discussing the obvious question: is the measurement any good? We will do this in terms of two related ideas: reliability and validity. Put simply, the reliability of a measure tells you how precisely you are measuring something, whereas the validity of a measure tells you how accurate the measure is. In this section, we will talk about reliability; we will talk about validity in section Experimentelle und nicht-experimentelle Forschung.

Reliability is actually a very simple concept. It refers to the repeatability or consistency of your measurement. The measurement of my weight by means of a “bathroom scale” is very reliable. If I step on and off the scales over and over again, it will keep giving me the same answer. Measuring my intelligence by means of “asking my mum” is very unreliable. Some days she tells me I am a bit thick, and other days she tells me I am a complete idiot. Notice that this concept of reliability is different to the question of whether the measurements are correct (the correctness of a measurement relates to it is validity). If I am holding a sack of potatoes when I step on and off the bathroom scales the measurement will still be reliable: it will always give me the same answer. However, this highly reliable answer does not match up to my true weight at all, therefore it is wrong. In technical terms, this is a reliable but invalid measurement. Similarly, whilst my mum’s estimate of my intelligence is a bit unreliable, she might be right. Maybe I am just not too bright, and so while her estimate of my intelligence fluctuates pretty wildly from day-to-day, it is basically right. That would be an unreliable but valid measure. Of course, if my mum’s estimates are too unreliable it is going to be very hard to figure out which one of her many claims about my intelligence is actually the right one. To some extent, then, a very unreliable measure tends to end up being invalid for practical purposes; so much so that many people would say that reliability is necessary (but not sufficient) to ensure validity.

Okay, now that we are clear on the distinction between reliability and validity, let us have a think about the different ways in which we might measure reliability:

Test-Retest-Reliabilität. Dies bezieht sich auf die Konsistenz einer Messung über die Zeit. Wenn wir die Messung zu einem späteren Zeitpunkt wiederholen, erhalten wir dann das gleiche Ergebnis?
Inter-Rater Reliabilität. Dies bezieht sich auf die Konsistenz einer Messung zwischen verschiedenen Personen. Wenn eine andere Person die Messung wiederholt (z.B. jemand anderes meine Intelligenz beurteilt), wird diese Person dann zum gleichen Ergebnis kommen?
Paralleltest-Reliabilität. Dies bezieht sich auf die Konsistenz zwischen theoretisch äquivalenten Messinstrumenten. Wenn ich mein Gewicht mit einer anderen Personenwaage messe, wird dann das gleiche Ergebnis angezeigt?
Internal consistency reliability. If a measurement is constructed from lots of different parts that perform similar functions (e.g., a personality questionnaire result is added up across several questions) do the individual parts tend to give similar answers. We will look at this particular form of reliability later in the book, in section Reliabilitätsanalyse.

Nicht alle Messungen müssen alle Formen der Reliabilität aufweisen. Zum Beispiel kann man die Bewertung von Studienleistungen als eine Form der Messung betrachten. Eines der Fächer, die ich unterrichte, Computational Cognitive Science, hat eine Bewertungsstruktur, die eine Forschungskomponente und eine Prüfungskomponente (sowie andere Dinge) umfasst. Die Prüfungskomponente soll etwas anderes messen als die Forschungskomponente, so dass die Bewertung als Ganzes eine geringe interne Konsistenz aufweist. Innerhalb der Prüfung gibt es jedoch mehrere Fragen, die (ungefähr) dasselbe messen sollen, und diese führen in der Regel auch zu ähnlichen Ergebnissen. Die Prüfung an sich hat also eine recht hohe interne Konsistenz. Und so sollte es auch sein. Reliabilität sollte man nur dann von einer Messung verlangen, wenn man auch dasselbe messen will!