Autore della sezione: Danielle J. Navarro and David R. Foxcroft

Assessing the validity of a study

More than any other thing, a scientist wants their research to be “valid”. The conceptual idea behind validity is very simple. Can you trust the results of your study? If not, the study is invalid. However, whilst it is easy to state, in practice it is much harder to check validity than it is to check reliability. And in all honesty, there is no precise, clearly agreed upon notion of what validity actually is. In fact, there are lots of different kinds of validity, each of which raises it is own issues. And not all forms of validity are relevant to all studies. I am going to talk about five different types of validity:

Internal validity
External validity
Construct validity
Face validity
Ecological validity

First, a quick guide as to what matters here. (1) Internal and external validity are the most important, since they tie directly to the fundamental question of whether your study really works. (2) Construct validity asks whether you are measuring what you think you are. (3) Face validity is not terribly important except insofar as you care about “appearances”. (4) Ecological validity is a special case of face validity that corresponds to a kind of appearance that you might care about a lot.

Internal validity

Internal validity refers to the extent to which you are able draw the correct conclusions about the causal relationships between variables. It is called “internal” because it refers to the relationships between things “inside” the study. Let us illustrate the concept with a simple example. Suppose you are interested in finding out whether a university education makes you write better. To do so, you get a group of first year students, ask them to write a 1000 word essay, and count the number of spelling and grammatical errors they make. Then you find some third-year students, who obviously have had more of a university education than the first-years, and repeat the exercise. And let us suppose it turns out that the third-year students produce fewer errors. And so you conclude that a university education improves writing skills. Right? Except that the big problem with this experiment is that the third-year students are older and they have had more experience with writing things. So it is hard to know for sure what the causal relationship is. Do older people write better? Or people who have had more writing experience? Or people who have had more education? Which of the above is the true cause of the superior performance of the third-years? Age? Experience? Education? You can not tell. This is an example of a failure of internal validity, because your study does not properly tease apart the causal relationships between the different variables.

External validity

External validity relates to the generalisability or applicability of your findings. That is, to what extent do you expect to see the same pattern of results in “real life” as you saw in your study. To put it a bit more precisely, any study that you do in psychology will involve a fairly specific set of questions or tasks, will occur in a specific environment, and will involve participants that are drawn from a particular subgroup (disappointingly often it is college students!). So, if it turns out that the results do not actually generalise or apply to people and situations beyond the ones that you studied, then what you have got is a lack of external validity.

The classic example of this issue is the fact that a very large proportion of studies in psychology will use undergraduate psychology students as the participants. Obviously, however, the researchers do not care only about psychology students. They care about people in general. Given that, a study that uses only psychology students as participants always carries a risk of lacking external validity. That is, if there is something “special” about psychology students that makes them different to the general population in some relevant respect, then we may start worrying about a lack of external validity.

That said, it is absolutely critical to realise that a study that uses only psychology students does not necessarily have a problem with external validity. I will talk about this again later, but it is such a common mistake that I am going to mention it here. The external validity of a study is threatened by the choice of population if (a) the population from which you sample your participants is very narrow (e.g., psychology students), and (b) the narrow population that you sampled from is systematically different from the general population in some respect that is relevant to the psychological phenomenon that you intend to study. The italicised part is the bit that lots of people forget. It is true that psychology undergraduates differ from the general population in lots of ways, and so a study that uses only psychology students may have problems with external validity. However, if those differences are not very relevant to the phenomenon that you are studying, then there is nothing to worry about. To make this a bit more concrete here are two extreme examples:

You want to measure “attitudes of the general public towards psychotherapy”, but all of your participants are psychology students. This study would almost certainly have a problem with external validity.
You want to measure the effectiveness of a visual illusion, and your participants are all psychology students. This study is unlikely to have a problem with external validity.

Having just spent the last couple of paragraphs focusing on the choice of participants, since that is a big issue that everyone tends to worry most about, it is worth remembering that external validity is a broader concept. The following are also examples of things that might pose a threat to external validity, depending on what kind of study you are doing:

People might answer a “psychology questionnaire” in a manner that does not reflect what they would do in real life.
Your lab experiment on (say) “human learning” has a different structure to the learning problems people face in real life.

Construct validity

Construct validity is basically a question of whether you are measuring what you want to be measuring. A measurement has good construct validity if it is actually measuring the correct theoretical construct, and bad construct validity if it does not. To give a very simple (if ridiculous) example, suppose I am trying to investigate the rates with which university students cheat on their exams. And the way I attempt to measure it is by asking the cheating students to stand up in the lecture theatre so that I can count them. When I do this with a class of 300 students 0 people claim to be cheaters. So I therefore conclude that the proportion of cheaters in my class is 0%. Clearly this is a bit ridiculous. But the point here is not that this is a very deep methodological example, but rather to explain what construct validity is. The problem with my measure is that while I am trying to measure “the proportion of people who cheat” what I am actually measuring is “the proportion of people stupid enough to own up to cheating, or bloody minded enough to pretend that they do”. Obviously, these are not the same thing! So my study has gone wrong, because my measurement has very poor construct validity.

Face validity

Face validity simply refers to whether or not a measure “looks like” it is doing what it is supposed to, nothing more. If I design a test of intelligence, and people look at it and they say “no, that test does not measure intelligence”, then the measure lacks face validity. It is as simple as that. Obviously, face validity is not very important from a pure scientific perspective. After all, what we care about is whether or not the measure actually does what it is supposed to do, not whether it looks like it does what it is supposed to do. As a consequence, we generally do not care very much about face validity. That said, the concept of face validity serves three useful pragmatic purposes:

Sometimes, an experienced scientist will have a “hunch” that a particular measure will not work. While these sorts of hunches have no strict evidentiary value, it is often worth paying attention to them. Because often times people have knowledge that they can not quite verbalise, so there might be something to worry about even if you can not quite say why. In other words, when someone you trust criticises the face validity of your study, it is worth taking the time to think more carefully about your design to see if you can think of reasons why it might go awry. Mind you, if you do not find any reason for concern, then you should probably not worry. After all, face validity really does not matter very much.
Often (very often), completely uninformed people will also have a “hunch” that your research is crap. And they will criticise it on the internet or something. On close inspection you may notice that these criticisms are actually focused entirely on how the study “looks”, but not on anything deeper. The concept of face validity is useful for gently explaining to people that they need to substantiate their arguments further.
Expanding on the last point, if the beliefs of untrained people are critical (e.g., this is often the case for applied research where you actually want to convince policy makers of something or other) then you have to care about face validity. Simply because, whether you like it or not, a lot of people will use face validity as a proxy for real validity. If you want the government to change a law on scientific psychological grounds, then it will not matter how good your studies “really” are. If they lack face validity you will find that politicians ignore you. Of course, it is somewhat unfair that policy often depends more on appearance than fact, but that is how things go.

Ecological validity

Ecological validity is a different notion of validity, which is similar to external validity, but less important. The idea is that, in order to be ecologically valid, the entire set up of the study should closely approximate the real-world scenario that is being investigated. In a sense, ecological validity is a kind of face validity. It relates mostly to whether the study “looks” right, but with a bit more rigour to it. To be ecologically valid the study has to look right in a fairly specific way. The idea behind it is the intuition that a study that is ecologically valid is more likely to be externally valid. It is no guarantee, of course. But the nice thing about ecological validity is that it is much easier to check whether a study is ecologically valid than it is to check whether a study is externally valid. A simple example would be eyewitness identification studies. Most of these studies tend to be done in a university setting, often with a fairly simple array of faces to look at, rather than a line up. The length of time between seeing the “criminal” and being asked to identify the suspect in the “line up” is usually shorter. The “crime” is not real so there is no chance of the witness being scared, and there are no police officers present so there is not as much chance of feeling pressured. These things all mean that the study definitely lacks ecological validity. They might (but might not) mean that it also lacks external validity.