구역 작성자: Danielle J. Navarro and David R. Foxcroft
Transforming and recoding a variable
It’s not uncommon in real world data analysis to find that one of your variables isn’t quite equivalent to the variable that you really want. For instance, it’s often convenient to take a continuous variable (e.g., age) and break it up into a smallish number of categories (e.g., younger, middle, older). At other times, you may need to convert a numeric variable into a different numeric variable (e.g., you may want to analyse at the absolute value of the original variable). In this section I’ll describe a few key ways you can do these things in jamovi.
Creating a transformed variable
The first trick to discuss is the idea of transforming a variable. Taken literally, anything you do to a variable is a transformation, but in practice what it usually means is that you apply a relatively simple mathematical function to the original variable in order to create a new variable that either (a) provides a better way of describing the thing you’re actually interested in, or (b) is more closely in agreement with the assumptions of the statistical tests you want to do. Since, at this stage, I haven’t talked about statistical tests or their assumptions, I’ll show you an example based on the first case.
Suppose I’ve run a short study in which I ask 10 people a single question:
On a scale of 1 (strongly disagree) to 7 (strongly agree), to what extent do you agree with the proposition that “Dinosaurs are awesome”?
Now let’s load and look at the data. The likert
data set contains a single
variable that contains raw Likert-scale responses for these 10 people. However,
if you think about it, this isn’t the best way to represent these responses.
Because of the fairly symmetric way that we set up the response scale, there’s
a sense in which the midpoint of the scale should have been coded as 0 (no
opinion), and the two endpoints should be +3 (strongly agree) and -3
(strongly disagree). By recoding the data in this way it’s a bit more
reflective of how we really think about the responses. The recoding here is
pretty straightforward, we just subtract 4 from the raw scores. In jamovi you
can do this by computing a new variable: click on the Compute
button
in the Data
tab and you will see that a new variable has been added to the
spreadsheet. Let’s call this new variable likert.centred
(go ahead
and type that in) and then add the following in the formula box, like in
그림 36: likert.raw - 4
One reason why it might be useful to have the data in this format is
that there are a lot of situations where you might prefer to analyse the
strength of the opinion separately from the direction of the
opinion. We can do two different transformations on this
likert.centred
variable in order to distinguish between these two
different concepts. First, to compute an opinion.strength
variable,
we want to take the absolute value of the centred data (using the ABS
function).[1] In jamovi, create another new variable using the
Compute
button. Name the variable opinion.strength
and this time
click on the fx button next to the Formula
box. This shows the
different Functions
and Variables
that you can add to the Formula
box, so double click on ABS
and then double click on likert.centred
and you will see that the Formula
box is populated with
ABS(likert.centred)
and a new variable has been created in the
spreadsheet view, as in 그림 37:
Second, to compute a variable that contains only the direction of the
opinion and ignores the strength, we want to calculate the “sign” of the
variable. In jamovi we can use the IF
function to do this. Create
another new variable using the Compute
button, name this one
opinion.sign
, and then type the following into the function box:
IF(likert.centred == 0, 0, likert.centred / opinion.strength)
When done, you’ll see that all negative numbers from the
likert.centred
variable are converted to -1, all positive
numbers are converted to 1 and zero stays as 0, like so:
-1 1 -1 0 0 0 -1 1 1 1
Let’s break down what this IF
command is doing. In jamovi there are
three parts to an IF
statement, written as IF(expression, value,
else)
. The first part, expression
can be a logical or mathematical
statement. In our example, we have specified likert.centred == 0
,
which is TRUE
for values where likert.centred
is zero. The next part,
value
, is the new value where the expression in part one is TRUE
. In
our example, we have said that for all those values where likert.centred
is zero, keep them zero. In the next part, else
, we can enter another
logical or mathematical statement to be used if part one evaluates to
FALSE
, i.e. where likert.centred
is not zero. In our example we have
divided likert.centred
by opinion.strength to give -1
or +1
depending of the sign of the original value in likert.centred
.[2]
And we’re done. We now have three shiny new variables, all of which are
useful transformations of the original likert.raw
variable.
Collapsing a variable into a smaller number of discrete levels or categories
One pragmatic task that comes up quite often is the problem of collapsing a variable into a smaller number of discrete levels or categories. For instance, suppose I’m interested in looking at the age distribution of people at a social gathering:
60, 58, 24, 26, 34, 42, 31, 30, 33, 2, 9
In some situations it can be quite helpful to group these into a
smallish number of categories. For example, we could group the data into
three broad categories: young (0-20), adult (21-40) and older (41-60).
This is a quite coarse-grained classification, and the labels that I’ve
attached only make sense in the context of this data set (e.g., viewed
more generally, a 42 year old wouldn’t consider themselves as “older”).
We can slice this variable up quite easily using the jamovi IF
function that we have already used. This time we have to specify nested
IF
statements, meaning simply that IF
the first logical expression is
TRUE
, insert a first value, but IF
a second logical expression is TRUE
,
insert a second value, but IF
a third logical expression is TRUE
, then
insert a third value. This can be written as:
IF(Age >= 0 and Age <= 20, 1, IF(Age >= 21 and Age <= 40, 2, IF(Age >= 41 and Age <= 60, 3 )))
Note that there are three left parentheses used during the nesting, so the whole statement has to end with three right parentheses otherwise you will get an error message. The jamovi screen shot for this data manipulation, along with an accompanying frequency table, is shown in 그림 38:
It’s important to take the time to figure out whether or not the resulting categories make any sense at all in terms of your research project. If they don’t make any sense to you as meaningful categories, then any data analysis that uses those categories is likely to be just as meaningless. More generally, in practice I’ve noticed that people have a very strong desire to carve their (continuous and messy) data into a few (discrete and simple) categories, and then run analyses using the categorised data instead of the original data.[3] I wouldn’t go so far as to say that this is an inherently bad idea, but it does have some fairly serious drawbacks at times, so I would advise some caution if you are thinking about doing it.
Creating a transformation that can be applied to multiple variables
Sometimes you want to apply the same transformation to more than one
variable, for example when you have multiple questionnaire items that
all need to be recalculated or recoded in the same way. And one of the
neat features in jamovi is that you can create a transformation, using
the Transform
button in the Data
tab, that can then be saved and
applied to multiple variables. Let’s go back to the first example above, using
the likert
data set that contains a single variable with raw
Likert-scale responses for 10 people. To create a transformation that
you can save and then apply across multiple variables (assuming you had
more variables like this in your data file), first in the spreadsheet
editor select (i.e., click) the variable you want to use to initially
create the transformation. In our example this is likert.raw
. Next
click the Transform
button in the jamovi Data
tab, and you’ll see
something like 그림 39.
Give your new variable a name, let’s call it opinion.strength
and
then click on the Using transform
selection box and select Create New
Transform…
. This is where you will create, and name, the
transformation that can be re-applied to as many variables as you like.
The transformation is automatically named for us as Transform 1
(imaginative, huh. You can change this if you like). Then type the
expression ABS($source - 4)
into the function text box, as in
그림 40, press Enter or Return on your keyboard and, hey
presto, you have created a new transformation and applied it to the
likert.raw
variable! Good, eh. Note that instead of using the variable
label in the expression, we have instead used $source
. This is so that
we can then use the same transformation with as many different variables as we
like - jamovi requires you to use $source
to refer to the source variable
you are transforming. Your transformation has also been saved and can be
re-used any time you like (providing you save the dataset as an .omv
file,
otherwise you’ll lose it!).
You can also create a transformation with the second example we looked at, the
age distribution of people at a social gathering. Go on, you know you want to!
Remember that we collapsed this variable into three groups: younger, adult and
older. This time we will achieve the same thing, but using the jamovi
Transform
→ Add condition
button. With this data set (go back to it or
create it again if you didn’t save it) set up a new variable transformation.
Call the transformed variable AgeCats
and the transformation you will
create Agegroupings
. Then click on the big +
sign next to the function
box. This is the Add condition
button and I’ve stuck a big red arrow onto
그림 41 so you can see exactly where this is. Re-create the
transformation shown in 그림 41 and when you have done, you
will see the new values appear in the spreadsheet window. What’s more, the
Agegroupings
transformation has been saved and can be re-applied any time
you like. Ok, so I know that it’s unlikely you will have more than one Age
variable, but you get the idea now of how to set up transformations in jamovi,
so you can follow this idea with other sorts of variables. A typical scenario
for this is when you have a questionnaire scale with, say, 20 items (variables)
and each item was originally scored from 1 to 6 but, for some reason or quirk
of the data you decide to recode all the items as 1 to 3. You can easily do
this in jamovi by creating and then re-applying your transformation for each
variable that you want to recode.