Autor des Abschnitts: Danielle J. Navarro and David R. Foxcroft

Multiple lineare Regression

The simple linear regression model that we have discussed up to this point assumes that there is a single predictor variable that you are interested in, in this case dani.sleep. In fact, up to this point every statistical tool that we have talked about has assumed that your analysis uses one predictor variable and one outcome variable. However, in many (perhaps most) research projects you actually have multiple predictors that you want to examine. If so, it would be nice to be able to extend the linear regression framework to be able to include multiple predictors. Perhaps some kind of multiple regression model would be in order?

Multiple regression is conceptually very simple. All we do is add more terms to our regression equation. Let us suppose that we have got two variables that we are interested in; perhaps we want to use both dani.sleep and baby.sleep to predict the dani.grump variable. As before, we let Yi refer to my grumpiness on the i-th day. But now we have two X variables: the first corresponding to the amount of sleep I got and the second corresponding to the amount of sleep my son got. So we will let Xi1 refer to the hours I slept on the i-th day and Xi2 refers to the hours that the baby slept on that day. If so, then we can write our regression model like this:

Yi = b0 + b1 Xi1 + b2 Xi2 + εi

Wie zuvor ist εi das mit der i-ten Beobachtung verbundene Residuum, \({\epsilon}_i = {Y}_i - \hat{Y}_i\). In diesem Modell haben wir nun drei Koeffizienten, die geschätzt werden müssen: b0 ist das Interzept, b1 ist der Koeffizient, der mit meinem Schlaf verbunden ist, und b2 ist der Koeffizient, der mit dem Schlaf meines Sohnes verbunden ist. Obwohl sich die Anzahl der zu schätzenden Koeffizienten geändert hat, bleibt die Grundidee der Schätzung unverändert: Unsere geschätzten Koeffizienten \(\hat{b}_0\), \(\hat{b}_1\) und \(\hat{b}_2\) sind diejenigen, welche die Summe der quadrierten Residuen minimieren.

Durchführung in jamovi

Multiple regression in jamovi is no different to simple regression. All we have to do is add additional variables to the Covariates box in jamovi. For example, if we want to use both dani.sleep and baby.sleep as predictors in our attempt to explain why I am so grumpy, then move baby.sleep across into the Covariates box alongside dani.sleep. By default, jamovi assumes that the model should include an intercept. The coefficients we get this time are:

Tab. 15 Modellkoeffizienten für das lineare Modell zur Vorhersage von dani.grump unter Verwendung von baby.sleep und dani.sleep (aus dem parenthood Datensatz).

Prädiktor

Schätzung

Interzept

125.966

dani.sleep

-8.950

baby.sleep

0.011

The coefficient associated with dani.sleep is quite large, suggesting that every hour of sleep I lose makes me a lot grumpier. However, the coefficient for baby.sleep is very small, suggesting that it does not really matter how much sleep my son gets. What matters as far as my grumpiness goes is how much sleep I get. To get a sense of what this multiple regression model looks like, Abb. 137 shows a 3D plot that plots all three variables, along with the regression model itself.

3D-Visualisierung eines multiplen Regressionsmodells

Abb. 137 3D visualisation of a multiple regression model: There are two predictors in the model, dani.sleep and baby.sleep and the outcome variable is dani.grump. Together, these three variables form a 3D space. Each observation (dot) is a point in this space. In much the same way that a simple linear regression model forms a line in 2D space, this multiple regression model forms a plane in 3D space. When we estimate the regression coefficients what we are trying to do is find a plane that is as close to all the blue dots as possible.

Die Formel für den allgemeinen Fall

Die oben dargestellte Gleichung zeigt Ihnen, wie ein multiples Regressionsmodell aussieht, wenn Sie zwei Prädiktoren einbeziehen. Wenn Sie also mehr als zwei Prädiktoren benötigen, müssen Sie lediglich weitere X Terme und weitere b Koeffizienten hinzufügen. Mit anderen Worten: Wenn Sie K Prädiktorvariablen im Modell haben, sieht die Regressionsgleichung wie folgt aus:

\[Y_i = b_0 + \left( \sum_{k = 1} ^ K b_{k} X_{ik} \right) + \epsilon_i\]