구역 작성자: Danielle J. Navarro and David R. Foxcroft

Multiple linear regression

The simple linear regression model that we’ve discussed up to this point assumes that there’s a single predictor variable that you’re interested in, in this case dani.sleep. In fact, up to this point every statistical tool that we’ve talked about has assumed that your analysis uses one predictor variable and one outcome variable. However, in many (perhaps most) research projects you actually have multiple predictors that you want to examine. If so, it would be nice to be able to extend the linear regression framework to be able to include multiple predictors. Perhaps some kind of multiple regression model would be in order?

Multiple regression is conceptually very simple. All we do is add more terms to our regression equation. Let’s suppose that we’ve got two variables that we’re interested in; perhaps we want to use both dani.sleep and baby.sleep to predict the dani.grump variable. As before, we let Yi refer to my grumpiness on the i-th day. But now we have two X variables: the first corresponding to the amount of sleep I got and the second corresponding to the amount of sleep my son got. So we’ll let Xi1 refer to the hours I slept on the i-th day and Xi2 refers to the hours that the baby slept on that day. If so, then we can write our regression model like this:

Yi = b0 + b1 Xi1 + b2 Xi2 + εi

As before, εi is the residual associated with the i-th observation, \({\epsilon}_i = {Y}_i - \hat{Y}_i\). In this model, we now have three coefficients that need to be estimated: b0 is the intercept, b1 is the coefficient associated with my sleep, and b2 is the coefficient associated with my son’s sleep. However, although the number of coefficients that need to be estimated has changed, the basic idea of how the estimation works is unchanged: our estimated coefficients \(\hat{b}_0\), \(\hat{b}_1\) and \(\hat{b}_2\) are those that minimise the sum squared residuals.

Doing it in jamovi

Multiple regression in jamovi is no different to simple regression. All we have to do is add additional variables to the Covariates box in jamovi. For example, if we want to use both dani.sleep and baby.sleep as predictors in our attempt to explain why I’m so grumpy, then move baby.sleep across into the Covariates box alongside dani.sleep. By default, jamovi assumes that the model should include an intercept. The coefficients we get this time are:

표 15 Model coefficients for the linear model predicting dani.grump using baby.sleep and dani.sleep (from the parenthood data set).

Predictor

Estimate

Intercept

125.966

dani.sleep

-8.950

baby.sleep

0.011

The coefficient associated with dani.sleep is quite large, suggesting that every hour of sleep I lose makes me a lot grumpier. However, the coefficient for baby.sleep is very small, suggesting that it doesn’t really matter how much sleep my son gets. What matters as far as my grumpiness goes is how much sleep I get. To get a sense of what this multiple regression model looks like, 그림 119 shows a 3D plot that plots all three variables, along with the regression model itself.

다중 회귀 모델의 3D 비주얼리제이션

그림 119 다중 회귀 모델의 3D 비주얼리제이션: 모델에는 dani.sleepbaby.sleep``의 가지 예측 변수가 있으며 결과 변수는 ``dani.grump 입니다. 이 세 가지 변수가 함께 3D 공간을 형성합니다. 각 관찰(점)은 이 공간의 한 점입니다. 단순 선형 회귀 모델이 2D 공간에서 선을 형성하는 것과 같은 방식으로 이 다중 회귀 모델은 3D 공간에서 평면을 형성합니다. 회귀 계수를 추정할 때 우리가 하려는 것은 가능한 모든 파란색 점에 가까운 평면을 찾는 것입니다.

Formula for the general case

The equation that I gave above shows you what a multiple regression model looks like when you include two predictors. Not surprisingly, then, if you want more than two predictors all you have to do is add more X terms and more b coefficients. In other words, if you have K predictor variables in the model then the regression equation looks like this

\[Y_i = b_0 + \left( \sum_{k=1}^K b_{k} X_{ik} \right) + \epsilon_i\]