Section author: Danielle J. Navarro and David R. Foxcroft

Estimating a linear regression model

Residuals associated with the best and with a poor regression line

Fig. 117 Depiction of the residuals associated with the best fitting regression line (left panel), and the residuals associated with a poor regression line (right panel). The residuals are much smaller for the good regression line. Again, this is no surprise given that the good line is the one that goes right through the middle of the data.

Okay, now let’s redraw our pictures but this time I’ll add some lines to show the size of the residual for all observations. When the regression line is good, our residuals (the lengths of the solid black lines) all look pretty small, as shown in Fig. 117 (left panel), but when the regression line is a bad one the residuals are a lot larger, as you can see from looking at Fig. 117 (right panel). Hmm. Maybe what we “want” in a regression model is small residuals. Yes, that does seem to make sense. In fact, I think I’ll go so far as to say that the “best fitting” regression line is the one that has the smallest residuals. Or, better yet, since statisticians seem to like to take squares of everything why not say that: The estimated regression coefficients, \(\hat{b}_0\) and \(\hat{b}_1\), are those that minimise the sum of the squared residuals, which we could either write as

\[\sum_i (Y_i - \hat{Y}_i)^2\]

or as

\[\sum_i \epsilon_{i}^2\]

Yes, yes that sounds even better. And since I’ve indented it like that, it probably means that this is the right answer. And since this is the right answer, it’s probably worth making a note of the fact that our regression coefficients are estimates (we’re trying to guess the parameters that describe a population!), which is why I’ve added the little hats, so that we get \(\hat{b}_0\) and \(\hat{b}_1\) rather than b0 and b1. Finally, I should also note that, since there’s actually more than one way to estimate a regression model, the more technical name for this estimation process is ordinary least squares (OLS) regression.

At this point, we now have a concrete definition for what counts as our “best” choice of regression coefficients, \(\hat{b}_0\) and \(\hat{b}_1\). The natural question to ask next is, if our optimal regression coefficients are those that minimise the sum squared residuals, how do we find these wonderful numbers? The actual answer to this question is complicated and doesn’t help you understand the logic of regression.[1] This time I’m going to let you off the hook. Instead of showing you the long and tedious way first and then “revealing” the wonderful shortcut that jamovi provides, let’s cut straight to the chase and just use jamovi to do all the heavy lifting.

Linear regression in jamovi

jamovi screenshot showing a simple linear regression analysis

Fig. 118 jamovi screenshot showing a simple linear regression analysis

To run my linear regression, open up the Regression - Linear Regression analysis in jamovi, using the parenthood data set. Then specify dani.grump as the Dependent Variable and dani.sleep as the variable entered in the Covariates box. This gives the results shown in Fig. 118, showing an intercept \(\hat{b}_0\) = 125.96 and the slope \(\hat{b}_1\) = -8.94. In other words, the best-fitting regression line that I plotted in Fig. 116 has this formula:

Ŷi = 125.96 + (-8.94 Xi)

Interpreting the estimated model

The most important thing to be able to understand is how to interpret these coefficients. Let’s start with \(\hat{b}_1\), the slope. If we remember the definition of the slope, a regression coefficient of \(\hat{b}_1\) = -8.94 means that if I increase Xi by 1, then I’m decreasing Yi by 8.94. That is, each additional hour of sleep that I gain will improve my mood, reducing my grumpiness by 8.94 grumpiness points. What about the intercept? Well, since \(\hat{b}_0\) corresponds to “the expected value of Yi when Xi equals 0”, it’s pretty straightforward. It implies that if I get zero hours of sleep (Xi = 0) then my grumpiness will go off the scale, to an insane value of (Yi = 125.96). Best to be avoided, I think.


[1]Or at least, I’m assuming that it doesn’t help most people. But on the off-chance that someone reading this is a proper kung fu master of linear algebra (and to be fair, I always have a few of these people in my intorductory statistics class), it will help you to know that the solution to the estimation problem turns out to be \(\hat{b} = (\mathbf{X}^\prime\mathbf{X})^{-1} \mathbf{X}^\prime y\), where \(\hat{b}\) is a vector containing the estimated regression coefficients, X is the “design matrix” that contains the predictor variables (plus an additional column containing all ones; strictly X is a matrix of the regressors, but I haven’t discussed the distinction yet), and y is a vector containing the outcome variable. For everyone else, this isn’t exactly helpful and can be downright scary. However, since quite a few things in linear regression can be written in linear algebra terms, you’ll see a bunch of footnotes like this one in this chapter. If you can follow the maths in them, great. If not, ignore it.