In Section 7.7, it was mentioned that GAMs are generally fit using a backfitting approach. The idea behind backfitting is actually quite simple. We will now explore backfitting in the context of multiple linear regression.

Suppose that we would like to perform multiple linear regression, but we do not have software to do so. Instead, we only have software to perform simple linear regression. Therefore, we take the following iterative approach: we repeatedly hold all but one coefficient estimate fixed at its current value, and update only that coefficient estimate using a simple linear regression. The process is continued un- til convergence—that is, until the coefficient estimates stop changing.

We now try this out on a toy example.

  1. Generate a response Y and two predictors X1 and X2, with n = 100.

  2. Initialize βˆ1 to take on a value of your choice. It does not matter what value you choose.

  3. Keeping βˆ1 fixed, fit the model

Y − βˆ 1 X 1 = β 0 + β 2 X 2 + ε .

You can do this as follows:

> a=y-beta1*x1
> beta2=lm(a∼x2)$coef[2]
  1. Keeping βˆ2 fixed, fit the model

Y − βˆ 2 X 2 = β 0 + β 1 X 1 + ε .

You can do this as follows:

> a=y-beta2*x2
> beta1=lm(a∼x1)$coef[2]
  1. Write a for loop to repeat (c) and (d) 1,000 times. Report the estimates of βˆ0, βˆ1, and βˆ2 at each iteration of the for loop. Create a plot in which each of these values is displayed, with βˆ0, βˆ1, and βˆ2 each shown in a different color.

  2. Compare your answer in (e) to the results of simply performing multiple linear regression to predict Y using X1 and X2. Use the abline() function to overlay those multiple linear regression coefficient estimates on the plot obtained in (e).

  3. On this data set, how many backfitting iterations were required in order to obtain a “good” approximation to the multiple re- gression coefficient estimates?