ISLR Home

Question

p299

  1. This question uses the variables dis (the weighted mean of distances to five Boston employment centers) and nox (nitrogen oxides concentration in parts per 10 million) from the Boston data. We will treat dis as the predictor and nox as the response.
  1. Use the poly() function to fit a cubic polynomial regression to predict nox using dis. Report the regression output, and plot the resulting data and polynomial fits.

  2. Plot the polynomial fits for a range of different polynomial degrees (say, from 1 to 10), and report the associated residual sum of squares.

  3. Perform cross-validation or another approach to select the optimal degree for the polynomial, and explain your results.

  4. Use the bs() function to fit a regression spline to predict nox using dis. Report the output for the fit using four degrees of freedom. How did you choose the knots? Plot the resulting fit.

  5. Now fit a regression spline for a range of degrees of freedom, and plot the resulting fits and report the resulting RSS. Describe the results obtained.

  6. Perform cross-validation or another approach in order to select the best degrees of freedom for a regression spline on this data. Describe your results.


library(MASS)
library(tidyverse)
library(gridExtra)
g1 <- ggplot(Boston, aes(x = nox, y = dis)) +
  geom_point(alpha = 0.5) +
  geom_smooth(method = "lm", formula = y ~ x)

g2 <- ggplot(Boston, aes(x = nox, y = dis)) +
  geom_point(alpha = 0.5) +
  geom_smooth(method = "loess", formula = y ~ x)



g3 <- ggplot(Boston, aes(x = nox, y = dis)) +
  geom_point(alpha = 0.5) +
  geom_smooth(method = "lm", formula = y ~ poly(x, 4), se = FALSE)

g4 <- ggplot(Boston, aes(x = nox, y = dis)) +
  geom_point(alpha = 0.5) +
  geom_smooth(method = "lm", 
              formula = y ~ poly(x, 4),
              level=0.95, # Default
              se = TRUE) # Default


grid.arrange(g1, g2, g3, g4, ncol = 2)

Fit model: nox = dis^4

fit = lm(nox ~ poly(dis, 4), data = Boston)

Diagnostic plots

par(mfrow=c(2,2))
plot(fit)

Coefficients

coef(summary(fit))
##                  Estimate  Std. Error     t value      Pr(>|t|)
## (Intercept)    0.55469506 0.002761339 200.8790240  0.000000e+00
## poly(dis, 4)1 -2.00309590 0.062114782 -32.2482963 2.540459e-124
## poly(dis, 4)2  0.85632995 0.062114782  13.7862506  6.924872e-37
## poly(dis, 4)3 -0.31804899 0.062114782  -5.1203430  4.356581e-07
## poly(dis, 4)4  0.03354668 0.062114782   0.5400757  5.893848e-01