ISLR Home

Question

p299

In this exercise, you will further analyze the Wage data set considered throughout this chapter.

  1. Perform polynomial regression to predict wage using age. Use cross-validation to select the optimal degree d for the polyno- mial. What degree was chosen, and how does this compare to the results of hypothesis testing using ANOVA? Make a plot of the resulting polynomial fit to the data.

  2. Fit a step function to predict wage using age, and perform cross-validation to choose the optimal number of cuts. Make a plot of the fit obtained.


library(ISLR)
library(boot)

6a

# Creating a placeholder for the cv errors
cv.error = rep(0, 5)

# Running a for loop to iterate through each polynomial and fitting data
for (i in 1:5) {
  # Fitting data the polynomial i
  glm.fit = glm(wage ~ poly(age, i), data=Wage)
  
  # Saving the CV estimate for the fit
  cv.error[i] = cv.glm(Wage, glm.fit)$delta[1]
}

Plot CV Error

# Plotting the results of for loop
plot(c(1:5), cv.error)

Comments:

  • Based on the one-SE rule, the cv chooses the 3rd degree polynomial
  • The ANOVA chooses 4th degree polynomial

6b

cv.error = rep(0, 5)
# Running a for loop to iterate through each step function and fitting data
for (i in 2:5) {
  # Fitting data the polynomial i
  print(i)
  Wage$age.cut = cut(Wage$age, i)
  glm.fit = glm(wage ~ age.cut, data=Wage)
  
  # Saving the CV estimate for the fit
  cv.error[i] = cv.glm(Wage, glm.fit)$delta[2]
}
## [1] 2
## [1] 3
## [1] 4
## [1] 5

Plot CV Error

plot(c(2:5), cv.error[2:5], pch=20, cex=0.5, lwd=2)