ISLR Home

# Question

In Sections 5.3.2 and 5.3.3, we saw that the cv.glm() function can be used in order to compute the LOOCV test error estimate. Alternatively, one could compute those quantities using just the glm() and predict.glm() functions, and a for loop. You will now take this approach in order to compute the LOOCV error for a simple logistic regression model on the Weekly data set. Recall that in the context of classification problems, the LOOCV error is given in (5.4).

1. Fit a logistic regression model that predicts Direction using Lag1 and Lag2.

2. Fit a logistic regression model that predicts Direction using Lag1 and Lag2 using all but the first observation.

3. Use the model from (b) to predict the direction of the first observation. You can do this by predicting that the first observation will go up if P(Direction=“Up”|Lag1, Lag2) > 0.5. Was this observation correctly classified?

4. Write a forloop fromi=1 t oi=n, where n is the number of observations in the data set, that performs each of the following steps:

1. Fit a logistic regression model using all but the ith observation to predict Direction using Lag1 and Lag2.
2. Compute the posterior probability of the market moving up for the ith observation.
3. Use the posterior probability for the ith observation in order to predict whether or not the market moves up.
4. Determine whether or not an error was made in predicting the direction for the ith observation. If an error was made, then indicate this as a 1, and otherwise indicate it as a 0.
1. Take the average of the n numbers obtained in (d)iv in order to obtain the LOOCV estimate for the test error. Comment on the results

library(ISLR)
set.seed(1)

# 7a

lr.fit = glm(Direction ~ Lag1 + Lag2, data=Weekly, family=binomial)
summary(lr.fit)
##
## Call:
## glm(formula = Direction ~ Lag1 + Lag2, family = binomial, data = Weekly)
##
## Deviance Residuals:
##    Min      1Q  Median      3Q     Max
## -1.623  -1.261   1.001   1.083   1.506
##
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept)  0.22122    0.06147   3.599 0.000319 ***
## Lag1        -0.03872    0.02622  -1.477 0.139672
## Lag2         0.06025    0.02655   2.270 0.023232 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
##     Null deviance: 1496.2  on 1088  degrees of freedom
## Residual deviance: 1488.2  on 1086  degrees of freedom
## AIC: 1494.2
##
## Number of Fisher Scoring iterations: 4

# 7b

train = Weekly[-1,]
test = Weekly[1,]

lr.fit.b = glm(Direction ~ Lag1 + Lag2, data=train, family=binomial)
summary(lr.fit.b)
##
## Call:
## glm(formula = Direction ~ Lag1 + Lag2, family = binomial, data = train)
##
## Deviance Residuals:
##     Min       1Q   Median       3Q      Max
## -1.6258  -1.2617   0.9999   1.0819   1.5071
##
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept)  0.22324    0.06150   3.630 0.000283 ***
## Lag1        -0.03843    0.02622  -1.466 0.142683
## Lag2         0.06085    0.02656   2.291 0.021971 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
##     Null deviance: 1494.6  on 1087  degrees of freedom
## Residual deviance: 1486.5  on 1085  degrees of freedom
## AIC: 1492.5
##
## Number of Fisher Scoring iterations: 4

# 7c: Predict

lr.prob = predict(lr.fit.b, test, type = "response") # 0.287534

lr.prob
##         1
## 0.5713923
lr.prob > 0.5 # >0.05 means LR predicts up
##    1
## TRUE
Weekly[1,]\$Direction # Actual direction was down Down
## [1] Down
## Levels: Down Up

Answer: Predicted incorrectly that it went Up

# 7d Predict

num_incorrect = 0

for (i in 1:nrow(Weekly)) {
train = Weekly[-i,]
test = Weekly[i,]
lr.fit.b = glm(Direction ~ Lag1 + Lag2, data=train, family=binomial)
lr.prob = predict(lr.fit.b, test, type = "response")

if (lr.prob > 0.5) {
predicted_direction = "Up"
} else {
predicted_direction = "Down"
}
#is_not_correct = Weekly[i,"Direction"] != predicted_direction
if (Weekly[i,"Direction"]  != predicted_direction) {
num_incorrect = num_incorrect + 1
}

}
print(num_incorrect)
## [1] 490

# 7e Predict

num_incorrect
## [1] 490
num_incorrect/nrow(Weekly)  # 45% wrong
## [1] 0.4499541