ISLR Home

Question

p371

This problem involves the OJ data set which is part of the ISLR package. Question 9 was the OJ problem in Chapter 8.

  1. Create a training set containing a random sample of 800 observations, and a test set containing the remaining observations.

  2. Fit a support vector classifier to the training data using cost=0.01, with Purchase as the response and the other variables as predictors. Use the summary() function to produce summary statistics, and describe the results obtained.

  3. What are the training and test error rates?

  4. Use the tune() function to select an optimal cost. Consider val- ues in the range 0.01 to 10.

  5. Compute the training and test error rates using this new value for cost.

  6. Repeat parts (b) through (e) using a support vector machine with a radial kernel. Use the default value for gamma.

  7. Repeat parts (b) through (e) using a support vector machine with a polynomial kernel. Set degree=2.

  8. Overall, which approach seems to give the best results on this data?


8a

Create a training set containing a random sample of 800 observations, and a test set containing the remaining observations. (see Q8.9)

# From Q8.9
dim(OJ)
## [1] 1070   18
set.seed(1)
train = sample(1:nrow(OJ), 800)

# Don't actually use these ???
oj.train = OJ[train,]
oj.test = OJ[-train,]
oj.train.y = OJ[train,"Purchase"]
oj.test.y = OJ[-train,"Purchase"]

8b

Fit a support vector classifier to the training data using cost=0.01, with Purchase as the response and the other variables as predictors.

Parameters: Cost = 0.01

svm.fit = svm(Purchase ~ ., data = oj.train, kernel = "linear", cost = c(0.01))
summary(svm.fit)
## 
## Call:
## svm(formula = Purchase ~ ., data = oj.train, kernel = "linear", cost = c(0.01))
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  linear 
##        cost:  0.01 
## 
## Number of Support Vectors:  435
## 
##  ( 219 216 )
## 
## 
## Number of Classes:  2 
## 
## Levels: 
##  CH MM

8c

What are the training and test error rates?

Training Data Confusion Matrix

svm.predict = predict(svm.fit, newdata = oj.train)
result = table(true=oj.train.y, pred=svm.predict)
result
##     pred
## true  CH  MM
##   CH 420  65
##   MM  75 240

Training Data Accuracy

(result[1] + result[4]) / sum(result)
## [1] 0.825

Test Data Confusion Matrix

svm.predict = predict(svm.fit, newdata = oj.test)
result = table(true=oj.test.y, pred=svm.predict)
result
##     pred
## true  CH  MM
##   CH 153  15
##   MM  33  69

Test Data Accuracy

(result[1] + result[4]) / sum(result)
## [1] 0.8222222

8d Cross-Validation

Use the tune() function to select an optimal cost. Consider values in the range 0.01 to 10

tune.out = tune(svm, as.factor(Purchase) ~ ., 
                data = oj.train,
                kernel = "linear", 
                ranges = list(cost = c(0.01, 0.1, 1, 5, 10)))
summary(tune.out)
## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost
##    10
## 
## - best performance: 0.17125 
## 
## - Detailed performance results:
##    cost   error dispersion
## 1  0.01 0.17375 0.03884174
## 2  0.10 0.17875 0.03064696
## 3  1.00 0.17500 0.03061862
## 4  5.00 0.17250 0.03322900
## 5 10.00 0.17125 0.03488573

Best Model

best.model <- tune.out$best.model
best.model
## 
## Call:
## best.tune(method = svm, train.x = as.factor(Purchase) ~ ., data = oj.train, 
##     ranges = list(cost = c(0.01, 0.1, 1, 5, 10)), kernel = "linear")
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  linear 
##        cost:  10 
## 
## Number of Support Vectors:  326
best.model$cost
## [1] 10

8e

Compute the training and test error rates using this new value for cost

Parameters: Cost = 0.1

#svm.fit = svm(Purchase ~ ., data = oj.train, kernel = "linear", cost = best_model)
#summary(svm.fit)
best.model.predict = predict(best.model, newdata=oj.train)
result = table(true=oj.train.y, pred=best.model.predict)
result
##     pred
## true  CH  MM
##   CH 423  62
##   MM  69 246

Best Model Training Data Accuracy

(result[1] + result[4]) / sum(result)
## [1] 0.83625

Training Data Confusion Matrix

#svm.predict = predict(svm.fit, newdata = oj.train)
#table(true=oj.train.y, pred=svm.predict)
#(422+246)/800
#best.model.predict = predict(best.model, newdata=oj.test)
#result = table(true=oj.test.y, pred=best.model.predict)
#result

Best Model Test Data Confusion Matrix

best.model.predict = predict(best.model, newdata=oj.test)
result = table(true=oj.test.y, pred=best.model.predict)
result
##     pred
## true  CH  MM
##   CH 156  12
##   MM  28  74

Accuracy

(result[1] + result[4]) / sum(result)
## [1] 0.8518519
#svm.predict = predict(svm.fit, newdata = oj.test)
#table(true=oj.test.y, pred=svm.predict)
#(155+71)/270

8f

Repeat parts (b) through (e) using a support vector machine with a radial kernel. Use the default value for gamma

8f-b

Parameters: Cost = 0.01

svm.fit.radial = svm(Purchase ~ ., data = oj.train, kernel = "radial", cost = c(0.01))
#summary(svm.fit.radial)

8f-c

Training Data Confusion Matrix

svm.predict = predict(svm.fit.radial, newdata = oj.train)
table(true=oj.train.y, pred=svm.predict)
##     pred
## true  CH  MM
##   CH 485   0
##   MM 315   0

8f-d

tune.out = tune(svm, as.factor(Purchase) ~ ., 
                data = oj.train,
                kernel = "radial", 
                ranges = list(cost = c(0.01, 0.1, 1, 5, 10)))
summary(tune.out)
## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost
##     1
## 
## - best performance: 0.17625 
## 
## - Detailed performance results:
##    cost   error dispersion
## 1  0.01 0.39375 0.06568284
## 2  0.10 0.18250 0.05470883
## 3  1.00 0.17625 0.03793727
## 4  5.00 0.18125 0.04299952
## 5 10.00 0.18125 0.04340139

8f-e

Parameters: Cost = 0.1

svm.fit = svm(Purchase ~ ., data = oj.train, kernel = "radial", cost = c(0.1))
summary(svm.fit)
## 
## Call:
## svm(formula = Purchase ~ ., data = oj.train, kernel = "radial", cost = c(0.1))
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  radial 
##        cost:  0.1 
## 
## Number of Support Vectors:  541
## 
##  ( 272 269 )
## 
## 
## Number of Classes:  2 
## 
## Levels: 
##  CH MM

Test Data Confusion Matrix

svm.predict = predict(svm.fit, newdata = oj.test)
table(true=oj.test.y, pred=svm.predict)
##     pred
## true  CH  MM
##   CH 150  18
##   MM  37  65

8g

Repeat parts (b) through (e) using a support vector machine with a polynomial kernel. Set degree=2

8g-b

Parameters: Cost = 0.01

degree <- 2
svm.fit.radial = svm(Purchase ~ ., data = oj.train, kernel = "polynomial", cost = c(0.01), degree=degree)
#summary(svm.fit.radial)

8g-c

Training Data Confusion Matrix

svm.predict = predict(svm.fit.radial, newdata = oj.train)
table(true=oj.train.y, pred=svm.predict)
##     pred
## true  CH  MM
##   CH 484   1
##   MM 297  18

8g-d

tune.out = tune(svm, as.factor(Purchase) ~ ., 
                data = oj.train,
                kernel = "polynomial", 
                ranges = list(cost = c(0.01, 0.1, 1, 5, 10),
                              degree = degree))
summary(tune.out)
## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost degree
##     5      2
## 
## - best performance: 0.18375 
## 
## - Detailed performance results:
##    cost degree   error dispersion
## 1  0.01      2 0.39000 0.08287373
## 2  0.10      2 0.32375 0.06730166
## 3  1.00      2 0.20000 0.05137012
## 4  5.00      2 0.18375 0.05104804
## 5 10.00      2 0.18625 0.05185785

8g-e

Parameters: Cost = 0.1

svm.fit = svm(Purchase ~ ., data = oj.train, kernel = "polynomial", cost = c(0.1), degree = degree)
summary(svm.fit)
## 
## Call:
## svm(formula = Purchase ~ ., data = oj.train, kernel = "polynomial", 
##     cost = c(0.1), degree = degree)
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  polynomial 
##        cost:  0.1 
##      degree:  2 
##      coef.0:  0 
## 
## Number of Support Vectors:  589
## 
##  ( 298 291 )
## 
## 
## Number of Classes:  2 
## 
## Levels: 
##  CH MM

Test Data Confusion Matrix

svm.predict = predict(svm.fit, newdata = oj.test)
table(true=oj.test.y, pred=svm.predict)
##     pred
## true  CH  MM
##   CH 161   7
##   MM  73  29

8h

Overall, which approach seems to give the best results on this data?