ISLR Home

# Question

p371

This problem involves the OJ data set which is part of the ISLR package. Question 9 was the OJ problem in Chapter 8.

1. Create a training set containing a random sample of 800 observations, and a test set containing the remaining observations.

2. Fit a support vector classifier to the training data using cost=0.01, with Purchase as the response and the other variables as predictors. Use the summary() function to produce summary statistics, and describe the results obtained.

3. What are the training and test error rates?

4. Use the tune() function to select an optimal cost. Consider val- ues in the range 0.01 to 10.

5. Compute the training and test error rates using this new value for cost.

6. Repeat parts (b) through (e) using a support vector machine with a radial kernel. Use the default value for gamma.

7. Repeat parts (b) through (e) using a support vector machine with a polynomial kernel. Set degree=2.

8. Overall, which approach seems to give the best results on this data?

# 8a

Create a training set containing a random sample of 800 observations, and a test set containing the remaining observations. (see Q8.9)

# From Q8.9
dim(OJ)
## [1] 1070   18
set.seed(1)
train = sample(1:nrow(OJ), 800)

# Don't actually use these ???
oj.train = OJ[train,]
oj.test = OJ[-train,]
oj.train.y = OJ[train,"Purchase"]
oj.test.y = OJ[-train,"Purchase"]

# 8b

Fit a support vector classifier to the training data using cost=0.01, with Purchase as the response and the other variables as predictors.

• Use the summary() function to produce summary statistics,
• describe the results obtained.

Parameters: Cost = 0.01

svm.fit = svm(Purchase ~ ., data = oj.train, kernel = "linear", cost = c(0.01))
summary(svm.fit)
##
## Call:
## svm(formula = Purchase ~ ., data = oj.train, kernel = "linear", cost = c(0.01))
##
##
## Parameters:
##    SVM-Type:  C-classification
##  SVM-Kernel:  linear
##        cost:  0.01
##
## Number of Support Vectors:  435
##
##  ( 219 216 )
##
##
## Number of Classes:  2
##
## Levels:
##  CH MM

# 8c

What are the training and test error rates?

## Training Data Confusion Matrix

svm.predict = predict(svm.fit, newdata = oj.train)
result = table(true=oj.train.y, pred=svm.predict)
result
##     pred
## true  CH  MM
##   CH 420  65
##   MM  75 240

## Training Data Accuracy

(result[1] + result[4]) / sum(result)
## [1] 0.825

## Test Data Confusion Matrix

svm.predict = predict(svm.fit, newdata = oj.test)
result = table(true=oj.test.y, pred=svm.predict)
result
##     pred
## true  CH  MM
##   CH 153  15
##   MM  33  69

## Test Data Accuracy

(result[1] + result[4]) / sum(result)
## [1] 0.8222222

# 8d Cross-Validation

Use the tune() function to select an optimal cost. Consider values in the range 0.01 to 10

tune.out = tune(svm, as.factor(Purchase) ~ .,
data = oj.train,
kernel = "linear",
ranges = list(cost = c(0.01, 0.1, 1, 5, 10)))
summary(tune.out)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
##  cost
##    10
##
## - best performance: 0.17125
##
## - Detailed performance results:
##    cost   error dispersion
## 1  0.01 0.17375 0.03884174
## 2  0.10 0.17875 0.03064696
## 3  1.00 0.17500 0.03061862
## 4  5.00 0.17250 0.03322900
## 5 10.00 0.17125 0.03488573

## Best Model

best.model <- tune.out$best.model best.model ## ## Call: ## best.tune(method = svm, train.x = as.factor(Purchase) ~ ., data = oj.train, ## ranges = list(cost = c(0.01, 0.1, 1, 5, 10)), kernel = "linear") ## ## ## Parameters: ## SVM-Type: C-classification ## SVM-Kernel: linear ## cost: 10 ## ## Number of Support Vectors: 326 best.model$cost
## [1] 10

# 8e

Compute the training and test error rates using this new value for cost

Parameters: Cost = 0.1

#svm.fit = svm(Purchase ~ ., data = oj.train, kernel = "linear", cost = best_model)
#summary(svm.fit)
best.model.predict = predict(best.model, newdata=oj.train)
result = table(true=oj.train.y, pred=best.model.predict)
result
##     pred
## true  CH  MM
##   CH 423  62
##   MM  69 246

## Best Model Training Data Accuracy

(result[1] + result[4]) / sum(result)
## [1] 0.83625

Training Data Confusion Matrix

#svm.predict = predict(svm.fit, newdata = oj.train)
#table(true=oj.train.y, pred=svm.predict)
#(422+246)/800
#best.model.predict = predict(best.model, newdata=oj.test)
#result = table(true=oj.test.y, pred=best.model.predict)
#result

## Best Model Test Data Confusion Matrix

best.model.predict = predict(best.model, newdata=oj.test)
result = table(true=oj.test.y, pred=best.model.predict)
result
##     pred
## true  CH  MM
##   CH 156  12
##   MM  28  74

## Accuracy

(result[1] + result[4]) / sum(result)
## [1] 0.8518519
#svm.predict = predict(svm.fit, newdata = oj.test)
#table(true=oj.test.y, pred=svm.predict)
#(155+71)/270

# 8f

Repeat parts (b) through (e) using a support vector machine with a radial kernel. Use the default value for gamma

## 8f-b

Parameters: Cost = 0.01

svm.fit.radial = svm(Purchase ~ ., data = oj.train, kernel = "radial", cost = c(0.01))
#summary(svm.fit.radial)

## 8f-c

### Training Data Confusion Matrix

svm.predict = predict(svm.fit.radial, newdata = oj.train)
table(true=oj.train.y, pred=svm.predict)
##     pred
## true  CH  MM
##   CH 485   0
##   MM 315   0

## 8f-d

tune.out = tune(svm, as.factor(Purchase) ~ .,
data = oj.train,
ranges = list(cost = c(0.01, 0.1, 1, 5, 10)))
summary(tune.out)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
##  cost
##     1
##
## - best performance: 0.17625
##
## - Detailed performance results:
##    cost   error dispersion
## 1  0.01 0.39375 0.06568284
## 2  0.10 0.18250 0.05470883
## 3  1.00 0.17625 0.03793727
## 4  5.00 0.18125 0.04299952
## 5 10.00 0.18125 0.04340139

## 8f-e

Parameters: Cost = 0.1

svm.fit = svm(Purchase ~ ., data = oj.train, kernel = "radial", cost = c(0.1))
summary(svm.fit)
##
## Call:
## svm(formula = Purchase ~ ., data = oj.train, kernel = "radial", cost = c(0.1))
##
##
## Parameters:
##    SVM-Type:  C-classification
##        cost:  0.1
##
## Number of Support Vectors:  541
##
##  ( 272 269 )
##
##
## Number of Classes:  2
##
## Levels:
##  CH MM

### Test Data Confusion Matrix

svm.predict = predict(svm.fit, newdata = oj.test)
table(true=oj.test.y, pred=svm.predict)
##     pred
## true  CH  MM
##   CH 150  18
##   MM  37  65

# 8g

Repeat parts (b) through (e) using a support vector machine with a polynomial kernel. Set degree=2

## 8g-b

Parameters: Cost = 0.01

degree <- 2
svm.fit.radial = svm(Purchase ~ ., data = oj.train, kernel = "polynomial", cost = c(0.01), degree=degree)
#summary(svm.fit.radial)

## 8g-c

### Training Data Confusion Matrix

svm.predict = predict(svm.fit.radial, newdata = oj.train)
table(true=oj.train.y, pred=svm.predict)
##     pred
## true  CH  MM
##   CH 484   1
##   MM 297  18

## 8g-d

tune.out = tune(svm, as.factor(Purchase) ~ .,
data = oj.train,
kernel = "polynomial",
ranges = list(cost = c(0.01, 0.1, 1, 5, 10),
degree = degree))
summary(tune.out)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
##  cost degree
##     5      2
##
## - best performance: 0.18375
##
## - Detailed performance results:
##    cost degree   error dispersion
## 1  0.01      2 0.39000 0.08287373
## 2  0.10      2 0.32375 0.06730166
## 3  1.00      2 0.20000 0.05137012
## 4  5.00      2 0.18375 0.05104804
## 5 10.00      2 0.18625 0.05185785

## 8g-e

Parameters: Cost = 0.1

svm.fit = svm(Purchase ~ ., data = oj.train, kernel = "polynomial", cost = c(0.1), degree = degree)
summary(svm.fit)
##
## Call:
## svm(formula = Purchase ~ ., data = oj.train, kernel = "polynomial",
##     cost = c(0.1), degree = degree)
##
##
## Parameters:
##    SVM-Type:  C-classification
##  SVM-Kernel:  polynomial
##        cost:  0.1
##      degree:  2
##      coef.0:  0
##
## Number of Support Vectors:  589
##
##  ( 298 291 )
##
##
## Number of Classes:  2
##
## Levels:
##  CH MM

### Test Data Confusion Matrix

svm.predict = predict(svm.fit, newdata = oj.test)
table(true=oj.test.y, pred=svm.predict)
##     pred
## true  CH  MM
##   CH 161   7
##   MM  73  29

# 8h

Overall, which approach seems to give the best results on this data?