ISLR Home

Question

p371

In this problem, you will use support vector approaches in order to predict whether a given car gets high or low gas mileage based on the Auto data set.

  1. Create a binary variable that takes on a 1 for cars with gas mileage above the median, and a 0 for cars with gas mileage below the median.

  2. Fit a support vector classifier to the data with various values of cost, in order to predict whether a car gets high or low gas mileage. Report the cross-validation errors associated with dif- ferent values of this parameter. Comment on your results.

  3. Now repeat (b), this time using SVMs with radial and polyno- mial basis kernels, with different values of gamma and degree and cost. Comment on your results.

  4. Make some plots to back up your assertions in (b) and (c).

Hint: In the lab, we used the plot() function for svm objects only in cases with p = 2. When p > 2, you can use the plot() function to create plots displaying pairs of variables at a time. Essentially, instead of typing

> plot(svmfit , dat)

where svmfit contains your fitted model and dat is a data frame containing your data, you can type

> plot(svmfit , dat , x1∼x4)

in order to plot just the first and fourth variables. However, you must replace x1 and x4 with the correct variable names. To find out more, type ?plot.svm.


7a

mpg.median = Auto$mpg %>% median()
mpg.median
## [1] 22.75
indicator = ifelse(Auto$mpg > mpg.median, 1, 0)
#indicator
Auto$mpg_above_median = indicator

7b Linear Kernel

Hyper … Parameters: Cost

tune.out = tune(svm, as.factor(mpg_above_median) ~ weight + displacement, 
                data = Auto,
                kernel = "linear", 
                ranges = list(cost = c(0.01,0.1, 1, 5, 10, 100)))
summary(tune.out)
## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost
##   0.1
## 
## - best performance: 0.09442308 
## 
## - Detailed performance results:
##    cost      error dispersion
## 1 1e-02 0.10455128 0.04580077
## 2 1e-01 0.09442308 0.04995254
## 3 1e+00 0.09955128 0.04275859
## 4 5e+00 0.09698718 0.04657340
## 5 1e+01 0.09698718 0.04657340
## 6 1e+02 0.09698718 0.04657340

7c

Warning, if you forget as.factor() the tune() function will hang forever.

Radial

Parameters: Cost, Gamma

kernel <-  "radial"
tune.out.radial = tune(svm, as.factor(mpg_above_median) ~ weight + displacement, 
                data = Auto,
                kernel = kernel, 
                ranges = list(
                  cost = c(0.01,0.1, 1, 5, 10, 100),
                  gamma = c(0.1, 5, 10)
                  ))
summary(tune.out.radial)
## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost gamma
##   100    10
## 
## - best performance: 0.09185897 
## 
## - Detailed performance results:
##     cost gamma      error dispersion
## 1  1e-02   0.1 0.14012821 0.07105027
## 2  1e-01   0.1 0.10217949 0.06062294
## 3  1e+00   0.1 0.09698718 0.05649603
## 4  5e+00   0.1 0.09698718 0.05649603
## 5  1e+01   0.1 0.09442308 0.05679586
## 6  1e+02   0.1 0.09442308 0.05679586
## 7  1e-02   5.0 0.26301282 0.05764009
## 8  1e-01   5.0 0.09955128 0.05474595
## 9  1e+00   5.0 0.09942308 0.06200297
## 10 5e+00   5.0 0.09679487 0.05861092
## 11 1e+01   5.0 0.09935897 0.06066269
## 12 1e+02   5.0 0.10961538 0.05246238
## 13 1e-02  10.0 0.54846154 0.02934158
## 14 1e-01  10.0 0.09692308 0.06477553
## 15 1e+00  10.0 0.09429487 0.05780177
## 16 5e+00  10.0 0.09942308 0.06081337
## 17 1e+01  10.0 0.09935897 0.05814244
## 18 1e+02  10.0 0.09185897 0.07039231
svm.fit.radial = svm(as.factor(mpg_above_median) ~ weight + displacement, data = Auto, kernel = "radial", cost = 5, gamma = 0.1)
#summary(svm.fit.radial)
svm.fit = svm(as.factor(mpg_above_median) ~ weight + displacement, data = Auto, kernel = "linear", cost = c(1))
#summary(svm.fit)

Polynomial

Parameters: Cost, Degree

kernel <-  "polynomial"
tune.out.linear = tune(svm, as.factor(mpg_above_median) ~ weight + displacement, 
                data = Auto,
                kernel = kernel, 
                ranges = list(
                  cost = c(0.01,0.1, 1, 5, 10, 100),
                  degree = c(2,3,4,5)
                  ))
summary(tune.out.linear)
## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost degree
##   0.1      3
## 
## - best performance: 0.1735897 
## 
## - Detailed performance results:
##     cost degree     error dispersion
## 1  1e-02      2 0.4158333 0.09968083
## 2  1e-01      2 0.3700000 0.09335323
## 3  1e+00      2 0.3597436 0.08763800
## 4  5e+00      2 0.3367949 0.08001114
## 5  1e+01      2 0.3264103 0.09014045
## 6  1e+02      2 0.3214103 0.08359219
## 7  1e-02      3 0.2628846 0.10218075
## 8  1e-01      3 0.1735897 0.09046000
## 9  1e+00      3 0.1810256 0.07419436
## 10 5e+00      3 0.1810256 0.07419436
## 11 1e+01      3 0.1810256 0.07014551
## 12 1e+02      3 0.1962821 0.06654383
## 13 1e-02      4 0.3572436 0.10764992
## 14 1e-01      4 0.3623718 0.09225428
## 15 1e+00      4 0.3546154 0.09702163
## 16 5e+00      4 0.3164103 0.10363543
## 17 1e+01      4 0.3087821 0.10526971
## 18 1e+02      4 0.3112179 0.10714795
## 19 1e-02      5 0.2935256 0.11385118
## 20 1e-01      5 0.2118590 0.11908871
## 21 1e+00      5 0.2398718 0.10396207
## 22 5e+00      5 0.2346795 0.09157196
## 23 1e+01      5 0.2397436 0.09185748
## 24 1e+02      5 0.2474359 0.09586424
svm.fit.poly = svm(as.factor(mpg_above_median) ~ weight + displacement,
                   data = Auto, 
                   kernel = "polynomial",
                   cost = .1, degree = 3)
#summary(svm.fit.poly)

7d

Radial looks the best with the lowest error. #?plot.svm

Linear

plot(svm.fit, Auto, weight~displacement)

Radial

plot(svm.fit.radial, Auto, weight~displacement)

Poly

plot(svm.fit.poly, Auto, weight~displacement)