p369
We have seen that we can fit an SVM with a non-linear kernel in order to perform classification using a non-linear decision boundary. We will now see that we can also obtain a non-linear decision boundary by performing logistic regression using non-linear transformations of the features.
Generate a data set with n = 500 and p = 2, such that the observations belong to two classes with a quadratic decision boundary between them. For instance, you can do this as follows:
Plot the observations, colored according to their class labels. Your plot should display X1 on the x-axis, and X2 on the y- axis.
Fit a logistic regression model to the data, using X1 and X2 as predictors.
Apply this model to the training data in order to obtain a pre- dicted class label for each training observation. Plot the ob- servations, colored according to the predicted class labels. The decision boundary should be linear.
Now fit a logistic regression model to the data using non-linear functions of X1 and X2 as predictors (e.g. X12, X1 ×X2, log(X2), and so forth).
Apply this model to the training data in order to obtain a pre- dicted class label for each training observation. Plot the ob- servations, colored according to the predicted class labels. The decision boundary should be obviously non-linear. If it is not, then repeat (a)-(e) until you come up with an example in which the predicted class labels are obviously non-linear.
Fit a support vector classifier to the data with X1 and X2 as predictors. Obtain a class prediction for each training observa- tion. Plot the observations, colored according to the predicted class labels.
Fit a SVM using a non-linear kernel to the data. Obtain a class prediction for each training observation. Plot the observations, colored according to the predicted class labels.
Comment on your results.
set.seed(421)
x1=runif(500)-0.5
x2=runif(500)-0.5
y=1*(x1^2 - x2^2 > 0) # Multiply by 1 so we don't get a boolean
#dat = data.frame(x1=x1, x2= x2, y=as.factor(y))
dat = data.frame(x1=x1, x2= x2, y=y)
plot(dat$x1, dat$x2, col=(3-y), pch=4)