ML Home

Chapter 8. Improving decision trees with random forests and boosting

8.1. Ensemble techniques: Bagging, boosting, and stacking

The idea is that predictions informed by a majority vote will have less variance than predictions made by a lone model.

There are three different ensemble methods:

Bootstrap aggregating Boosting Stacking

8.1.1. Training models on sampled data: Bootstrap aggregating

The premise of bagging is quite simple:

Decide how many sub-models you’re going to train. For each sub-model, randomly sample cases from the training set, with replacement, until you have a sample the same size as the original training set. Train a sub-model on each sample of cases. Pass new data through each sub-model, and let them vote on the prediction. The modal prediction (the most frequent prediction) from all the sub-models is used as the predicted output.

“The most critical part of bagging is the random sampling of the cases.”

“Bagging (and, as you’ll learn, boosting and stacking) is a technique that can be applied to any supervised machine learning algorithm. Having said this, it works best on algorithms that tend to create low-bias, high-variance models, such as decision trees. In fact, there is a famous and very popular implementation of bagging for decision trees called random forest.”

8.1.2. Learning from the previous models’ mistakes: Boosting

“boosting is an ensemble technique that, again, trains many individual models, but builds them sequentially. Each additional model seeks to correct the mistakes of the previous ensemble of models.”

“boosting is most beneficial when using weak learners as the sub-models.”

“The function of boosting is to combine many weak learners together to form one strong ensemble learner. The reason we use weak learners is that there is no improvement in model performance when boosting with strong learners versus weak learners. So why waste computational resources training hundreds of strong, probably more complex learners, when we can get the same performance by training weak, less complex ones?”

“There are two methods of boosting, which differ in the way they correct the mistakes of the previous set of models:

Adaptive boosting Gradient boosting

Weighting incorrectly predicted cases: Adaptive boosting

model weight =

Learning from the previous models’ residuals: Gradient boosting

Calculating log loss

\[log loss = \]

XGBoost wrapped by the mlr package

8.1.3. Learning from predictions made by other models: Stacking

“Stacking explicitly uses different algorithms to learn the sub-models”

“For example, we may choose to use the kNN algorithm (from chapter 3), logistic regression algorithm (from chapter 4), and the SVM algorithm (from chapter 6) to build three independent base models.”

8.2. Building a random forest Model

Listing 8.1. Tuning the random forest hyperparameters

forestParamSpace <- makeParamSet( 1 makeIntegerParam(“ntree”, lower = 300, upper = 300), makeIntegerParam(“mtry”, lower = 6, upper = 12), makeIntegerParam(“nodesize”, lower = 1, upper = 5), makeIntegerParam(“maxnodes”, lower = 5, upper = 20))

randSearch <- makeTuneControlRandom(maxit = 100) 2

cvForTuning <- makeResampleDesc(“CV”, iters = 5) 3

parallelStartSocket(cpus = detectCores())

tunedForestPars <- tuneParams(forest, task = zooTask, 4 resampling = cvForTuning, 4 par.set = forestParamSpace, 4 control = randSearch) 4

parallelStop()

tunedForestPars

tunedForest <- setHyperPars(forest, par.vals = tunedForestPars$x)

tunedForestModel <- train(tunedForest, zooTask)

Listing 8.2. Plotting the out-of-bag error

forestModelData <- getLearnerModel(tunedForestModel)

species <- colnames(forestModelData$err.rate)

plot(forestModelData, col = 1:length(species), lty = 1:length(species))

legend(“topright”, species, col = 1:length(species), lty = 1:length(species))”

Listing 8.3. Cross-validating the model-building process

outer <- makeResampleDesc(“CV”, iters = 5)

forestWrapper <- makeTuneWrapper(“classif.randomForest”, resampling = cvForTuning, par.set = forestParamSpace, control = randSearch)

parallelStartSocket(cpus = detectCores())

cvWithTuning <- resample(forestWrapper, zooTask, resampling = outer)

parallelStop()

cvWithTuning

8.3. Building an XGBoost model

Listing 8.4. Converting factors into numerics

zooXgb <- mutate_at(zooTib, .vars = vars(-type), .funs = as.numeric)

xgbTask <- makeClassifTask(data = zooXgb, target = “type”)

Listing 8.5. Tuning XGBoost hyperparameters

xgbParamSpace <- makeParamSet( makeNumericParam(“eta”, lower = 0, upper = 1), makeNumericParam(“gamma”, lower = 0, upper = 5), makeIntegerParam(“max_depth”, lower = 1, upper = 5), makeNumericParam(“min_child_weight”, lower = 1, upper = 10), makeNumericParam(“subsample”, lower = 0.5, upper = 1), makeNumericParam(“colsample_bytree”, lower = 0.5, upper = 1), makeIntegerParam(“nrounds”, lower = 20, upper = 20), makeDiscreteParam(“eval_metric”, values = c(“merror”, “mlogloss”)))

randSearch <- makeTuneControlRandom(maxit = 1000)

cvForTuning <- makeResampleDesc(“CV”, iters = 5)

tunedXgbPars <- tuneParams(xgb, task = xgbTask, resampling = cvForTuning, par.set = xgbParamSpace, control = randSearch)

tunedXgbPars

Listing 8.6. Training the final tuned model

tunedXgb <- setHyperPars(xgb, par.vals = tunedXgbPars$x)

tunedXgbModel <- train(tunedXgb, xgbTask)

Listing 8.7. Plotting iteration number against log loss

xgbModelData <- getLearnerModel(tunedXgbModel)

ggplot(xgbModelData$evaluation_log, aes(iter, train_mlogloss)) + geom_line() + geom_point()

Listing 8.8. Plotting individual decision trees

install.packages(“DiagrammeR”) xgboost::xgb.plot.tree(model = xgbModelData, trees = 1:5)


“This takes nearly 15 minutes on my four-core machine!”

Listing 8.9. Plotting individual decision trees

outer <- makeResampleDesc(“CV”, iters = 3)

xgbWrapper <- makeTuneWrapper(“classif.xgboost”, resampling = cvForTuning, par.set = xgbParamSpace, control = randSearch)

cvWithTuning <- resample(xgbWrapper, xgbTask, resampling = outer)

cvWithTuning