(updated on Oct 14)

Background

It has been a while since my last update. I have been working on lots of interesting projects since I joined Mount Sinai in August. We have a great team here and obviously I can learn a lot from everyone around me. Most of my job so far focuses on applying machine learning techniques, mainly extreme gradient boosting and the visualization of results. Parameter tuning could be challenging in XGBoost. I recently tried autoxgboost, which is so easy to use and runs much faster than the naive grid or random search illustrated in my earlier post on XGBoost. The results are also as good as the best effort we could obtain from the time-consuming random search.

I use the same dataset to exemplify autoxgboost
To install the package, run devtools::install_github("ja-thomas/autoxgboost")

Using `autoxgboost`

A paper on Bayesian Optimization
A presentation: Introduction to Bayesian Optimization
By default, the optimizer runs for for 160 iterations or 1 hour, results using 80 iterations are good enough
By default, par.set: parameter set to tune over, is autoxgbparset:

autoxgbparset

##                      Type len Def      Constr Req Tunable Trafo
## eta               numeric   -   - 0.01 to 0.2   -    TRUE     -
## gamma             numeric   -   -     -7 to 6   -    TRUE     Y
## max_depth         integer   -   -     3 to 20   -    TRUE     -
## colsample_bytree  numeric   -   -    0.5 to 1   -    TRUE     -
## colsample_bylevel numeric   -   -    0.5 to 1   -    TRUE     -
## lambda            numeric   -   -   -10 to 10   -    TRUE     Y
## alpha             numeric   -   -   -10 to 10   -    TRUE     Y
## subsample         numeric   -   -    0.5 to 1   -    TRUE     -

This dataset is a regression problem, for classification, use makeClassifTask instead of makeRegrTask in the makeRegrTask function. There are more options for different tasks
Use all as default, input a data.frame, and that’s it…

library(autoxgboost)
reg_task <- makeRegrTask(data = data_train, target = "Share_Temporary")
set.seed(1234)
system.time(reg_auto <- autoxgboost(reg_task))
# saveRDS(reg_auto, file = "D:/SDIautoxgboost_80.rds")

New Result

## Autoxgboost tuning result
## 
## Recommended parameters:
##               eta: 0.118
##             gamma: 0.035
##         max_depth: 7
##  colsample_bytree: 0.860
## colsample_bylevel: 0.671
##            lambda: 7.731
##             alpha: 0.236
##         subsample: 0.642
##           nrounds: 57
## 
## 
## Preprocessing pipeline:
## dropconst(rel.tol = 1e-08, abs.tol = 1e-08, ignore.na = FALSE)
## 
## With tuning result: mse = 0.044

Testing mse: 0.047 (rmse is: 0.2168) is quite close to the 0.043 from previous post. But it is much faster using only 57 rounds. Notice that the cross-valiation tuning mse is almost the same: 0.044.

## [09:24:04] WARNING: amalgamation/../src/objective/regression_obj.cu:152: reg:linear is now deprecated in favor of reg:squarederror.

## [1] 0.022044

Compared to old Result

Parameters

##   nrounds max_depth   eta gamma colsample_bytree min_child_weight
## 1     228         8 0.034     0           0.7208                7
##   subsample
## 1    0.7017

rmse

# 0.0433

Tuning over Different Boosters

autoxgboost also allows us to tune over the three types of boosters: gbtree, gblinear and dart
The paraset autoxgbparset.mixed was predifined by author, but it seems I still need to load it
Here is the question I consulted on github

reg_task <- makeRegrTask(data = data_train, target = "Share_Temporary")

autoxgbparset.mixed = makeParamSet(
  makeDiscreteParam("booster", values = c("gbtree", "gblinear", "dart")),
  makeDiscreteParam("sample_type", values = c("uniform", "weighted"), requires = quote(booster == "dart")),
  makeDiscreteParam("normalize_type", values = c("tree", "forest"), requires = quote(booster == "dart")),
  makeNumericParam("rate_drop", lower = 0, upper = 1, requires = quote(booster == "dart")),
  makeNumericParam("skip_drop", lower = 0, upper = 1, requires = quote(booster == "dart")),
  makeLogicalParam("one_drop", requires = quote(booster == "dart")),
  makeDiscreteParam("grow_policy", values = c("depthwise", "lossguide")),
  makeIntegerParam("max_leaves", lower = 0, upper = 8, trafo = function(x) 2^x, requires = quote(grow_policy == "lossguide")),
  makeIntegerParam("max_bin", lower = 2L, upper = 9, trafo = function(x) 2^x),
  makeNumericParam("eta", lower = 0.01, upper = 0.2),
  makeNumericParam("gamma", lower = -7, upper = 6, trafo = function(x) 2^x),
  makeIntegerParam("max_depth", lower = 3, upper = 20),
  makeNumericParam("colsample_bytree", lower = 0.5, upper = 1),
  makeNumericParam("colsample_bylevel", lower = 0.5, upper = 1),
  makeNumericParam("lambda", lower = -10, upper = 10, trafo = function(x) 2^x),
  makeNumericParam("alpha", lower = -10, upper = 10, trafo = function(x) 2^x),
  makeNumericParam("subsample", lower = 0.5, upper = 1)
)
system.time(reg_auto_dart <- autoxgboost(reg_task, par.set = autoxgbparset.mixed))

Interestingly enough, a model with dart booster has been chosen, but the results are pretty much the same. Tuning mse = 0.044, and testing mse = 0.048. It means a result around 0.044 is about the best we can achieve through xgboost.

## Autoxgboost tuning result
## 
## Recommended parameters:
##           booster: dart
##       sample_type: weighted
##    normalize_type: forest
##         rate_drop: 0.635
##         skip_drop: 0.765
##          one_drop: TRUE
##       grow_policy: lossguide
##        max_leaves: 1
##           max_bin: 16
##               eta: 0.051
##             gamma: 0.086
##         max_depth: 10
##  colsample_bytree: 0.838
## colsample_bylevel: 0.544
##            lambda: 0.009
##             alpha: 0.003
##         subsample: 0.726
##           nrounds: 54
## 
## 
## Preprocessing pipeline:
## dropconst(rel.tol = 1e-08, abs.tol = 1e-08, ignore.na = FALSE)
## 
## With tuning result: mse = 0.044

## [09:24:04] WARNING: amalgamation/../src/objective/regression_obj.cu:152: reg:linear is now deprecated in favor of reg:squarederror.

## testing mse:

## [1] 0.0203543

Return the recommended/chosen parameters

By the way, since autoxgboost was built on mlr package, it might appear difficult to further engineer the output, for example, extract parameters for further use. To extract the the tuned parameters using getHyperPars from the mlr package:

Param_chosen <- mlr::getHyperPars(reg_auto_dart$final.learner)
print(unlist(Param_chosen))

##               nrounds               verbose             objective 
##                  "54"                   "0"          "reg:linear" 
##               booster           sample_type        normalize_type 
##                "dart"            "weighted"              "forest" 
##             rate_drop             skip_drop              one_drop 
##   "0.634519323528358"   "0.765370830696091"                "TRUE" 
##           grow_policy            max_leaves               max_bin 
##           "lossguide"                   "1"                  "16" 
##                   eta                 gamma             max_depth 
##  "0.0507358974376265"  "0.0863826961866801"                  "10" 
##      colsample_bytree     colsample_bylevel                lambda 
##   "0.837704133330852"   "0.543820898304747"  "0.0090610428548379" 
##                 alpha             subsample 
## "0.00319982004364013"   "0.726191523257774"

autoxgboost: Automatic XGBoost using Bayesian Optimization

Yang Liu

Background

Using `autoxgboost`

New Result

Compared to old Result

Tuning over Different Boosters

Return the recommended/chosen parameters

About the Author

autoxgboost: Automatic XGBoost using Bayesian Optimization

Yang Liu

Background

Using autoxgboost

New Result

Compared to old Result

Tuning over Different Boosters

Return the recommended/chosen parameters

About the Author

Using `autoxgboost`