(updated on Oct 14)
Background
It has been a while since my last update. I have been working on lots of interesting projects since I joined Mount Sinai in August. We have a great team here and obviously I can learn a lot from everyone around me. Most of my job so far focuses on applying machine learning techniques, mainly extreme gradient boosting and the visualization of results. Parameter tuning could be challenging in XGBoost. I recently tried autoxgboost
, which is so easy to use and runs much faster than the naive grid or random search illustrated in my earlier post on XGBoost. The results are also as good as the best effort we could obtain from the time-consuming random search.
I use the same dataset to exemplify autoxgboost
To install the package, run devtools::install_github("ja-thomas/autoxgboost")
Using autoxgboost
- A paper on Bayesian Optimization
- A presentation: Introduction to Bayesian Optimization
- By default, the optimizer runs for for 160 iterations or 1 hour, results using 80 iterations are good enough
- By default,
par.set
: parameter set to tune over, isautoxgbparset
:
autoxgbparset
## Type len Def Constr Req Tunable Trafo
## eta numeric - - 0.01 to 0.2 - TRUE -
## gamma numeric - - -7 to 6 - TRUE Y
## max_depth integer - - 3 to 20 - TRUE -
## colsample_bytree numeric - - 0.5 to 1 - TRUE -
## colsample_bylevel numeric - - 0.5 to 1 - TRUE -
## lambda numeric - - -10 to 10 - TRUE Y
## alpha numeric - - -10 to 10 - TRUE Y
## subsample numeric - - 0.5 to 1 - TRUE -
- This dataset is a regression problem, for classification, use
makeClassifTask
instead ofmakeRegrTask
in themakeRegrTask
function. There are more options for different tasks
- Use all as default, input a data.frame, and that’s it…
library(autoxgboost)
reg_task <- makeRegrTask(data = data_train, target = "Share_Temporary")
set.seed(1234)
system.time(reg_auto <- autoxgboost(reg_task))
# saveRDS(reg_auto, file = "D:/SDIautoxgboost_80.rds")
New Result
## Autoxgboost tuning result
##
## Recommended parameters:
## eta: 0.118
## gamma: 0.035
## max_depth: 7
## colsample_bytree: 0.860
## colsample_bylevel: 0.671
## lambda: 7.731
## alpha: 0.236
## subsample: 0.642
## nrounds: 57
##
##
## Preprocessing pipeline:
## dropconst(rel.tol = 1e-08, abs.tol = 1e-08, ignore.na = FALSE)
##
## With tuning result: mse = 0.044
- Testing mse: 0.047 (rmse is: 0.2168) is quite close to the 0.043 from previous post. But it is much faster using only 57 rounds. Notice that the cross-valiation tuning mse is almost the same: 0.044.
## [09:24:04] WARNING: amalgamation/../src/objective/regression_obj.cu:152: reg:linear is now deprecated in favor of reg:squarederror.
## [1] 0.022044
Compared to old Result
- Parameters
## nrounds max_depth eta gamma colsample_bytree min_child_weight
## 1 228 8 0.034 0 0.7208 7
## subsample
## 1 0.7017
- rmse
# 0.0433
Tuning over Different Boosters
autoxgboost
also allows us to tune over the three types of boosters: gbtree, gblinear and dart- The paraset
autoxgbparset.mixed
was predifined by author, but it seems I still need to load it
- Here is the question I consulted on github
reg_task <- makeRegrTask(data = data_train, target = "Share_Temporary")
autoxgbparset.mixed = makeParamSet(
makeDiscreteParam("booster", values = c("gbtree", "gblinear", "dart")),
makeDiscreteParam("sample_type", values = c("uniform", "weighted"), requires = quote(booster == "dart")),
makeDiscreteParam("normalize_type", values = c("tree", "forest"), requires = quote(booster == "dart")),
makeNumericParam("rate_drop", lower = 0, upper = 1, requires = quote(booster == "dart")),
makeNumericParam("skip_drop", lower = 0, upper = 1, requires = quote(booster == "dart")),
makeLogicalParam("one_drop", requires = quote(booster == "dart")),
makeDiscreteParam("grow_policy", values = c("depthwise", "lossguide")),
makeIntegerParam("max_leaves", lower = 0, upper = 8, trafo = function(x) 2^x, requires = quote(grow_policy == "lossguide")),
makeIntegerParam("max_bin", lower = 2L, upper = 9, trafo = function(x) 2^x),
makeNumericParam("eta", lower = 0.01, upper = 0.2),
makeNumericParam("gamma", lower = -7, upper = 6, trafo = function(x) 2^x),
makeIntegerParam("max_depth", lower = 3, upper = 20),
makeNumericParam("colsample_bytree", lower = 0.5, upper = 1),
makeNumericParam("colsample_bylevel", lower = 0.5, upper = 1),
makeNumericParam("lambda", lower = -10, upper = 10, trafo = function(x) 2^x),
makeNumericParam("alpha", lower = -10, upper = 10, trafo = function(x) 2^x),
makeNumericParam("subsample", lower = 0.5, upper = 1)
)
system.time(reg_auto_dart <- autoxgboost(reg_task, par.set = autoxgbparset.mixed))
- Interestingly enough, a model with dart booster has been chosen, but the results are pretty much the same. Tuning mse = 0.044, and testing mse = 0.048. It means a result around 0.044 is about the best we can achieve through
xgboost
.
## Autoxgboost tuning result
##
## Recommended parameters:
## booster: dart
## sample_type: weighted
## normalize_type: forest
## rate_drop: 0.635
## skip_drop: 0.765
## one_drop: TRUE
## grow_policy: lossguide
## max_leaves: 1
## max_bin: 16
## eta: 0.051
## gamma: 0.086
## max_depth: 10
## colsample_bytree: 0.838
## colsample_bylevel: 0.544
## lambda: 0.009
## alpha: 0.003
## subsample: 0.726
## nrounds: 54
##
##
## Preprocessing pipeline:
## dropconst(rel.tol = 1e-08, abs.tol = 1e-08, ignore.na = FALSE)
##
## With tuning result: mse = 0.044
## [09:24:04] WARNING: amalgamation/../src/objective/regression_obj.cu:152: reg:linear is now deprecated in favor of reg:squarederror.
## testing mse:
## [1] 0.0203543
Return the recommended/chosen parameters
- By the way, since
autoxgboost
was built onmlr
package, it might appear difficult to further engineer the output, for example, extract parameters for further use. To extract the the tuned parameters usinggetHyperPars
from themlr
package:
Param_chosen <- mlr::getHyperPars(reg_auto_dart$final.learner)
print(unlist(Param_chosen))
## nrounds verbose objective
## "54" "0" "reg:linear"
## booster sample_type normalize_type
## "dart" "weighted" "forest"
## rate_drop skip_drop one_drop
## "0.634519323528358" "0.765370830696091" "TRUE"
## grow_policy max_leaves max_bin
## "lossguide" "1" "16"
## eta gamma max_depth
## "0.0507358974376265" "0.0863826961866801" "10"
## colsample_bytree colsample_bylevel lambda
## "0.837704133330852" "0.543820898304747" "0.0090610428548379"
## alpha subsample
## "0.00319982004364013" "0.726191523257774"