shap.values
returns a list of three objects from XGBoost or LightGBM model: 1.
a dataset (data.table) of SHAP scores. It has the same dimension as the
X_train); 2. the ranked variable vector by each variable's mean absolute SHAP
value, it ranks the predictors by their importance in the model; and 3. The
BIAS, which is like an intercept. The rowsum of SHAP values including the
BIAS would equal to the predicted value (y_hat).
shap.values(xgb_model, X_train)
xgb_model | an XGBoost or LightGBM model object |
---|---|
X_train | the dataset of predictors (independent variables) used for calculating SHAP values, it should be a matrix |
a list of three elements: the SHAP values as data.table, ranked mean|SHAP|, and BIAS
data("iris") X1 = as.matrix(iris[,-5]) mod1 = xgboost::xgboost( data = X1, label = iris$Species, gamma = 0, eta = 1, lambda = 0, nrounds = 1, verbose = FALSE) # shap.values(model, X_dataset) returns the SHAP # data matrix and ranked features by mean|SHAP| shap_values <- shap.values(xgb_model = mod1, X_train = X1) shap_values$mean_shap_score#> Petal.Length Petal.Width Sepal.Length Sepal.Width #> 0.62935975 0.21664035 0.02910357 0.00000000shap_values_iris <- shap_values$shap_score # shap.prep() returns the long-format SHAP data from either model or shap_long_iris <- shap.prep(xgb_model = mod1, X_train = X1) # is the same as: using given shap_contrib shap_long_iris <- shap.prep(shap_contrib = shap_values_iris, X_train = X1) # **SHAP summary plot** shap.plot.summary(shap_long_iris, scientific = TRUE)# Alternatives options to make the same plot: # option 1: from the xgboost model shap.plot.summary.wrap1(mod1, X = as.matrix(iris[,-5]), top_n = 3)# option 2: supply a self-made SHAP values dataset # (e.g. sometimes as output from cross-validation) shap.plot.summary.wrap2(shap_score = shap_values_iris, X = X1, top_n = 3)