R/SHAP_funcs.R
shap.plot.dependence.Rd
This function by default makes a simple dependence plot with feature values
on the x-axis and SHAP values on the y-axis, optional to color by another
feature. It is optional to use a different variable for SHAP values on the
y-axis, and color the points by the feature value of a designated variable.
Not colored if color_feature
is not supplied. If data_int
(the
SHAP interaction values dataset) is supplied, it will plot the interaction
effect between y
and x
on the y-axis. Dependence plot is easy
to make if you have the SHAP values dataset from predict.xgb.Booster
or predict.lgb.Booster
.
It is not necessary to start with the long format data, but since that is
used for the summary plot, we just continue to use it here.
shap.plot.dependence( data_long, x, y = NULL, color_feature = NULL, data_int = NULL, dilute = FALSE, smooth = TRUE, size0 = NULL, add_hist = FALSE, add_stat_cor = FALSE, alpha = NULL, jitter_height = 0, jitter_width = 0, ... )
data_long | the long format SHAP values from |
---|---|
x | which feature to show on x-axis, it will plot the feature value |
y | which shap values to show on y-axis, it will plot the SHAP value of that feature. y is default to x, if y is not provided, just plot the SHAP values of x on the y-axis |
color_feature | which feature value to use for coloring, color by the feature value. If "auto", will select the feature "c" minimizing the variance of the shap value given x and c, which can be viewed as a heuristic for the strongest interaction. |
data_int | the 3-dimention SHAP interaction values array. if |
dilute | a number or logical, dafault to TRUE, will plot
|
smooth | optional to add a loess smooth line, default to TRUE. |
size0 | point size, default to 1 if nobs<1000, 0.4 if nobs>1000 |
add_hist | whether to add histogram using |
add_stat_cor | add correlation and p-value from |
alpha | point transparancy, default to 1 if nobs<1000 else 0.6 |
jitter_height | amount of vertical jitter (see hight in |
jitter_width | amount of horizontal jitter (see width in |
... | additional parameters passed to |
be default a ggplot2
object, based on which you could add more geom
layers.
# **SHAP dependence plot** # 1. simple dependence plot with SHAP values of x on the y axis shap.plot.dependence(data_long = shap_long_iris, x="Petal.Length", add_hist = TRUE, add_stat_cor = TRUE)#>#># 2. can choose a different SHAP values on the y axis shap.plot.dependence(data_long = shap_long_iris, x="Petal.Length", y = "Petal.Width")#># 3. color by another feature's feature values shap.plot.dependence(data_long = shap_long_iris, x="Petal.Length", color_feature = "Petal.Width")#># 4. choose 3 different variables for x, y, and color shap.plot.dependence(data_long = shap_long_iris, x="Petal.Length", y = "Petal.Width", color_feature = "Petal.Width")#># Optional to add hist or remove smooth line, optional to plot fewer data (make plot quicker) shap.plot.dependence(data_long = shap_long_iris, x="Petal.Length", y = "Petal.Width", color_feature = "Petal.Width", add_hist = TRUE, smooth = FALSE, dilute = 3)# to make a list of plot plot_list <- lapply(names(iris)[2:3], shap.plot.dependence, data_long = shap_long_iris) # **SHAP interaction effect plot ** # To get the interaction SHAP dataset for plotting, need to get `shap_int` first: mod1 = xgboost::xgboost( data = as.matrix(iris[,-5]), label = iris$Species, gamma = 0, eta = 1, lambda = 0,nrounds = 1, verbose = FALSE) # Use either: data_int <- shap.prep.interaction(xgb_mod = mod1, X_train = as.matrix(iris[,-5])) # or: shap_int <- predict(mod1, as.matrix(iris[,-5]), predinteraction = TRUE) # if data_int is supplied, y axis will plot the interaction values of y (vs. x) shap.plot.dependence(data_long = shap_long_iris, data_int = shap_int_iris, x="Petal.Length", y = "Petal.Width", color_feature = "Petal.Width")#>