SHAP dependence plot and interaction plot, optional to be colored by a selected feature

This function by default makes a simple dependence plot with feature values on the x-axis and SHAP values on the y-axis, optional to color by another feature. It is optional to use a different variable for SHAP values on the y-axis, and color the points by the feature value of a designated variable. Not colored if color_feature is not supplied. If data_int (the SHAP interaction values dataset) is supplied, it will plot the interaction effect between y and x on the y-axis. Dependence plot is easy to make if you have the SHAP values dataset from predict.xgb.Booster or predict.lgb.Booster. It is not necessary to start with the long format data, but since that is used for the summary plot, we just continue to use it here.

shap.plot.dependence(
  data_long,
  x,
  y = NULL,
  color_feature = NULL,
  data_int = NULL,
  dilute = FALSE,
  smooth = TRUE,
  size0 = NULL,
  add_hist = FALSE,
  add_stat_cor = FALSE,
  alpha = NULL,
  jitter_height = 0,
  jitter_width = 0,
  ...
)

Arguments

data_long	the long format SHAP values from `shap.prep`
x	which feature to show on x-axis, it will plot the feature value
y	which shap values to show on y-axis, it will plot the SHAP value of that feature. y is default to x, if y is not provided, just plot the SHAP values of x on the y-axis
color_feature	which feature value to use for coloring, color by the feature value. If "auto", will select the feature "c" minimizing the variance of the shap value given x and c, which can be viewed as a heuristic for the strongest interaction.
data_int	the 3-dimention SHAP interaction values array. if `data_int` is supplied, y-axis will plot the interaction values of y (vs. x). `data_int` is obtained from either `predict.xgb.Booster` or `shap.prep.interaction`
dilute	a number or logical, dafault to TRUE, will plot `nrow(data_long)/dilute` data. For example, if dilute = 5 will plot 20% of the data. As long as dilute != FALSE, will plot at most half the data
smooth	optional to add a loess smooth line, default to TRUE.
size0	point size, default to 1 if nobs<1000, 0.4 if nobs>1000
add_hist	whether to add histogram using `ggMarginal`, default to TRUE. But notice the plot after adding histogram is a `ggExtraPlot` object instead of `ggplot2` so cannot add `geom` to that anymore. Turn the histogram off if you wish to add more `ggplot2` geoms
add_stat_cor	add correlation and p-value from `ggpubr::stat_cor`
alpha	point transparancy, default to 1 if nobs<1000 else 0.6
jitter_height	amount of vertical jitter (see hight in `geom_jitter`)
jitter_width	amount of horizontal jitter (see width in `geom_jitter`). Use values close to 0, e.g. 0.02
...	additional parameters passed to `geom_jitter`

Value

be default a ggplot2 object, based on which you could add more geom layers.

Examples

# **SHAP dependence plot**

# 1. simple dependence plot with SHAP values of x on the y axis
shap.plot.dependence(data_long = shap_long_iris, x="Petal.Length",
                     add_hist = TRUE, add_stat_cor = TRUE)
#> `geom_smooth()` using formula 'y ~ x'
#> `geom_smooth()` using formula 'y ~ x'

# 2. can choose a different SHAP values on the y axis
shap.plot.dependence(data_long = shap_long_iris, x="Petal.Length",
                           y = "Petal.Width")
#> `geom_smooth()` using formula 'y ~ x'

# 3. color by another feature's feature values
shap.plot.dependence(data_long = shap_long_iris, x="Petal.Length",
                           color_feature = "Petal.Width")
#> `geom_smooth()` using formula 'y ~ x'

# 4. choose 3 different variables for x, y, and color
shap.plot.dependence(data_long = shap_long_iris, x="Petal.Length",
                           y = "Petal.Width", color_feature = "Petal.Width")
#> `geom_smooth()` using formula 'y ~ x'

# Optional to add hist or remove smooth line, optional to plot fewer data (make plot quicker)
shap.plot.dependence(data_long = shap_long_iris, x="Petal.Length",
                     y = "Petal.Width", color_feature = "Petal.Width",
                     add_hist = TRUE, smooth = FALSE, dilute = 3)

# to make a list of plot
plot_list <- lapply(names(iris)[2:3], shap.plot.dependence, data_long = shap_long_iris)

# **SHAP interaction effect plot **

# To get the interaction SHAP dataset for plotting, need to get `shap_int` first:
mod1 = xgboost::xgboost(
  data = as.matrix(iris[,-5]), label = iris$Species,
  gamma = 0, eta = 1, lambda = 0,nrounds = 1, verbose = FALSE)
# Use either:
data_int <- shap.prep.interaction(xgb_mod = mod1,
                                  X_train = as.matrix(iris[,-5]))
# or:
shap_int <- predict(mod1, as.matrix(iris[,-5]),
                    predinteraction = TRUE)

# if data_int is supplied, y axis will plot the interaction values of y (vs. x)
shap.plot.dependence(data_long = shap_long_iris,
                           data_int = shap_int_iris,
                           x="Petal.Length",
                           y = "Petal.Width",
                           color_feature = "Petal.Width")
#> `geom_smooth()` using formula 'y ~ x'