This function by default makes a simple dependence plot with feature values on the x-axis and SHAP values on the y-axis, optional to color by another feature. It is optional to use a different variable for SHAP values on the y-axis, and color the points by the feature value of a designated variable. Not colored if color_feature is not supplied. If data_int (the SHAP interaction values dataset) is supplied, it will plot the interaction effect between y and x on the y-axis. Dependence plot is easy to make if you have the SHAP values dataset from predict.xgb.Booster or predict.lgb.Booster. It is not necessary to start with the long format data, but since that is used for the summary plot, we just continue to use it here.

shap.plot.dependence(
  data_long,
  x,
  y = NULL,
  color_feature = NULL,
  data_int = NULL,
  dilute = FALSE,
  smooth = TRUE,
  size0 = NULL,
  add_hist = FALSE,
  add_stat_cor = FALSE,
  alpha = NULL,
  jitter_height = 0,
  jitter_width = 0,
  ...
)

Arguments

data_long

the long format SHAP values from shap.prep

x

which feature to show on x-axis, it will plot the feature value

y

which shap values to show on y-axis, it will plot the SHAP value of that feature. y is default to x, if y is not provided, just plot the SHAP values of x on the y-axis

color_feature

which feature value to use for coloring, color by the feature value. If "auto", will select the feature "c" minimizing the variance of the shap value given x and c, which can be viewed as a heuristic for the strongest interaction.

data_int

the 3-dimention SHAP interaction values array. if data_int is supplied, y-axis will plot the interaction values of y (vs. x). data_int is obtained from either predict.xgb.Booster or shap.prep.interaction

dilute

a number or logical, dafault to TRUE, will plot nrow(data_long)/dilute data. For example, if dilute = 5 will plot 20% of the data. As long as dilute != FALSE, will plot at most half the data

smooth

optional to add a loess smooth line, default to TRUE.

size0

point size, default to 1 if nobs<1000, 0.4 if nobs>1000

add_hist

whether to add histogram using ggMarginal, default to TRUE. But notice the plot after adding histogram is a ggExtraPlot object instead of ggplot2 so cannot add geom to that anymore. Turn the histogram off if you wish to add more ggplot2 geoms

add_stat_cor

add correlation and p-value from ggpubr::stat_cor

alpha

point transparancy, default to 1 if nobs<1000 else 0.6

jitter_height

amount of vertical jitter (see hight in geom_jitter)

jitter_width

amount of horizontal jitter (see width in geom_jitter). Use values close to 0, e.g. 0.02

...

additional parameters passed to geom_jitter

Value

be default a ggplot2 object, based on which you could add more geom layers.

Examples

# **SHAP dependence plot** # 1. simple dependence plot with SHAP values of x on the y axis shap.plot.dependence(data_long = shap_long_iris, x="Petal.Length", add_hist = TRUE, add_stat_cor = TRUE)
#> `geom_smooth()` using formula 'y ~ x'
#> `geom_smooth()` using formula 'y ~ x'
# 2. can choose a different SHAP values on the y axis shap.plot.dependence(data_long = shap_long_iris, x="Petal.Length", y = "Petal.Width")
#> `geom_smooth()` using formula 'y ~ x'
# 3. color by another feature's feature values shap.plot.dependence(data_long = shap_long_iris, x="Petal.Length", color_feature = "Petal.Width")
#> `geom_smooth()` using formula 'y ~ x'
# 4. choose 3 different variables for x, y, and color shap.plot.dependence(data_long = shap_long_iris, x="Petal.Length", y = "Petal.Width", color_feature = "Petal.Width")
#> `geom_smooth()` using formula 'y ~ x'
# Optional to add hist or remove smooth line, optional to plot fewer data (make plot quicker) shap.plot.dependence(data_long = shap_long_iris, x="Petal.Length", y = "Petal.Width", color_feature = "Petal.Width", add_hist = TRUE, smooth = FALSE, dilute = 3)
# to make a list of plot plot_list <- lapply(names(iris)[2:3], shap.plot.dependence, data_long = shap_long_iris) # **SHAP interaction effect plot ** # To get the interaction SHAP dataset for plotting, need to get `shap_int` first: mod1 = xgboost::xgboost( data = as.matrix(iris[,-5]), label = iris$Species, gamma = 0, eta = 1, lambda = 0,nrounds = 1, verbose = FALSE) # Use either: data_int <- shap.prep.interaction(xgb_mod = mod1, X_train = as.matrix(iris[,-5])) # or: shap_int <- predict(mod1, as.matrix(iris[,-5]), predinteraction = TRUE) # if data_int is supplied, y axis will plot the interaction values of y (vs. x) shap.plot.dependence(data_long = shap_long_iris, data_int = shap_int_iris, x="Petal.Length", y = "Petal.Width", color_feature = "Petal.Width")
#> `geom_smooth()` using formula 'y ~ x'