Make force plot for top_n features, optional to randomly plot certain portion of the data in case the dataset is large.

shap.prep.stack.data(
  shap_contrib,
  top_n = NULL,
  data_percent = 1,
  cluster_method = "ward.D",
  n_groups = 10L
)

Arguments

shap_contrib

shap_contrib is the SHAP value data returned from predict, here an ID variable is added for each observation in the shap_contrib dataset for better tracking, it is created in the begining as 1:nrow(shap_contrib). The ID matches the output from shap.prep

top_n

integer, optional to show only top_n features, combine the rest

data_percent

what percent of data to plot (to speed up the testing plot). The accepted input range is (0,1], if observations left is too few, there will be an error from the clustering function

cluster_method

default to ward.D, please refer to stats::hclust for details

n_groups

a integer, how many groups to plot in shap.plot.force_plot_bygroup

Value

a dataset for stack plot

Examples

# **SHAP force plot** plot_data <- shap.prep.stack.data(shap_contrib = shap_values_iris, n_groups = 4)
#> All the features will be used.
#> Data has N = 150 | zoom in length is 50 at location 90.
shap.plot.force_plot(plot_data, zoom_in_group = 2)
#> Data has N = 150 | zoom in at cluster 2 with N = 28.
# plot all the clusters: shap.plot.force_plot_bygroup(plot_data)