Although ‘SHAPforxgboost’ is not a package too complicated, it took me some time to get the package pass all the cran check. Now (Aug.03,2019) it is available on cran. Install by either

install.packages("SHAPforxgboost")

or

devtools::install_github("liuyanguu/SHAPforxgboost")

The most recent version is 0.0.2.

I used ‘usethis’ package https://usethis.r-lib.org/ to set up the structure of the package.

Requirements from cran editors

Lots of thanks to Dr. Ligges and Dr. Herbrandt for their kind advice.

  • In the description, write package names, software names and API names in single quotes (e.g. ‘Python’). The Title field should be in title case. The description should not start with package name or “This package”.

  • Please ensure that you do not use more than 2 cores in your examples. Is there any reason why the number of core to use is not an argument of e.g. xgboost.fit()? >>> (# The function was for my own convenience, actually it is not necessary to be included in the package. I removed that function.)

  • Please replace cat() by message() or warning() in your functions (except for print() and summary() functions). Messages and warnings can be suppressed if needed.

Some of my own experience

Description

Single quote packages in both Title and Description fields. For example: “This package uses the SHAP values output from ‘xgboost’”.

Namespace:

  • It would be unnecessary to add any Depends in the DESCRIPTION besides R (>= 3.3.0).

  • The Imports part in the DESCRIPTION won’t impact what you import into NAMESPACE in the code (although they are both named import). The NAMESPACE is defined by using @import to import packages and functions in the R code. What you import into NAMESPACE are what really matters. But you should also keep the record aligned in the DESCRIPTION as cran will check it.

How the DESCRIPTION looks like on cran:

  • If you have @import ggplot2 anywhere in the R code. Your function can use ggplot2 as long as your package is loaded. Your function knows to search in the namespace of “ggplot2::”, but the ggplot2 package is not attached. Although someone loads your package, he needs library(ggplot2) in order to use ggplot2 for himself. This is the difference between attach and load as discussed in Hadley’s “R” package book.

  • By the way, if you put ggplot2 in Depends in the DESCRIPTION, library(yourPackage) would really attach ggplot2. But generally speaking, you don’t have to do that for others.

  • If you name a function like plot.shap.summary it would be documented as S3 method even if you @export plot.shap.summary (Yes, you have to export the full function name, otherwise, it would be an S3 method in the namespace.). Don’t name function this way to avoid confusion with S3method. For example, I changed the name to shap.plot.summary.

  • Each dataset that you attached using use_data need a document as well. Document it in a way similar to the function.

  • All the function parameters should be documented using @param.

  • When devtools check the program, there will be notes that “no visible binding for global variable”. This is how to please the cran:

if(getRversion() >= "2.15.1")  {
  utils::globalVariables(c(".", "rfvalue", "value", ... )) # all the variables complained
  }
  • You cannot have things like line break "\n" in the documentation. It will give a warning of “unknown macro”, and will actually cause error when you try to download and build the package from github.

Files locations:

  • All the codes go into the R/ folder

  • @example R/example/sample1.R will attach sample1.R in the folder “R/example/” to the documentation of the function. If you write out the samples directly in the function code, you use @examples instead of @example.