Although ‘SHAPforxgboost’ is not a package too complicated, it took me some time to get the package pass all the cran check. Now (Aug.03,2019) it is available on cran. Install by either

install.packages("SHAPforxgboost")

or

devtools::install_github("liuyanguu/SHAPforxgboost")

Use the ‘usethis’ package https://usethis.r-lib.org/ to set up the structure of the package.

Some of my own experience

On description

Single quote packages in both Title and Description fields. For example: “This package uses the SHAP values output from ‘xgboost’”.

  • It is unnecessary to add any Depends in the DESCRIPTION besides R (>= 3.3.0).

  • The Imports part in the DESCRIPTION won’t impact what you import into NAMESPACE in the code (although they are both named import). The NAMESPACE is defined by using @import to import packages and functions in the R code. What you import into NAMESPACE are what really matters. But you should also keep the record aligned in the DESCRIPTION as cran will check it.

How the DESCRIPTION looks like on cran:

Namespace

load vs attach

  • If you have @import ggplot2 anywhere in the R code, the “ggplot2” package will be loaded (but not attached) and your functions can use ggplot2. Your package knows to search in the namespace of ggplot2, but you cannot run functions from ggplot2 without typing library(ggplot2). This is the difference between attach and load as discussed in Hadley’s “R” package book: loading the package won’t put its functions in the search path, but attaching will.

    • I.e., when others load and attach your package, they can use your functions depending on ggplot2 but they cannot use the functions from the ggplot2 package unless they load and attach ggplot2 by library(ggplot2).

    • If you put ggplot2 in Depends in the DESCRIPTION, library(yourPackage) would load and attach ggplot2 — same as doing library(ggplot2) — generally speaking you don’t need to do so.

  • If you name a function like plot.shap.summary it would be documented automatically as a S3 method of the plot generic function. So I changed the name to shap.plot.summary. I think in the future I shall not use dot in function names.

Documentation

  • @example R/example/sample1.R will attach sample1.R in the folder “R/example/” to the documentation of the function. If you write out the samples directly in the function code, you use @examples instead of @example.

  • All the function parameters should be documented using @param.

Potential problems when checking the package

  • You cannot have things like a line break "\n" in the documentation. It will give a warning of “unknown macro”, and will actually cause error when you try to download and build the package from github.

  • When checking the package, there will be notes saying that “no visible binding for global variable”. To remove such notes, adding anywhere in the script:

if(getRversion() >= "2.15.1")  {
  utils::globalVariables(c(".", "rfvalue", "value", "values complained")) # all the variables complained
  }
  • All the code scripts go into the R/ folder. I think it seems OK to leave undocumented functions in a separate script as long as they are internally called.

Some nice suggestions from the CRAN team when submitting the package

  • In the description, write package names, software names and API names in single quotes (e.g. ‘Python’). The Title field should be in title case. The description should not start with package name or “This package”.

  • Please ensure that you do not use more than 2 cores in your examples. Is there any reason why the number of core to use is not an argument of e.g. xgboost.fit()?

  • Please replace cat() by message() or warning() in your functions (except for print() and summary() functions). Messages and warnings can be suppressed if needed.