Although ‘SHAPforxgboost’ is not a package too complicated, it took me some time to get the package pass all the cran check. Now (Aug.03,2019) it is available on cran. Install by either
install.packages("SHAPforxgboost")
or
devtools::install_github("liuyanguu/SHAPforxgboost")
Use the ‘usethis’ package https://usethis.r-lib.org/ to set up the structure of the package.
Some of my own experience
On description
Single quote packages in both Title and Description fields. For example: “This package uses the SHAP values output from ‘xgboost’”.
It is unnecessary to add any Depends in the DESCRIPTION besides
R (>= 3.3.0)
.The Imports part in the DESCRIPTION won’t impact what you import into NAMESPACE in the code (although they are both named import). The NAMESPACE is defined by using
@import
to import packages and functions in the R code. What you import into NAMESPACE are what really matters. But you should also keep the record aligned in the DESCRIPTION as cran will check it.
How the DESCRIPTION looks like on cran:
Namespace
load vs attach
If you have
@import ggplot2
anywhere in the R code, the “ggplot2
” package will be loaded (but not attached) and your functions can useggplot2
. Your package knows to search in the namespace ofggplot2
, but you cannot run functions fromggplot2
without typinglibrary(ggplot2)
. This is the difference between attach and load as discussed in Hadley’s “R” package book: loading the package won’t put its functions in the search path, but attaching will.I.e., when others load and attach your package, they can use your functions depending on
ggplot2
but they cannot use the functions from theggplot2
package unless they load and attachggplot2
bylibrary(ggplot2)
.If you put
ggplot2
in Depends in the DESCRIPTION,library(yourPackage)
would load and attachggplot2
— same as doinglibrary(ggplot2)
— generally speaking you don’t need to do so.
If you name a function like
plot.shap.summary
it would be documented automatically as a S3 method of theplot
generic function. So I changed the name toshap.plot.summary
. I think in the future I shall not use dot in function names.
Documentation
@example R/example/sample1.R
will attach sample1.R in the folder “R/example/” to the documentation of the function. If you write out the samples directly in the function code, you use @examples instead of @example.All the function parameters should be documented using
@param
.
Potential problems when checking the package
You cannot have things like a line break
"\n"
in the documentation. It will give a warning of “unknown macro”, and will actually cause error when you try to download and build the package from github.When checking the package, there will be notes saying that “no visible binding for global variable”. To remove such notes, adding anywhere in the script:
if(getRversion() >= "2.15.1") {
utils::globalVariables(c(".", "rfvalue", "value", "values complained")) # all the variables complained
}
- All the code scripts go into the R/ folder. I think it seems OK to leave undocumented functions in a separate script as long as they are internally called.
Some nice suggestions from the CRAN team when submitting the package
In the description, write package names, software names and API names in single quotes (e.g. ‘Python’). The Title field should be in title case. The description should not start with package name or “This package”.
Please ensure that you do not use more than 2 cores in your examples. Is there any reason why the number of core to use is not an argument of e.g.
xgboost.fit()
?Please replace
cat()
bymessage()
orwarning()
in your functions (except forprint()
andsummary()
functions). Messages and warnings can be suppressed if needed.