January 18, 2017

Overview

  • Part A
    • R-packages – an introduction (10 minutes)
    • Structure and contents of R-packages (30 minutes)
    • From scripts to functions (20 minutes)
  • Part B
    • Building a template package the prospective way (10 minutes)
    • Maintaining a package (20 minutes)

1.1 R-packages – What?

R-packages – What?

  • If you are looking for an introduction to R… bad luck!
  • This course does cover
    • a brief idea of the concept of R-packages
    • a discussion/justification of package contents
    • hands-on work to build a package
    • work beyond heaving a package onto the shelf and vanish
    • additional material and efforts related to packages

R-packages – What?

What you should be already familiar with

  • R and RStudio
  • Installing and using R-packages
  • Writing scripts (and functions)
  • Structuring code and following good practice rules

R-packages

Further resources

R-packages

Prerequisites and computational minions

  • RStudio: The skin for R, a perfect developer (and user) environment
    • Support for creating scripts, packages, reports, books, slides, …
    • Plots, data environment history, file browser, help viewer
    • Auto completion, syntax highlighting, context help
    • Version control and project management
  • The R-package devtools by Hadley Wickham
  • The R-package ryoxygen2 by Hadley Wickham

R-packages

Concepts

  • R lives from sharing code, and packages are the vehicles for this idea
  • Packages are the pilars for open, transparent, reproducible work
  • Packages comprise the full set of self-contained components

  • Libraries are no packages! Libraries host packages. You pull a package from the library.

R-packages

Some illusions

  • R-packages have to go to the Comprehensive R Archive Network (CRAN)
  • Writing R-packages is a lot of boring effort
  • You can start easy and add further details later
  • People will use your package in the way you designed it

R-packages

Why writing packages

  • You want to share your code with others
  • The code is organised in a coherent way
  • You want to handle/distribute only a single file
  • Bundling code makes keeping track much easier than collecting scripts
  • The code is automatically tested (by your examples and other routines)

  • Working with packages simply saves time and brain cells

1.2 R-packages – Contents!

R-package contents

An overview

  • What do you think? What belongs to a an R-package?

R-package contents

An overview of obvious material

  • A name (yes, a package name)
  • A description (the meta information)
  • A function (yes, there are packages with only one function)
  • A documentation (you want to understand what the function does)
  • A working example (you better don't trust the documentation)

  • A set of further stuff that will be covered later

R-package contents

An overview

R-package contents

The package name

  • This is (or should be) the hardest part of building a package
  • Requirements
    • Only letters, numbers and periods are allowed
    • Start with a letter, do not end with a period
  • Advice
    • Pick a unique name that you can google and covers the package
    • Do not mix upper and lower case letters
    • Check if the name already exists beforehand

R-package contents

The DESCRIPTION file

  • The mandatory file that defines all the metadata of the package: name & short title, version & date, author & contact, license & dependencies:

R-package contents

The DESCRIPTION file - Title and description

  • Title must be capitalised, only one line, not ending with a period
  • Description can be several sentences, but only one paragraph. Lines can only contain 80 characters and must be indented by 4 spaces.
Title: Environmental seismology toolbox
Description: A collection of functions to handle seismic data for the
    purpose of investigating the seismic signals emitted by Earth surface
    processes. The package supports inporting standard formats, data 
    preparation and analysis techniques and data visualisation.
  • Both elements are important. They will be indexed and Google has learned a lot to spot R packages.

R-package contents

The DESCRIPTION file – Dependencies

  • Dependency options in short (read more):
    • Depends: all packages your package needs to run
    • Imports: will be covered by the namespace section
    • Suggests: packages used to make, e.g., vignettes
    • Enhances: optionally needed packages
    • LinkingTo: needed to reference C++/C libraries
  • CRAN became strict with the number of Depends-entries. Use importFrom() in NAMESPACE, instead.

R-package contents

The DESCRIPTION file – Author information and roles

  • Author information can also be defined more comprehensive (read more):
Authors@R: person("First", "Last", email = "first.last@example.com",
                  role = c("aut", "cre"))
  • Essential for correct citation of packages! Nota bene: always cite the package and R version used for analysis: citation("PACKAGENAME", auto = TRUE)
  • Don't use fake mails. CRAN and users will not communicate with you.

R-package contents

The DESCRIPTION file – License issues

  • The key element to inform who can use the package for which purpose!

  • Either a link to a license file (License: file LICENSE) or a keyword of standard licenses (read more):
    • GPL-2 or GPL-3: copy-left license , other users must license code GPL-compatile. Essential for CRAN-submission.
    • CC0: give away all rights, anything can be done with the code
    • BSD or MIT: permissive licenses, require additional file LICENSE.

R-package contents

The DESCRIPTION file – Version patterns

  • Version numbers must be numeric and separated by a period.
  • They are more than just counters, they define dependency satisfactions

  • Format: MAJOR.MINOR.PATCH (start a released package with 0.0.1)
    • MAJOR releases should be rare
    • MINOR releases should keep the package up to date
    • PATCH releases may be frequent (but think of CRAN team time budget)
  • Make use of a NEWS file to announce version history and changes.

R-package contents

Further contents

R-package contents

R code

  • The actual function definitions
  • Will be covered in the next section

R-package contents

Code documentation

  • The (second) most important part of a reasonable package
  • Omitting it means, nobody will be able to use your package

  • Documentation in R is reference documentation, similar to dictionaries
  • Additional documentation is covered by vignettes (not covered here)

  • In R, documentation in *.Rd-files is highly formalised and follows a LaTex scheme (short version, long version)

R-package contents

Code documentation

R-package contents

Code documentation

R-package contents

Code documentation

  • Why should you not attempt to build documentation files manually?
    • Tedious, clumsy, not intuitive
    • Prone to forget updating after changing the function
  • Alternative, write documentation in function definitions
    • roxygen2 and Rd2roxygen
    • inlinedocs (no longer updated)
  • Application see next section

R-package contents

Examples and example data

  • Working (and worked) examples are mandatory documentation items
  • Serve two things
    • Explain usage of the function
    • Test in a real case that the function works
  • Typically useful to include example data sets
    • Must be provided as *.rda files (generated with save())
    • Must be stored in directory data
    • Must be documented individually

R-package contents

Further contents

  • Namespace will be dealt with automatically (read more)
    • Defines function name assignments to packages
  • Compiled code (not covered here)
    • C++ code present in src will be compiled during installation
  • Shiny and other installed software (not covered here)
    • further files/software in inst will be copied to main directory

From scripts to functions

From scripts to functions

A horrable start

a <- 10:50
print(a)
plot(a)
b <- 210:250
plot(a, b)
A <- a * b
c <- 5
V <- A * c
print(A)
print(V)
plot(a,V)

From scripts to functions

Separating code sections

a <- 10:50
b <- 210:250
c <- 5

A <- a * b
V <- A * c

plot(a)
plot(a, b)
plot(a,V)

print(a)
print(A)
print(V)

From scripts to functions

Documenting code

## define object geometry
a <- 10:50
b <- 210:250
c <- 5

## calculate area and volume
A <- a * b
V <- A * c

## plot object dimensions
plot(a)
plot(a, b)
plot(a, V)

## print values
print(a)
print(A)
print(V)

From scripts to functions

Wrapping code to functions

f <- function(a, b, c) {
  
  ## calculate area and volume
  A <- a * b
  V <- A * c

  ## plot object dimensions
  plot(a)
  plot(a, b)
  plot(a, V)

  ## return values
  return(list(A = A,
              V = V))
}

From scripts to functions

Wrapping code to functions

f <- function(a, b, c, plot = TRUE) {
  
  ## calculate area and volume
  A <- a * b
  V <- A * c

  ## optionally plot object dimensions
  if(plot == TRUE) {
    
    plot(a)
    plot(a, b)
    plot(a, V)
  }

  ## return values
  return(list(A = A,
              V = V))
}

From scripts to functions

Wrapping code to functions

f(a = 10, b = 100, c = 5, plot = FALSE)
## $A
## [1] 1000
## 
## $V
## [1] 5000

From scripts to functions

In words

  • Structure your script
    • Variable/object/argument definitions
    • Data/variable checks and automatic assignments
    • Data manipulation/evaluation
    • Optional further outputs
    • Return object creation
  • Wrap it into a function definition
    • FUNCTION_NAME(ARUMENT_1, ARGUMENT_2) {FUNCTION BODY}

From scripts to functions

In words

  • Ooops, what did we forget?

From scripts to functions

In words

  • Ooops, what did we forget?

  • Function documentation (in a separate file)

\name{f}
\alias{f}
\title{Calculate and plot cuboid areas and volumes.}
\usage{f(a, b, c, plot = TRUE)}
\arguments{
\item{a}{\code{Numeric} vector, length of the cuboid.}
\item{b}{\code{Numeric} vector, width of the cuboid.}
\item{c}{\code{Numeric} vector, height of the cuboid.}
}
\value{A list with cuboid area and volume.}
\description{The function takes numeric vectors of the cardinal dimensions of a cuboid object and calculates area and volume. The results can optionally be plotted.}
\examples{f(a = 10, b = 100, c = 5, plot = FALSE)}
\author{Michael Dietze}

From scripts to functions

Documentation using roxygen2

  • So why not writing documentation into the function definition file?

Another brief function example

f <- function(x, p = 2) {
  
  ## calculate the power of x
  y <- x^p

  ## return value
  return(y)
}

From scripts to functions

Documentation using roxygen2

Can be rewritten like this:

#' Calculate the power of a vector                     # TITLE
#' 
#' The function calculates the power p of a vector x.  # DESCRIPTION
#' 
#' The function simply combines the arguments.         # DETAILS 
#'
#' @param x input vector                               # ARGUMENTS
#' @param p power exponent                             # ARGUMENTS
#' @return vector of the power p of x.                 # VALUE
#' @author Michael Dietze                              # AUTHOR(S)
#' @examples
#' f(x = 10, p = 3)                                    # EXAMPLES
#' @export f                                           # NAMESPACE ENTRY
f <- function(x, p = 2) {                              # USAGE
  return(x^p)
}

From scripts to functions

Documentation using roxygen2

And become someting like:

From scripts to functions

Documentation using roxygen2

  • roxygen2 is a package that parses function source files for tags and converts them to the LaTex-like structure of a *.Rd-file.

  • First line always becomes the title (thus, keep it to one line)
  • Second set of lines becomes becomes description
  • Third and further set of lines becomes details (optional)

  • Further down follow tagged items

From scripts to functions

Documentation using roxygen2

  • @param - Function arguments, note argument and then description
  • @return - Function value
  • @examples - Examples section
  • @export - Namespace export, usually the function name
  • @seealso - Related functions to link to
  • @keywords - Well, keywords
  • @section - Arbitrary sections to further structure the documentation

From scripts to functions

Documentation using roxygen2

  • Further LaTex-like tags to structure the text can be
    • Text formatting (\emph{}, \strong{}, \code{})
    • Links (\code{\link{}}, \href{}{})
    • Lists (\enumerate{}, \itemize{})
    • Equations (\eqn{}, \deqn{})
    • Tables (\tabular{}{\tab \cr})
  • Details see here, Example see source of Luminescence::analyse_baSAR.R

From scripts to functions

Documentation of data sets using roxygen2

  • Further tags
    • @format - Overview of data structure (copy-paste output of str())
    • @source - Source of the data set, e.g., the internet link
  • Documentation in a file called like the data set and saved in R/DATA_SET.R
  • Alternatively, all documentation in package documentation (see next slide)
#' Ten numbers from 1 to 10
#'
#' A dataset containing ten ordered natural numbers
#'
#' @format A vector with 10 variables:
#' int [1:10] 1 2 3 4 5 6 7 8 9 10
"x"

From scripts to functions

Documentation of packages using roxygen2

  • Similar to data sets, but the package is defined as a NULL-object
  • Definition of imports (entire packages or external functions)
  • Save as file called PACKAGE_NAME-package.Rin R/
#' A package of diverse functions
#'
#' The package is used to store all my functions, save from my brain.
#'
#' @docType package
#' @name PACKAGE_NAME
#' @import stats
#' @importFrom utils read.table, write.table
NULL