EGU 2019 | Thu, 11 Apr, 08:30–10:15, Room -2.62

Before we start

Before we start

Who we are

  • Sebastian Kreutzer is a Post-doc at Université Bordeaux Montaigne, working on landscape evolution, luminescence dating and data science.

  • Michael Dietze is a Post-doc at the GFZ Potsdam, working on the seismic signals emitted by Earth surface processes.

Before we start

And who are you?

  • Your first task: Introduce yourself to the others
    • What is your educational background and your career stage?
    • For what purpose do you want to build R packages?
    • What do you expect from this course?

Before we start

Today's schedule

  • R packages – an introduction (15 min)
  • Structure and contents of R packages (20 min)
  • Git and how it works together with R (10 min)
  • Initiating an R package with RStudio (20 min)
  • Writing a function with 'roxygen2' (20 min)
  • Unit tests of package functionality (15 min)
  • Wrap up, feedback and resources (5 min)

R packages – What?

R packages – What?

  • This course covers
    • a brief idea of the concept of R packages
    • a discussion/justification of package contents
    • hands-on work to build a package
  • You are looking for an introduction to R … sorry, bad luck!

R packages – What?

What you should be already familiar with

  • R and RStudio
  • Installing and using R packages
  • Writing scripts (and functions)
  • Structuring code and following good practice rules

  • You are looking for an introduction to R … sorry, bad luck!*

(*) Don't repeat yourself.

R packages

Further resources

R packages

Prerequisites and computational minions

  • Git, a versioning tool for all kinds of software source code (and beyond)
  • RStudio: The skin for R, a proper developer (and user) environment
    • Auto completion, syntax highlighting, context help
    • Version control and project management
  • The R package 'devtools' by H. Wickham, J. Hester & W. Chang
  • The R package 'roxygen2' by H. Wickham, P. Danenberg & M. Eugster
  • The R package 'testhat' by H. Wickham, R Core Team

R packages

Concepts

  • R lives from sharing code, packages are the vehicles carrying this idea
  • Packages can be the pillars for open and reproducible science
  • Packages comprise the full set of self-contained components

R packages

Some illusions

  • R packages have to go to the Comprehensive R Archive Network (CRAN)
  • Writing R packages is a lot of boring effort
  • You can start easy and add further details later
  • People will use your package in the way you designed it

R packages

Why writing packages

  • You want to share your code with others
  • The code is organised in a coherent way
  • You want to distribute only a single file
  • Bundling code makes keeping track much easier than collecting scripts
  • The code is automatically tested (by your examples and other routines)

Working with packages simply saves time and brain cells

R packages – Contents

R package contents

An overview


What do you think? What belongs to an R package?

R package contents

An overview of obvious material

  • A name (yes, a package name)
  • A description (the meta information)
  • A function (yes, there are packages with only one function)
  • A documentation (you want to understand what the function does)
  • A working examples (you don't want to rely on the documentation, only)

  • A set of further stuff that will be covered later

R package contents

An overview

R package contents

The package name

  • This is (or should be) the hardest job when creating a package
  • Requirements
    • Only letters, numbers and periods are allowed
    • Start with a letter, do not end with a period
  • Advice
    • Pick a unique name you can google, best describing your package
    • Check if the name already exists beforehand

R package contents

The DESCRIPTION file

  • The mandatory file that defines all the metadata of the package: name & short title, version & date, author & contact, license & dependencies:

R package contents

The DESCRIPTION file - Title and description

  • Title must be capitalised, only one line, not ending with a period
  • Description can be several sentences, but only one paragraph. Lines can only contain 80 characters and must be indented by 4 spaces.
Title: Environmental seismology toolbox
Description: A collection of functions to handle seismic data for the
    purpose of investigating the seismic signals emitted by Earth surface
    processes. The package supports inporting standard formats, data 
    preparation and analysis techniques and data visualisation.
  • Both elements are important. They will be indexed and Google has learned a lot to spot R packages.

R package contents

The DESCRIPTION file – Dependencies

  • Dependency options in short (read more):
    • Depends: all packages your package essentially needs to run
    • Imports: will be covered by the namespace section
    • Suggests: optionally needed packages
    • LinkingTo: needed to reference C++/C libraries
  • CRAN became strict with the number entries in 'Depends'. Use importFrom() in the file NAMESPACE, instead.

R package contents

The DESCRIPTION file – Author information and roles

  • Essential for correct citation of packages! Nota bene: Type citation("PACKAGENAME") to see how a package should be cited
  • Don't use fake mails. CRAN and users cannot communicate with you.
  • Author information can be defined more comprehensively (read more):
Authors@R: person("First", "Last", email = "first.last@example.com",
                  role = c("aut", "cre"))

CRAN supports ORCID

Authors@R: person(...,
                  comment = c(ORCID = "0000-0002-9079-593X"))

R package contents

The DESCRIPTION file – License issues

  • The key element to inform who can use the package for which purpose!
  • Either a link to a license file (License: file LICENSE) or a keyword of standard licenses (read more):
    • GPL-2 or GPL-3: copy-left license , other users must license code GPL-compatile. Common for CRAN-submission.
    • CC0: give away all rights, anything can be done with the code
    • BSD or MIT: permissive licenses, require additional file LICENSE.

R package contents

The DESCRIPTION file – Version patterns

  • Version numbers must be numeric and separated by a period.
  • They are more than just counters, they define dependency satisfactions

  • Format: MAJOR.MINOR.PATCH (start a released package with 0.1.0)
    • MAJOR releases should be rare
    • MINOR releases should keep the package up to date
    • PATCH releases may be frequent (but think of CRAN team time budget)
  • Make use of a NEWS file (e.g., NEWS.md) to announce history & changes.

R package contents

Further contents