---
title: "Using valaddin"
author: "Eugene Ha"
date: "2017-08-10"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Using valaddin}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
library(valaddin)
knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
```

_valaddin_ is a lightweight R package that enables you to transform an existing 
function into a function with input validation checks. It does so without 
requiring you to modify the body of the function, in contrast to doing input 
validation using `stop` or `stopifnot`, and is therefore suitable for both 
programmatic and interactive use.

This document illustrates the use of valaddin, by example. For usage details,
see the main documentation page, `?firmly`.

## Use cases

The workhorse of valaddin is the function `firmly`, which applies input 
validation to a function, _in situ_. It can be used to:

### Enforce types for arguments

For example, to require that all arguments of the function
    
```{r}
f <- function(x, h) (sin(x + h) - sin(x)) / h
```

are numerical, apply `firmly` with the check formula `~is.numeric`[^1]:

```{r}
ff <- firmly(f, ~is.numeric)
```

`ff` behaves just like `f`, but with a constraint on the type of its arguments:

```{r, error = TRUE, purl = FALSE}
ff(0.0, 0.1)

ff("0.0", 0.1)
```

[^1]: The inspiration to use `~` as a quoting operator came from the vignette 
[Non-standard 
evaluation](https://cran.r-project.org/package=lazyeval/vignettes/lazyeval.html),
by Hadley Wickham.

### Enforce constraints on argument values

For example, use `firmly` to put a cap on potentially long-running computations:

```{r, error = TRUE, purl = FALSE}
fib <- function(n) {
  if (n <= 1L) return(1L)
  Recall(n - 1L) + Recall(n - 2L)
}

capped_fib <- firmly(fib, list("n capped at 30" ~ ceiling(n)) ~ {. <= 30L})

capped_fib(10)
capped_fib(50)
```

The role of each part of the value-constraining formula is evident:

-   The right-hand side `{. <= 30L}` is the constraint itself, expressed as a
    condition on `.`, a placeholder argument.

-   The left-hand side `list("n capped at 30" ~ ceiling(n))` specifies the 
    expression for the placeholder, namely `ceiling(n)`, along with a message to
    produce if the constraint is violated.

### Warn about pitfalls

If the default behavior of a function is problematic, or unexpected, you can use
`firmly` to warn you. Consider the function `as.POSIXct`, which creates a
date-time object:

```{r, include = FALSE}
tz_original <- Sys.getenv("TZ", unset = NA)
```

```{r}
Sys.setenv(TZ = "CET")
(d <- as.POSIXct("2017-01-01 09:30:00"))
```

The problem is that `d` is a potentially _ambiguous_ object (with hidden state),
because it's not assigned a time zone, explicitly. If you compute the local hour
of `d` using `as.POSIXlt`, you get an answer that interprets `d` according to 
your current time zone; another user—or you, in another country, in the
future—may get a different result.

-   If you're in CET time zone:

    ```{r}
    as.POSIXlt(d, tz = "EST")$hour
    ```

-    If you were to change to EST time zone and rerun the code:

    ```{r}
    Sys.setenv(TZ = "EST")
    d <- as.POSIXct("2017-01-01 09:30:00")
    as.POSIXlt(d, tz = "EST")$hour
    ```

```{r, include = FALSE}
if (isTRUE(is.na(tz_original))) {
  Sys.unsetenv("TZ")
} else {
  Sys.setenv(TZ = tz_original)
}
```

To warn yourself about this pitfall, you can modify `as.POSIXct` to complain 
when you've forgotten to specify a time zone:

```{r}
as.POSIXct <- firmly(as.POSIXct, .warn_missing = "tz")
```

Now when you call `as.POSIXct`, you get a cautionary reminder:

```{r}
as.POSIXct("2017-01-01 09:30:00")

as.POSIXct("2017-01-01 09:30:00", tz = "CET")
```

**NB**: The missing-argument warning is implemented by wrapping functions. The
underlying function `base::as.POSIXct` is called _unmodified_.

#### Use `loosely` to access the original function

Though reassigning `as.POSIXct` may seem risky, it is not, for the behavior is 
unchanged (aside from the extra precaution), and the original `as.POSIXct`
remains accessible:

-   With a namespace prefix: `base::as.POSIXct`
-   By applying `loosely` to strip input validation: `loosely(as.POSIXct)`

```{r}
loosely(as.POSIXct)("2017-01-01 09:30:00")

identical(loosely(as.POSIXct), base::as.POSIXct)
```

### Decline handouts

R tries to help you express your ideas as concisely as possible. Suppose you 
want to truncate negative values of a vector `w`:

```{r}
w <- {set.seed(1); rnorm(5)}

ifelse(w > 0, w, 0)
```

`ifelse` assumes (correctly) that you intend the `0` to be repeated `r length(w)`
times, and does that for you, automatically.

Nonetheless, R's good intentions have a darker side:

```{r}
z <- rep(1, 6)
pos <- 1:5
neg <- -6:-1

ifelse(z > 0, pos, neg)
```

This smells like a coding error. Instead of complaining that `pos` is too short,
`ifelse` recycles it to line it up with `z`. The result is probably not what you 
wanted.

In this case, you don't need a helping hand, but rather a firm one:

```{r}
chk_length_type <- list(
  "'yes', 'no' differ in length" ~ length(yes) == length(no),
  "'yes', 'no' differ in type" ~ typeof(yes) == typeof(no)
) ~ isTRUE

ifelse_f <- firmly(ifelse, chk_length_type)
```

`ifelse_f` is more pedantic than `ifelse`. But it also spares you the
consequences of invalid inputs:

```{r, error = TRUE, purl = FALSE}
ifelse_f(w > 0, w, 0)
ifelse_f(w > 0, w, rep(0, length(w)))

ifelse(z > 0, pos, neg)
ifelse_f(z > 0, pos, neg)

ifelse(z > 0, as.character(pos), neg)
ifelse_f(z > 0, as.character(pos), neg)
```

### Reduce the risks of a lazy evaluation-style

When R make a function call, say, `f(a)`, the _value_ of the argument `a` is not
materialized in the body of `f` until it is actually needed. Usually, you can 
safely ignore this as a technicality of R's evaluation model; but in some 
situations, it can be problematic if you're not mindful of it.

Consider a bank that waives fees for students. A function to make deposits 
might look like this[^2]:

```{r}
deposit <- function(account, value) {
  if (is_student(account)) {
    account$fees <- 0
  }
  account$balance <- account$balance + value
  account
}

is_student <- function(account) {
  if (isTRUE(account$is_student)) TRUE else FALSE
}
```

Suppose Bob is an account holder, currently not in school:

```{r}
bobs_acct <- list(balance = 10, fees = 3, is_student = FALSE)
```

If Bob were to deposit an amount to cover an future fee payment, his account
balance would be updated to:

```{r}
deposit(bobs_acct, bobs_acct$fees)$balance
```

Bob goes back to school and informs the bank, so that his fees will be waived:

```{r}
bobs_acct$is_student <- TRUE
```

But now suppose that, somewhere in the bowels of the bank's software, the type
of Bob's account object is converted from a list to an environment:

```{r}
bobs_acct <- list2env(bobs_acct)
```

If Bob were to deposit an amount to cover an future fee payment, his account
balance would now be updated to:

```{r}
deposit(bobs_acct, bobs_acct$fees)$balance
```

Becoming a student has cost Bob money. What happened to the amount deposited?

The culprit is lazy evaluation and the modify-in-place semantics of 
environments. In the call `deposit(account = bobs_acct, value = bobs_acct$fee)`,
the value of the argument `value` is only set when it's used, which comes after
the object `fee` in the environment `bobs_acct` has already been zeroed out.

To minimize such risks, forbid `account` from being an environment:

```{r}
err_msg <- "`acccount` should not be an environment"
deposit <- firmly(deposit, list(err_msg ~ account) ~ Negate(is.environment))
```

This makes Bob a happy customer, and reduces the bank's liability:

```{r, error = TRUE, purl = FALSE}
bobs_acct <- list2env(list(balance = 10, fees = 3, is_student = TRUE))

deposit(bobs_acct, bobs_acct$fees)$balance

deposit(as.list(bobs_acct), bobs_acct$fees)$balance
```

[^2]: Adapted from an example in Section 6.3 of Chambers, _Extending R_, CRC 
Press, 2016. For the sake of the example, ignore the fact that logic to handle
fees does not belong in a function for deposits!

### Prevent self-inflicted wounds

You don't mean to shoot yourself, but sometimes it happens, nonetheless:

```{r, eval = FALSE}
x <- "An expensive object"
save(x, file = "my-precious.rda")

x <- "Oops! A bug or lapse has tarnished your expensive object"

# Many computations later, you again save x, oblivious to the accident ...
save(x, file = "my-precious.rda")
```

`firmly` can safeguard you from such mishaps: implement a safety procedure

```{r}
# Argument `gear` is a list with components:
# fun: Function name
# ns : Namespace of `fun`
# chk: Formula that specify input checks

hardhat <- function(gear, env = .GlobalEnv) {
  for (. in gear) {
    safe_fun <- firmly(getFromNamespace(.$fun, .$ns), .$chk)
    assign(.$fun, safe_fun, envir = env)
  }
}
```

gather your safety gear

```{r}
protection <- list(
  list(
    fun = "save",
    ns  = "base",
    chk = list("Won't overwrite `file`" ~ file) ~ Negate(file.exists)
  ),
  list(
    fun = "load",
    ns  = "base",
    chk = list("Won't load objects into current environment" ~ envir) ~
      {!identical(., parent.frame(2))}
  )
)
```

then put it on

```{r}
hardhat(protection)
```

Now `save` and `load` engage safety features that prevent you from inadvertently
destroying your data:

```{r, eval = FALSE}
x <- "An expensive object"
save(x, file = "my-precious.rda")

x <- "Oops! A bug or lapse has tarnished your expensive object"
#> Error: save(x, file = "my-precious.rda")
#> Won't overwrite `file`

save(x, file = "my-precious.rda")

# Inspecting x, you notice it's changed, so you try to retrieve the original ...
x
#> [1] "Oops! A bug or lapse has tarnished your expensive object"
load("my-precious.rda")
#> Error: load(file = "my-precious.rda")
#> Won't load objects into current environment

# Keep calm and carry on
loosely(load)("my-precious.rda")

x
#> [1] "An expensive object"
```

**NB**: Input validation is implemented by wrapping functions; thus, if the 
arguments are valid, the underlying functions `base::save`, `base::load` are 
called _unmodified_.

## Toolbox of input checkers

_valaddin_ provides a collection of over 50 pre-made input checkers to 
facilitate typical kinds of argument checks. These checkers are prefixed by 
`vld_`, for convenient browsing and look-up in editors and IDE's that support
name completion.

For example, to create a type-checked version of the function `upper.tri`, which
returns an upper-triangular logical matrix, apply the checkers `vld_matrix`,
`vld_boolean` (here "boolean" is shorthand for "logical vector of length 1"):

```{r, error = TRUE, purl = FALSE}
upper_tri <- firmly(upper.tri, vld_matrix(~x), vld_boolean(~diag))

# upper.tri assumes you mean a vector to be a column matrix
upper.tri(1:2)

upper_tri(1:2)

# But say you actually meant (1, 2) to be a diagonal matrix
upper_tri(diag(1:2))

upper_tri(diag(1:2), diag = "true")

upper_tri(diag(1:2), TRUE)
```

### Check anything with `vld_true`

Any input validation can be expressed as an assertion that "such and such must
be true"; to apply it as such, use `vld_true` (or its complement, `vld_false`).

For example, the above hardening of `ifelse` can be redone as:

```{r, error = TRUE, purl = FALSE}
chk_length_type <- vld_true(
  "'yes', 'no' differ in length" ~ length(yes) == length(no),
  "'yes', 'no' differ in type" ~ typeof(yes) == typeof(no)
)
ifelse_f <- firmly(ifelse, chk_length_type)

z <- rep(1, 6)
pos <- 1:5
neg <- -6:-1

ifelse_f(z > 0, as.character(pos), neg)
ifelse_f(z > 0, c(pos, 6), neg)
ifelse_f(z > 0, c(pos, 6L), neg)
```

### Make your own input checker with `localize`

A check formula such as `~ is.numeric` (or `"Not number" ~ is.numeric`, if you
want a custom error message) imposes its condition "globally":

```{r, error = TRUE, purl = FALSE}
difference <- firmly(function(x, y) x - y, ~ is.numeric)

difference(3, 1)
difference(as.POSIXct("2017-01-01", "UTC"), as.POSIXct("2016-01-01", "UTC"))
```

With `localize`, you can concentrate a globally applied check formula to 
specific expressions. The result is a _reusable_ custom checker:

```{r, error = TRUE, purl = FALSE}
chk_numeric <- localize("Not numeric" ~ is.numeric)
secant <- firmly(function(f, x, h) (f(x + h) - f(x)) / h, chk_numeric(~x, ~h))

secant(sin, 0, .1)
secant(sin, "0", .1)
```

(In fact, `chk_numeric` is equivalent to the pre-built checker `vld_numeric`.)

Conversely, apply `globalize` to impose your localized checker globally:

```{r, error = TRUE, purl = FALSE}
difference <- firmly(function(x, y) x - y, globalize(chk_numeric))

difference(3, 1)
difference(as.POSIXct("2017-01-01", "UTC"), as.POSIXct("2016-01-01", "UTC"))
```

## Related packages

### Packages that enhance valaddin

-   [assertive](https://bitbucket.org/richierocks/assertive),
    [assertthat](https://github.com/hadley/assertthat), and 
    [checkmate](https://github.com/mllg/checkmate) provide extensive collections
    of predicate functions that you can use in conjunction with `firmly` and 
    `localize`.
    
-   [ensurer](https://github.com/smbache/ensurer) and 
    [assertr](https://github.com/ropensci/assertr) provide ways to validate
    function _values_.

### Other approaches to input validation

-   [argufy](https://github.com/gaborcsardi/argufy) takes a different approach
    to input validation, using [roxygen](https://github.com/r-lib/roxygen2) 
    comments to specify checks.
    
-   [ensurer](https://github.com/smbache/ensurer) provides an experimental
    replacement for `function` that builds functions with type-validated
    arguments.

-   [typeCheck](https://github.com/jimhester/typeCheck), together with
    [Types for R](https://github.com/jimhester/types), enable the creation of 
    functions with type-validated arguments by means of special type 
    annotations. This approach is orthogonal to that of valaddin: whereas 
    valaddin specifies input checks as _predicate functions with scope_
    (predicates are primary), typeCheck specifies input checks as _arguments
    with type_ (arguments are primary).
