---
title: "Multiple Assignment"
author: "John Mount"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Multiple Assignment}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---


[`wrapr`](https://github.com/WinVector/wrapr) now supplies a name based multiple assignment notation for [`R`](https://www.r-project.org).

In `R` there are many functions that return named lists or other structures keyed by names. Let's start with a simple example: `base::split()`.

First some example data.

```{r}
d <- data.frame(
  x = 1:9,
  group = c('train', 'calibrate', 'test'),
  stringsAsFactors = FALSE)

knitr::kable(d)
```

One way to use `base::split()` is to call it on a `data.frame` and then unpack the desired portions from the returned value.

```{r}
parts <- split(d, d$group)
train_data <- parts$train
calibrate_data <- parts$calibrate
test_data <- parts$test
```

```{r}
knitr::kable(train_data)

knitr::kable(calibrate_data)

knitr::kable(test_data)
```

If we use a multiple assignment notation we can collect some steps together, and avoid possibly leaving a possibly large temporary variable such as `parts` in our environment.

Let's clear out our earlier results.

```{r}
rm(list = c('train_data', 'calibrate_data', 'test_data', 'parts'))
```

And now let's apply `split()` and unpack the results in one step.

```{r}
library(wrapr)

to[
  train_data <- train,
  calibrate_data <- calibrate,
  test_data <- test
  ] <- split(d, d$group)
```

```{r}
knitr::kable(train_data)

knitr::kable(calibrate_data)

knitr::kable(test_data)
```

The semantics of `[]<-` imply that an object named "`to`" is left in our workspace as a side effect.  However, this object is small and if there is already an object name `to` in the workspace that is not of class `Unpacker` the unpacking is aborted prior to overwriting anything.  The unpacker two modes: `unpack` (a function that needs a dot in pipes) and `to` (an eager function factory that does not require a dot in pipes).  The side-effect can be avoided by using `:=` for assigment.

```{r}
rm(list = c('train_data', 'calibrate_data', 'test_data', 'to'))

to[
  train_data <- train,
  calibrate_data <- calibrate,
  test_data <- test
  ] := split(d, d$group)

ls()
```




Also the side-effect can be avoided by using alternate non-array update notations.

We will demonstrate a few of these.  First is pipe to array notation.

```{r}
rm(list = c('train_data', 'calibrate_data', 'test_data'))
```

```{r}
split(d, d$group) %.>% to[
  train_data <- train,
  calibrate_data <- calibrate,
  test_data <- test
  ]

ls()
```

Note the above is the [`wrapr` dot arrow pipe](https://journal.r-project.org/archive/2018/RJ-2018-042/index.html) (which requires explicit dots to denote pipe targets).  In this case it is dispatching on the class of the right-hand side argument to get the effect.  This is a common feature of the wrapr dot arrow pipe.  We could get a similar effect by using right-assigment "`->`" instead of the pipe.

We can also use a pipe function notation.

```{r}
rm(list = c('train_data', 'calibrate_data', 'test_data'))
```

```{r}
split(d, d$group) %.>% to(
  train_data <- train,
  calibrate_data <- calibrate,
  test_data <- test
)

ls()
```

Notice piping to `to()` is like piping to `to[]`, no dot is needed.

We can not currently use the `magrittr` pipe in the above as in that case the unpacked results are lost in a temporary intermediate environment `magrittr` uses during execution.

A more conventional functional form is given in `unpack()`.  `unpack()` requires a dot in `wrapr` pipelines.


```{r}
rm(list = c('train_data', 'calibrate_data', 'test_data'))
```

```{r}
split(d, d$group) %.>% unpack(
  .,
  train_data <- train,
  calibrate_data <- calibrate,
  test_data <- test
)

ls()
```

Unpack also support the pipe to array and assign to array notations.  In addition, with `unpack()` we could also use the conventional function notation.

```{r}
rm(list = c('train_data', 'calibrate_data', 'test_data'))
```

```{r}
unpack(
  split(d, d$group),
  train_data <- train,
  calibrate_data <- calibrate,
  test_data <- test
)

ls()
```

`to()` can not be directly used as a function.  It is *strongly* suggested that the objects returned by `to[]`, `to()`, and `unpack[]` *not ever* be stored in variables, but instead only produced, used, and discarded.  The issue these are objects of class `"UnpackTarget"` and have the upack destination names already bound in. This means if one of these is used in code: a user reading the code can not tell where the side-effects are going without examining the contents of the object.

The assignments in the unpacking block can be any of `<-`, `=`, `:=`, or even `->` (though the last one assigns left to right).

```{r}
rm(list = c('train_data', 'calibrate_data', 'test_data'))
```

```{r}
unpack(
  split(d, d$group),
  train_data = train,
  calibrate_data = calibrate,
  test_data = test
)

ls()
```

```{r}
rm(list = c('train_data', 'calibrate_data', 'test_data'))
```


```{r}
unpack(
  split(d, d$group),
  train -> train_data,
  calibrate -> calibrate_data,
  test -> test_data
)

ls()
```

It is a caught and signaled error to attempt to unpack an item that is not there.

```{r}
rm(list = c('train_data', 'calibrate_data', 'test_data'))
```

```{r, error=TRUE}
unpack(
  split(d, d$group),
  train_data <- train,
  calibrate_data <- calibrate_misspelled,
  test_data <- test
)
```

```{r}
ls()
```

The unpack attempts to be atomic: preferring to unpack all values or no values.

Also, one does not have to unpack all slots.

```{r}
unpack(
  split(d, d$group),
  train_data <- train,
  test_data <- test
)

ls()
```


We can use a name alone as shorthand for `name <- name` (i.e. unpacking to the same name as in the incoming object).


```{r}
rm(list = c('train_data', 'test_data'))
```

```{r}
split(d, d$group) %.>%
  to[
     train,
     test
     ]

ls()
```

We can also use `bquote` `.()` notation to use variables to specify where data is coming from.


```{r}
rm(list = c('train', 'test'))
```

```{r}
train_source <- 'train'

split(d, d$group) %.>%
  to[
     train_result = .(train_source),
     test
     ]

ls()
```


In all cases the user explicitly documents the intended data sources and data destinations at the place of assignment.  This meas a later reader of the source code can see what the operation does, without having to know values of additional variables.



Related work includes:

<ul>
<li>
The <a href="https://CRAN.R-project.org/package=zeallot"><code>zeallot::%&lt;-%</code></a> arrow already supplies excellent positional or ordered unpacking. But we feel that style may be more appropriate in the Python world where many functions return un-named tuples of results.  Python functions tend to have positional tuple return values <em>because</em> the Python language has had positional tuple unpacking as a core language feature for a very long time (thus positional structures have become "Pythonic").  R has not emphasized positional unpacking, so R functions tend to return named lists or named structures.  For named lists or named structures it may not be safe to rely on value positions.  So I feel it is more "R-like" to use named unpacking.</li>
<li>
<a href="https://github.com/crowding/vadr/blob/master/R/bind.R"><code>vadr::bind</code></a> supplies named unpacking, but appears to use a "<code>SOURCE = DESTINATION</code>" notation. That is the reverse of a "<code>DESTINATION = SOURCE</code>" which is how both R assignments and argument binding are already written.</li>
<li><code>base::attach</code>.  <code>base::attach</code> adds items to the search path with names controlled by the object being attached (instead of by the user).</li>
<li><code>base::with()</code>.  <code>unpack(list(a = 1, b = 2), x &lt;- a, y &lt;- b)
</code> works a lot like <code>
with(list(a = 1, b = 2), { x &lt;&lt;- a; y &lt;&lt;-b })</code>.
</li>
<li>
<a href="https://CRAN.R-project.org/package=tidytidbits"><code>tidytidbits</code></a> supplies positional unpacking with a <code>%=%</code> notation.
</li>
<li><a href="https://winvector.github.io/wrapr/articles/let.html"><code>wrapr::let()</code></a>.  <code>wrapr::let()</code> re-maps names during code execution using a "<code>TARGET = NEWNAME</code>" target replacement scheme, where <code>TARGET</code> acts as if it had the name stored in <code>NEWNAME</code> for the duration of the let-block.
</li>
</ul>
