---
title: "Introduction to `ttt`: Formatted Tables the Easy Way"
author: "Benjamin Rich"
date: "`r Sys.Date()`"
output: 
  rmarkdown::html_vignette:
    css: vignette.css
    toc: true
vignette: >
  %\VignetteIndexEntry{Introduction to `ttt`: Formatted Tables the Easy Way}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteDepends{table1}
  %\VignetteEncoding{UTF-8}
---

```{r echo=FALSE, message=FALSE, warning=FALSE, results='hide'}
set.seed(123)
library(ttt, quietly=TRUE)
```

## Introduction

`ttt` stands for "The Table Tool" (or, if you prefer "Tables! Tables!
Tables!"). It allows you to creates formatted HTML tables of in a flexible and
convenient way.

Creating nice formatted tables has traditionally been a pain points with R.
Over the years, however, things have gotten a lot better, with the emergence of
packages that can produce nice looking tables in HTML (or LaTex or even
Microsoft<sup>&#174;</sup> Word), or which there are now a large number.^[
A non-exhaustive list includes:
[flextable](https://CRAN.R-project.org/package=flextable),
[kableExtra](https://CRAN.R-project.org/package=kableExtra),
[huxtable](https://CRAN.R-project.org/package=huxtable),
[htmlTable](https://CRAN.R-project.org/package=htmlTable),
[tableHTML](https://CRAN.R-project.org/package=tableHTML),
[ztable](https://CRAN.R-project.org/package=ztable), 
[formattable](https://CRAN.R-project.org/package=formattable),
[pixiedust](https://CRAN.R-project.org/package=pixiedust),
[basictabler](https://CRAN.R-project.org/package=basictabler),
[mmtable2](https://github.com/ianmoran11/mmtable2),
[gt](https://github.com/rstudio/gt),
[DT](https://CRAN.R-project.org/package=DT),
[tables](https://CRAN.R-project.org/package=tables),
[xtable](https://CRAN.R-project.org/package=xtable).
]
But most of those packages treat tables just like `data.frames`, i.e. a grid of
rows and columns, with very little structure. While some packages do have
constructs that allow you to group columns or rows with additional headers and
labels, that structure is basically superficial, tacked onto the `data.frame`
after the fact. And while it may achieve the desired result it tends to require
more code and by consequence, effort.  Another thing that some of these
packages do is they give you the flexibility to control the visual appearance
or styling of the tables (fonts, colors, grid lines, spacing, etc.) directly
from the R code. This is nice, but typically achieving that level of
flexibility requires a pretty complex interface of functions and arguments
dedicated to styling, and achieving the desired result can take a considerable
amount of code and hence again, effort.

This package takes a different approach. It focuses on the table structure and
content, leaving the formatting duties to CSS, a dedicated language that
was designed specifically for this purpose (the downside is that it only works
for HTML, but we accept this inconvenience). Also, the package follows the
philosophy of not trying to solve all problems, but solving some problems well.
Design decisions have been made to make some things easy, at the expense of
limiting the package's generality (while still keeping in it some sense quite
general, as this vignette will demonstrate).  That it is not possible with
this package to produce all conceivable tables is a given; that was never the
intention.

## Basic Examples

Before we start, let's load a couple of packages that we will be using:

```{r}
library(table1, quietly=TRUE)
library(magrittr, quietly=TRUE)
```

It is worth taking a minute to comment on these packages. The first, `table1`,
is like a "sister" package of `ttt` (they are both written by the same author).
While not a strict requirement, `table1` contains some utility functions that
can also be quite useful in conjunction  with `ttt`, so most of the time it is
a good idea to load `table1` along with `ttt`.  The `magrittr` package
contains the well-known "pipe" operator that we will make use of at some point
in this vignette, so we load that, too.

With that out of the way, let's start looking at what `ttt` can do, and how to
use it. The types of tables that can be produced are many and varied. At its
simplest, the ``ttt()`` function can turn a `data.frame` into an HTML table:

```{r}
ttt(mtcars)
```

But this is far from the typical use case. More typically, there is some
*structure* to the data, in the sense that some columns contain *values*, while
others contain *keys* that are used to group values according to some common
characteristic. We will refer to these latter columns as *facets*, borrowing a
term from the `ggplot2` package.  We will assume that the data are in a "tidy"
format, by which we mean that all the *values* have been placed in a single
column (if this is not the case, there are many functions that will allow you
to "reshape" the data accordingly).

For the second example, continuing with the `mtcars` data, we would like to
tabulate the average `mpg` (miles per gallon) by number of gears (rows) and
cylinders (columns). Here is the code:

```{r}
ttt(mpg ~ gear | cyl, data=mtcars, lab="Number of Cylinders", render=mean)
```

Let's break this down. The first argument is a `formula` with 3 parts:
`<values> ~ <row facets> | <column facets>`. The first part, to the left of the
symbol `~`, is the name of the column that contains the values (recall that in
the "tidy" format there is only one such column). The second part, between the
symbols `~` and `|` contains the *row facets*, one or more variables that
define how the values should be split into rows. The third part, to the right
of the symbol `|` is the *column facets*, one or more variables that define how
the values should be split into columns.

Following the `formula` comes the `data` argument, which is the `data.frame`
that contains the data that the `formula` refers to.  The next argument `lab`
is an optional label placed over all the columns. The last argument, `render`,
is a `function`. This function is called for each grouping of data defined by
unique combinations of the row and column facets, and produces the value that
appears in the corresponding cell of the table. Hopefully this is fairly
intuitive.

Now, the above table is nice, but still not quite what we want. Here are
the issues that need to be addressed:

  1. The label "gear" should be changed to "Number of Gears". 

  2. One table cell contains the cryptic value "NaN" because there aren't any
     cars with 8 cylinders and 4 gears in our dataset; we would like this cell
     to remain empty instead.

  3. The number of decimal digits is different in each cell; we would
     like it to be the same throughout the table (1 decimal digit).

Let us now address these issues.

```{r}
label(mtcars$gear) <- "Number of<br/>Gears"

rndr <- function(x, ...) {
    if (length(x) == 0) return("")
    round_pad(mean(x), 1)
}

ttt(mpg ~ gear | cyl, data=mtcars, lab="Number of Cylinders", render=rndr)
```

The way we addressed the first issue was to add a label to the variable `gear`
using the `label()` function (one of the useful utility functions from the
`table1` package). The two other issues were fixed by defining a function
`rndr()` to do the rendering.

The `ttt()` function allows the order of the `formula` data `data` arguments to
be switched, so that an alternative syntax using the `magrittr` "pipe" operator
may be used:

```{r}
mtcars %>% ttt(mpg ~ gear | cyl, lab="Number of Cylinders", render=rndr)
```

## Facets

The facets allow you to "slide-&-dice" the data however you want. The column
facet is optional; it can be omitted:

```{r}
ttt(mpg ~ gear, data=mtcars, render=rndr)
```

The row facet is required by the formula syntax, but the "magic" value `1` may
be used to indicate no splitting by rows:

```{r}
ttt(mpg ~ 1 | cyl, data=mtcars, lab="Number of Cylinders", render=rndr)
```

Both row and column facets may consist of more than one variable, joined
together by the symbol `+`.  Here is an example with 2 row facets:

```{r}
label(mtcars$cyl) <- "Number of<br/>Cylinders"
ttt(mpg ~ gear + cyl, data=mtcars, render=rndr)
```

And, similarly for 2 column facets:

```{r}
ttt(mpg ~ 1 | gear + cyl, data=mtcars, lab="Number of Cylinders/Gears", render=rndr)
```

The order of the variables is obviously important, as they define a nesting
structure. If we think of the `|` that separates the row and column facets as
the "values" (i.e. the central part of the table), then the order makes sense:
variables closer to the center (`|`) are grouped within variables that are
farther away.

Just to demonstrate, here is a synthetic example of a large table where both row
and column facets are nested 3 levels deep:

```{r}
bigtable <- expand.grid(
    R1=LETTERS[1:3],
    R2=LETTERS[4:6],
    R3=LETTERS[7:9],
    C1=LETTERS[10:12],
    C2=LETTERS[13:15],
    C3=LETTERS[16:18])

bigtable$x <- 1:nrow(bigtable)
ttt(x ~ R3 + R2 + R1 | C1 + C2 + C3, data=bigtable)
```

## Render functions

The `render` function gives a lot of flexibility. For example, instead of the
mean `mpg`, we can list the cars according to their gear/cylinder combination:

```{r}
ttt(rownames(mtcars) ~ gear | cyl, data=mtcars, lab="Number of Cylinders",
  render=paste, collapse="<br/>")
```

Note that additional arguments can be  passed to the `render` function through `...`
(as in this case, `collapse`).

Furthermore, a `render` function can return more than one value in a named
vector.  In this case, the argument `expand.along` also comes into play. It
determines how the multiple values are laid out, either vertically (along
rows), or the horizontally (along columns).

For example, here we define a function that computes both the mean and
standard deviation (SD), each to 3 significant digits, and apply it to the response
variable in the `OrchardSprays` dataset (i.e., `decrease`) according to treatment:

```{r}
rndr.meansd <- function(x) signif_pad(c(Mean=mean(x), SD=sd(x)), digits=3)

ttt(decrease ~ treatment, data=OrchardSprays, render=rndr.meansd)
```

The default is `expand.along="rows"`, which produces the result above.  As we
can see, the table contains an additional column with the values "Mean" and
"SD", and each value is displayed in its own row. By default, the extra column
is labeled "Statistic", but to change the label to "Blah" we can specify a
named vector as follows:

```{r}
ttt(decrease ~ treatment, data=OrchardSprays, render=rndr.meansd, expand.along=c(Blah="rows"))
```

The other option is `expand.along="columns"`, which produces this result:

```{r}
ttt(decrease ~ treatment, data=OrchardSprays, render=rndr.meansd, expand.along="columns")
```

Now "Mean" and "SD" each have their own column.

## Captions and footnotes

A caption and one or more footnotes can be added to the table by specifying string values
to the `caption` and `footnote` arguments, respectively:

```{r}
ttt(decrease ~ 1 | treatment, data=OrchardSprays, render=rndr.meansd, lab="Treatment",
    caption="Mean and SD of Decrease by Treatment",
    footnote=c("Data: OrchardSprays", "Comment: <code>ttt</code> is cool!"))
```

There's not much more to say about this.

## Styling

The appearance of the tables produced by `ttt` can be changed in 2 ways: using
themes, or custom styling. Themes are easier, but don't give much flexibility.
For fine-level control, custom styling must be used.

### Themes

The `ttt` package comes with 2 themes at this time: the `default` theme that
has been used throughout this vignette so far, and the `booktabs` theme. (More
themes may be added later.) Selecting the theme can be done using the
`ttt.theme` global option:

```{r, eval=FALSE}
options(ttt.theme="booktabs")  # Select the "booktabs" theme
```

If we select the `booktabs` theme, our large table looks like this:

```{r, eval=FALSE}
ttt(x ~ R3 + R2 + R1 | C1 + C2 + C3, data=bigtable)
```

```{r, echo=FALSE}
css <- readLines(system.file(package="ttt", "ttt_booktabs_1.0/ttt_booktabs.css"))
css <- gsub(".Rttt ", ".Rttt-booktabs-demo ", css, fixed=TRUE)
ttt(x ~ R3 + R2 + R1 | C1 + C2 + C3, data=bigtable, topclass="Rttt-booktabs-demo", css=css)
```

Note that a theme can only apply to a whole document; it is not possible to
selectively style different tables within the same document differently using
different themes, as we appear to have done here (but it can be done with
custom styling, which is how it was done).

```{r, eval=FALSE}
options(ttt.theme="default")  # Change back to the "default" theme
```

### Custom styling

As mentioned in the introduction, changing the table's appearance is
accomplished using CSS. In order to make this possible, `ttt` places "hooks" in
the table in the form of `class` attributes on various HTML elements.

The first thing to know is that the whole table is enclosed in a
`<div class="Rttt">` element. The allows specific formatting to be applied to tables
output by `ttt` without interfering with other tables in the same document.

The next thing is that all row labels have the class `Rttt-rl`, and all column
labels have the class `Rttt-cl`. Furthermore, there are different classes for
each *level* or nesting: `Rttt-rl-lvl1` for the first (i.e. innermost) level or
row labels, `Rttt-rl-lvl2` for the second level, and so on, and similarly for
the column labels with `cl` instead of `rl`.

Finally, it is possible to add a `class` attribute or `id` attribute to the
whole table, so that it may be targetted with specific CSS selectors. We can
also pass CSS code directly to the `ttt()` function to be included with the
table.

For example, can can specify that our table has the ID `bigtable`, and then
give it a particular (and particularly weird) style:

```{r}
css <- '
#bigtable {
  font-family: "Lucida Console", Monaco, monospace;
}
#bigtable td {
  background-color: #eee;
}
#bigtable th {
  color: blue;
  background-color: lightblue;
}
#bigtable th, #bigtable td {
  border: 2px dashed orange;
}
#bigtable .Rttt-rl {
  background-color: #fff;
  font-style: italic;
  font-weight: bold;
}
#bigtable .Rttt-rl-lvl1 {
  font-size: 12pt;
  color: pink;
  background-color: yellow;
}
#bigtable .Rttt-rl-lvl2 {
  font-size: 14pt;
  color: green;
}
#bigtable .Rttt-rl-lvl3 {
  font-size: 18pt;
  color: red;
}
'

ttt(x ~ R3 + R2 + R1 | C1 + C2 + C3, data=bigtable, id="bigtable", css=css)
```

### Example: conditional formatting

A `render` function can actually add a `html.class` attribute to its return
value. The value of this attribute will be assigned to the resulting HTML
element's `class` attribute, allowing you to target that element with specific
formatting.

For example, suppose we have a `data.frame` that contains some numeric values,
and we want to put them in a table:

```{r}
dat <- expand.grid(row=LETTERS[1:5], column=LETTERS[1:5])
dat$value <- rnorm(nrow(dat))

ttt(value ~ row | column, data=dat, render=round_pad, digits=2)
```

Furthermore, suppose we want the cells that contain negative values to be red,
and those that contain positive values to be green.  (Note: this can actually
be easily accomplished using JavaScript, but that's not the point of this
example).  Let's define a `render` function for this:

```{r}
rndr <- function(x, ...) {
    y <- round_pad(x, 2)
    attr(y, "html.class") <- ifelse(x < 0, "neg", "pos")
    y
}
```

(since there are no zeros in this example, the code here cheats and treats zero
as positive; if you don't like this, think of it as non-negative).

The `render` function sets the desired class on the elements. Now, we can add
some CSS code to obtain the desired colors:

```{css, echo=TRUE}
.neg {
  color: #990000;
  background-color: #ff000030;
}
.pos {
  color: #007700;
  background-color: #00ff0030;
}
```

Finally, we generate the table:

```{r}
ttt(value ~ row | column, data=dat, render=rndr)
```

