---
title: "sdcLog options"
output:
  rmarkdown::html_vignette:
    toc: true
    toc_depth: 5
    number_sections: false
vignette: >
  %\VignetteIndexEntry{sdcLog options}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#"
)

user_options <- options()

options(datatable.print.class = FALSE)
options(datatable.print.keys = FALSE)
options(datatable.print.trunc.cols = FALSE)

options(sdc.n_ids = 5L)
options(sdc.n_dominance = 2L)
options(sdc.share_dominance = 0.85)
```

You can set several options to

1. adapt sdcLog to the policies at your research data center
2. make sdcLog a little more convenient.

First, we create a tiny `data.frame` to demonstrate the effects of the options:

```{r label, options}
library(sdcLog)
df <- data.frame(id = LETTERS[1:3], v1 = 1L:3L, v2 = c(1L, 2L, 4L))
df
```

# Options to adapt sdcLog to the policies at your research data center
## sdc.n_ids
By default, sdcLog expects at least five different entities behind each
calculated number. The functions in sdcLog derive this number from
`getOption("sdc.n_ids", default = 5)`. That is, if the option `sdc.n_ids` is not
set, it defaults to `5`. Consider the following example:

```{r example1_sdc.n_ids}
sdc_descriptives(data = df, id_var = "id", val_var = "v1")
```

This can be adapted to the policy of your research data center by setting the
option `sdc.n_ids` to the desired value. For example, if your policy allows
results to be released if there are at least three different entities behind
each number, set

```{r set_sdc.n_ids}
options(sdc.n_ids = 3)
```

Now, `getOption("sdc.n_ids", default = 5)` evaluates to `3` and warnings are
thrown only if there are less than three entities behind each result. Note that
this is reflected in the first line of output from every function of sdcLog:

```{r example2_sdc.n_ids}
sdc_descriptives(data = df, id_var = "id", val_var = "v1")
```

## sdc.n_ids_dominance

The default value for `sdc.n_ids_dominance` is `2`. In our example, this leads
to a warning:

```{r example1_sdc.n_ids_dominance}
sdc_descriptives(data = df, id_var = "id", val_var = "v2")
```

If your policy requires only the largest entity alone to attribute for a share
of less than `0.85`, set

```{r set_sdc.n_ids_dominance}
options(sdc.n_ids_dominance = 1)
```

Then, there is no problem in the example:

```{r example2_sdc.n_ids_dominance}
sdc_descriptives(data = df, id_var = "id", val_var = "v2")
```

## sdc.share_dominance

The last option of sdcLog which affects internal calculations is
`sdc.share_dominance`. To demonstrate, we first reset `sdc.n_dominance` to it's
default value of `2`.
```{r reset_options1, include=FALSE}
options(sdc.n_ids_dominance = 2L)
```
Let's consider a policy which allows the largest two entities to attribute for a
share of `0.8`. To reflect this, set

```{r set_sdc.share_dominance}
options(sdc.share_dominance = 0.8)
```

Now, the initial example from `sdc.n_ids` throws a warning:

```{r example1_sdc.share_dominance}
sdc_descriptives(data = df, id_var = "id", val_var = "v1")
```

## sdc.info_level

This option differs from the previous ones in the sense that is does not affect
actual calculations. Instead, it determines the verbosity of the output of
sdcLog functions. Possible values are `0`, `1` (default), and `2`. Before
demonstrating the effects of `sdc.info_level`, we reset `sdc.share_dominance` to
it's default value of `0.85`.

```{r reset_options2, include=FALSE}
options(sdc.share_dominance = 0.85)
```

The example below shows the different levels of information printed to the
console based on the different levels of `sdc.info_level`:

```{r example_sdc.info_level}
for (i in 0:2) {
  options(sdc.info_level = i)
  cat("\nsdc.info_level: ", getOption("sdc.info_level"), "\n")
  print(sdc_descriptives(data = df, id_var = "id", val_var = "v1"))
}
```

At level `0`, only options and settings are printed. Level `1` also prints a
short message about the overall outcome of the checks. Level `2` additionally
prints the results of the separate checks on distinct entities and dominance.


# Option to make sdcLog more convenient

Usually, the ID variable does not change during the course of your analysis.
Therefore, it is convenient to set

```{r sdc.id_var}
options(sdc.id_var = "id")
```

Then you do not have to specify `id_var` every time you use one of the `sdc_*`
functions:

```{r reset options,echo=-1}
options(user_options)
sdc_descriptives(data = df, val_var = "v1")
```

# General remarks
Please note that these options affect all functions of sdcLog, not just
`sdc_descriptives()`.
