---
title: "Advanced features"
author: "George G. Vega Yon"
date: "July 19th, 2021"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Advanced features}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(fig.dim=c(9, 9), out.width=600, out.height=600)
```

# Life expectancy in the US

Before we jump into coding, let's start by loading the package and
the data that we will be using; the `lifeexpect` database.
This data set was simulated using official statistics and research on life expectancy
in the US (see `?lifeexpect` for more details). Each row corresponds to the
observed age of an individual at the time of disease.

```{r loading}
library(fmcmc)
data(lifeexpect)
head(lifeexpect) # Taking a look at the data
```

The data is characterized by the following model:

\begin{equation*}
y_i \sim \mbox{N}\left(\theta_0 + \theta_{smk}smoke_i + \theta_{fem}female_i, \sigma^2\right)
\end{equation*}

So the logposterior can be written as:

\begin{equation*}
\sum_i \log\phi\left(\frac{y_i - (\theta_0 + \theta_{smk}smoke_i + \theta_{fem}female_i)}{\sigma}\right)
\end{equation*}

Where $y_i$ is the age of the i-th individual. Using R, we could write the
logposterior as follows:

```{r logpost}
logpost <- function(p, D) {
  sum(dnorm(
    D$age - (p[1] + p[2] * D$smoke + p[3] * D$female),
    sd = p[4],
    log = TRUE
  ))
}
```


# Operating within the loop

In some cases, the user would like to go beyond what `MCMC()` does. In those cases,
we can directly access the environment in which the main loop of the `MCMC`-call
is being executed, using the function `ith_step()`.

With `ith_step()`, we can access the environment containing the existing elements
while the MCMC loop occurs. Among these, we have: `i` (the step number),
`logpost` (a vector storing the trace of the unnormalized logposterior), `draws`,
(a matrix storing the kernel's proposed states), etc. The complete list of available
objects is available either in the manual or when printing the function:

```{r print-help}
# This will show the available objects
ith_step
```

For example, sometimes accidents happen, and your computing environment could
crash (R, your PC, your server, etc.). It could be a good idea to keep track
of the current state of the chain. A way to do this is printing out the state
of the process every n-th step. 

Using the `lifeexpect` data, let's rewrite the `logpost()` function using 
`ith_step()`. We will print the latest accepted state every 1,000 steps:

```{r logpost2}
logpost2 <- function(p, D) {

  # Getting the number of step
  i <- ith_step("i")

  # After the first iteration, every 1000 steps:
  if (i > 1L && !(i %% 1000)) {

    # The last accepted state. Accepted states are
    # stored in -ans-.
    s <- ith_step()$ans[i - 1L,]

    cat("Step: ", i, " state: c(\n  ", paste(s, collapse = ", "), "\n)\n", sep = "")

  }

  # Just returning the previous value
  logpost(p, D)

}
```

Note that the posterior distribution, i.e., accepted states, is stored in the matrix
`ans` within the MCMC loop. Let's use the Robust Adaptive Metropolis Kernel to fit this model.
Since we need to estimate the standard error, we can set a lower-bound for the variables.
For the starting point, let's use the vector `[70, 0, 0, sd(age)]` (more than a good
guess!):

```{r haario-with-ith-step, echo = TRUE}
# Generating kernel
kern <- kernel_ram(warmup = 1000, lb = c(-100,-100,-100,.001))

# Running MCMC
ans0 <- MCMC(
  initial  = c(70, 0, 0, sd(lifeexpect$age)),
  fun      = logpost2, 
  nsteps   = 10000,    
  kernel   = kern, 
  seed     = 555,
  D        = lifeexpect,
  progress = FALSE
  )

```

The `ith_step()` makes `MCMC` very easy to tailor. Now what happens when we deal
with multiple chains?

# Using ith_step() with multiple chains

Using the function `ith_step()` could be of real help when dealing with multiple
chains in a single run. In such a case, we can use the variable `chain_id` that
can be found with `ith_step()`. From the previous example:

```{r lupost3, eval = TRUE}
logpost3 <- function(p, D) {

  # Getting the number of step
  i <- ith_step("i")

  # After the first iteration, every 1000 steps:
  if (i > 1L && !(i %% 1000)) {

    # The last accepted state. Accepted states are
    # stored in -ans-.
    s <- ith_step()$ans[i - 1L,]
    
    chain <- ith_step("chain_id")

    cat("Step: ", i, " chain: ", chain, " state: c(\n  ",
        paste(s, collapse = ",\n  "), "\n)\n", sep = ""
        )

  }
  
  # Just returning the previous value
  logpost(p, D)

}

# Rerunning using parallel chains
ans1 <- MCMC(
  initial   = ans0,
  fun       = logpost3, # The new version of logpost includes chain
  nsteps    = 1000,    
  kernel    = kern,     # Reusing the kernel
  thin      = 1,       
  nchains   = 2L,       # Two chains, two different prints     
  multicore = FALSE,
  seed      = 555,
  progress  = FALSE,
  D         = lifeexpect
  )
```

Using `ith_state()` increases the computational burden of the process. Yet,
since most of the load lies on the objective function itself, the additional
time can be neglected.

# Saving information as it happens

Another thing the user may need to do is storing data as the MCMC algorithm runs.
In such cases, you can use the `set_userdata()` function, which, as the name suggests,
will store the required data.

For a simple example, suppose we wanted to store the proposed state, we could do it in the
following way:

```{r lupost4}
logpost4 <- function(p, D) {

  # Timestamp
  set_userdata(
    p1 = p[1],
    p2 = p[2],
    p3 = p[3]
  )

  # Just returning the previous value
  logpost(p, D)

}

# Rerunning using parallel chains
ans1 <- MCMC(
  initial   = ans0,
  fun       = logpost4, # The new version of logpost includes chain
  nsteps    = 1000,    
  kernel    = kern,     # Reusing the kernel
  thin      = 10,       # We are adding thinning
  nchains   = 2L,       # Two chains, two different prints     
  multicore = FALSE,
  seed      = 555,
  progress  = FALSE,
  D         = lifeexpect
  )
```

In this case, since `nchains == 2`, `MCMC` will store a list of length two
with the user data. To retrieve the generated data frame, we can call the function
`get_userdata()`. We can also inspect the `MCMC_OUTPUT` as follows:

```{r inspect-and-plot}
print(MCMC_OUTPUT)
str(get_userdata())
```


