---
title: "RStanTVA Parameter Recovery Study"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{RStanTVA Parameter Recovery Study}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  warning = FALSE
)

```


In this vignette, we are reporting results for a parameter recovery of the `tva_recovery` data set. We need packages `tibble`, `tidyr` and `dplyr` for data wrangling, `ggplot2` for plotting, `RStanTVA` for the actual analyses, and `brms` for specifying priors.

```{r setup}
library(tibble)
library(dplyr)
library(tidyr)
library(ggplot2)
library(RStanTVA)
library(brms)
```

We want to make use of the parallelization built into Stan by activating the corresponding option in RStan:

```{r}
rstan_options(threads_per_chain = parallel::detectCores())
```

# Generative model

## Level 1: Trial

Each trial is simulated assuming partial report in TVA with a global processing capacity $C$, top-down selectivity $\alpha$, hemispheric bias ($r$), probit-distributed memory capacity $K$ (parameters $\mu_K$ and $\sigma_K$), and Gaussian sensory thresholds $t_0$ (parameters $\mu_0$ and $\sigma_0$).

For easier understanding, we will later on only look at the expected $K$ values instead of the distributional parameters $\mu_K$ and $\sigma_K$. The expected value (mean) can be derived from $(\mu_K,\sigma_K)$ samples like this:

```{r meanprobitk}
meanprobitK <- Vectorize(function(mK, sigmaK, nS) {
  p <- pnorm(1:nS, mK, sigmaK)
  lps <- c(0, p)
  ups <- c(p, 1)
  sum(0:nS * (ups - lps))
}, c("mK","sigmaK"))
```

The same is often done for $t_0$, where the expected value of a Gaussian $t_0$ is simply $\mu_0$.

*Note:* For a demonstration of simulated TVA trials, see the `ShinyTVA` app, which can be run as follows:

```r
shiny::runGitHub("mmrabe/ShinyTVA")
```

## Level 2: Experimental condition

Processing speed $C$ differs between the "low" and "high" conditions so that $C_\textrm{low}=`r exp(tva_recovery_true_params$b["C_Intercept"])`$ and $C_\textrm{high}=`r exp(tva_recovery_true_params$b["C_Intercept"]+tva_recovery_true_params$b["C_conditionhigh"])`$. There are no true differences between the conditions for any of the other model parameters. 

## Level 3: Subject-level effects (hyperparameters)

Each of the `r n_distinct(tva_recovery$subject)` subjects has their individual parameter configuration and condition-level effects, and all of those parameters were randomly sampled from a multivariate normal distribution so that:

$$
\begin{pmatrix}
a_{\log C}\\
b_{\log C}\\
a_{\log\alpha}\\
b_{\log\alpha}\\
a_{\mu_K}\\
b_{\mu_K}\\
a_{\log\sigma_K}\\
b_{\log\sigma_K}\\
a_{\log\sigma_0}\\
b_{\log\sigma_0}\\
a_{\mu_0}\\
b_{\mu_0}\\
a_{\textrm{logit}(r)}\\
b_{\textrm{logit}(r)}
\end{pmatrix}\sim\mathcal{N}_{`r length(tva_recovery_true_params$b)`}\left(\begin{pmatrix}
`r paste0(tva_recovery_true_params$b, collapse = "\\\\")`
\end{pmatrix},\begin{pmatrix}
`r (tva_recovery_true_params$r_subject * tcrossprod(tva_recovery_true_params$s_subject)) %>% apply(1, paste, collapse = "&") %>% paste(collapse="\\\\")`
\end{pmatrix}\right)
$$

## Simulations


The data frame `tva_recovery` contains simulated CombiTVA trials for `r n_distinct(tva_recovery$subject)` subjects (`r nrow(tva_recovery)/n_distinct(tva_recovery$subject)` trials per subject, =`r nrow(tva_recovery)` trials total). Half of the trials of each subject were performed in a "low" condition and the other half in a "high" condition:

```{r load-data}
data("tva_recovery")
head(tva_recovery)
```

The package also contains the true values used for the simulation (including the subject-level parameters):

```{r load-true-vals}
data("tva_recovery_true_params")
str(tva_recovery_true_params)
```

These can be conveniently converted to data frame for later comparison with fitted values:

```{r narrow-true-vals}
true_params <- tva_recovery_true_params$coef_subject

true_params_narrow <- true_params %>% mutate(`italic(w)[lambda]` = w[,2], `italic(K)` = meanprobitK(mK,sK,6))  %>% select(-w,-mK,-sK) %>% rename(`italic(t)[0]` = mu0, `sigma[0]` = sigma0, `italic(C)` = C) %>% pivot_longer(-c(subject,condition), names_to = "param", values_to = "true_value")
head(true_params_narrow)

true_params_narrow2 <- bind_rows(
  true_params_narrow %>% filter(condition == "low" & param != "italic(C)") %>% select(-condition),
  true_params_narrow %>% filter(param == "italic(C)") %>% transmute(subject, param = sprintf("%s[%s]", param, condition), true_value)
)
head(true_params_narrow2)
```


*Note:* The R script that generates `tva_recovery` is located in the package source at `data-raw/tva_recovery.R`.


# Model likelihood

For a single trial, given the effective exposure duration $\tau=t-t_0$ of the set of displayed items $S$, and predicted individual processing rates $\mathbf v$ for those items, the probability of a memory report $M$ is given as:

$$
P_{\mathrm{M}}\left(M\mid\tau,S,K,\mathbf{v}\right)
=\begin{cases}
1 & \textrm{if }M=\emptyset\textrm{ and }K=0\\
1 & \textrm{if }M=\emptyset\textrm{ and }\tau=0\\
\bar{F}\left(\tau;\sum_{z\in S}v_{z}\right) & \textrm{if }M=\emptyset\textrm{ and }K>0\textrm{ and }\tau>0\\
\prod_{y\in M}F\left(\tau;v_{y}\right)\prod_{z\in S\setminus M}\bar{F}\left(\tau;v_{z}\right) & \textrm{if }0<\left|M\right|<K\leq\left|S\right|\textrm{ and }\tau>0\\
\prod_{y\in M}F\left(\tau;v_{y}\right)\prod_{z\in S\setminus M}\bar{F}\left(\tau;v_{z}\right) & \textrm{if }0<\left|M\right|=\left|S\right|\leq K\textrm{ and }\tau>0\\
\sum_{x\in M}\int_{0}^{\tau}f\left(t;v_{x}\right)\prod_{y\in M\setminus\left\{ x\right\} }F\left(t;v_{y}\right)\prod_{z\in S\setminus M}\bar{F}\left(t;v_{z}\right)\mathrm{d}t & \textrm{if }0<\left|M\right|=K<\left|S\right|\textrm{ and }\tau>0\\
0 & \textrm{otherwise}
\end{cases}\
$$

For partial reports (i.e., when the display set contains distractors $S_{{\rm D}}$ and targets $S_{{\rm T}}$, we need to iterate over the set of potential subsets of $S_{{\rm D}}$, powerset $\mathcal{P}_{\leq K-\left|R_{{\rm T}}\right|}\left(S_{{\rm D}}\right)$, that could have made it into VSTM before $\tau$, in addition to the actually reported target items $R_\mathrm{T}$:

$$
P_{{\rm PR}}\left(R_{{\rm T}}\mid\tau,S_{{\rm T}},S_{{\rm D}},K,\mathbf{v}\right)
=\begin{cases}
\sum_{R_{{\rm D}}\in\mathcal{P}_{\leq K-\left|R_{{\rm T}}\right|}\left(S_{{\rm D}}\right)}P_{{\rm M}}\left(R_{{\rm T}}\cup R_{{\rm D}}\mid\tau,S_{{\rm T}}\cup S_{{\rm D}},K,\mathbf{v}\right) & \textrm{if }S_{{\rm D}}\neq\emptyset\textrm{ and }\tau\geq0\\
P_{{\rm M}}\left(R_{{\rm T}}\mid\tau,S_{{\rm T}},K,\mathbf{v}\right) & \textrm{if }S_{{\rm D}}=\emptyset\textrm{ and }\tau\geq0\\
0 & \textrm{otherwise}
\end{cases}
$$

Note that $P_{{\rm PR}}$ simplifies to $P_{\mathrm{M}}$ if there were no distractors ($S_{{\rm D}}=\emptyset$), i.e. when the trial was effectively a whole-report trial.

# Parameter recovery

Being based on RStan, RStanTVA can be used in different ways, i.e. using maximum-likelihood estimation or Bayesian inference of different hierarchical complexity (single-condition, single-participant/fixed-effects, fully-hierarchical/mixed-effects). Hence we can also recover model parameters in different ways.

## Maximum-likelihood estimation (MLE)

Using MLE, we can define four different inference models:

### RStanTVA-ML: Single participant, single condition

The most straightforward model uses no regularization by priors (`priors = FALSE`) and conducts single fits for all model parameters:

```r
m <- stantva_model(
  locations = 6,
  task = "pr",
  regions = list(left = 1:3, right = 4:6),
  w_mode = "regions",
  t0_mode = "gaussian",
  K_mode = "probit",
  save_log_lik = FALSE,
  priors = FALSE
)
```

### RStanTVA-MLN: Single participant, condition as fixed effect

By defining a regression model for $\log C$, which includes an intercept and a treatment effect of "high" vs. "low" (`log(C) ~ 1 + condition`), we can model both experimental conditions in a single model:

```r
m_nested <- stantva_model(
  formula = list(log(C) ~ 1 + condition),
  locations = 6,
  task = "pr",
  regions = list(left = 1:3, right = 4:6),
  w_mode = "regions",
  t0_mode = "gaussian",
  K_mode = "probit",
  save_log_lik = FALSE,
  priors = FALSE
)
```

### RStanTVA-MLR: Single participant, single condition, with regularization (priors)

Omitting the `priors = FALSE` argument falls back to the default `priors = NULL`, which uses vaguely informative default priors for all parameters. The model is otherwise identical to the baseline RStanTVA-ML model above:

```r
m_reg <- stantva_model(
  locations = 6,
  task = "pr",
  regions = list(left = 1:3, right = 4:6),
  w_mode = "regions",
  t0_mode = "gaussian",
  K_mode = "probit",
  save_log_lik = FALSE
)
```

### RStanTVA-MLRN: Single participant, condition as fixed effect, with regularization (priors)

When using regularizing priors, if a parameter is described by an equation such as $C$, we need to specify priors for intercepts and slopes:

```r
priors <-
  prior(normal(0,.5),dpar=C)+
  prior(normal(4.5,.6),dpar=C,coef=Intercept)

m_nested_reg <- stantva_model(
  formula = list(log(C) ~ 1 + condition),
  locations = 6,
  task = "pr",
  regions = list(left = 1:3, right = 4:6),
  w_mode = "regions",
  t0_mode = "gaussian",
  K_mode = "probit",
  save_log_lik = FALSE,
  priors = priors
)
```

For brevity, we are only reporting results of the RStanTVA-MLRN fits here, since those have been found to be most reliable (see manuscript). The model can be fitted to the individual data sets (each participant) as follows:

```{r fit_mle_nested_reg, include=FALSE}
fit_mle_nested_reg <- readRDS("tva_recovery_cache/fit_mle_nested_reg.rds")
```

```r
fit_mle_nested_reg <- lapply(1:50, function(i) {
  d <- tva_recovery %>% filter(subject == i)
  repeat {
    p <- optimizing(m_nested_reg, d)
    if(p$return_code == 0L) break
  }
  tibble(subject = i, param = c("italic(C)[low]","italic(C)[high]","alpha","italic(w)[lambda]","italic(t)[0]","sigma[0]","italic(K)"), fitted_value = c(exp(p$par["C_Intercept"]), exp(p$par["C_Intercept"]+p$par["C_conditionhigh"]), p$par[c("alpha","r[2]","mu0","sigma0")], meanprobitK(p$par["mK"],p$par["sK"],6)), converged = p$return_code == 0L)
}) %>% bind_rows() %>% left_join(true_params_narrow2)
```

The critical function above is `optimizing(m_nested_reg, d)`, which uses MLE to fit `m_nested_reg` to the subject-level subset `d`.


## RStanTVA-NB: Bayesian non-hierarchical inference

We can, in principle, fit the exact same models as above to the data using Bayesian inference instead of MLE, simply by replacing the `optimizing(...)` with a `sampling(...)` function call. So the following code will fit the RStanTVA-MLRN (`m_nested_reg`) model to subject-level subsets `d` using HMC methods in Stan:

```{r fit_bayes_fixed, include=FALSE}
fit_bayesian_nested <- readRDS("tva_recovery_cache/fit_bayesian_nested.rds")
```


```r
fit_bayesian_nested <- lapply(1:50, function(i) {
  d <- tva_recovery %>% filter(subject == i)
  sf <- sampling(m_nested_reg, d)
  p1 <- predict(sf, data.frame(subject = i, condition = "low"), c("C","alpha", "r", "mu0","sigma0","mK","sK"))
  p2 <- predict(sf, data.frame(subject = i, condition = "high"), "C")
  tibble(
    `italic(C)[low]` = t(p1$C),
    `italic(C)[high]` = t(p2$C),
    `alpha` = t(p1$alpha),
    `italic(w)[lambda]` = t(p1$r[,2,1]),
    `italic(t)[0]` = t(p1$mu0),
    `sigma[0]` = t(p1$sigma0),
    `italic(K)` = t(meanprobitK(p1$mK,p1$sK,6))
  ) %>%
    pivot_longer(everything(), names_to = "param", values_to = "samples") %>%
    rowwise(param) %>%
    reframe(subject = i, posterior_sd = sd(samples), fitted_value = median(samples), cri = t(quantile(samples, c(.025,.975))), converged = Rhat(matrix(samples, ncol = sf@sim$chains)) < 1.1)
}) %>% 
  bind_rows() %>%
  left_join(true_params_narrow2)
```

Note that we must use the method `predict(fit, data, parameters)` to predict the TVA parameter values from the fitted parameters. This will return a posterior distribution for each row (subject) in `data` and each specified parameter. We are then calculating 95% credible intervals (CrIs), medians, $\hat R$ and posteriors SDs as measures of goodness-of-fit.


## RStanTVA-HB*: Bayesian hierarchical inference

For hierarchical inference, we define mixed-effects regressions for the model parameters $C$, $\log\alpha$, $\mu_K$, $\mu_0$, and $\textrm{logit}(r)$, while keeping $\sigma_0$ and $\sigma_K$ fixed across all subjects:

```r
m_hierarchical <- stantva_model(
  formula = list(
    C ~ 1 + condition + (1 + condition | subject), 
    log(alpha) ~ 1 + (1 | subject), 
    mK ~ 1 + (1 | subject), 
    mu0 ~ 1 + (1 | subject), 
    log(r) ~ 1 + (1 | subject)
  ),
  locations = 6,
  task = "pr",
  regions = list(left = 1:3, right = 4:6),
  C_mode = "equal",
  w_mode = "regions",
  t0_mode = "gaussian",
  K_mode = "probit",
  priors = 
    prior(normal(90, 20/sqrt2()), coef = Intercept, dpar = C) + 
    prior(normal(0, 10/sqrt2()), dpar = C) + 
    prior(gamma(2, 2/20 * sqrt2()), class = sd, coef = Intercept, dpar = C) + 
    prior(gamma(2, 2/10 * sqrt2()), class = sd, dpar = C) + 
    prior(normal(-0.4, 0.6/sqrt2()), coef = Intercept, dpar = alpha) + 
    prior(normal(0, 0.3/sqrt2()), dpar = alpha) + 
    prior(gamma(2, 2/0.6 * sqrt2()), class = sd, coef = Intercept, dpar = alpha) + 
    prior(gamma(2, 2/0.3 * sqrt2()), class = sd, dpar = alpha) + 
    prior(normal(0, 0.1/sqrt2()), coef = Intercept, dpar = r) + 
    prior(normal(0, 0.05/sqrt2()), dpar = r) + 
    prior(gamma(2, 2/0.1 * sqrt2()), class = sd, coef = Intercept, dpar = r) + 
    prior(gamma(2, 2/0.05 * sqrt2()), class = sd, dpar = r) + 
    prior(normal(3.5, 0.5/sqrt2()), coef = Intercept, dpar = mK) + 
    prior(normal(0, 0.25/sqrt2()), dpar = mK) + 
    prior(gamma(2, 2/0.5 * sqrt2()), class = sd, coef = Intercept, dpar = mK) + 
    prior(gamma(2, 2/0.25 * sqrt2()), class = sd, dpar = mK) + 
    prior(normal(20, 15/sqrt2()), coef = Intercept, dpar = mu0) +
    prior(normal(0, 7.5/sqrt2()), dpar = mu0) + 
    prior(gamma(2, 2/15 * sqrt2()), class = sd, coef = Intercept, dpar = mu0) + 
    prior(gamma(2, 2/7.5 * sqrt2()), class = sd, dpar = mu0) + 
    prior(normal(0.5, 0.05/sqrt2()), coef = Intercept, dpar = sK) + 
    prior(normal(0, 0.025/sqrt2()), dpar = sK) + 
    prior(gamma(2, 2/0.05 * sqrt2()), class = sd, coef = Intercept, dpar = sK) + 
    prior(gamma(2, 2/0.025 * sqrt2()), class = sd, dpar = sK) + 
    prior(normal(20, 15/sqrt2()), coef = Intercept, dpar = sigma0) + 
    prior(normal(0, 7.5/sqrt2()), dpar = sigma0) + 
    prior(gamma(2, 2/15 * sqrt2()), class = sd, coef = Intercept, dpar = sigma0) + 
    prior(gamma(2, 2/7.5 * sqrt2()), class = sd, dpar = sigma0)
)
```

We can then simply fit the hierarchical model `m_hierarchical` to the entire data set `tva_recovery` without subsetting:

```{r fit_bayesian_hierarchical_globals, include=FALSE}
fit_bayesian_hierarchical_globals <- readRDS("tva_recovery_cache/fit_bayesian_hierarchical_globals.rds")
```

```r
fg <- sampling(m_hierarchical, tva_recovery)

fit_bayesian_hierarchical_globals <-
  {
    p1 <- predict(fg, tibble(subject = 1:50, condition = "low"), c("C","alpha", "r", "mu0","sigma0","mK","sK"))
    p2 <- predict(fg, tibble(subject = 1:50, condition = "high"), "C")
    tibble(
      subject = 1:50,
      `italic(C)[low]` = t(p1$C),
      `italic(C)[high]` = t(p2$C),
      `alpha` = t(p1$alpha),
      `italic(w)[lambda]` = t(p1$r[,2,]),
      `italic(t)[0]` = t(p1$mu0),
      `sigma[0]` = t(p1$sigma0),
      `italic(K)` = t(matrix(meanprobitK(p1$mK,p1$sK,6),ncol=50))
    )
  } %>%
  pivot_longer(-subject, names_to = "param", values_to = "samples") %>%
  rowwise(c(subject,param)) %>%
  reframe(posterior_sd = sd(samples), fitted_value = median(samples), cri = t(quantile(samples, c(.025,.975))), converged = Rhat(t(samples)) < 1.1) %>%
  left_join(true_params_narrow2)

```


## Comparison of MLE and Bayes

```{r plot-bayes, fig.width = 8, fig.height = 5, dpi = 100}
figdat <- bind_rows(
  fit_mle_nested_reg %>% add_column(method = "MLRN"),
  fit_bayesian_hierarchical_globals %>% add_column(method = "HB"),
  fit_bayesian_nested %>% add_column(method = "NB")
) %>%
  mutate(method = factor(method, levels = c("MLRN","NB","HB")))

fig <- ggplot(figdat) +
  theme_minimal() +
  theme(text = element_text(family = "sans", size = 10)) +
  labs(x = "True value", y = "Recovered value") +
  facet_wrap(method~param, ncol = 7, scales = "free", labeller = function(x) label_parsed(list(sprintf("%s~(\"%s\")", x$param, x$method)))) +
  geom_abline(linetype = "dashed") +
  geom_linerange(aes(x=true_value,ymin=cri[,1],ymax=cri[,2]), color = "gray50", linewidth = 0.2) +
  geom_point(aes(x=true_value,y=fitted_value), size = 0.5, color = "blue") +
  geom_point(aes(x=vx,y=vy), color = "transparent", figdat %>% group_by(param) %>% reframe(vx = range(true_value), vy = range(c(fitted_value,cri), na.rm = TRUE)))

print(fig)

```