---
author: 
  - name: Jonathan Ö. Rittmo
    affiliation: University of Edinburgh
    address: |
      | Human Cognitive Neuroscience
      | Psychology
    email: j.rittmo@gmail.com
  - name: Robert D. McIntosh
    affiliation: University of Edinburgh
    address: |
      | Human Cognitive Neuroscience
      | Psychology
    email: r.d.mcintosh@ed.ac.uk
title: "The R package singcar: Comparing Single Cases to Small Samples"
  # formatted: "The \\proglang{R} package \\pkg{singcar}: Comparing Single Cases to Small Samples"
  # plain:     "The R package singcar: Comparing Single Cases to Small Samples"
  # short:     "\\pkg{singcar}: Comparing Single Cases to Small Samples"
abstract: >
    Statistical comparison of single cases to small samples is a methodology that
    has been extensively used in, for example, cognitive and clinical neuropsychology. 
    This is most often done to determine changes in cognitive processing after 
    an individual has incurred some type of brain damage. In a clinical setting
    one often wish to infer whether a patient exhibit abnormally low performance
    on some cognitive ability. In cognitive neuropsychology on the other hand one
    often wish to infer if a patient exhibits an abnormally large discrepancy in
    performance between two cognitive abilities. Because cognitive abilities seldom
    are well represented by one single performance on one single task one might
    additionally be interested in the abnormality of a case on  several measurements
    converging on a cognitive ability of interest, or the abnormality of a case
    in a multivariate space. Several methods to estimate case abnormality have been 
    developed that keeps the Type I error rate at its nominal level. However, they
    have not been available in any standard statistical software environment and
    their documentation is spread thin across multiple articles and compiled
    computer programs. This vignette aims to gather and review the most popular 
    methods while presenting them and their usage in the `R` package `singcar`.
    Of note are the more flexible and useful methods that have not received as
    much spread as the simpler. These include techniques using Bayesian regression 
    to allow for the inclusion of covariates and linear mixed models to handle
    repeated measures data. Additionally, statistical comparison of single cases
    to a control population are inherently low powered. To facilitate experimental 
    planning and design power calculators have been implemented in `singcar` and the 
    concept of power for this type of statistical analysis is reviewed.
keywords: "single case comparisons, small samples, R"
  # formatted: [single case comparisons, small samples, "\\proglang{R}"]
  # plain:     [single case comparisons, small samples, R]
bibliography: ref.bib
link-citations: true
preamble: >
  \usepackage{setspace}\setstretch{1.3}
  \usepackage{float}
  \floatplacement{figure}{H}
  \usepackage{amsmath}
  \usepackage{bm}
  \usepackage{amsfonts}
  \usepackage{longtable}
  \usepackage{caption}
  \usepackage{pbox}
  \usepackage{booktabs}
  \usepackage{graphicx}
  \usepackage{cleveref}
  \usepackage{cancel}
  \renewcommand{\eqref}{\Cref}
  \Crefformat{equation}{#2#1#3}
# \usepackage{cleveref}
# \renewcommand{\eqref}{\Cref}
# \Crefformat{equation}{#2#1#3}
output:
    bookdown::html_document2:
      base_format: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{The R package singcar: Comparing Single Cases to Small Samples}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, setup, include=FALSE}
options(prompt = 'R> ', continue = '+ ')

library(singcar)
library(MASS)
library(lme4)
library(lmerTest)
```

# Introduction {#section1}

There might be several needs to probabilistically determine
if a single observation belongs to some population distribution.
In some areas, such as neuropsychology, this methodology has been
essential for the advancement of the field. This is because brain-damaged patients
with patterns of associated cognitive functional impairments often are unique in each case.
A single patient might then be the only available source of data for
a phenomenon of interest. In addition, clinical diagnoses and description 
of patient impairments are always performed for single cases. 

Patients with brain damage are hence often compared to a distribution of
cognitive functioning in the healthy population,
both in research and clinical contexts. We do however not always know
the population parameters of such a distribution, and when we do not,
these must be estimated from a sample -- we compare the single case
to a control sample. There are other potential application areas
of this method such as studies of uncommon expertise, targeted quality 
checks in small production industries or some animal studies
where rearing large experimental groups might be too costly or unethical.

However, since these methods most commonly are used within the field
of neuropsychology the related nomenclature will be adopted. Abnormally
low score on some variate compared to the control sample will be referred
to as a *deficit*, important both in clinics and in research. For the latter
another concept is also considered to be of great interest: namely an abnormally
large discrepancy between two variates. This is known as a *dissociation* and 
provides evidence for independence between cognitive abilities. By mapping
such discrepancies it is possible to build theories of the architecture of the 
human mind [@shalliceNeuropsychologyMentalStructure1988].

During the last 20 years methods allowing researchers to estimate abnormality and test
for deficits and dissociations with retained control of Type I errors have
been developed. This has mainly been done by John Crawford and Paul
Garthwaite, e.g. @crawfordComparingIndividualTest1998, @crawfordComparingSingleCase2011, @crawfordComparisonSingleCase2007, @crawfordInvestigationSingleCase2002 and @crawfordTestingSuspectedImpairments2005.
These methods have been implemented in freely available software.
However, only as standalone programs, not open source and only for Windows. 
Most of the programs require manual input of summary statistics and cannot
be used on raw data. Full documentation of the implemented methods are only
available in the original publications.

This was the motivation behind the development of the `R` package `singcar`
[@rittmoSingcarComparingSingle2021]. We wanted to encourage and simplify the usage of these
methods by bringing them together in a fully documented package with open source
code that works across platforms. In Section \@ref(section2) all implemented methods
will be described in detail. This contribution will provide a comprehensive
overview of the methods available together with their advantages and disadvantages.

The development of Crawford and Garthwaite's methods has been focused around
limiting Type I errors. But in recent work we showed the inherent inability of
controlling Type II errors using this experimental design and argued that this
limitation might even be more detrimental than an increase Type I error rate,
for some applications of the methods [@mcintoshPowerCalculationsSinglecase2020].
Therefore we also provide power calculators for each test function in the
package.

When we conduct single case comparisons we want to estimate the probability that
a random draw from the distribution estimated by the sample is more extreme than
the single observation we are interested in. This means that the power of a case
comparison hypothesis test is contingent on the standard deviation of the
distribution, not on the standard error of the test statistic. Increasing the
sample size then (which usually is done to increase power) only leads to a
better estimation of the control distribution, not a reduction of the the
standard error. Hence, the limit of achievable power is mostly contingent on the
size of the effect. When investigating unique cases it is not always possible to
target larger effects to get rid of this problem.

One approach that often is highlighted for its potential to increase power is
multivariate hypothesis testing. This is however only true to a certain extent
[@franePowerTypeError2015] and one should not force a multivariate analysis on a
univariate research question. Focus hitherto has been univariate in
nature, but for a neuropsychological dissociation to be convincing the optimal
experimental design would include multiple tasks converging on the cognitive
functions of interest, performed on more than one occasion
[@shalliceNeuropsychologyMentalStructure1988;
@mcintoshPowerCalculationsSinglecase2020]. Integrating such information
both across tasks as well as over several performances could be aided by 
multivariate analyses. 

This calls for the need of methods that estimate a case's abnormality in a
multidimensional space estimated by a sample. The most obvious way to do this
would of course entail the $p$ value of a Hotelling's $T^2$-score. However,
@elfadalyPointEstimationAbnormality2016 show that this statistic is
substantially biased as a measure of abnormality. They investigate nine other candidates including two point
estimates based on polynomial expansion, which gave rise to the lowest bias.
These point estimates would, however, sometimes yield estimates out of the [0,
1] range and their interpretation is not very intuitive. Because ease of
interpretation is paramount in scientific reporting
@elfadalyPointEstimationAbnormality2016 suggested using an adapted median
estimator when bias is not of "immediate importance", in which case the
polynomial estimators should be used. A compiled computer program 
for this adapted median estimator is available [@garthwaiteModifiedConfidenceIntervals2017],
but again, no open source software working across platforms. 
This relatively novel estimation method has also been implemented in 
`singcar`.

The following section, Section \@ref(section2), will review available
methods for single case comparisons, their theory and how they can be
used in `R`. Section \@ref(power) gives a brief review of statistical
power in this experimental design and the implementation of power 
calculators in `singcar`, but for a more thorough and perhaps
more accessible review see @mcintoshPowerCalculationsSinglecase2020.


# Comparing a single case to small samples {#section2}

## Background

Variates of interest will in a neuropsychological context often be scores on
some task assessing a cognitive function. Therefore they will often be referred
to as "task scores". It has not been uncommon in this field to compare patients
to a control sample, treating the sample statistics as population parameters and
estimating deficits by evaluating the $p$ value associated with the patient's
$Z$ score from the estimated distribution. Estimating dissociations was done in
a similar way using a method developed by
@payneStatisticsInvestigationIndividual1957, where a $Z$ score is calculated
from the distribution of difference scores in the control sample.

This is of course problematic if small samples are used, which is often the case
in neuropsychology, since the sampling distribution of the sample variance
is right skewed for small sample sizes. If one derives a test statistic
that is divided by a skewed distribution the size of the obtained values
would be inflated and thus the Type I error rate [@crawfordComparingIndividualTest1998].

In the following sections I will describe methods that has been devised
as replacements to the $Z$ score methods. Both frequentist and Bayesian 
significance tests have been developed and are covered in Section \@ref(sec22)
and Section \@ref(sec23) respectively. More complex model designs can be 
achieved using Bayesian regression techniques, described in Section \@ref(cov), 
or linear mixed models, described in Section \@ref(lmm). For estimating
multivariate abnormality, see Section \@ref(section4).


### Example data {#exdata}

In the following sections the theory behind the methods in `singcar` is
reviewed. But at the end of each subsection the usage of the implementations
will be exemplified. To aid this the package comes with the dataset
`size_weight_illusion`, which is a dataset from a neuropsychological study
investigating the size-weight illusion in a well known patient "DF"
[@hassanSizeweightIllusionVisual2020]. DF incurred damage to the lateral
occipital complex which gave rise to visual form agnosia, potentially making her less
susceptible to the size-weight illusion [@dijkermanVisuomotorPerformancePatient2004a].
This illusion is a perceptual phenomenon making objects of small size be
perceived as heavier during lifting than objects of larger size but equal in
weight [@buckinghamGettingGripHeaviness2014]. 

However, due to the location of DF's brain damage one would only suspect
less susceptibility to the illusion for visual cues since she other sensory
processing should be unaffected of her damage. In the study by @hassanSizeweightIllusionVisual2020
the illusion is tested given visual cues in one condition and kinaesthetic
cues in another. It was predicted that she would be abnormally less susceptible to
visual size-weight illusion and that there would be an abnormally large discrepancy
between visual and kinaesthetic  size-weight illusion. The control sample consisted
of 28 participants and their age and sex was collected along with their performance
in the two conditions. The size-weight illusion is given in the dataset as a 
scaled measure of the number of grams weight difference perceived per cubic cm
of volume change. To use this data, start by installing and loading the package:

```{r scload, eval= FALSE, echo = TRUE}
install.packages("singcar")
library("singcar")
```
```{r dataset, echo = TRUE}
head(size_weight_illusion) 
```

`V_SWI` and `K_SWI` are the measurements for visual and kinaesthetic 
size-weight illusion respectively. Most functions in `singcar`
can take summary data as input, but for demonstrative purposes the raw
data of this dataset will be used. To illustrate usage
of the methods more generally, the scores of the variates of interest
as well as of the covariate `YRS` is extracted from the data and renamed:
```{r patex, echo = TRUE}
caseA <- size_weight_illusion[1, "V_SWI"]
contA <- size_weight_illusion[-1, "V_SWI"]
caseB <- size_weight_illusion[1, "K_SWI"]
contB <- size_weight_illusion[-1, "K_SWI"]
caseAGE <- size_weight_illusion[1, "YRS"]
contAGE <- size_weight_illusion[-1, "YRS"]
```
Here, `caseA` and `caseB` refers to the case scores on some task A
and B respectively. Similarly,  `contA` and `contB` refers to the scores
of the control sample on the same tasks. 

## Frequentist approaches {#sec22}

When we are comparing a patient to a healthy population on some normally
distributed variate that we are estimating from a small sample, we are in
fact not estimating abnormality from a normal distribution but a central
$t$ distribution. 

Hence, we must use a $t$ score rather than a $Z$ score if we want to estimate
deficits without inflating effect sizes. This was noted by
@sokalBiometryPrinciplesPractice1981 (p. 227) where they devised a test
statistic for single cases in the biological sciences, but it was first popularised
within neuropsychology by @crawfordComparingIndividualTest1998. Its common
application is as a one-tailed "test of deficit" (TD) that estimate the
proportion of the healthy population that would exhibit a lower score than the
case on some cognitive task. If this score falls below some chosen threshold the
patient can be diagnosed with a deficit.

This basic approach is simply a modified two samples $t$ test, 

\begin{equation}
t_{n-1} = \frac{y^* - \overline{y}}{s \sqrt{\frac{n + 1}{n}}}
(\#eq:TD)
\end{equation}

where
one of the samples are treated as a size of $n = 1$. The degrees of freedom
then quite naturally become $n+1-2 = n -1$.
$\overline{y}, \ s$ and $n$ are the sample mean, standard deviation and size and
$y^*$ is the case score. This method allows transparent control over the Type I
error rate [@crawfordComparingSingleCase2009; @crawfordInferentialMethodsComparing2004; @crawfordSinglecaseResearchNeuropsychology2012] when used for significance
testing. In addition, the $p$ value associated with he obtained $t$ statistic
gives an unbiased point estimate of the abnormality of the case. That is, it
provides us with the expected percentage of the normal population that would
exhibit a more extreme score than the case. This estimate is often 
of main interest in neuropsychology. The simple proof for this is 
demonstrated by @crawfordMethodsTestingDeficit2006 as outlined below.

The percentage of a population expected to score lower on some variate $Y$ than
the case $y^*$ is
$$
\mathbb{P}[Y< y^*].
$$
Subtracting $\overline{y}$ from both sides of the inequality
and dividing by $s \sqrt{\frac{n + 1}{n}}$ gives
$$
\mathbb{P}[Y< y^*] =\mathbb{P}\left[\frac{Y-\overline{y}}{s \sqrt{\frac{n + 1}{n}}} < \frac{y^* - \overline{y}}{s \sqrt{\frac{n + 1}{n}}}\right].
$$
The quantity to the left of the inequality, i.e., \(\frac{y-\overline{y}}{s \sqrt{\frac{n + 1}{n}}}\)
is $t$ distributed with $n-1$ degrees of freedom. Hence,
$$
\mathbb{P}[Y< y^*] =\mathbb{P}\left[t_{n-1} < \frac{y^* - \overline{y}}{s \sqrt{\frac{n + 1}{n}}}\right].
$$
To the right of the inequality we have the test statistic from
 (\@ref(eq:TD)). Therefore, $\mathbb{P}[y< y^*]$ is
the $p$ value from the test of deficit. This simple fact also
makes the construction of confidence intervals for the abnormality (i.e. $p$)
possible. 

Although the $Z$ score is not an appropriate statistic to use for 
significance testing when the control sample is small, it does
provide a standardised effect measure of the deficit of interest
similar to Cohen's $d$, because insensitivity to sample size is a requirement
for effect size measures [@cohenStatisticalPowerAnalysis1988;
@crawfordPointIntervalEstimates2010]. So, @crawfordPointIntervalEstimates2010
proposed the use of $Z$ scores as effect size measures within single-case
neuropsychology and dubbed this measure $Z_{CC}$ where the subscript indicates
"case-controls" comparison:
\begin{equation}
Z_{CC} = \frac{y^* - \overline{y}}{s}.
(\#eq:zcc)
\end{equation}
If $p$ is the percentage of the population that would fall below or above a
case score, $Z_{CC}$ can be used to construct $100(1-\alpha)$% confidence intervals around $p$.
The method to do so was described by @crawfordInvestigationSingleCase2002.

Given $Z_{CC}$ and \(y^* \neq \overline{y}\), $Z_{CC}$ comes from a non-central
$t$ distribution on $n-1$ degrees of freedom. We find the value $\delta_U$ being
the non-centrality parameter (NCP) of a non-central $t$ distribution with $df=n-1$
such that the $100\frac{\alpha}{2}$ percentile equates $Z_{CC}\sqrt{n}$ and correspondingly
we find the NCP $\delta_L$ of a non-central $t$ distribution such that the $100(1-\frac{\alpha}{2})$
percentile equates $Z_{CC}\sqrt{n}$. The easiest approach to finding these quantities
is by deploying a search algorithm. We can then construct the upper and lower boundaries
for $Z_{CC}$ as:
$$
Z_{CC_U} = \frac{\delta_U}{\sqrt{n}}, \ \ Z_{CC_L} = \frac{\delta_L}{\sqrt{n}}.
$$
When the case falls on the left side of the distribution
the boundaries for \(p\) are given by:
$$
p_U = \Phi\left(\frac{\delta_U}{\sqrt{n}}\right), \ p_L = \Phi\left(\frac{\delta_L}{\sqrt{n}}\right),
$$
where $\Phi$ is the CDF of the standard normal distribution. And
when the case falls on the right side of the distribution the boundaries
are given by:
$$
p_U = 1 - \Phi\left(\frac{\delta_L}{\sqrt{n}}\right), \ p_L = 1 - \Phi\left(\frac{\delta_U}{\sqrt{n}}\right).
$$

Evidently, estimating abnormality on a single variate is trivial. It
is equally simple to estimate abnormality on the difference between two
variates $Y_1 - Y_2$ if the normally distributed variates $Y_1$ and $Y_2$ 
do not need standardisation to be comparable. If this is the case then 
the test function is similar to (\@ref(eq:TD)) but one divides
the differences by the standard deviation of the difference scores:
\begin{equation}
t_{n-1} = \frac{(y^*_1 - \overline{y}_1) - (y^* _2 - \overline{y}_2) }{ \sqrt{(s^2_1 +s^2_2 -2s_1 s_2 \rho_{12})(\frac{n+1}{n})}}
(\#eq:UDT)
\end{equation}
where $\rho_{12}$ is the correlation between the variates. Confidence intervals
for this "unstandardised difference test" (UDT) is constructed as for the TD.
Applications testing unstandardised differences is however scarce, at least
within neuropsychology. Much more common is the need to asses abnormality of
differences between variates that require standardisation to be comparable. 

If we just standardise the variates in (\@ref(eq:UDT)) and do not
take sample size into account, we get an effect size measure of the 
difference (dissociation) similar to $Z_{CC}$, (\@ref(eq:zcc)): 
\begin{equation}
Z_{DCC}  = \frac{z^*_1 - z^*_2}{\sqrt{2-2\rho_{12}}}.
(\#eq:PJ)
\end{equation}
The two quantities in the numerator are the standardised case scores on variate
$Y_1$ and $Y_2$ and the subscript indicates "discrepancy-case-controls". This
measure was proposed as a reporting standard in the single-case literature by
@crawfordPointIntervalEstimates2010, but was first suggested as a statistic
for significance testing by @payneStatisticsInvestigationIndividual1957. 
It is of course not appropriate for such use since we would get an inflated
resultant statistic ($Z_{DCC}$) because the quantities in the numerator ($z^*_1$ and $z^*_2$) are inflated.

When we only can estimate standardised case scores from small samples and need
to test if the difference between them is abnormally large we are in fact estimating
the difference between two $t$ distributed variates. Since linear combinations
of correlated $t$ distributed variates are not $t$ distributed themselves this
is not a trivial problem. 

@garthwaiteDistributionDifferenceTwo2004 examined the distribution of such 
difference scores using asymptotic expansion. They searched for quantities
that divided by a function of the sample correlation would be asymptotically
$t$ distributed. They found:

\begin{equation}
\psi=\frac{\frac{(y^*_1-\overline{y}_1)}{s_{1}}-\frac{(y^*_2-\overline{y}_2)}{s_{2}}}{
\sqrt{
(\frac{n+1}{n})
\left( (2-2 \rho)+
\frac{2(1-\rho^{2})}{n-1}+
\frac{(5+c^{2})(1-\rho^{2})}{2(n-1)^{2}}+
\frac{\rho(1+c^{2})(1-\rho^{2})}{2(n-1)^{2}}\right)
}},
\end{equation}

$\rho$ being the sample correlation and $c$ a pre-specified critical two-tailed
$t$ value on $df= n-1$. They showed that $\mathbb{P}[ \psi > c] \approx\mathbb{P}[t >c]$ 
and that one must solve for $\psi = c$ to obtain a precise
probability of $\psi$ which yields a quantity that is not dependent on a
specified critical value. This is a quadratic equation in \(c^2\),
choosing the positive root of which yields:

\begin{align}
c & = \sqrt{\frac{ -b + \sqrt{b^2 - 4ad}}{2a}}, \  \text{where} \\
a & = (1+r)(1-r^2), \\
b & =  (1-r)[4(n-1)^2+4(1+r)(n-1)+(1+r)(5+r)], \\
d & =  - 2\left[\frac{y^*_{1} - \overline{y}_1}{s_1}-\frac{y^*_2 -\overline{y}_2}{s_2}\right]^2\left(\frac{n(n-1)^2}{n+1}\right)
(\#eq:RSDT)
\end{align}

where $p = \mathbb{P}[t_{n-1}>c]$ is used for significance testing and is the
point estimate the percentage of the control population that is expected to show
a more extreme difference score than the case. Note that the test statistic
$c$ will always be positive, no matter the direction of the effect.

@crawfordTestingSuspectedImpairments2005 refer to this test statistic as 
the "revised standardised difference test" (RSDT) because it was an iteration on
a test previously developed by the authors, that did not keep control of 
the Type I error rate @crawfordPayneJonesRevisited1998.
@crawfordTestingSuspectedImpairments2005 show that the RSDT succeeds
in this attempt fairly well. 

Using simulations the RSDT was shown to barely exceed the specified error rate
even for very small samples such as $n = 5$. However, when a case lacks a
dissociation but exhibits extreme scores on both variates (in the same direction
of course) the error rate increases steeply. According to one simulation study
the RSDT starts to lose control of the error rate for scores that are
simultaneously more extreme than two standard deviations away from the mean and
for task scores as extreme as 8 standard deviations away from the mean, error
rates as high as 35% were observed @crawfordComparisonSingleCase2007. 

Another issue with the RSDT is that, because $\psi$ is only approximately
$t$ distributed, constructing confidence intervals for estimated abnormality in
the same manner as for the TD is not possible. So to remedy these drawbacks of this
test statistic Crawford and colleagues looked into Bayesian methodology.

### Example usage in `singcar` {#freqsing}

The test of deficit is called in the `singcar` package with the function `TD()`.
Testing for a deficit using the example data from Section \@ref(exdata) we have:
```{r tdcode, echo=TRUE}
TD(case = caseA, controls = contA, alternative = "less", conf_level = 0.95)
```
Where the argument `alternative` specifies the alternative hypothesis, i.e.
whether we are performing two or one sided test and if the latter, which direction.
If one instead wants to test for a dissociation the most common alternative hypothesis
is two sided. Using the RSDT for this test, call:
```{r rsdtcode, echo = TRUE}
RSDT(case_a = caseA, case_b = caseB, controls_a = contA, controls_b = contB,
      alternative = "two.sided")
```
Notably, this function does not produce confidence intervals. If the two tasks
are directly comparable without the need of standardisation (which in fact is the
case for this specific dataset), call instead `UDT()` with the same syntax and
confidence intervals will be given.

## Bayesian approaches {#sec23}

The most prominent difference between the frequentist and the Bayesian framework
is that in the former parameters are treated as fixed attributes of a population 
and in the latter they are themselves treated as random variables with associated
distributions. To estimate these one can use prior knowledge about the parameter 
of interest which is specified as a prior distribution. There is often
a desire that the data should be the only thing influencing estimations. If
so, one can reduce the impact of the prior by, for example, assigning equal
probabilities to all possible parameter values. This is an example of what
is known as a non-informative prior distribution. This distribution is then updated
when new information is obtained from data and as such forms the posterior
parameter distribution. This is calculated using Bayes theorem and if we 
omit the normalising constant in the denominator of Bayes theorem it can be 
expressed:
\begin{equation*}
posterior \ \propto \ likelihood \times prior.
\end{equation*}

If we have a hypothesis for a parameter value, what the above is saying is that
the posterior density of this hypothesis is proportional ($\propto$) to the likelihood
of the data under that hypothesis multiplied by the prior probability
of the hypothesis. 

One of the disadvantages with Bayesian methodology is that it might be
impossible or at least very difficult to analytically calculate the posterior.
They are however often feasible to construct with numerical solutions using
Monte Carlo methods. One of the reasons Bayesian statistics has received such
upsurge is because the increase of computational power in recent years that has
made many otherwise infeasible numerical strategies possible. Monte Carlo
methods are mathematical algorithms that solve numerical problems by repeated
random sampling. The algorithms vary depending on the problem at hand, but for
Bayesian methodology they are all building on rules for drawing random values
based on the likelihood and prior -- as such "constructing" the posterior with
the draws after a large number of iterations. The peak of the constructed
distribution is often the parameter of interest and the width is a measurement
of the certainty of the estimation. If using non-informative priors the peak
often coincide with the maximum likelihood estimation of the parameter. This
means that non-informative priors often yields estimators with frequentist
properties, necessary for null hypothesis significance testing.

Because the testing property of the estimation methods presented here are of main interest,
frequentist properties was required when @crawfordComparisonSingleCase2007 and
@crawfordComparingSingleCase2011 devised Bayesian analogues to the test functions
presented in Section \@ref(sec22). The procedural details of these methods
will be described in the following sections. This is to provide details of 
how they were implemented in `singcar` as well as an overview of how they operate.
The methods are all based on Monte Carlo methods and some are, computationally, very
intense. They are implemented in `singcar` as straight R code but would probably
have benefited from being implemented in compiled code first. This is an aim
for future updates of the package. The following algorithms presented closely
follows the original articles, but with few slight changes of notation to make
the presentation more coherent. 

### The Bayesian test of deficit {#BTD}

The Bayesian analogue of the test of deficit (BTD) produces asymptotically identical
output as the frequentist test of deficit. It produces an estimate of $p$ with accompanying
credible intervals, i.e. the percentage of controls that would exhibit a more extreme
score than the case. The method is included here for completeness but for actual analysis
there is no advantage to using it over the TD, (\@ref(eq:TD)).

From a sample of size $n$ we obtain measurements of some normally distributed variate $Y$
with unknown mean $\mu$ and unknown variance $\sigma^2$. Let $\overline{y}$ and $s^2$ denote
the sample mean and variance and $y^*$ the case score. The prior distribution of
$\mu$ is conditioned upon the prior distribution of $\sigma^2$ since both parameters
are unknown. The prior considered here is $f(\mu, \sigma^2) \propto (\sigma^2)^{-1}$,
which is the standard non-informative prior for normal data. 
The marginal posterior distribution of $\frac{(n-1)s^2}{\sigma^2} \sim \chi^2_{n-1}$ and
the posterior distribution $\mu|\sigma^2 \sim \mathcal{N}(\overline{y}, \sigma^2/n)$,
see e.g., @gelmanBayesianDataAnalysis2013 (p. 45 and 65).

When we have a posterior distribution for the control population parameters, $p$ can
be estimated by iterating the following steps.

1.  Estimate $\hat{\sigma}^2_{(i)}$ as $\hat{\sigma}^2_{(i)} = \frac{(n-1)s^2}{\psi}$ where
    $\psi$ is generated from a $\chi^2$ distribution on $n-1$ degrees of freedom.
    The subscript $(i)$ indicates that the statistic is an estimate of $\sigma^2$
    for this iteration.
2.  The estimate for $\mu$ for this iteration is given by $\hat{\mu}_{(i)}=\overline{y}+z\sqrt{(\hat{\sigma}_{(i)}^2/n)}$,
    where $z$ is generated from a standard normal distribution.
3.  The estimate of $p$ is then calculated as if $\hat{\sigma}^2_{(i)}$ and
    $\hat{\mu}_{(i)}$ are the true population parameters. That is, put 
    $z^*_{(i)}=\frac{y^* - \hat{\mu}_{(i)}}{\sqrt{\hat{\sigma}_{(i)}^2}}$ as the standardised
    case score and obtain an iteration estimate of $p$ as $\hat{p}_{(i)}=\Phi\left(z^*_{(i)}\right)$ 
    or $\hat{p}_{(i)} = 1-\Phi\left(z^*_{(i)}\right)$ depending on alternative
    hypothesis. 

Iterating these steps a large number of times yields a distribution of $\hat{p}$,
the mean of which is the point estimate of $p$, which can be used for significance 
testing. The 2.5th and 97.5th percentile of the distribution of $\hat{p}$ as well
as of $z^*$ gives the boundaries for the 95% credible interval of $\hat{p}$ and
$\hat{Z}_{CC}$ respectively. The point estimate of the latter is that of
(\@ref(eq:zcc)). 

### The Bayesian standardised difference test {#BSDT}

In contrast to the BTD, the Bayesian analogue of the difference tests does not
produce asymptotically identical results to the RSDT.  
For the Bayesian standardised difference test (BSDT) we now obtain
measurements on $Y_1$ and $Y_2$ following a bivariate normal distribution
representing two tasks of interest, from a control sample of size $n$. The goal
is to estimate the percentage $p$ of the control population that would be
expected to show a greater difference $Y_1 - Y_2$ than the case.

Let $\overline{y}_1$ and $\overline{y}_2$ be the sample means and  
\begin{equation*}
\pmb{A}=\begin{bmatrix}
s^2_{1} & s_{12} \\
s_{12} & s^2_{2} \end{bmatrix}
\end{equation*}
the sample variance-covariance matrix. 
Then \(\pmb{S} =\pmb{A}(n-1)\) is the sums of squares and cross products (SSCP) matrix.
The case scores are denoted \(y_1^*\) and \(y_2^*\).

The population parameters
\begin{equation*}
\pmb{\mu} = \begin{pmatrix}
\mu_1 \\
\mu_2 \end{pmatrix} \ \text{and} \ \Sigma=\begin{bmatrix}
\sigma^2_{1} & \sigma_{12} \\
\sigma_{12} & \sigma^2_{2} \end{bmatrix}
\end{equation*}
are both unknown. To estimate $p$ we must first obtain a posterior
of these parameters. Since both are unknown the prior distribution
of $\pmb\mu$ is again conditioned upon the prior distribution of $\Sigma$.

The prior considered for \(f(\pmb\mu, \Sigma^{-1})\) in
@crawfordComparisonSingleCase2007 was \(f(\pmb\mu, \Sigma^{-1}) \propto
|\Sigma|\). This is a common non-informative prior, however, it was chosen in
favour of the perhaps even more commonly used \(f(\pmb\mu, \Sigma^{-1})
\propto |\Sigma|^{(k+1)/2}\) for multivariate normal data, where $k=$ the
number of variates. This was because the former prior was shown to give
asymptotically identical interval estimates as the UDT for unstandardised 
case scores. Even though the latter
perhaps is more commonly applied, @crawfordComparisonSingleCase2007 refer to
the former as the "standard theory" prior.

For this prior we have that $\Sigma^{-1} \sim \mathcal{W}_k(\pmb{S}^{-1}, n)$.
That is, the posterior marginal distribution for $\Sigma^{-1}$ is a Wishart 
distribution parametrised with scale matrix $\pmb{S}^{-1}$ and $n$ degrees of
freedom. The Wishart distribution is a multivariate generalisation of the $\chi^2$ 
distribution. It is possible to view $S\sim\mathcal{W}_p(\Sigma, n)$ as a distribution
of SSCP-matrices associated with $\Sigma$. Inversely, one can view the inverse
Wishart distribution parametrised with some SSCP matrix and $n$ degrees of freedom
as a distribution of variance-covariance matrices related to that SSCP matrix.

We then have that \(\pmb\mu|\Sigma \sim \mathcal{N}_k(\pmb{\overline{y}},
\Sigma/n)\) where $\mathcal{N}_k(\pmb{\overline{y}}, \Sigma/n)$ denotes a
multivariate normal distribution with mean $\pmb{\overline{y}}$ and variance
$\Sigma/n$, see e.g., @gelmanBayesianDataAnalysis2013 (p. 72-73).

The posterior marginal distribution of $\Sigma$ is then constructed
by iterating draws of random observations from $\mathcal{W}^{-1}(\pmb{S}, n)$,
each observation, \(\hat{\Sigma}_{(i)}\), being the estimation of $\Sigma$
for the $i$th iteration. 

As previously mentioned, frequentist properties are desired if the estimations
are to be used for significance testing. @bergerObjectivePriorsBivariate2008
investigated convergence to frequentist estimates for bivariate normal
distributions using various priors. They showed that the "standard theory" prior
produced converging estimates of \(\sigma_1\) and \(\sigma_2\) but that the
convergence was not as good for $\rho$ and differences in means. As a "general
purpose" prior they instead recommend using \(f(\pmb\mu, \Sigma^{-1}) \propto
\frac{1}{\sigma_1\sigma_2(1-\rho^2)}\). Posteriors with this prior is
constructed by using rejection sampling. Random draws from an inverse Wishart on
$n-1$ degrees of freedom are generated, but only a subset of the the
observations are retained.

@crawfordComparingSingleCase2011 considered this prior but noticed that it gave
rise to too narrow credible intervals, overestimating the certainty of the estimations.
However, they showed that if the sample was treated as being of size $n-1$ when
estimating $\Sigma$, i.e. so that the generated intervals were somewhat more conservative,
frequentist properties were retained. This "calibrated" prior is recommended by
@crawfordComparingSingleCase2011, and they refer to it as such. 

To construct the posterior using the calibrated prior the following steps are
iterated:

1. Since we are reducing the sample size we must put \(\pmb{S}^*= (n-2)\pmb{S}/(n-1)\) 
   to retain the unbiased property of \(\pmb{S}/(n-1)\) as an estimate
   of $\Sigma$.

2. Draw a random observation from $\mathcal{W}^{-1}(\pmb{S}^*, n-2)$. This
   draw is denoted:
   \[
   \hat{\Sigma} = \begin{bmatrix}
   \hat{\sigma}^2_{1} & \hat{\sigma}_{12} \\
   \hat{\sigma}_{12} & \hat{\sigma}^2_{2} \end{bmatrix}
   \]
   and
   \[
   \hat{\rho}= \frac{\hat{\sigma}_{12}}{\sqrt{\hat{\sigma}^2_{1}\hat{\sigma}^2_{2}}}.
   \]

3. If $u$ is a random draw from a uniform distribution \(u \sim U(0, 1)\), $\hat\Sigma$
   is only accepted as the estimation of $\Sigma$ if \(u^2 \leq 1-\hat{\rho}^2\), if 
   not we iterate the procedure until we have an accepted draw of $\hat\Sigma$. Once
   accepted we have an estimation of $\Sigma$ for this iteration and denote the draw:
   \begin{equation*}
   \hat{\Sigma}_{(i)}=\begin{bmatrix}
   \hat{\sigma}^2_{1(i)} & \hat{\sigma}_{12(i)} \\
   \hat{\sigma}_{12(i)} & \hat{\sigma}^2_{2(i)} \end{bmatrix}.
   \end{equation*}

Even though this latter calibrated prior is recommended, some might argue that frequentist
properties are of no concern and therefore choose the "standard theory" prior.
But regardless of the prior chosen, when we have an estimate of $\Sigma$, the following
steps are iterated to obtain an estimate of $p$, the percentage of controls expected
to exhibit a more extreme difference between the variates than the case. 

1.  Generate two draws from a standard normal distribution and denote them \(z_{r1}\) and \(z_{r2}\).
    Find the lower triangular matrix $\pmb{T}$ such that $\pmb{T}\pmb{T}' = \hat\Sigma_{(i)}$ using
    Choleksy decomposition. The estimate of \(\pmb{\mu}\) for this iteration is then:
    \begin{equation*}
    \pmb{\hat{\mu}}_{(i)} = \begin{pmatrix}
    \hat{\mu}_{1(i)} \\
    \hat{\mu}_{2(i)} \end{pmatrix} = \begin{pmatrix}
    \overline{y}_1 \\
    \overline{y}_2 \end{pmatrix}+ \pmb{T} \begin{pmatrix}
    z_{r1} \\
    z_{r2} \end{pmatrix} / \sqrt{n}.
    \end{equation*}
    See for example @gelmanBayesianDataAnalysis2013 (p. 580).

2.  We now have estimates of both $\pmb{\mu}$ and $\Sigma$ and calculate
    $p$ as if these were the true population parameters. The method to do so
    differ slightly depending on whether a standardised or unstandardised test
    is required. For an unstandardised test, the size of the difference score is calculated
    as:     
    \begin{equation*}
    z_{(i)}^* = \frac{(y_1^* - \hat{\mu}_{1(i)}) - (y^*_2 - \hat{\mu}_{2(i)})}
    {\sqrt{\hat{\sigma}^2_{1(i)}+\hat{\sigma}^2_{2(i)}-2\hat{\sigma}_{12(i)}}}.
    \end{equation*}
    More commonly we would want to calculate the difference score from standardised
    variates. If this is the case, put:
    \begin{equation*}
    z_{1(i)} = \frac{y_1^* - \hat{\mu}_{1(i)}}{\sqrt{\hat{\sigma}^2_{1(i)}}}, \ z_{2(i)} = \frac{y_2^* -
    \hat{\mu}_{2(i)}}{\sqrt{\hat{\sigma}^2_{2(i)}}}, \ \hat{\rho}_{(i)} = \frac{\hat{\sigma}_{12(i)}}{\sqrt{\hat{\sigma}_{1(i)}\hat{\sigma}_{2(i)}}}
    \end{equation*}
    and
    \begin{equation*}
    \hat{z}^*_{(i)} = \frac{z_{1(i)} - z_{2(i)}}{\sqrt{2-2\hat{\rho}_{(i)}}}.
    \end{equation*}

3.  The estimate of $p$, \(\hat{p}_{(i)}\), for this iteration is then the tail area of a standard
    normal distribution more extreme than $z^*_{(i)}$. 

Iterate the procedure a large number of times and save the estimations for each
iteration. This will yield a distribution of $\hat{p}$ and $\hat{z}^*$, the
quantiles of which again are used to calculate the boundaries of the credible
intervals for $p$ and $Z_{DCC}$. The point estimate of the former is the mean of
the estimated distribution, while the point estimate of the latter is that of
(\@ref(eq:PJ)).

### Example usage in `singcar` {#bayesing}

Testing for a deficit using the Bayesian test of deficit, call `BTD()`:
```{r btdcode, echo=TRUE, cache=TRUE}
BTD(case = caseA, controls = contA, alternative = "less",
     iter = 10000, int_level = 0.95)
```
As can be seen, compared to `TD()` this function has the additional argument `iter`, which
indicates the number of iterations to be used for the posterior approximation. 
This number should be based on desired accuracy of the analysis.

Testing for a dissociation using the BSDT (recommended over RSDT), call:
```{r bsdtcode, echo = TRUE, cache = TRUE}
BSDT(case_a = caseA, case_b = caseB, controls_a = contA, controls_b = contB,
      alternative = "two.sided", iter = 10000, unstandardised = FALSE,
      calibrated = TRUE)
```
This function has several additional arguments compared to `RSDT()`. 
Instead of having a separate function for an unstandardised
Bayesian difference test one can specify whether to standardise
the variates or not within the function. If `unstandardised` is
set to `TRUE`, the results from `BSDT()` will converge to that of 
`UDT()`. To use the "standard theory" prior instead of the calibrated, 
set `calibrated` to `FALSE`.

### Tests using Bayesian regression to allow for covariates {#cov}

It is seldom possible to design perfect experiments. This is especially true in
psychological science where causal relationships can be confounded by a great
variety of factors. When comparing a case to a control sample it is therefore
important to reduce noise by matching the control group to the case as well as
possible on variates beyond the ones of interest, such as age or education
level, making the collection of control samples cumbersome.

However, this matching of covariates can instead be achieved statistically.
@crawfordComparingSingleCase2011 describe methods where they extend the 
tests presented in \@ref(sec22) to allow for the inclusion of covariates
using Bayesian regression. These methods thus allow you to compare the case's
score on the task(s) of interest against the control population conditioned
on them having the same score as the case on the covariate.
This both reduces noise and alleviate the collection of control data.
But because one degree of freedom is lost for each included covariate
it is advisable to only include covariates with a correlation to the variate(s)
of interest larger than $0.3$, as a rule of thumb [@crawfordComparingSingleCase2011].

We assume that the variates of interest are normally distributed, but no assumption
of the distribution of the covariates is made. The procedure to get an estimate of
$p$ in the presence of covariates is presented below for the general case, that is
either when we are interested in a deficit or a dissociation. The main difference
between the methods is the recommended prior. 

From a sample of size $n$ we obtain measurements on $k= 1,2$ variates of interest
an $m$ number of covariates, denoted \(\boldsymbol{X} = (X_1, \dots, X_m)\).

We wish to estimate the regression coefficients for each covariate on the variate(s)
of interest:
\[
\boldsymbol{B} = \begin{bmatrix}
\boldsymbol{\beta}_1 & \cdots & \boldsymbol{\beta}_k
\end{bmatrix}.
\]
Here $\boldsymbol{B}$ is an $m+1 \times1,2$ matrix where each column contains
the regression coefficients for the $i$th variate of interest, the first
row being the intercept(s). Further, we wish to estimate the covariance matrix
or variance for the variate(s) of interest conditioned upon the covariates, we
assume these statistics not to vary with the covariates.
\[
\Sigma \ | \ \boldsymbol{X} =
\begin{bmatrix}
\sigma^2_1 \ | \ \boldsymbol{X} & \rho\sigma_1\sigma_2 \ | \ \boldsymbol{X} \\
\rho\sigma_1\sigma_2 \ | \ \boldsymbol{X} & \sigma^2_2 \ | \ \boldsymbol{X}
\end{bmatrix}  \ \text{or} \ \left[\sigma^2 | \boldsymbol{X} \right].
\]

If \(\boldsymbol{X}\) is the \(n \times (m+1)\) design matrix 
and \(\boldsymbol{Y}\), the \(n \times k\) response matrix
\[
\boldsymbol{X} =
\begin{bmatrix}
1 & x_{11} & \cdots & x_{1m} \\
\vdots & \vdots & \ddots & \vdots \\
1 & x_{n1} & \cdots & x_{nm}
\end{bmatrix},\ \
\boldsymbol{Y} =
\begin{bmatrix}
 y_{11} & \cdots & y_{1k} \\
 \vdots & \ddots & \vdots \\
 y_{n1} & \cdots & y_{nk}
\end{bmatrix}
\]
the data estimates of \(\boldsymbol{B}\) and \(\Sigma\) are 
\[
\boldsymbol{B}^* = (\boldsymbol{X}'\boldsymbol{X})^{-1}\boldsymbol{X}'\boldsymbol{Y} \ \text{and} \
\Sigma^* = \frac{1}{n-m-1}(\boldsymbol{Y}-\boldsymbol{X}\boldsymbol{B}^*)'(\boldsymbol{Y}-\boldsymbol{X}\boldsymbol{B}^*).
\]

When $k=1$ or the "standard theory" prior is used for $k=2$ the posterior
of $\Sigma$ is an inverse Wishart with scale matrix \((n-m-1)\Sigma^*\)
on $n-m-1$ or $n-m$ degrees of freedom for $k=1$ and $k=2$ respectively
[@crawfordComparingSingleCase2011; @tiaoBayesianEstimationMultivariate1964].
Hence, to get an estimate $\hat{\Sigma}_{(i)}$ of $\Sigma$ we generate
a draw from these distributions. 

For the "calibrated" prior we instead follow the procedure outlined in Section
\@ref(BSDT) and use \(\pmb{S}^* = (n-m-2)\pmb{S}/(n-m-1)\) as scale
matrix to sample from an inverse Wishart on $n-m-2$ degrees of freedom,
\(\pmb{S} = (n-m-1)\Sigma^*\). The estimates are then:

\begin{equation}
\hat{\Sigma}_{(i)}=[\hat{s}^2_{(i)}] \ \text{when} \ k=1 \ \text{and} \
\hat{\Sigma}_{(i)} =
\begin{bmatrix}
\hat{s}^2_{1(i)} & \hat{s}_{12(i)}  \\
\hat{s}_{12(i)}  & \hat{s}^2_{2(i)}
\end{bmatrix} \ \text{when} \ k=2. \
(\#eq:sigma)
\end{equation}

We turn the \((m+1) \times k\) matrix \(\boldsymbol{B}^*\) into a \(k(m + 1)
\times 1\) vector by concatenating the columns in \(\boldsymbol{B}^*\).
\(\boldsymbol{B}^*_{vec} = (\boldsymbol{\beta}^{*'}_1,
...,\boldsymbol{\beta}^{*'}_k)'\) is the estimate of $\boldsymbol{B}_{vec}$.
The posterior of $\boldsymbol{B}_{vec}| \hat{\Sigma}_{(i)}$ is a multivariate
normal distribution with mean $\boldsymbol{B}^*_{vec}$ and covariance
matrix \(\boldsymbol{\Lambda}_{(i)} =\boldsymbol{\hat\Sigma}_{(i)} \otimes (\boldsymbol{X'X})^{-1}\),
where $\otimes$ denotes the Kronecker product. To get an estimate of $\hat{\boldsymbol{B}}_{(i)}$
we generate a random observation from this distribution where the $k(m+1) \times 1$ vector
of obtained values 
\(\hat{\boldsymbol{B}}^*_{\text{vec(i)}} =(\hat{\boldsymbol{\beta}}'_{1(i)}, ...,\hat{\boldsymbol{\beta}}'_{k(i)})'\)
is turned back into the \(m+1 \times k\) matrix
\(\hat{\boldsymbol{B}}_{(i)} = (\hat{\boldsymbol{\beta}}_{1(i)}, ...,\hat{\boldsymbol{\beta}}_{k(i)})\).
If $\boldsymbol{x}^*$ is a vector of the case values on the covariates we obtain
estimates of the conditional case scores on the task of interest as so:
\[
\hat{\boldsymbol{\mu}}_{(i)} = \hat{\boldsymbol{B}}_{(i)}\boldsymbol{x}^*.
\]

With these values we can now estimate the conditional effect size of the case.
@crawfordComparingSingleCase2011 denote these effect measures \(Z_{CCC}\) and
\(Z_{DCCC}\) depending on whether $k=1$ or $k=2$. As the names indicate they are
related to \(Z_{CC}\) (\@ref(eq:zcc)) and \(Z_{DCC}\) (\@ref(eq:PJ)),
but calculated from statistics conditioned upon the covariates, hence the extra
C in the subscript.

If \(y^*_1\) is the case score on a variate $Y_1$,
we would calculate the conditional deficit (\(Z_{DCCC}\)) of the case on variate $Y_1$ like so:
\begin{equation}
\hat{Z}_{CCC(i)} = \frac{y^*_1-\hat{\mu}_{(i)}}{\hat{s}^2_{(i)}}.
(\#eq:zccc) 
\end{equation}
If \(y^*_2\) is the case score on variate $Y_2$ and we are interested
in a dissociation between $Y_1$ and $Y_2$, we denote the two
conditional means on these variates \(\hat{\mu}_{1(i)}\) and \(\hat{\mu}_{2(i)}\).
\(Z_{DCCC}\) is then calculated like so:
\begin{equation}
\hat{Z}_{DCCC(i)} = \frac{\frac{y^*_1-\hat{\mu}_{1(i)}}{\hat{s}_{1(i)}}-\frac{y^*_2-\hat{\mu}_{2(i)}}
{\hat{s}_{2(i)}}}{\sqrt{2-2\hat{\rho}_{12(i)}}}.
(\#eq:zdccc)
\end{equation}
The conditional standard deviations \(\hat{s}_{1(i)}\) and \(\hat{s}_{2(i)}\)
are obtained from \(\hat{\Sigma}_{(i)}\) in (\@ref(eq:sigma)), the 
conditional correlation between the variates ($\hat{\rho}_{12(i)}$) is given by
\[\hat{\rho}_{12(i)} = \frac{\hat{s}_{12(i)}}{\sqrt{\hat{s}^2_{1(i)}\hat{s}^2_{2(i)}}}.\]

To get estimates of $p$ we find the tail area under the standard normal distribution
with more extreme values on \(\hat{Z}_{CCC(i)}\) or \(\hat{Z}_{DCCC(i)}\), depending
on the problem and alternative hypothesis specified. If testing for a defict we would
for example have:
\[
\hat{p}_{(i)}= \Phi(\hat{Z}_{CCC(i)}).
\]

Iterate the procedure a large number of times and save the estimations for each
iteration. The mean of the resultant distribution of $\hat{p}$ is taken at the point
estimate of $p$. Point estimates of $Z_{CCC}$ and $Z_{DCCC}$ are obtained by
using (\@ref(eq:zccc)) and (\@ref(eq:zdccc)) with the conditional means,
standard deviations and correlation calculated directly from the control sample.
The $1-\alpha$ credible interval boundaries for each estimate is obtained from the
quantiles of the estimated distributions as described for the previous methods.

### Example usage in `singcar` {#covsing}

To specify a Bayesian test of deficit with one or more covariates, call:
```{r btdcov, echo=TRUE, cache=TRUE}
BTD_cov(case_task = caseA, case_covar = caseAGE, control_task = contA,
         control_covar = contAGE, alternative = "less", int_level = 0.95,
         iter = 10000, use_sumstats = FALSE)
```
The covariates should for the case be specified as a single value or 
vector of values containing the case scores for each covariate included.
For the controls the covariates should be specified as a vector or matrix
where each column represents one of the covariates. 

To specify a Bayesian test for a dissociation with one or more covariates present, call:
```{r bsdtcov, echo=TRUE, cache=TRUE}
BSDT_cov(case_tasks = c(caseA, caseB), case_covar = caseAGE, 
          control_tasks = cbind(contA, contB), control_covar = contAGE, 
          alternative = "two.sided", int_level = 0.95, iter = 10000,
          calibrated = TRUE, use_sumstats = FALSE)
```
Here, the case scores as well as the control scores
from variates of interest must be concatenated. The covariates are specified 
in the same manner as for `BTD_cov()`. The prior used is again specified with
the argument `calibrated`.

If using summary statistics for these functions, the argument `use_sumstats`
must be set to `TRUE`. Information on how to specify the data in such a case
consult the documentation.

## Using mixed effects models for repeated measures {#lmm}

It should be noted that the TD of course is nothing other than a linear
regression model with a dummy coded variable for belonging to the patient group.
This equivalence might seem trivial but is included here for completeness. If
$\boldsymbol{y} = (y^*_1, y_2,...,y_n)$ is the vector of scores for the case $y^*_1$ and
the control sample $(y_2,...,y_n)$ on the variate of interest, then $\boldsymbol{x} =
(1, 0, ..., 0)$ is a dummy coded variable indicating inclusion in the patient
group. The equation for a regression model with these variables is then:
$$
y_i = \beta_0+\beta_1x_i+\epsilon_i,
$$
where the error term $\epsilon_i\sim\mathcal{N}(0, \sigma^2)$. The least squares
estimate of $\beta_1$, $\hat{\beta}_1$ equals the numerator in the equation for 
the TD, (\@ref(eq:TD)). We have:
\begin{equation}
\hat{\beta}_1 & = \frac{\frac{1}{n}\sum^n_{i =1}(x_i-\overline{x})(y_i-\overline{y})}{\frac{1}{n}\sum^n_{i = 1}(x_i-\overline{x})^2} =\frac{\sum^n_{i =1}y_ix_i-\frac{1}{n}\sum^n_{i =1}y_i\sum^n_{i =1}x_i}{\sum^n_{i=1}(x_i-\overline{x})^2} \\
& =\frac{\frac{ny^*_1-\sum^n_{i=1}y_i}{n}}{\left(1-\frac{1}{n}\right)^2+(n-1)\left(-\frac{1}{n}\right)^2} 
 =\frac{\frac{ny^*_1-(y^*_1+\sum^n_{i=2}y_i)}{n}}{\frac{n-1}{n}} 
 =\frac{(n-1)y^*_1-\sum^n_{i=2}y_i}{n-1} \\
& =\frac{(n-1)y^*_1-\sum^n_{i=2}y_i}{n-1} 
 =y^*_1-\frac{\sum^n_{i=2}y_i}{n-1} 
 =y^*_1-\overline{y}_c.
\end{equation}
Here $\overline{y}_c$ denotes the sample mean of the controls, which is 
also the least squares estimate of the intercept, $\hat{\beta}_0$:
\begin{equation}
\hat{\beta}_0 &= \overline{y}-\hat{\beta}_1\overline{x} = \frac{\sum^n_{i=1}y_i}{n} -\frac{ny_1-\sum^n_{i=1}y_i}{(n-1)}\frac{1}{n} \\
&= \frac{(n-1)\sum^n_{i=1}y_i-ny_1+\sum^n_{i=1}y_i}{n(n-1)}\\
&= \frac{n\sum^n_{i=1}y_i-ny_1}{n(n-1)}
= \frac{\sum^n_{i=2}y_i}{n-1} = \overline{y}_c.
\end{equation}

To test the slope in a simple regression model one uses the statistic
\begin{equation}
t_{n-2} = \frac{\hat{\beta}_1}{SE\left(\hat{\beta}_1\right)}
(\#eq:regt)
\end{equation}
following a $t$ distribution on $n-2$ degrees of freedom, given that the null
hypothesis is true. The degrees of freedom for this distribution is of course the 
same as $n_c-1$ degrees of freedom if $n_c$ is the size of the control sample.
Remember that the denominator in the TD was 
$s_c \sqrt{\frac{n_c + 1}{n_c}}$, where $s_c$ is the standard deviation of the control
sample. We know that the standard error for the slope coefficient
is:
$$
SE\left(\hat{\beta}_1\right) = \sqrt{\frac{\frac{1}{n-2}\sum^n_{i =1}\hat{\epsilon}^2_i}{\sum^2_{i=1}(x_i-\overline{x})^2} }
= \sqrt{\frac{\frac{1}{n-2}\sum^n_{i=1}(y_i-\hat{y}_i)^2}{\frac{n-1}{n}}}.
$$
Furthermore, $\hat{y}_i = \hat{\beta}_0+\hat{\beta}_1x_i$ and
$\hat{y}_1 = \overline{y}_c+(y^*_1-\overline{y}_c) = y^*_1$,
hence:
\begin{equation}
SE\left(\hat{\beta}_1\right) 
&= \sqrt{\frac{\frac{1}{n-2}\sum^n_{i=2}(y_i-\hat{y}_i)^2+\cancel{(y^*_1-\hat{y}_1)^2}}{\frac{n-1}{n}}} \\
&= \sqrt{\frac{\sum^n_{i=2}(y_i-\hat{y}_i)^2}{n-2}}\sqrt{\frac{n}{n-1}} = s_c\sqrt{\frac{n_c+1}{n_c}}.
(\#eq:regsd).
\end{equation}
As such the $t$ test for the slope in the regression equation and the test of 
deficit are identical. To get $Z_{CC}$ if using the regression approach simply
divide $\hat{\beta}_1$ by the standard deviation of the sample, 
$Z_{CC} = \hat{\beta}_1/s_c$. 

The fact that we can use simple linear regression instead of the TD to model
case abnormality is in itself not very interesting. However, one can
include covariates without going through the trouble of performing Bayesian
regression for the methods as described in Section \@ref(cov). To compute the
conditional effect size $Z_{CCC}$ in this case one must first calculate the
conditional mean and standard deviation from the control sample. The conditional
mean can be predicted from the fitted model given the case values on the
covariates and the conditional variance can be computed like so:
$$
\sigma^2|X = \frac{\sum^n_{i=2}\epsilon_i^2}{n-2},
$$
where $X$ denotes the covariates and $\epsilon_i$ the residuals from the fitted
model. Note also that the index starts at 2 to leave out the case. $Z_{CCC}$ is then:
$$
Z_{CCC} = \frac{y_1^*-\mu|X}{\sqrt{\sigma^2|X}}.
$$

More importantly though, we can generalise the linear regression to 
linear mixed effect models [@west2014linear]. With linear
mixed effect models (LMM) one can model data with dependency between
observations, opening for the possibility to use repeated measures data in
single case comparisons and other, more complex, designs. This is because
LMM can model both fixed and random effects (hence "mixed"). Fixed effects are
in this case fixed or constant across participants and random effects are 
found for observations that are somehow more related to each other (e.g. 
several measurements from the same participant or hierarchically clustered
observations). The concept of applying LMM to single case comparisons was
first investigated by @huberComparingSingleCase2015a.

Since impairments on tasks converging on a cognitive construct of interest
provides stronger evidence of a potential deficit, the possibility to model
random effects successfully is of great importance. Such a model would take the
form:
\begin{equation}
y_{ij} = \beta_0 + \beta_1x_i + v_{0i}+\epsilon_{ij}, \  v_{0i} \sim \mathcal{N}(0, \sigma^2_v) \ \text{and} \ \epsilon_{ij} \sim \mathcal{N}(0, \sigma^2).
(\#eq:lmm1)
\end{equation}
Here $i$ indicates participant and $j$ indicates task or item. So, $y_{ij}$ is
then the response of participant $i$ for task/item $j$. $\beta_0$ and $\beta_1$ are
still the intercept of the sample and raw effect of the case while $v_{0i}$
denotes the random intercept for each participant. As such the term allows for
participants to vary from the average intercept of the sample. However, while
the model in (\@ref(eq:lmm1)) accounts for participant variability we
must include yet a further term to account for the task/item variability:
\begin{equation}
 y_{ij} = \beta_0 + \beta_1x_i + v_{0i} + w_{0j} +\epsilon_{ij}, \\ 
 v_{0i} \sim \mathcal{N}(0, \sigma^2_v), \ w_{0j} \sim \mathcal{N}(0, \sigma^2_w) \ \text{and} \ \epsilon_{ij} \sim \mathcal{N}(0, \sigma^2),
(\#eq:lmm2)
\end{equation}
where $w_{0j}$ denotes the random intercept for tasks/items, which should be included
to avoid inflation of Type I errors [@baayenMixedeffectsModelingCrossed2008]. 

A common method to obtain $p$ values for LMMs is to use likelihood ratio tests.
However, @huberComparingSingleCase2015a show that the Type I error rate for this
method only is at an acceptable level when the size of the control sample is
larger than 50. In contrast, using parametric bootstrapping or one-tailed $t$
tests approximating the degrees of freedom with the Satterthwaite method, kept
the error rate at its nominal level (or rather somewhat above, implying that a
slightly more conservative $\alpha$ value should be considered) even for small
control sample sizes. As a reference they compared error rate and statistical
power of the mixed model to the TD using aggregated data, i.e. taking the mean
of all tasks/items for each participant, which is what @crawfordComparingIndividualTest1998
suggests for this type of data. The $p$
value of $\beta_1$ in (\@ref(eq:lmm2)) using the Satterthwaite method
will be approximately equal to the two sided $p$ value from the TD if averaging the scores for
each participant over the tasks/items, given a large enough sample size.

Furthermore, because we can model data with dependency between observations
with LMM it is also possible to model effects within participants, i.e.
between conditions. The model equation would then be:
\begin{equation}
 y_{ij} = \beta_0 + \beta_1x_{1i} + \beta_2x_{2i}+ v_{0i} + w_{0j} +\epsilon_{ij}, \\ 
 v_{0i} \sim \mathcal{N}(0, \sigma^2_v), \ w_{0j} \sim \mathcal{N}(0, \sigma^2_w) \ \text{and} \ \epsilon_{ij} \sim \mathcal{N}(0, \sigma^2),
(\#eq:lmm3)
\end{equation}
where $\beta_2$ is the raw effect of the condition indicated by the categorical
variable $x_2$. Introducing an interaction term between $x_1$ and $x_2$ in this model 
we can even test for the presence of a dissociation. This model would take the form:
\begin{equation}
 y_{ij} = \beta_0 + \beta_1x_{1i} + \beta_2x_{2i}+\beta_3(x_{1i}x_{2i})+ v_{0i} + v_{1i} + w_{0j} +\epsilon_{ij}, \\
 \begin{pmatrix}
 v_{0i} \\
 v_{1i}
 \end{pmatrix}
  \sim \mathcal{N}\left(
  \begin{bmatrix}
 0 \\
 0
 \end{bmatrix},
 \Sigma_v = 
 \begin{bmatrix}
 \sigma^2_{v_0} &  \sigma_{v_0}\sigma_{v_1} \\
 \sigma_{v_0}\sigma_{v_1} & \sigma^2_{v_1}
 \end{bmatrix} \right), \ w_{0j} \sim \mathcal{N}(0, \sigma^2_w) \ \text{and} \ \epsilon_{ij} \sim \mathcal{N}(0, \sigma^2),
(\#eq:lmm4)
\end{equation}
where $\beta_3$ is the raw effect of the dissociation and $v_{1i}$ is
the random slope of the participants. That is, $v_{1i}$ accounts for 
the variability of the effect of the condition, between participants. 
A caveat of using LMM is that it requires more observations
than parameters specified, to be identifiable. Specifying random slopes
and random intercepts for each participant then one must have $>2n$ observations.
Implying that it would not be a good model for testing a dissociation
between two tasks without repeated measurements within each task.

There are already several implementations of LMMs in various open source
software which can be used directly in the context of single case comparisons
as described above. Therefore this methodology has not been implemented in 
`singcar`. However, due to the usefulness of the concept and the lack 
of methodology in `singcar` to handle repeated measures data I will give
a brief description of how one can specify an LMM in `R` using the package
`lme4` [@batesFittingLinearMixedeffects2015] and 
`lmerTest` [@kuznetsovaLmerTestPackageTests2017] (the latter is used to 
obtain $p$ values from $t$ tests with Satterthwaite approximation of 
the degrees of freedom).

### Example usage in `R` {#lmmexample}

Begin by installing and loading required packages:
```{r pkgload, eval=FALSE, echo = TRUE}
install.packages(c("lme4", "lmerTest", "MASS"))
library("lme4")
library("lmerTest")
library("MASS") # For multivariate simulation
```

Because the example data is not suited for repeated measures analysis
a simulated dataset will be used. Say that we have 30 measurements from
four normally distributed variates $Y_i$, $i = 1,2,3,4$. Call these variates items. 
They are thought to converge on some cognitive ability of interest and have the distribution:
$$
\begin{pmatrix}
Y_1 \\
Y_2 \\
Y_3 \\
Y_4
\end{pmatrix} \sim \mathcal{N}\left(
\begin{bmatrix}
100 \\
80 \\ 
50 \\
20 
\end{bmatrix}, 
\begin{bmatrix}
15^2 & 108 & 78 & 30 \\
108 & 10^2 & 12 & 15 \\
78 & 12 & 10^2 & 25 \\
30 & 15 & 25 & 5^2
\end{bmatrix}
\right).
$$
The case has incurred a deficit of two standard deviations on all four variates.
These data could then be simulated as follows:
```{r simdatlmm, echo = TRUE}
simdata <-  expand.grid(case = c(1, rep(0, 29)), item = factor(1:4))
simdata$ID <- factor(rep(1:30, 4)) 

set.seed(123) # For replicability
mu <- c(100, 80, 50, 20)  
sigma <- matrix(c(15^2, 108, 78, 30,
                  108, 10^2, 12, 15,
                  78,  12, 10^2, 25,
                  30,  15,  25,  5^2), nrow = 4, byrow = T)
dv <- mvrnorm(30, mu = mu, Sigma = sigma) # Dependent variable

dv[1, ] <- dv[1, ] + c(-30, -20, -20,  -10) # Case scores
simdata$dv <- c(dv)
```

The linear mixed model to test the deficit of the case would then
be
```{r lmm1, echo=TRUE}
model1 <- lmer(dv ~ case + (1|ID) + (1|item), data = simdata)
summary(model1)[["coefficients"]]
```
Where `(1|ID)` and `(1|item)` specifies the random effects structure,
in this case the random intercepts for participant and item.
This is approximately the same $t$ statistic that would be produced
by the test of deficit if averaging the scores for each participant:
```{r tdcomp, echo = TRUE}
agg_data <- rowMeans(dv)
TD(agg_data[1], agg_data[-1], alternative = "two.sided")[["statistic"]]
```

Now, say that we want to model a dissociation using LMM. We have 
collected 30 measurements from two items during two different
conditions. It is thought that these two different conditions
are tapping two independent cognitive abilities but that the 
items collected in each condition are converging on the same. 
Hence we have four variates
$Y_{ij}$, $i=1,2$, $j=1,2$, distributed according to the multivariate
distribution: 
\begin{equation}
\begin{pmatrix}
Y_{11} \\
Y_{21} \\
Y_{12} \\
Y_{22}
\end{pmatrix} \sim \mathcal{N}\left(
\begin{bmatrix}
100 \\
80 \\ 
50 \\
20 
\end{bmatrix}, 
\begin{bmatrix}
15^2 & 120 & 45 & 7 \\
120 & 10^2 & 11 & 15 \\
45 & 11 & 10^2 & 40 \\
7 & 15 & 40 & 5^2
\end{bmatrix}
\right),
(\#eq:lmmdist)
\end{equation}
where e.g. $Y_{12}$ is the random variable
corresponding to the first item in the second condition.
Given that the two conditions in fact taps different cognitive 
structures, the incurred impairment of the case would potentially
affect performance in one of the conditions more than the other.
Hence, we impose a deficit of two standard deviations on the case scores.
The data can then be simulated as follows:
```{r simdatlmm2, echo = TRUE}
simdata <-  expand.grid(case = c(1, rep(0, 29)),
                        item = factor(1:2), condition = factor(1:2))
simdata$ID <- factor(rep(1:30, 4)) 
contrasts(simdata$condition) <- contr.sum(2) # Effect coding

set.seed(123) # For replicability
mu <- c(100, 80, 50, 20)  
sigma <- matrix(c(15^2, 120,    45,  7,
                  120, 10^2,    11, 15,
                  45,    11,  10^2, 40,
                  7,     15,    40, 5^2), nrow = 4, byrow = T)
dv <- mvrnorm(30, mu = mu, Sigma = sigma) # Dependent variable
dv[1, ] <- dv[1, ] + c(-30, -20, -0,  -0) # Case scores
simdata$dv <- c(dv)
```
The fully crossed linear mixed model used to test a dissociation for these
data would then be specified like so:
```{r lmm2, echo = TRUE}
model2 <- lmer(dv ~ case*condition + (condition|ID) + (1| item), data = simdata)
round(summary(model2)[["coefficients"]], 5)
```
Comparing the $p$ value of the interaction effect to the $p$ value 
obtained from `RSDT()` we see that that they are far from equivalent. 
```{r rsdtcomp, echo=TRUE}
agg_data1 <- rowMeans(dv[ , 1:2])
agg_data2 <- rowMeans(dv[ , 3:4])
RSDT(agg_data1[1], agg_data2[1], agg_data1[-1], agg_data2[-1])[["p.value"]]
```
However, a small simulation study conducted with the data described above
revealed that the RSDT and the LMM yields comparable results regarding power and
Type I error. It takes the specified
impairments of the case for each variate and returns the ratio
of significant interaction effects from (\@ref(eq:lmm4)) divided
by the total number of simulations run, as well as the same ratio
obtained from either the RSDT or the BSDT. The distribution
of the data is the same in this small simulation study as (\@ref(eq:lmmdist)).

```
R> powerLMM(nsim = 1000, case_impairment = c(-30, -20, -0, -0),
+ compare_against = "RSDT", alpha = 0.05)
RSDT LMM
1 0.353 0.462
```

It seems like the method based on the linear mixed model has quite a 
good power advantage over the RSDT for data simulated from this
particular distribution. To compare the error rate of two tests
when can use the same ratio as for power, when there is no true 
effect present. In this case that would be either when the case
has no incurred impairment at all or when the impairments are 
equal in both conditions.
```
R> powerLMM(nsim = 1000, case_impairment = c(-0, -0, -0, -0),
+ compare_against = "RSDT", alpha = 0.05)
RSDT LMM
1 0.032 0.039
```
Even though a power advantage of the mixed model was observed it seems to
produce an error rate under the nominal level of $0.05$ as well. If this was a
consistent behaviour it would eliminate the usefulness of the RSDT. However, if
the case exhibits extreme impairments in both conditions we would have another
null scenario. This was discussed previously in Section \@ref(sec22) where it
was noted that one of the reasons @crawfordComparisonSingleCase2007 developed a
Bayesian approach to test for dissociations was the highly increasing error rate
of the RSDT, for increasing deficits on both tasks with no discrepancy between
them. Say that the case has impairments of six standard deviations on both items
in both conditions and let us compare the LMM to the BSDT instead of the RSDT:
```
R> powerLMM(nsim = 1000, case_impairment = c(-90, -60, -60, -30),
+ compare_against = "BSDT", alpha = 0.05)
BSDT LMM
1 0.053 0.61
```
The BSDT seems to keep the error rate almost at its nominal level while it is 
enormous for the LMM. Even though a more extensive simulation study with
various types of data structures should be performed to solidify these
findings, the preliminary results shown here indicate some limitations
of using LMM for dissociations and reiterates the recommendation of 
@crawfordComparisonSingleCase2007 to use the BSDT to test for 
dissociations, even when using aggregated repeated measures data. 
That is, because the effects of brain damage can be severe I
would recommend against using LMM to model dissociations.

## Multivariate methods {#section4}

Up until now I have discussed univariate testing methods. However,
as discussed in the introduction, Section \@ref(section1), stronger
evidence for either deficits or dissociations is provided by using
multiple measures (tasks) for a cognitive function of interest. 
Conceptually, what we want to estimate or test in this scenario
is the distance between a vector of observations for a single case
and a vector of population means. A common way to do this is by 
using the Mahalanobis distance measure. If $\boldsymbol{y}^*$ is a vector
of task observations from a single case and $\boldsymbol{\mu}$ is the population
mean vector, the Mahalanobis distance, $\delta$, is a measure
of the distance between the case and the population mean, given by:
$$
\delta= \sqrt{(\boldsymbol{y}^*-\boldsymbol{\mu})'\Sigma^{-1}(\boldsymbol{y}^*-\boldsymbol{\mu})}.
$$

Given that the population follows a multivariate normal distribution 
we have that $\delta^2 \sim \chi^2_{\nu_1}$ where $\nu_1$ is the degrees
of freedom and the number of dimensions in the multivariate distribution. This
squared distance is sometimes referred to as the Mahalanobis *index*.
Because the purpose of case-comparison methodology is to estimate
abnormality we are interested in $p$, that is the proportion of the control
population that would exhibit larger $\delta$ than a single case. If we
need to estimate $\boldsymbol{\mu}$ and $\Sigma$ the sample 
Mahalanobis distance is given by
$$
\hat\delta = \sqrt{(\boldsymbol{y}^*-\overline{\boldsymbol{y}})'\boldsymbol{S}^{-1}(\boldsymbol{y}^*-\overline{\boldsymbol{y}})},
$$
where $\overline{\boldsymbol{y}}$ is the sample mean vector and $\boldsymbol{S}$ the sample
covariance matrix. Using the Mahalanobis index between a case and the mean of
a normative sample, the abnormality is estimated with 
$$
\hat{p} = \mathbb{P}[(\boldsymbol{y}-\overline{\boldsymbol{y}})'\boldsymbol{S}^{-1}(\boldsymbol{y}-\overline{\boldsymbol{y}}) > (\boldsymbol{y}^*-\overline{\boldsymbol{y}})'\boldsymbol{S}^{-1}(\boldsymbol{y}^*-\overline{\boldsymbol{y}})].
$$

An obvious way to estimate $p$ would be to use the Hotelling's $T^2$ test.
The $T^2$ statistic is proportional to the $F$ distribution and hence its
related $p$ value seems like an intuitive choice as an estimate of $p$.
This was in fact proposed by @huizengaMultivariateNormativeComparisons2007,
but @elfadalyPointEstimationAbnormality2016 show that this estimate
is biased and devise instead several other candidates. None of these was shown to
be perfectly unbiased but several of the estimates performed better than 
the estimate based on the $T^2$ statistic.

If we let $\lambda = \delta^2 = (\boldsymbol{y}^*-\boldsymbol{\mu})'\Sigma^{-1}(\boldsymbol{y}^*-\boldsymbol{\mu})$
and $\chi^2_{\nu_1}= (\boldsymbol{y}-\boldsymbol{\mu})'\Sigma^{-1}(\boldsymbol{y}-\boldsymbol{\mu})$ be the case's
Mahalanobis index and the Mahalanobis index of any vector $\boldsymbol{x}$ respectively, 
what we want to estimate is: 
$$
p = \mathbb{P}[\chi^2_{\nu_1}> \lambda].
$$
If $\overline{\boldsymbol{y}}$ and
$\hat\Sigma = \frac{n-1}{n}\boldsymbol{S}$ denotes the maximum likelihood estimates of $\boldsymbol{\mu}$ and
$\Sigma$ the Hotelling's $T^2$ based estimate of $p$, $\hat{p}_F$, is
$$
\hat{p}_F = \mathbb{P}\left[(\boldsymbol{y}-\overline{\boldsymbol{y}})'\hat\Sigma^{-1}(\boldsymbol{y}-\overline{\boldsymbol{y}})>(\boldsymbol{y}^*-\overline{\boldsymbol{y}})'\hat\Sigma^{-1}(\boldsymbol{y}^*-\overline{\boldsymbol{y}})\right].
$$
If we let $\lambda_0 = (\boldsymbol{y}^*-\overline{\boldsymbol{y}})'\boldsymbol{S}^{-1}(\boldsymbol{y}^*-\overline{\boldsymbol{y}})$, we
have that 
$$
T^2=\frac{n\lambda_0}{n+1} \ \text{and} \ \hat{p}_F = \mathbb{P}\left[F_{\nu_1, \nu_2}>\frac{\nu_2}{\nu_1(n-1)}T^2\right].
$$
i.e., this estimate is the $p$ value associated with testing
the hypothesis that the case is coming from the control population, 
using a Hotelling's $T^2$ test. As such it is also often used as
a point estimate of $p$. 
However, we can also use the fact that $\lambda \sim \chi^2_{\nu_1}$
and use the $p$ value from the $\chi^2$ distribution by only using
the sample estimates on the right hand side of the inequality:
$$
\hat{p}_{\chi^2} = \mathbb{P}\left[(\boldsymbol{y}-\boldsymbol{\mu})'\Sigma^{-1}(\boldsymbol{y}-\boldsymbol{\mu})>(\boldsymbol{y}^*-\overline{\boldsymbol{y}})'\hat\Sigma^{-1}(\boldsymbol{y}^*-\overline{\boldsymbol{y}})\right].
$$
We have that:
$$
\hat{p}_{\chi^2} = \mathbb{P}\left[\chi^2_{\nu_1}>\frac{n}{n-1}\lambda_0\right].
$$
This is an intuitive estimator and it was shown to exhibit fairly low bias
(lower than $\hat{p}_F$) for low true values of $p$ and it had the smallest mean
square error of all estimator considered by
@elfadalyPointEstimationAbnormality2016. But unfortunately, as $p$ increases
$\hat{p}_{\chi^2}$ starts to show substantial bias.

Both of theses estimates are intuitive, however, since they both are biased
@elfadalyPointEstimationAbnormality2016 recommend using other estimators, based
on the needs of the analysis. One of which is based on the a modified confidence
interval for Mahalanobis distance and two are based on quadrature polynomial approximation of
the $\chi^2$ distribution. In Section \@ref(pd) I describe two estimators
based on confidence intervals of $\delta$ which are both implemented in `singcar`
along with $\hat{p}_F$ and $\hat{p}_{\chi^2}$.


### Estimation based on confidence intervals {#pd}

@reiserConfidenceIntervalsMahalanobis2001 suggested a method to construct
confidence intervals on $\delta^2$. Using this method @elfadalyPointEstimationAbnormality2016
propose a way to construct confidence intervals on $p$ as well. We have:
$$
F_0 = \frac{n\nu_2}{(n-1)\nu_1}\lambda_0 \sim F_{\nu_1, \nu_2}(n\lambda),
$$
that is, the sample statistic $F_0$ has a non-central $F$ distribution
on $\nu_1$ and $\nu_2$ degrees of freedom and non-centrality parameter
$n\lambda$. We call the value of $n\lambda$, where $F_0$ is the $\alpha$-quantile
of this distribution, $L_{\alpha}$. We can construct the lower and upper boundaries
of a confidence interval for $p$ using the CDF for the $\chi^2$-distribution on $\nu_1$ degrees of 
freedom, denoted $G(.)$, as such:
\begin{equation}
\left[1-G\left(\frac{L_{\alpha/2}}{n}\right), \ 1-G\left(\frac{L_{1-\alpha/2}}{n}\right)\right].
(\#eq:DCI)
\end{equation}

Similarly, they suggest a point estimator of $p$ based on the median of 
$F_{\nu_1, \nu_2}(n\lambda)$. So that if we let $L_{0.5}$ be the value of 
$n\lambda$ where $F_0$ is the median of $F_{\nu_1, \nu_2}(n\lambda)$ the 
estimator is:
\begin{equation}
\hat{p}_D= 1-G\left(\frac{L_{0.5}}{n}\right).
(\#eq:pmed)
\end{equation}

Similarly we would construct a confidence interval around $\lambda$ by 
choosing $\lambda_u$ and $\lambda_l$ such that
<!-- However, the method to construct confidence intervals around $\delta$ by -->
<!-- @reiserConfidenceIntervalsMahalanobis2001 has a drawback. The intervals are -->
<!-- constructed by exploiting the result that if $\boldsymbol{Y} \sim -->
<!-- \mathcal{N}(\boldsymbol{\mu}, \ \Sigma)$ and $\boldsymbol{M} \sim \mathcal{W}(\Sigma,\ n-1)$ -->
<!-- then $\frac{\nu_2}{\nu_1}\boldsymbol{Y}'\boldsymbol{M}^{-1}\boldsymbol{Y} \sim F_{\nu_1,n- -->
<!-- \nu_1}(\boldsymbol{\mu}'\Sigma\boldsymbol{\mu})$, i.e. a non-central F-distribution with -->
<!-- NCP $\boldsymbol{\mu}'\Sigma\boldsymbol{\mu}$. From this we have: (## Make this stringent with the paragraphs above) -->
<!-- $$ -->
<!-- D^2 = \frac{n\nu_2}{(n-1)\nu_1}\hat{\delta}^2 \sim F_{\nu_1, \nu_2}\left(n\delta^2\right) -->
<!-- $$ -->
<!-- and the upper and lower boundaries of $\delta$, $\delta_u$ and $\delta_l$ -->
<!-- are chosen such that -->
\begin{equation}
\mathbb{P}\left[F_{\nu_1,n- \nu_1}(n\lambda_l)\leq F_0\right] = \frac{\alpha}{2}, \ \text{and} \  \mathbb{P}\left[F_{\nu_1,n- \nu_1}(n\lambda_u)\leq F_0\right] =1- \frac{\alpha}{2}.
(\#eq:freqci)
\end{equation}

However, this method has a drawback. It exactly matches
the nominal level, i.e. all of the say 95% intervals (assuming $\alpha= 0.05$) will in fact contain the
true value of $\delta$ 95% of the time, proved by
@wunderlichExactConfidenceIntervals2015. But when $F_0$ is small enough so that
$\mathbb{P}[F_{\nu_1,n- \nu_1}(0)\leq F_0] \leq 1- \frac{\alpha}{2}$, we cannot
find a $\lambda_u$ so that $\mathbb{P}\left[F_{\nu_1,n- \nu_1}(n\lambda_u)\leq F_0\right] =1- \frac{\alpha}{2}$
and it is instead set to 0. Similarly we set $\lambda_l$ to 0 when $\mathbb{P}[F_{\nu_1,n- \nu_1}(0)\leq F_0] \leq \frac{\alpha}{2}$. But we know with full certainty that an interval of [0, 0]
will not contain $\lambda$, since $\boldsymbol{y}^*$ is continuous with $\mathbb{P}[\boldsymbol{y}^* = \boldsymbol{\mu}] = 0$.

This problem follows in the construction of confidence intervals for $p$, 
(\@ref(eq:DCI)) as well as for the estimator $\hat{p}_D$, (\@ref(eq:pmed)).
The median of the distribution $F_{\nu_1, \nu_2}(n\lambda)$
decreases of course as the non-centrality parameter $n\lambda$ decreases. We
have that $\lambda \geq 0$, therefore the lower limit of the median for
$F_{\nu_1,\nu_2}(n\lambda)$ is the median of of this distribution when 
the non-centrality parameter $n\lambda = 0$, i.e. the median of a *central*
$F$ distribution on $\nu_1$ and $\nu_2$ degrees of freedom. So, using the
approach of @reiserConfidenceIntervalsMahalanobis2001, if $F_0$ is 
smaller than this limit one set $n\lambda$ to zero. 

We see, however, a bias in this
estimator when $F_0$ is small even if it is not smaller than the lower limit. 
@elfadalyPointEstimationAbnormality2016 gives as an example of this scenario where
we have $\nu_1 =4, \ \nu_2 = 20$ and $\lambda_0 = 0.4$, hence $n =24$ and 
we have $F_0 = \frac{24* 20}{23 * 4}0.4 = 2.09$. The value of the non-centrality
parameter $n\lambda$ where the median of $F_{\nu_1, \nu_2}(n\lambda)$ equals 2.09
is ~5.015 and $\hat{p}_D = 1-G_{4}\left(5.015/24\right) = 0.9949$ or 99.49%. 
So when using the estimated Mahalanobis index of $0.4$ we would infer that only 
0.51% of the population would exhibit a lower Mahalanobis index. If instead
calculating the true percentage of the population that would exhibit a smaller
Mahalanobis index when 0.4 is the true case value, that is $p = 1-G_4(0.4) = 0.9825$
or 98.25%, which is quite a considerable discrepancy. 

@garthwaiteModifiedConfidenceIntervals2017 proposed a solution to this in the
construction of confidence intervals of $\delta$ by modifying the method by
@reiserConfidenceIntervalsMahalanobis2001. They suggested to form a Bayesian
credible interval instead of the frequentist confidence interval, whenever
$F_0$ is small. Credible intervals have the more intuitive interpretation
that they *contain* the true value of interest with a probability of
$1-\alpha$. As such this interval will not give rise to unreasonable values
such as [0, 0]. But to form a credible interval, a prior must be specified.

Let $\boldsymbol{y}$ be a vector of values from an observation of the control population
and not necessarily the case. We denote the prior for $\delta$ when $\delta$
is the Mahalanobis distance of the case as $\pi(\delta)$ and when $\delta$
is the distance between the mean of the controls and a randomly selected observation
as $\psi(\delta)$. A priori $\delta$ should be larger when it is calculated from
the observation of the case than a randomly selected observation of the controls.
@garthwaiteModifiedConfidenceIntervals2017 assume that for any $\hat{\delta}^2$
and any specified quantile $\pi_q(\delta | \hat{\delta}) \geq \psi_q(\delta | \hat{\delta})$,
where $\pi_q(.)$ and $\psi_q(.)$ are the q:th quantile of the prior distributions.

Note that they do not aim to specify $\pi(\delta)$, instead they want to put limits on 
the quantiles of the resultant posterior through $\psi(\delta)$. This is because we 
know neither $\pi(\delta)$ nor $\pi_q(\delta | \hat{\delta})$ but it is possible to 
establish $\psi(\delta|\hat{\delta})$ and the $\alpha/2$ and $1-\alpha/2$ quantiles
of this distribution are then the upper and lower boundaries of a $1-\alpha$ credible interval
of $\delta$. 

The logic behind this modification is intuitive: treat the case like a randomly drawn observation 
from the control distribution whenever the case observation is more similar to the average
of the controls than the majority of the control observations. That is, we take the boundaries
that are lowest of the confidence and credible intervals. 

Again, we set $\lambda = \delta^2$ and $\hat{\lambda} = \hat{\delta}^2$ and let
$\psi^*(\lambda)$ be the prior corresponding to $\psi(\delta)$. This is a $\chi^2$
distribution on $\nu_1$ degrees of freedom. 
\begin{equation}
\psi^*(\lambda) = \frac{1}{2^{\nu_1/2}\Gamma(\nu_1/2)}\lambda^{\nu_1/2-1}e^{-\lambda/2}, \ \lambda \geq 0.
(\#eq:prior)
\end{equation}
If we set $m = n/(n-1)$ the likelihood of $\lambda$, $L(\lambda; \hat{\lambda})$, is given by
\begin{equation}
L(\lambda; \hat{\lambda}) = \sum_{r=0}^{\infty}\frac{(n\lambda/2)^re^{-n\lambda}}{B[(n-\nu_1)/2, \ \nu_1/2+r]r!}m^{\nu_1/2+r}\left(\frac{1}{1+m\hat{\lambda}}\right)^{n/2+r}\hat{\lambda}^{\nu_1/2+r-1}, \  \lambda \geq 0.
(\#eq:likeli)
\end{equation}
Combining the prior with the likelihood, $\psi^*(\lambda)L(\lambda; \hat{\lambda})$ yields the
posterior $\psi^*(\lambda | \hat{\lambda})$:
\begin{equation}
\psi^*(\lambda | \hat{\lambda}) = \frac{1}{c} \sum_{r=0}^{\infty}\frac{(n^r/2)(\lambda/2)^{\nu_1/2+r-1}
e^{-(n+1)\lambda/2}}{B[(n-\nu_1)/2, \ \nu_1/2+r]\Gamma(\nu_1/2)r!}m^{\nu_1/2+r}
\left(\frac{1}{1+m\hat{\lambda}}\right)^{n/2+r}\hat{\lambda}^{\nu_1/2+r-1}, \  \lambda \geq 0.
(\#eq:post)
\end{equation}

Here $B(.,\ .)$ and $\Gamma(.)$ are the beta and gamma function respectively, $c$ is the norming constant
which is obtained through numeric integration of \@ref(eq:post) over $0 < \lambda < \infty$.

Sums over infinite ranges are problematic for practical computation. One way do deal with this
is to set reasonable endpoints instead of infinity. For the implementation in `singcar` the endpoint of the 
infinite sum was set by noting that the denominator in (\@ref(eq:post))
goes to infinity faster than the numerator. In fact, in `R` and many other languages
$r! = \infty$ for $r > 170$, so a reasonable boundary for the infinite sum would then be $170$, 
because anything divided by infinity is 0. However, for some parameter values infinity will be
approached in the numerator before it is approached in the denominator. Infinity divided
by any real value is still infinity but infinity divided by infinity is undefined. 
Therefore, to be able to sum, we remove any infinite or undefined value if present.
As such, the integral over the sum in (\@ref(eq:post)) will always be convergent and the normalising constant
$c$ can be calculated.

`singcar` uses the function `integrate` in the base `R` package `stats` to do this. This 
numerical integration procedure, based on routines from the QUADPACK package [@piessens2012quadpack],
often yields correct integrals. However, it uses subdivision of the interval to be integrated
and when integrating to infinity the area of interest will be undersampled. Hence, 
in `singcar` we set:
$$
c = \int_0^{0.5\sqrt{\hat\lambda}}\psi^*(\lambda | \hat{\lambda})  d\lambda + \int_{0.5\sqrt{\hat\lambda}}^{\sqrt{\hat\lambda}}\psi^*(\lambda | \hat{\lambda})  d\lambda +
\int_{\sqrt{\hat\lambda}}^{1.5\sqrt{\hat\lambda}}\psi^*(\lambda | \hat{\lambda})  d\lambda +
\int_{1.5\sqrt{\hat\lambda}}^{2\sqrt{\hat\lambda}}\psi^*(\lambda | \hat{\lambda})  d\lambda +
\int_{2\sqrt{\hat\lambda}}^{\infty}\psi^*(\lambda | \hat{\lambda}) \ d\lambda.
$$
This ensures heavier sampling around $\sqrt{\hat\lambda}$, which of course is the area we are most
interested in. Further, for large values of $\hat\lambda$ some discontinuities will be introduced
in the posterior which can be problematic for some numerical integration techniques as well but
the above subdivision will alleviate this problem somewhat.

The q:th quantile of the posterior $\psi^*(\lambda | \hat{\lambda})$, $\lambda_q$ is then
found by an optimisation search procedure (`nlminb` in `stats`) that aims to minimise 
$$
\left| \int_{\lambda_0}^{\lambda_q} \psi^*(\lambda | \hat{\lambda})d\lambda - q \right|.
$$

To get an estimate of $p$ we use the same procedure to find the median of the posterior, $\lambda_{0.5}$.
@elfadalyPointEstimationAbnormality2016's modified estimator of $p$, $\hat{p}_{MD}$, is then:
$$
\hat{p}_{MD} = min\left[\hat{p}_D,  \ 1-G\left(\lambda_{0.5}\right) \right].
$$
Note however that due to the fact that $c$ will sometimes be very small and hence $1/c$
very large, computations of the normalised posterior can be very slow, but
this mostly happens when $\hat\delta$ is so large that $\hat{p}_D$ would
be greater than $1-G(\lambda_{0.5})$ anyway. In `singcar` it is therefore recommended 
to use $\hat{p}_{D}$ but to calculate $\hat{p}_{MD}$ if the Mahalanobis distance
of the case seems suspiciously small.

The difference between the estimators $\hat{p}_D$ and $\hat{p}_{MD}$ is small and 
@elfadalyPointEstimationAbnormality2016 show that their bias and means square error
are almost identical, but they recommend $\hat{p}_{MD}$. This is because, as
$\hat{\delta} \rightarrow 0, \ \hat{p}_D \rightarrow 1$ much faster than it,
by common sense, should. 

It should be noted that the implementation of $\hat{p}_{MD}$ in `singcar` is
unstable when compared to the implementation by
@garthwaiteModifiedConfidenceIntervals2017 written in `C`. I have not been able
to find the cause of this discrepancy, but it only happens for relatively large
$\hat{\delta}$. It is likely that the numerical integration procedures differ
and that the procedure used in the implementation by
@garthwaiteModifiedConfidenceIntervals2017 better handles discontinuous
functions, which we see in the posterior (\@ref(eq:post)) for high values of
$\hat{\delta}$. Alternatively, the discontinuities cause problems for the
optimisation routine. It is therefore recommended that both estimators are
calculated, and the smallest of the two are chosen, but if that is
$\hat{p}_{MD}$ then compare the result to the implementation by
@garthwaiteModifiedConfidenceIntervals2017.

<!-- @garthwaiteModifiedConfidenceIntervals2017 argue that the proportion -->
<!-- of the control population that exhibit a larger $\delta$ than the case should not be -->
<!-- bigger than when $\boldsymbol{y}$ is a randomly drawn observation from the controls (## what -->
<!-- does this actually mean?).  -->

### Example usage in `singcar`

Say that we want to know the case abnormality in the example data
on both tasks simultaneously. To estimate this abnormality on a 
multivariate space, call:
```{r mtdcode, eval=FALSE, echo=TRUE}
MTD(case = c(caseA, caseB), controls = cbind(contA, contB),
    conf_level = 0.95, method = c("pd", "pchi", "pf", "pmd"),
    mahalanobis_dist = NULL, k = NULL, n = NULL)
```
As can be seen, $\hat{p}_D$ is the default method to estimate the abnormality.
This is because the difference in results between $\hat{p}_D$ and $\hat{p}_{MD}$
is very small, but the difference in efficiency is huge. In addition, `"pmd"` in
`singcar` is actually $1-G(\lambda_{0.5})$ rather than $min[\hat{p}_D,
1-G(\lambda_{0.5})]$, hence taking the minimum is left to the user. The reason
for this was to minimise confusion. However, I realise that this could yield the
opposite effect and might be changed in future updates.

The $p$ value in the output of `MTD()` is the $p$ value associated
with the Hotelling's $T^2$ statistic and the same as the abnormality
estimate if the method `pf` is chosen. The three last arguments are
only required if summary statistics are used.

### Power calculators {#power}

A further capacity of `singcar`is that it can be used to calculate
statistical power of the tests. The notion of power when comparing cases to
control samples have been somewhat overlooked for this class of statistical tests.
In recent work [@mcintoshPowerCalculationsSinglecase2020], we argued that,
even though power is inherently limited in this paradigm, a priori calculations are
still useful for study design and interpretation in neuropsychological and other applications.
Calculating power for the test of deficit is similar to calculating power for
any \(t\) test and can be done analytically.
\begin{equation} power = 1 - \beta =
T_{n-1}\left(t_{\alpha, \ n-1} \Bigg\rvert \frac{x^* - \overline{x}}{\sigma
\sqrt{\frac{n+1}{n}}}\right) 
(\#eq:TDpower)
\label{eq:TDpower} 
\end{equation}
Where \(T_{n-1}(.\rvert \theta)\) is the cumulative distribution function for the
non-central \(t\) distribution with \(n-1\) degrees of freedom and non-centrality
parameter \(\frac{y^* - \overline{y}}{\sigma \sqrt{\frac{n+1}{n}}}\) (i.e., TD, Equation
\@ref(eq:TD) and \(t_{\alpha, \ n-1}\) is the \(\alpha\) quantile of the \emph{central}
\(t\) distribution on \(n-1\) degrees of freedom (note that this is for a one-sided
test). For the unstandardised difference test power is calculated in an
analogous way by putting Equation \@ref(eq:UDT) as the non-centrality parameter. Deriving
power for the other functions in an analytic manner is however not possible (the
RSDT is only approximately \(t\) distributed) and a Monte Carlo approach has been
used for these tests. To call any power calculator in the package one simply
uses the function names with `_power` added as a suffix.

So, for example, to calculate power for the test of deficit we call `TD_power()`.
The expected case score and either sample size or desired power must
be supplied. The mean and standard deviation of the control sample
can also be specified with the arguments `mean` and `sd`.
If not, they take the default values of 0 and 1 respectively so
that the case score is interpreted as distance from the mean
in standard deviations. A conventional \(\alpha\)-level of
\(0.05\) is assumed if nothing else is supplied. The alternative
hypothesis can also be specified by the argument `alternative`:
specify `"less"` (default) or `"greater"` for a one-tailed test, specify 
`"two.sided"` for a two-tailed test.
```{r TD-power}
TD_power(case = 70,
         mean = 100,
         sd = 15,
         sample_size = 16,
         power = NULL,
         alternative = "less",
         alpha = 0.05,
         spec = 0.005)
```

`TD_power()` can also be used to calculate required sample size for a
desired level of power. For example, if we specify a desired power level of 0.6,
leave `sample_size` to its default and let the rest of the arguments
be as in the previous example, we see from the output of the function that
power will not increase more than 0.5\% for any additional participant after a sample
size of 15. That is, the algorithm stops searching
when this level of specificity has been reached and we are nearing the
asymptotic maximum power for this effect size. We can increase the specificity
by lowering the `spec` argument.
```{r TDpowersize}
TD_power(case = 70,
         mean = 100,
         sd = 15,
         sample_size = NULL,
         power = 0.6,
         alternative = "less",
         alpha = 0.05,
         spec = 0.005)
```

Power calculators for the Bayesian tests of deficit cannot calculate
required sample size. This is because they rely on simulation methods
to estimate approximate power and deploying a search algorithm to find the required sample
size for a given level of power would be computationally too intense. The syntax
is otherwise relatively similar to that of `TD_power()`. For `BTD_power()`
we have the two extra arguments `nsim` and `iter`, indicating the number
of simulations used in the power function and by `BTD()`, respectively.
```{r BTDpower}
BTD_power(case = 70,
         mean = 100,
         sd = 15,
         sample_size = 15,
         alternative = "less",
         alpha = 0.05,
         nsim = 1000,
         iter = 1000)
```

The only difference in syntax of `BTD_cov_power()` is due to the inclusion of covariates.
The variate of interest must be specified as a vector where the first element
gives the mean and the second the standard deviation in the argument `control_task`.
The covariates can be specified similarly or as an \(m \times 2\) matrix where the first
column gives the means of each covariate and the second column gives the standard
deviations. The correlation matrix of the variates must be given as well. In the example
below, power is evaluated for a test taking two covariates into account, both with a mean
of 0 and a standard deviation of 1. The correlation is specified as a \(3\times 3\)
matrix with pairwise correlations of $0.3$. The default settings include only one
covariate having a $0.3$ correlation with the variate of interest.
This function is computationally intense and hence, the number
of simulations has, for the example below, been decreased.
```{r BTDcovpower}
covars <- matrix(c(0, 1,
                   0, 1), ncol = 2, byrow = TRUE)
BTD_cov_power(case = -2,
              case_cov = c(0.2, -0.6),
              control_task = c(0, 1),
              control_covar = covars,
              cor_mat = diag(3) + 0.3 - diag(c(0.3, 0.3, 0.3)),
              sample_size = 15,
              alternative = "less",
              alpha = 0.05,
              nsim = 100,
              iter = 100)
```

For the difference tests one must supply the expected case scores on both
variates as well as sample size. The means and standard deviations of the
control sample can also be specified. If unspecified, they take on the default values of
0 and 1 respectively, so that the expected case scores are interpreted as
distances from the means in standard deviations. `RSDT_power()`,
`BSDT_power()` and `UDT_power()` additionally require an estimate of
the sample correlation between the variates of interest, `r_ab`. If this is
not specified a correlation of 0.5 is assumed by default. For
`BSDT_cov_power()` the correlation matrix between the variates of interest
and the covariates must instead be supplied (i.e., at least a \(3\times3\) matrix
where the first correlation is the correlation between the variates of
interest).

The alternative hypothesis is by default assumed to be
`"two.sided"` since the direction of the effect is dependent on the
order of the inputs, but can be specified to be `"less"` or
`"greater"` as well. The syntax is similar for all three functions but with small
differences. For `UDT_power()` one can request required sample size for a
desired power, as for `TD_power()`. Calculators for the Bayesian tests
have the extra argument `calibrated` as to be able to specify the prior.
`BSDT_cov_power()` requires input in the same format as `BTD_cov_power()`
for both `control_tasks` and `control_covar`. The two examples
below demonstrate usage for `RSDT_power()` and `BSDT_cov_power()`.
```{r RSDTpower}
RSDT_power(case_a = 70,
           case_b = 55,
           mean_a = 100,
           mean_b = 50,
           sd_a = 15,
           sd_b = 10,
           r_ab = 0.5,
           sample_size = 15,
           alternative = "two.sided",
           alpha = 0.05,
           nsim = 1000)
```

```{r BSDTcovpower}
cor_mat <- matrix(c(1,   0.5, 0.6,
                    0.5,   1, 0.3,
                    0.6, 0.3,   1), ncol = 3, byrow = TRUE)
BSDT_cov_power(case_tasks = c(70, 55),
               case_cov = 65,
               control_tasks = matrix(c(100, 15,
                                        50, 10), ncol = 2, byrow = TRUE),
               control_covar = c(50, 25),
               cor_mat = cor_mat,
               sample_size = 15,
               alternative = "two.sided",
               alpha = 0.05,
               nsim = 100,
               iter = 100,
               calibrated = TRUE)
```

# Summary

This vignette has reviewed several statistical methods to compare a single case to
a small control sample. It has exemplified practical usage of these methods with
implementations in the package,
where possible, and in `lme4` where not. Using repeated measures data and linear
mixed models has been shown to potentially produce higher power when testing for
a discrepancy between two variates than the more standard RSDT, in a very small
simulation study. However, when no discrepancy is present but the case exhibits
severe impairments on both variates the mixed model yielded an extremely high
error rate, for the specific scenario investigated. So when estimating dissociations
it is recommended to use some of the Bayesian tests and aggregating the data. 

Since more complex model structures can be created with linear mixed models
compared to the more standard procedures it is unfortunate that their usage for
finding dissociations seems limited. This is on the other hand somewhat made up
for by the implementation of the Bayesian regression techniques in Section
\@ref(cov). In addition, implementation of (relatively) unbiased techniques
to estimate case abnormality in multidimensional space, Section \@ref(section4),
also offers the possibility of increased complexity. 

The methods described have been developed for keeping transparent control over
Type I errors, but power calculators have been implemented in the
package as well. Consideration of power can assist researchers in study design
and in setting realistic expectations for what these types of statistical
hypothesis tests can achieve [@mcintoshPowerCalculationsSinglecase2020].

# References

<div id="refs"></div>

# Appendix {-}

Code for the function `powerLMM()` which evaluates power for the linear
mixed model used to test for the presence of a dissociation and
compares it to the power for either the RSDT or the BSDT, with data
distributed as described in Section \@ref(lmmexample).


```{r powercode, eval=FALSE, echo=TRUE}
powerLMM <- function(nsim, case_impairment = c(0, 0, 0, 0),
                     compare_against = c("RSDT", "BSDT"), alpha = 0.05) {
  require(lme4)
  require(lmerTest)
  require(MASS)
  comptest <- match.arg(compare_against)
  pdt <- vector(length = nsim)
  plm <- vector(length = nsim)
  simdata <-  expand.grid(case = c(1, rep(0, 29)),
                          item = factor(1:2), condition = factor(1:2))
  simdata$ID <- factor(rep(1:30, 4)) 
  contrasts(simdata$condition) <- contr.sum(2)
  mu <- c(100, 80, 50, 20)
  sigma <- matrix(c(15^2, 120,    45,  7,
                    120, 10^2,    11, 15,
                    45,    11,  10^2, 40,
                    7,     15,    40, 5^2), nrow = 4, byrow = T)
  for (i in 1:nsim){
    dv <- mvrnorm(30, mu = mu, Sigma = sigma)
    dv[1, ] <- dv[1, ] + case_impairment
    simdata$dv <- c(dv)
    model2 <- lmer(dv ~ case*condition + (condition|ID) + (1| item), data = simdata)
    
    agg_data1 <- rowMeans(dv[ ,1:2])
    agg_data2 <- rowMeans(dv[ ,3:4])
    if (comptest == "RSDT"){
      pdt[i] <- RSDT(agg_data1[1], agg_data2[1], 
                     agg_data1[-1], agg_data2[-1])[["p.value"]]
    } else {
      pdt[i] <- BSDT(agg_data1[1], agg_data2[1], 
                     agg_data1[-1], agg_data2[-1])[["p.value"]]
    }
    plm[i] <- summary(model2)[["coefficients"]][4, 5]
  }
  if (comptest == "RSDT"){
    data.frame(RSDT = sum(pdt < 0.05)/nsim, LMM = sum(plm < 0.05)/nsim)
  } else {
    data.frame(BSDT = sum(pdt < 0.05)/nsim, LMM = sum(plm < 0.05)/nsim)
  }
}
```