\documentclass[a4paper]{article}
%\VignetteIndexEntry{Running coalescentMCMC}
%\VignettePackage{coalescentMCMC}
\usepackage[utf8]{inputenc}
\usepackage{fancyvrb}
\usepackage{color,natbib}

\newcommand{\code}{\texttt}
\newcommand{\pkg}{\textsf}
\newcommand{\coalmcmc}{\pkg{coalescentMCMC}}
\newcommand{\ape}{\pkg{ape}}

\author{Emmanuel Paradis}
\title{Running Coalescent Analyses With \coalmcmc}

\begin{document}
\maketitle

<<echo=false,quiet=true>>=
options(width=60)
@

\noindent Coalescent analysis is a
powerful approach to investigate the demography of populations using
genetic data. The coalescent is a random process describing the
coalescent times of a genealogy with respect to population size and
mutation rate. In the majority of cases, the genealogy of individuals
within a population is unknown. So a coalescent analysis typically
integrates over the likely genealogies to make inference on
the dynamics of the population. This uses computer-intensive methods
such as Monte Carlo simulations of Markov chains. Besides, if
priors are defined on the distributions of the parameters, Bayesian
inference can be done. Several methods have been proposed for such
integrations, although currently there is no consensus on which method
is the best, or which ones are the most appropriate in some circumstances
\cite{Felsenstein2007}.

\coalmcmc\ aims to provide a general framework to run coalescent
analyses. In its current version, the package provides a simple MCMC
algorithm based on Hastings's ratio.

\coalmcmc\ has three main groups of functions that have different roles:

\begin{itemize}
\item the function \code{coalescentMCMC} itself which runs the chain;
\item some functions doing operations on tree which are called by the
  previous one to move from one tree to another;
\item some functions to infer demography from genealogies under
  various coalescent models which are typically used to analyse the
  output of a chain run.
\end{itemize}

The motivating idea behind \coalmcmc\ is that the user can have full
control over the analysis. The options of the main function are:

<<>>=
library(coalescentMCMC)
args(coalescentMCMC)
@
where \code{ntrees} are the number of trees to output, and \code{tree0}
is the initial tree (if not provided, a UPGMA tree from a JC69-based
distance matrix is used). \code{model} is either \code{NULL} in which
case $\Theta$ is assumed to be constant, or \code{"time"} in which
case a model where $\Theta$ follows an exponential growth is
used.\footnote{Other models will be implemented later. See the
vignette ``CoalescentModels''.} Finally, \code{printevery} is an integer
controlling the print frequency of the progress of the chain.

The current implementation uses neighborhood rearrangement as
proposed in \cite{Kuhner1995} calling the function
\code{NeighborhoodRearrangement} at each step of the chain. Five other
tree moves are availables taken from \cite{Drummond2002}. This can
modified by using other functions described in \code{?treeOperators}.

Let us now consider a very simple analysis with the woodmouse data
available in \ape. For the purpose of this vignette, we run a very
light analysis in order to produce small outputs in a reasonable time.

<<echo=false,quiet=true>>=
data(woodmouse)
out <- coalescentMCMC(woodmouse, ntrees = 300, printevery = 0)
@

\begin{Schunk}
\begin{Sinput}
> data(woodmouse)
> out <- coalescentMCMC(woodmouse, ntrees = 300)
\end{Sinput}
\begin{Soutput}
Running the Markov chain:
  Number of generations to run: 200
Generation    Nb of accepted trees
   300                 162
Done.
\end{Soutput}
\end{Schunk}

The output object is of class \code{"coda"}, so we can visualise it with the package of the same name (which has already been loaded):

\setkeys{Gin}{width=\textwidth}
<<fig=true>>=
plot(out)
@
The log-likelihood was relatively stable between $-1870$ and
$-1880$. The trees are stored in a special place of the memory (an `environment' in R's jargon) from where they can be retrieved with a specific function:

<<>>=
TR <- getMCMCtrees()
TR
@
Note that the trees generated during the burn-in period are not output, but the corresponding values of log-likelihood and $\Theta$ are. Hence \code{out} has 400 rows.

<<>>=
dim(out)
colnames(out)
@

We now run a model of time-dependent coalescent where $\Theta$ follows an exponential change through time:

<<echo=false,quiet=true>>=
out2 <- coalescentMCMC(woodmouse, ntrees = 300, model = "time", printevery = 0)
@

\begin{Schunk}
\begin{Sinput}
> out2 <- coalescentMCMC(woodmouse, ntrees = 300, model = "time")
\end{Sinput}
\begin{Soutput}
Running the Markov chain:
  Number of generations to run: 300
Generation    Nb of accepted trees
   300                 154
Done.
\end{Soutput}
\end{Schunk}

<<fig=true>>=
plot(out2)
@
The change in log-likelihood along the chain is similar to what was observed above. The object \code{out2} has now three columns:
<<>>=
dim(out2)
colnames(out2)
@
If we try to extract the trees as previously done and R is running in interactive mode, we will be asked which list of trees to extract:

\begin{Verbatim}[formatcom=\color{blue}]
> getMCMCtrees()
Several lists of MCMC trees are stored:

1 : TREES_001
2 : TREES_002

Return which number?
\end{Verbatim}
It is also possible to extract the trees of a specific chain with its number, which is useful when the R code is not run interactively (such as this vignette):

<<>>=
TR2 <- getMCMCtrees(2)
@

The parameters of the MCMC runs are stored separately and can be extracted with:

<<>>=
getMCMCstats()
@

We can now compare both coalescent models: the two hypotheses under consideration are:

\begin{itemize}
\item H$_0$: $\Theta$ is constant;
\item H$_1$: $\Theta$ changes through time following an exponential model.
\end{itemize}


Other things that could be done with simple R commands include:

\begin{itemize}
\item Compute confidence intervals around $\hat{\Theta}_0$ and $\hat{\rho}$ (alternatively, posterior distributions of these parameters if a Bayesian sampling is done);
\item Re-run the chain(s) with different initial trees, for instance to run branching chains taking a tree from \code{TR} or \code{TR2}.
\item Use other models of time-dependent coalescent (see the other vignette in this package).
\end{itemize}

\bibliographystyle{plain}
\bibliography{coalescentMCMC}

\end{document}
