---
title: "Command Line Arguments"
author: "Paul Staab"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{scrm-Arguments}
  %\VignetteEngine{knitr::rmarkdown}
  %\usepackage[utf8]{inputenc}
---

scrm is uses a syntax compatible with the popular program [ms][1]. There are, however, a few differences to ms: 

* scrm can not simulate 
    * gene conversions (`-c` in ms) and
    * a fix number of segregating sites (`-s`),
* the option `-L` produces a slightly different output and
* it additionally implements the flags 
    * `-l` (approximation), 
    * `-sr` (changing recombination rate), 
    * `-st` (changing mutation rate),
    * `-eI` (sampling haplotypes at multiple time points) and
    * `-oSFS` (generates frequency spectra). 
* We do not support changing the number of populations with `-ema`. Our version of the command is just `-ema <t> <M11> <M12> ...` instead of `-ema <t> <npop> <M11> <M12> ...`.
    
For all other options, you can also refer to [ms' manual][1] to get a detailed description of what the commands are doing. scrm should happily execute any ms command that does not contain `-c`, `-s` and `-ema`. Also, scrm has somewhat stricter requirements regarding the order of arguments if population admixture (`-es`) is involved.

## General Syntax
The arguments for calling _scrm_ are

    scrm <nhap> <nrep> [...]

where _nhap_ is the total number of haplotypes (in all populations and at all times) that are simulated at each locus, and _nrep_ is the number of independent loci that will be produced. The `[...]` is an optional placeholder for an arbitrary number of command line flags described below.

## Basic Parameters

### Recombination

* `-r <R> <L>`: Set the recombination rate to _R = 4*N0*r_ and the length of all loci to L base pairs. _r_ is expected number of recombinations on the locus per generation.
* `-l <l>`: Use approximation rather than simulating the exact ARG. Within a sliding window of length _l_ base pairs all linkage information is considered when building the genealogy. To positions outside of this window, some linkage  is ignored. Setting _l=0_ produces the SMC' and _l=-1_ deactivates the approximation. Since v1.6.0, it's also possible to specify the window's length in number of recombinations. To do so, use `-l <x>r`, where x is the number of recombinations (e.g. `-l 100r` for a window spanning 100 recombinations). Also starting with version 1.6.0 **approximation is turned on by default** using a conservative window length of 500 recombinations. For most applications, it should be fine to reduce this value to 100 - 250 recombinations if runtime is a critical factor.

### Population structure & migration
In all commands, migrations rates _M = 4*N0*m_, where _m_ is the fraction of a population that is replaced with migrants from other populations each generation (looking forwards in time).

* `-I <npop> <s1> ... <sn> [<M>]`: Use an island model with _npop_ populations, where _s1_ to _sn_ haplotypes are sampled from population 1 to n, respectively. Optionally assume a symmetric migration rate of _M_.
* `-M <M>`: Assume a symmetric migration rate of _M/(npop-1)_. 
* `-m <i> <j> <M>`: Set the migration rate from population _j_ to population _i_ to _M_ (looking forward in time) [since v1.3.1].
* `-ma <M11> <M21> ... <M21> ...`: Set the migration matrix (Dimension is _npop x npop_). Diagonals elements are ignored but required (you can use `x` or `0`).  

### Population size changes
For exponential growth/decline of a population, the parameter _a_ changes the size of a population
according to the formula _N(s) = N(0)*exp(-a*s)_, where _N(0)_ is the population's size at the time of the command (e.g. _0_ for `-g <a>` and `-G <a>` and _t_ for `-eg <t> <a>` and `-eG <t> <a>`) and _N(s)_ is the size of the population _s_ time units in the past.
Looking forwards in time, a positive _a_ leads to population growth, while a negative one generates a
decline in population sizes.

* `-n <i> <n>`: Set the present day size of population _i_ to _n*N0_.
* `-G <a>`: Set the exponential growth rate of all populations to _a_.
* `-g <i> <a>`: Set the exponential growth rate of population _i_ to _a_.

### Summary Statistics

* `-t < $\theta$ >`: Set the mutation rate to $\theta = 4N_0u$, where _u_ is the neutral mutation rate per locus. If this options is given, scrm generates the segregating sites output.
* `-transpose-segsites` or `--transpose-segsites`: If given, the segregating sites are printed with each row representing a mutation and each column representing a haplotype, rather than the other way round. Additionally, the time at which a mutation occurred is reported (in units of _4 * N0_ generations) [since v1.7.0].
* `-T`: Print the local genealogies in newick format.
* `-O`: Print the local genealogies in the `oriented forest` format as described in [Kelleher _et al._ (2014)][2] [since v1.2].
* `-L`: Print the TMRCA and the local tree length for each segment (behaves different to ms). Both values are scaled in coalescent time units, e.g. in _4 * N0_ generations.
* `-oSFS`: Print the site frequency spectrum. Requires that the mutation rate $\theta$ is given with the '-t' option.
* `-SC [ms|rel|abs]`: Scaling of sequence positions. Either relative to the locus length between 0 and 1 (`rel`), absolute in base pairs (`abs`) or `ms`'s scaling (default) where the positions in the _segregating sites_ output are relative, and the positions in the trees output are absolute (`ms`) [since v1.3.0]. 

### Other options

* `-seed <SEED> [<SEED2> <SEED3>]`: Specifies a seed for the simulation. 
  You can input up to three non-negative numbers. If no seed is given, 
  scrm generates one using entropy provided by the operating system.
  To reproduce a previous simulation, use the single number in the second 
  line of the output.
* `-print-model, --print-model`: Prints information about the model defined 
   by the command line arguments, including calculated population sizes. 
   Can be useful for debugging or verifying the model [since v1.5.0]. 
* `-p <digits>`: Sets the number of significant digits used in the output [since v1.4.0].
* `-h`, `--help`: Prints a help text.
* `-v`, `--version`: Prints version information.


## Time specific parameters
The command this section all have a time _t_ as first parameter. Changes made by the commands affect the time from _t_ further back into the past. All times in units of _4*N0_ generations.

### Population structure & migration

* `-eI <t> <s1> ... <sn>`: Sample _s1_ to _sn_ haplotypes are from population _1_ to _n_, respectively, at time _t_. 
* `-eM <t> <M>`: Assume a symmetric migration rate of _M/(npop-1)_ at time _t_.
* `-em <t> <i> <j> <M>`: Set the migration rate from population _j_ to population _i_ to _M_ (looking forward in time) at time _t_  [since v1.3.1].
* `-ema <t> <M11> <M12> ... <M21> ...`: Set the migration matrix at time _t_ (Dimension is _npop x npop_). Diagonals elements are ignored but required (use 'x' or 0). The rates apply pastwards from time _t_.

### Population size changes

* `-eN <t> <n>`: Set the size of all populations to _n*N0_ at time _t_.
* `-en <t> <i> <n>`: Set the size of population _i_ to _n*N0_ at time _t_. 
* `-eg <t> <i> <a>`: Set the exponential growth rate of population _i_ to _a_ at time _t_.
* `-eG <t> <a>`: Set the exponential growth rate of all populations to _a_ at time _t_.

### Population Splits & Merges

* `-es <t> <i> <p>`: Population admixture. Replaces a fraction _1-p_ of population _i_ with haplotypes from a population _npop + 1_. Technically (and looking backwards in time), a new population _n+1_ with size _N0_ is created at time _t_. Migration (to & from) and growth rates for this population are initially 0. Each lines in population _i_ is moved to the new population with probability _1-p_. Please sort multiple `-es` arguments by their time to avoid confusion about the numbering of populations. Please give the arguments that affect the whole population (`-M`, `-N`, `-G` & `-ma`) before giving the first `-es`. Also, their timed equivalent's (`-eM`, `-eN`, `-eG`, `-eI` & `-ema`) position on the command line events must also be sorted by time, at least relative to the `-es` argument. 
`scrm` throws an error if any of these conditions is not met. In doubt, just sort all command line arguments by their time.
* `-eps <t> <i> <j> <p>`: Partial admixture. Similar to `-es` but replaces a fraction `1-p` of population _i_ with haploids from population _j_ at time _t_. Different to `-es`, population _j_ is a normal population that continues to exist at times more recent than _t_. 
Viewed backwards in time, this moves a fraction _1-p_ of the linages in population _i_ to population _j_. This does not change the number of populations, population sizes, growth or migration rates in any way [since v1.5.0].
* `-ej <t> <j> <i>`: Adds a specialization event in population _i_ that creates population _j_ (forwards in time). Technically (and looking backwards in time), it moves all lines from population _j_ into population _i_ at time _t_. Migration rates into population _j_ are set to 0 for the time further back into the past.
 
When multiple `es`, `eps` or `ej` arguments are given for the same time _t_, the migrations are executed in the order in which the commands are given. For example if we have `-es 0.08 2 .2 -ej 0.08 3 1`, first 80% of pop 2 move to a newly created pop 3 (viewed backwards in time), then everyone that just moved to pop 3 moves on to pop 1. This is equivalent to `-eps 0.08 2 1 .2`, except that the latter does not create the empty population 3.

## Sequence specific parameters
The following commands change the model parameters from at a sequence position _s_.
You should still set the initial rate with `-r` or `-t`, respectively, and then use
the commands prefixed with `s` for all changes. Note that `-r` also takes the total
length of the sequence as second argument, while `-sr` just has the rate as argument.

* `-sr <s> <R>`: Set the recombination rate to _R_ starting at position _s_.
* `-st <s> <$\theta$>`: Set the mutation rate to $\theta$ starting at position _s_. 


[1]: http://home.uchicago.edu/~rhudson1/source/mksamples.html
[2]: https://www.sciencedirect.com/science/article/pii/S0040580914000355
