---
title: 'sinaplot: an enhanced chart for simple and truthful representation of single
  observations over multiple classes'
author: Nikos Sidiropoulos, Sina Hadi Sohi, Nicolas Rapin, Frederik
  Otzen Bagger
date: '`r Sys.Date()`'
vignette: >
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteIndexEntry{sinaplot}
  %\VignetteEncoding{UTF-8}
output: 
    rmarkdown::html_vignette
---

## Introduction

Recent developments in data science, in particular computational biology, often 
integrate data from several sources, over diverse experiments, or databases 
leaves a challenge of truthfully visualize data where the number of data points 
vary between classes. Plot types like bar charts, violin plot, strip charts or 
box-and-whiskers plots can provide visual information about mean/median, 
variance of the data, number of data points or density distribution of data; 
but only pairs of plots or dense overlays of these plot types will provide all 
the relevant information. To aid the presentation of datasets with differing 
sample size we have developed a new type of plot overcoming limitations of 
current standards visualization charts. 

**sinaplot** is inspired by the strip chart and the violin plot. By letting the 
normalized density of points restrict the jitter along the x-axis the plot 
displays the same contour as a violin plot, but resemble a simple strip chart 
for small number of data points. In this way the plot conveys information of 
both the number of data points, the density distribution, outliers and spread 
in a very simple, comprehensible and condensed format.

## Examples

```{r example, fig.align='center', fig.width=5, fig.height=4}
x <- c(rnorm(200, 4, 1), rnorm(200, 5, 2), rnorm(400, 6, 1.5))
groups <- c(rep("Cond1", 200), rep("Cond2", 200), rep("Cond3", 400))

library(sinaplot)
sinaplot(x, groups)
#Use any "plot" argument
sinaplot(x, groups, col = 2:4, pch = 20, bty = "n")
```

### Blood 

We use a cohort of 2095 AML, ALL and healthy bone marrow samples to illustrate 
some of the strengths  of **sinaplot**.

```{r blood, echo=FALSE, results='asis'}
knitr::kable(head(blood, 10))
```

---
```{r bloodDensity, fig.align='center', fig.height=6, fig.width=7}
sinaplot(Gene ~ Class, data = blood, pch = 20)
```

---

By setting the argument `scale = FALSE` we turn off the group-wise scaling based
on the class with the highest density.

```{r bloodScaleOff, fig.align='center', fig.height=6, fig.width=7}
sinaplot(Gene ~ Class, data = blood, pch = 20, scale = FALSE)
```

---

Using the `method = "counts"` to compute the borders we get a less smooth 
spread of the samples due to the absence of the kernel density estimate.

```{r bloodCounts, fig.align='center', fig.height=6, fig.width=7}
sinaplot(Gene ~ Class, data = blood, pch = 20, scale = FALSE, method = "counts")
```

---

Sinaplot aesthetics can be tweaked in the same manner as in 
[graphics::plot](https://rdrr.io/r/graphics/plot.html).

```{r bloodParams, fig.align='center', fig.height=6, fig.width=7}
par(mar = c(9,4,4,2) + 0.1)
n_groups <- length(levels(blood$Class))

sinaplot(Gene ~ Class, data = blood, pch = 20, xaxt = "n", col = rainbow(n_groups), 
         ann = FALSE, bty = "n")
axis(1, at = 1:n_groups, labels = FALSE)
text(x = 1:n_groups,
     y = par()$usr[3] - 0.1 * (par()$usr[4] - par()$usr[3]),
     labels = levels(blood$Class), srt = 45, xpd = TRUE, adj = 1,
     cex = .8)
```

## Comparison

Using a subset of the ```blood``` dataset we compare sinaplot with 5 popular 
plotting strategies, and show how our package integrates features from these
methods to achieve truthful and yet simple representation of multiclass
single variable data.
    <img src="../inst/figures/comparison.png" width = 650 height = 800>

## Session info

```{r sessionInfo}
sessionInfo()
```

