---
title: "rmcorr Estimates with NaN"
author: "Jonathan Bakdash and Laura Marusich"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
bibliography: references.bib 
vignette: >
  %\VignetteIndexEntry{rmcorr Estimates with NaN}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, tidy = FALSE)
options(width = 80)
library(knitr)
library(rmarkdown)
library(rmcorr) 
library(ggplot2)
library(ggExtra) 

```

### NaN estimates
This synthetic dataset produces NaN estimates with rmcorr. Thanks to Shreya Ghosh for this example. 

#### Running Examples Requires ggExtra [@ggExtra]
```{r, eval = FALSE}
install.packages("ggExtra")
require(ggExtra)
```

#### Load data, visualize, and model
```{r}
load(file = "../man/data/ghosh_synth.rda")

#Look at data
ghosh_synth #Note lots of repeated zeros in A3 and A4

set.seed(40) #Make jittering reproducible 
p <- ggplot(ghosh_synth, aes(x = A4, y = A3)) +
            geom_point(alpha = 0.2) +
            geom_jitter(width = 2, height = 2) 
p1 <- ggMarginal(p, type="histogram")
p1

rmc.ghosh <- rmcorr(Subject, A3, A4, ghosh_synth)

rmc.ghosh

#The default rmcorr plot doesn't jitter values, this masks identical values because they are drawn on top of each other
plot(rmc.ghosh)
```

The NaN estimates appear to be due to insufficient varability in the dataset. A possible way to address this issue is adding a small amount of random noise.

#### Add random noise
```{r}
set.seed(67) 
small.noise1 <- rnorm(dim(ghosh_synth)[[1]], 0, 0.2)
small.noise2 <- rnorm(dim(ghosh_synth)[[1]], 0, 0.2)
    
ghosh_synth$A3.noise <- ghosh_synth$A3 + small.noise1
ghosh_synth$A4.noise <- ghosh_synth$A4 + small.noise2

rmc.ghosh.noise <- rmcorr(Subject, A3.noise, A4.noise, ghosh_synth)

rmc.ghosh.noise

p2 <- ggplot(ghosh_synth, aes(x = A3.noise, y = A4.noise, 
       group = factor(Subject), color = factor(Subject))) +
       ggplot2::geom_point(ggplot2::aes(colour = factor(Subject), 
                                        alpha = 0.10)) +
       ggplot2::geom_line(aes(y = rmc.ghosh.noise$model$fitted.values),
                         linetype = 1) +
     theme(legend.position="none")

p3 <- ggMarginal(p2, type="histogram")
p3
```

#### Caveats
The results with rmcorr should be interpreted with some caution because the data are non-normal with zero-inflation. Still, these results  provides at least a starting point: A common linear association around 0. A much more complicated alternative is fitting a multilevel model with an appropriate distribution for zero-inflated data (e.g., negative binomial distribution or zero-inflated Poisson). 
