---
title: "dmtools_intro"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{dmtools_intro}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

## Installation
```{r install, eval = FALSE}
library(dmtools)
```

## Overview
For checking the dataset from EDC in clinical trials. 
Notice, your dataset should have a postfix( \_V1 ) or a prefix( V1\_ ) in the names of variables. Column names should be unique.

* laboratory - Does the investigator correctly estimate the laboratory analyzes?
* dates - Do all dates correspond to the protocol's timeline?
* rename the dataset

## Usage

### laboratory

For laboratory check, you need to create the excel table like in the example.  

* AGELOW - number, >= number
* AGEHIGH - if none, type Inf, <= number  
* SEX - for both sex, use `|`
* LBTEST - What was the lab test name? (can be any convenient name for you)
* LBORRES* - What was the result of the lab test?
* LBNRIND* - How [did/do] the reported values compare within the [reference/normal/expected] range?
* LBORNRLO - What was the lower limit of the reference range for this lab test, >= 
* LBORNRHI - What was the high limit of the reference range for this lab test, <=

*column names without prefix or postfix


```{r refer, echo = FALSE, result = 'asis', warning = FALSE, message = FALSE}
library(knitr)
library(dmtools)
library(dplyr)

refs <- system.file("labs_refer.xlsx", package = "dmtools")
refers <- readxl::read_xlsx(refs)
kable(refers, caption = "lab reference ranges")
```


```{r dataset, echo = FALSE, result = 'asis'}

ID <- c("01", "02", "03")
AGE <- c("19", "20", "22")
SEX <- c("f", "m", "m")
V1_GLUC <- c("5.5", "4.1", "9.7")
V1_GLUC_IND <- c("norm", NA, "norm")
V2_AST <- c("30", "48", "31")
V2_AST_IND <- c("norm", "norm", "norm")

df <- data.frame(
  ID, AGE, SEX,
  V1_GLUC, V1_GLUC_IND,
  V2_AST, V2_AST_IND,
  stringsAsFactors = F
)

kable(df, caption = "dataset")
```


```{r lab}
# "norm" and "no" it is an example, necessary variable for the estimate, get from the dataset
# parameter is_post has value FALSE because a dataset has a prefix( V1_ ) in the names of variables
refs <- system.file("labs_refer.xlsx", package = "dmtools")
obj_lab <- lab(refs, ID, AGE, SEX, "norm", "no", is_post = FALSE)
obj_lab <- obj_lab %>% check(df)

# ok - analysis, which has a correct estimate of the result
obj_lab %>% choose_test("ok")

# mis - analysis, which has an incorrect estimate of the result
obj_lab %>% choose_test("mis")

# skip - analysis, which has an empty value of the estimate
obj_lab %>% choose_test("skip")

# all analyzes 
obj_lab %>% get_result()
```

### dates

For dates check, you need to create the excel table like in the example.

* MINUS, PLUS, VISITDY - parameter of a timeline
* VISITNUM - clinical encounter number, parameter for function e.g. `contains(num_visit)`
* VISIT - protocol-defined description of a clinical encounter (can be any convenient name) 
* STARTDAT - column name of start date, with postfix or prefix
* STARTVISIT - can be any convenient name of start date for you 
* IS_EQUAL - Boolean data type(T/F) to check date equality within a visit
* EQUALDAT - column name for check date's equality, with postfix or prefix

```{r timelines, echo = FALSE, result = 'asis', warning = F, message = FALSE}

dates <- system.file("dates.xlsx", package = "dmtools")
timeline <- readxl::read_xlsx(dates)
kable(timeline, caption = "timeline")
```

```{r dataset_dates, echo = FALSE, result = 'asis'}

id <- c("01", "02", "03")
screen_date_E1 <- c("1991-03-13", "1991-03-07", "1991-03-08")
rand_date_E2 <- c("1991-03-15", "1991-03-11", "1991-03-10")
ph_date_E3 <- c("1991-03-21", "1991-03-16", "1991-03-16")
bio_date_E3 <- c("1991-03-23", "1991-03-16", "1991-03-16")

df <- data.frame(
  id, screen_date_E1, rand_date_E2, ph_date_E3, bio_date_E3,
  stringsAsFactors = F
)

kable(df, caption = "dataset")
```


```{r date}
# use parameter str_date for search columns with dates, default:"DAT"
dates <- system.file("dates.xlsx", package = "dmtools")
obj_date <- date(dates, id, dplyr::contains, dplyr::matches)
obj_date <- obj_date %>% check(df)

# out - dates, which are out of the protocol's timeline
obj_date %>% choose_test("out")

# uneq - dates, which are unequal
obj_date %>% choose_test("uneq")

# ok - correct dates
obj_date %>% choose_test("ok")

# all dates
obj_date %>% get_result()
```
`dplyr::contains` - A function, which select necessary visit or event e.g. dplyr::start_with, dplyr::contains. It works like `df %>% select(contains("E1"))`. You also can use `dplyr::start_with`, works like `df %>% select(start_with("V1"))`

`dplyr::matches` - A function, which select dates from necessary visit e.g. dplyr::matches, dplyr::contains. It works like `visit_one %>% select(contains("DAT"))`, default: `dplyr::contains()`

### rename

Function to rename the dataset, using crfs.

```{r rename, eval = FALSE}
rename_dataset("./crfs", "old_name", "new_name", 2)
```

* "./crfs" - path to crfs
* "old_name" - variable for names in the dataset, without postfix or prefix
* "new_name" - variable for necessary names, names should be unique
* 2 - a position of a sheet in the excel document, where dmtools can find "old_name" and "new_name"