---
title: "Introduction to sdtmchecks"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Introduction to sdtmchecks}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---


The purpose of the `sdtmchecks` package is to help detect and investigate potential analysis relevant issues in SDTM data. This is done using a set of data check functions. These check functions are intended to be **generalizable**, **actionable**, and **meaningful for analysis**.

## Setting Up

To start using `sdtmchecks` first install it via

```{r, message=FALSE,eval=FALSE}
# install.packages("devtools")
devtools::install_github("pharmaverse/sdtmchecks", ref="main")
```

Then just load the package

```{r, message=FALSE}
library(sdtmchecks) 
```

## Documentation

Here's how to access the help page for the package
```{r, message=FALSE}
# type ??sdtmchecks into the console
??sdtmchecks 
``` 
 
## Metadata

The package comes with the `sdtmchecksmeta` dataset which contains metadata on each check function.
It contains details like function name, category, priority, and descriptions.
Each function is given a **Category** (Cross Therapeutic Area, Oncology, Covid-19, Patient Reported Outcomes, Ophthalmology) and a **Priority** (High, Medium, Low).

```{r, message=FALSE, eval=FALSE}
#Just type this in
sdtmchecksmeta
```

```{r, message=FALSE, echo=FALSE}
meta<-subset(sdtmchecksmeta, select=c("check","xls_title","category","priority","domains"))
colnames(meta)<-c("check","title","category","priority", "domains")
head(meta,n=10)
    
``` 


## Running a Check

Let's do an example using `check_ae_ds_partial_death_dates(AE,DS)`

This check flags records with partial death dates (i.e. length <10) in AE and DS. If any are found, then data check returns `FALSE` with attributes containing a list of flagged records as well as a brief message explaining the result. If no issues are detected the check returns `TRUE`.

```{r, message=FALSE, echo=FALSE}
# Create sample data frames
 AE <- data.frame(
  USUBJID = 1:3,
  AEDECOD = c("AE1","AE2","AE3"),
  AEDTHDTC = c("2017-01-01","2017",NA),
  stringsAsFactors=FALSE
 )
 DS <- data.frame(
  USUBJID = 4:7,
  DSSCAT = "STUDY DISCON", 
  DSDECOD = "DEATH",
  DSSTDTC = c("2018-01-01","2017-03-03","2018-01-02","2016-10"),
  stringsAsFactors=FALSE
 )
``` 

```{r, message=FALSE}
# Use sample data frames.
AE
DS
# Run the data check.
check_ae_ds_partial_death_dates(AE,DS)
```  

## Running Many Checks

Running all the checks on your data is super easy.
Just use the `run_all_checks` function.
This function assumes you have all of your sdtm datasets as objects in your global environment, e.g. `ae`,`dm`,`ex`,etc.

```{r, message=FALSE, error=FALSE, warning=FALSE, eval=FALSE }

# Read data to your global environment
ae = haven::read_sas("path/to/ae.sas7bdat")
ds = haven::read_sas("path/to/ds.sas7bdat")

# Run the checks and save as an object called "myreport"
myreport=run_all_checks(metads = sdtmchecksmeta,
               priority = c("High", "Medium", "Low"), #subset checks based on priority
               type = c("ALL", "ONC", "COVID", "PRO", "OPHTH"), #subset checks based category
               verbose = TRUE)

class(myreport) #results in a list object
names(myreport) #each check result is saved in a slot of the list
myreport[["check_ae_aedecod"]] #investigate the results of a check
``` 


The `run_all_checks` function also lets you easily subset on category or priority

```{r, message=FALSE, error=FALSE, warning=FALSE, eval=FALSE }

myreport=run_all_checks(metads = sdtmchecksmeta,
               priority = c("High"),
               type = c("ONC"),
               verbose = TRUE)
``` 

You can also choose specific checks to run.  Here's a way to get started with some checks that should work fairly well for most datasets

```{r, message=FALSE, error=FALSE, warning=FALSE, eval=FALSE }

# Read data to your global environment
ae = haven::read_sas("path/to/ae.sas7bdat")
cm = haven::read_sas("path/to/cm.sas7bdat")
dm = haven::read_sas("path/to/dm.sas7bdat")

# Subset to checks that should work OK for most datasets
metads = sdtmchecksmeta %>%
  filter(check %in% c("check_ae_aedecod",
                      "check_ae_aetoxgr",
                      "check_ae_dup",
                      "check_cm_cmdecod",
                      "check_cm_missing_month",
                      "check_dm_age_missing",
                      "check_dm_usubjid_dup",
                      "check_dm_armcd"
                      ))

myreport=run_all_checks(metads = metads,
               verbose = TRUE)
``` 

## Writing Out Results

You can then write results out to an xlsx for easy sharing.

```{r, message=FALSE, error=FALSE, warning=FALSE, eval=FALSE }

report_to_xlsx(res=myreport,outfile="check_report.xlsx")

```

## Making a Customizable Script

There's also a convenient helper function to write out a user friendly R script with all the check function calls.

```{r, message=FALSE, error=FALSE, warning=FALSE, eval=FALSE }

create_R_script(file = "run_the_checks.R")

```
