---
title: "Basic R-Usage Guide for rMZQC"
author: "Chris Bielow <chris.bielow@fu-berlin.de>"
date: '`r Sys.Date()`'
output:
  html_document: default
  pdf_document: null
vignette: >
  %\VignetteIndexEntry{Basic R-Usage Guide for rMZQC}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
---

# Basic R-Usage Guide for rMZQC

This vignette serves as a quickstart guide for R users to create and save an mzQC document.

**Target Audience:** R users

## Read an mzQC file and extract some data


```{r, eval=TRUE}
library(rmzqc)
data = readMZQC(system.file("./testdata/test.mzQC", package = "rmzqc", mustWork = TRUE))
cat("This file has ", length(data$runQualities), " runqualities\n")
cat("  - file: ", data$runQualities[[1]]$metadata$inputFiles[[1]]$name, "\n")
cat("  - # of metrics: ", length(data$runQualities[[1]]$qualityMetrics), "\n")
cat("    - metric #1 name: ", data$runQualities[[1]]$qualityMetrics[[1]]$name, "\n")
cat("    - metric #1 value: ", data$runQualities[[1]]$qualityMetrics[[1]]$value, "\n")

```

Hint: if you receive an error such as 
```
Error in parse_con(txt, bigint_as_char) : 
lexical error: invalid char in json text.
cursor_int": [ NaN,NaN,NaN,NaN,825282.0,308263
(right here) ------^
```
when calling`readMZQC` this indicates that the mzQC is not valid JSON, since `NaN` values should be quoted (`"NaN"`) or replaced by `null` (unquoted), depending on context. In short: `null` may become an `NA` in R if part of an array, see https://github.com/jeroen/jsonlite/issues/70#issuecomment-431433773.


## Create a minimal mzQC document


```{r, eval=TRUE}
library(rmzqc)
## we need a proper URI (i.e. no backslashes and a scheme, e.g. 'file:')
## otherwise writing will fail
raw_file = localFileToURI("c:\\data\\special.raw", FALSE)

file_format = getCVTemplate(accession = filenameToCV(raw_file))
ptxqc_software = toAnalysisSoftware(id = "MS:1003162", version = "1.0.13") ## you could use 'version = packageVersion("PTXQC")' to automate further
run1_qc = MzQCrunQuality$new(metadata = MzQCmetadata$new(label = raw_file,
                         inputFiles = list(MzQCinputFile$new(basename(raw_file),
                                                             raw_file,
                                                             file_format)),
                         analysisSoftware = list(ptxqc_software)),
                         qualityMetrics = list(toQCMetric(id = "MS:4000059", value = 13405), ## number of MS1 scans
                                               toQCMetric(id = "MS:4000063", value = list("MS:1000041" = 1:3, "UO:0000191" = c(0.1, 0.8, 0.1))) # MS2 known precursor charges fractions
                                               )
                         )

mzQC_document = MzQCmzQC$new(version = "1.0.0", 
                             creationDate = MzQCDateTime$new(), 
                             contactName = Sys.info()["user"], 
                             contactAddress = "test@user.info", 
                             description = "A minimal mzQC test document with bogus data",
                             runQualities = list(run1_qc),
                             setQualities = list(), 
                             controlledVocabularies = list(getCVInfo()))

## write it out
mzqc_filename = paste0(getwd(), "/test.mzQC")
writeMZQC(mzqc_filename, mzQC_document)
cat(mzqc_filename, "written to disk!\n")

## read it again
mq = readMZQC(mzqc_filename)

## print some basic stats
gettextf("This mzQC was created on %s and has %d quality metric(s) in total.", dQuote(mq$creationDate$datetime), length(mq$runQualities) + length(mq$setQualities))

```

## Different Value Types (single value, n-tuple, table, matrix) for QualityMetric's

mzQC allows for 4 different kinds of value types for a `qualityMetric`:
 - Single value
 - n-tuple
 - table
 - matrix
 
See `toQCMetric()` for more examples.
 
Below we will create an example for each of them, so you know what R datastructures to employ.
Using a data.frame, instead of a list, will give you the wrong JSON formatting and leads to validation failures.

### Single Value

A single value (MS:4000003) is the easiest. Just assign a string or a number to the value attribute:

```{r, eval=TRUE}
qualityMetric_singleValue = toQCMetric(id = "MS:4000059", value = 13405) ## number of MS1 scans

## let's look at how this looks in JSON 
jsonlite::toJSON(qualityMetric_singleValue, pretty = TRUE, auto_unbox = TRUE)
```

### n-tuple

An n-tuple (MS:4000004) is just a vector in R:


```{r, eval=TRUE}
qualityMetric_tuple = toQCMetric(id = "MS:4000061", value = c(0.45, 0.76, 0.23)) ## MS1 density quantiles

## let's look at how this looks in JSON 
jsonlite::toJSON(qualityMetric_tuple, pretty = TRUE, auto_unbox = TRUE)
```

### Table

An table (MS:4000005) is a list (not a data.frame!) of columns.

The example below shows a table with two columns and three rows:


```{r, eval=TRUE}
qualityMetric_table =   # MS2 known precursor charges fractions
                  toQCMetric(id = "MS:4000063", value = list("MS:1000041" = 1:3,   ## charge state
                                                             "UO:0000191" = c(0.1, 0.8, 0.1)))   ## fraction of precursors with that charge
                     
## let's look at how this looks in JSON 
jsonlite::toJSON(qualityMetric_table, pretty = TRUE, auto_unbox = TRUE)
```                


### Matrix

A matrix (MS:4000006) is an R matrix:

At the point of writing, there is no matrix CV term yet. So we just make one up:

```{r, eval=TRUE}
qualityMetric_matrix = toQCMetric(id = "MS:40000??", value = matrix(1:9, 3, 3), allow_unknown_id = TRUE)
qualityMetric_matrix$name = "unknown metric"
## let's look at how this looks in JSON 
jsonlite::toJSON(qualityMetric_matrix, pretty = TRUE, auto_unbox = TRUE)

```                





