---
title: "Import data from Bruker MALDI Biotyper"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{import-data-from-bruker-maldi-biotyper}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup}
library(maldipickr)
```

<!-- WARNING - This vignette is generated by {fusen} from dev/import-data.Rmd: do not edit by hand -->

<!-- Run this 'development' chunk -->
<!-- Store every call to library() that you need to explore your functions -->


<!--
 You need to run the 'description' chunk in the '0-dev_history.Rmd' file before continuing your code there.

If it is the first time you use {fusen}, after 'description', you can directly run the last chunk of the present file with inflate() inside.
--> 


<!-- 
 Store your dataset in a directory named "inst/" at the root of your project.
 Use it for your tests in this Rmd thanks to `pkgload::load_all()` to make it available
and `system.file()` to read it in your examples.

- There already is a dataset in the "inst/" directory to be used in the examples below
-->

The matrix-assisted laser desorption/ionization-time-of-flight (MALDI-TOF) technology is coupled with mass spectrometry in the Bruker MALDI Biotyper device in order to identify microorganisms.
The device generates two types of data:

1. A report of the identification using its proprietary database of mass spectrum projections (MSPs).
2. The raw mass spectrometry data.

The following vignette describe how to streamline the import of these two types of data into R using the [`{maldipickr}`](https://github.com/ClavelLab/maldipickr) package


# Importing generated reports from the Bruker MALDI Biotyper device

## Importing a single report

The Bruker MALDI Biotyper generates a report via the Compass software summarizing the identification of the microorganisms using its internal database.
While the file is separated by semi-colons, it contains no headers.
The report has many columns in a *wide* format to describe the ten hits when identification is feasible, or only a few when no identification was possible.
All-in-all, this makes the table import into R and its manipulation relatively painful.

Below is an example of an import of a single Bruker MALDI Biotyper report into a [`{tibble}`](https://tibble.tidyverse.org). By default, only the best hit of each colony is reported. All hits can be reported as well, in the *long* format (`long_format = TRUE`), for further explorations with the [`{tidyverse}`](https://tidyverse.tidyverse.org/) suite.

<!--
Create a chunk for the core of the function

- The chunk needs to be named `function` at least
- It contains the code of a documented function
- The chunk can also be named `function-my_median` to make it easily
findable in your Rmd
- Let the `@examples` part empty, and use the next `examples` chunk instead to present reproducible examples

After inflating the template

-  This function code will automatically be added in a new file in the "R/" directory
-->


<!--
Create a chunk with an example of use for your function

- The chunk needs to be named `examples` at least
- It contains working examples of your function
- The chunk is better be named `examples-my_median` to be handled
correctly when inflated as a vignette

After inflating the template

-  This example will automatically be added in the '@examples' part of our function above in the "R/" directory
- This example will automatically be added in the vignette created from this Rmd template
-->


```{r examples-read_biotyper_report}
# Get a example Bruker report
biotyper <- system.file("biotyper.csv", package = "maldipickr")
# Import the report as a tibble
report_tibble <- read_biotyper_report(biotyper)
# Display the tibble
report_tibble
```

<!--
Create a chunk with a test of use for your function

- The chunk needs to be named `tests` at least
- It contains working tests of your function
- The chunk is better be named `tests-my_median` to be handled
correctly when inflated as a vignette

After inflating the template

-  This test code will automatically be added in the "tests/testthat/" directory
-->


## Importing multiple reports

During large-scale analysis, batches of identification are run and can easily be imported using the `read_many_biotyper_reports` function along with their custom-made metadata.

Below is an example of such usage, where one report was artificially extended into multiple reports.


```{r examples-read_many_biotyper_reports}
# List of Bruker MALDI Biotyper reports
reports_paths <- system.file(
  c("biotyper.csv", "biotyper.csv", "biotyper.csv"),
  package = "maldipickr"
)
# Read the list of reports and combine them in a single tibble
read_many_biotyper_reports(
  reports_paths,
  report_ids = c("first", "second", "third"),
  # Additional metadata below are passed to dplyr::mutate
  growth_temperature = 37.0
)
```

  
# Importing spectra from the Bruker MALDI Biotyper device

Other than the identification reports, the Bruker MALDI Biotyper device outputs the raw data used for the identification (if not the database) in the form of mass spectra.
Thankfully, the [`{MALDIquant}`](https://strimmerlab.github.io/software/maldiquant/) and [`{readBrukerFlexData}`](https://cran.r-project.org/package=readBrukerFlexData) packages help users import and manipulate these data in R.


## Importing multiple spectra from a directory hierarchy

However, when the Bruker MALDI Biotyper device produces `acqus` files (instead of the native `acqu` files), the [`readBrukerFlexDir()`](https://rdrr.io/cran/readBrukerFlexData/man/readBrukerFlexDir.html) function from the [`{readBrukerFlexData}`](https://cran.r-project.org/package=readBrukerFlexData) package
 will fail with the following error message:

```
Error in .readAcquFile(fidFile = fidFile, verbose = verbose) :
File ‘/data/maldi_dir/targetA/0_D10/1/1SLin/acqu’ doesn't exists!
```

The following [`import_biotyper_spectra()`](https://clavellab.github.io/maldipickr/reference/import_biotyper_spectra.html)) function used in the example below circumvent this error by creating a symbolic link and conveniently helps removing calibration samples.

The toy dataset bundled with this package is a subset of a dataset in the [`{MALDIquantExamples}`](https://github.com/sgibb/MALDIquantExamples) package and consist here of six spectra:
* 1 replicate of species 1
* 2 replicates of species 2
* 3 replicates of species 3


```{r examples-import_biotyper_spectra}
# Get an example directory of six Bruker MALDI Biotyper spectra
directory_biotyper_spectra <- system.file(
  "toy-species-spectra",
  package = "maldipickr"
)
# Import the six spectra
spectra_list <- import_biotyper_spectra(directory_biotyper_spectra)
# Display the list of spectra
spectra_list
```

  
## Evaluate the quality of the spectra

Once the spectra are imported, the [`check_spectra()`](https://clavellab.github.io/maldipickr/reference/check_spectra.html) function can easily assess whether all the spectra in the list are not empty, of the same length and correspond to profile data.
If some spectra do not satisfy these criteria, the function will exit with a warning and indicate the faulty spectra.
Either way, the function outputs a list of logical vectors (`TRUE` or `FALSE`) indicating whether each of the spectra are empty (`is_empty`), of an odd length (`is_outlier_length`) or not a profile spectra (`is_not_regular`).


```{r examples-check_spectra}
# Get an example directory of six Bruker MALDI Biotyper spectra
directory_biotyper_spectra <- system.file(
  "toy-species-spectra",
  package = "maldipickr"
)
# Import the six spectra
spectra_list <- import_biotyper_spectra(directory_biotyper_spectra)
# Display the list of checks, with FALSE where no anomaly is detected
check_spectra(spectra_list)
# The overall sanity can be checked with Reduce
Reduce(any, check_spectra(spectra_list)) # Should be FALSE
```