---
title: "connector"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{connector}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

## Introduction

The `connector` package provides a set of functions to connect to different data sources (such as databases and file systems) and read and write data from them using a consistent interface.

It is designed to be a generic and extensible package, so that new data sources can be added easily.

This vignette demonstrates how to use the `connector` package to connect to either a **file system** or a **database** to access different types of data.

## Connector configuration

The main function in this package is `connect()`. This function, based on a 
configuration file or a list, creates a `connectors` object with a `connector`
for each of the specified data sources. 
The configuration file can be in list format, JSON, or YAML format.

The input list (or configuration file) must have the following structure:

 * Only `metadata`, `env`, and `datasources` fields are allowed.
 * All elements must be named.
 * **`datasources`** is mandatory.
 * **`metadata`** and **`env`** must each be a list of named character vectors of length 1.
 * **`datasources`** must be a list of unnamed lists.
 * Each datasource must have the named character element **`name`** and the named list element **`backend`**.
 * For each backend, **`type`** must be provided.

## Working example

```{r, include = FALSE}
# Use a temporary directory as working directory for the example below
tmp <- withr::local_tempdir()
knitr::opts_knit$set(root.dir = tmp)
```

Here is an example anyone can run to see how the `connector` package works.
We will use the configuration file provided below, which uses the file
system as the connection type for ADaM and TFL data.

```{r, include = FALSE}
'metadata:
  adam_path: !expr file.path(getwd(), "adam")
  tfl_path: !expr file.path(getwd(), "tfl")

datasources:
  - name: "adam"
    backend:
      type: "connector::connector_fs"
      path: "{metadata.adam_path}"
  - name: "tfl"
    backend:
      type: "connector::connector_fs"
      path: "{metadata.tfl_path}"
' |>
  writeLines("_connector.yml")
```

`_connector.yml:`
```yaml
metadata:
  adam_path: !expr file.path(getwd(), "adam")
  tfl_path: !expr file.path(getwd(), "tfl")

datasources:
  - name: "adam"
    backend:
      type: "connector::connector_fs"
      path: "{metadata.adam_path}"
  - name: "tfl"
    backend:
      type: "connector::connector_fs"
      path: "{metadata.tfl_path}"
```

As you can see, the configuration file contains metadata about the paths to the directories where the data will be stored, and two data sources: `adam` and `tfl`, both using the `connector_fs` backend to connect to file system folders.
Note that the paths to the directories are defined using metadata variables (e.g., `{metadata.adam_path}`), which allows you to easily change the paths in one place.

Now, let's run the example:

```{r, include = FALSE}
library(connector)
library(dplyr)
library(ggplot2)

# Let's create ADaM and TFL directories
dir.create("adam")
dir.create("tfl")
```

The first step is to create the connections to the data sources.

```{r}
# Load data connections
db <- connect()
```


Next, we manipulate the iris dataset and store it in the `adam` connector.
This means we will create a subset of the iris dataset and save it as an RDS file in the `adam` directory.

```{r}
## Iris data
setosa <- iris |>
  filter(Species == "setosa")
## Store data
db$adam |>
  write_cnt(setosa, "setosa.rds")
```

We can also create more complex summaries and store them in the same connector.

```{r}
mean_for_all_iris <- iris |>
  group_by(Species) |>
  summarise_all(list(mean, median, sd, min, max))

db$adam |>
  write_cnt(mean_for_all_iris, "mean_iris.rds")

## List and load data
db$adam |>
  list_content_cnt()
```


We can also read back the data we just created and filter it further using the `read_cnt()` function.

```{r}
# Read and filter data
setosa_filtered <- db$adam |>
  read_cnt("setosa") |>
  filter(Sepal.Length > 5)
```

Finally, we can create a plot with the `ggplot2` package and store it in the `tfl` connector.

```{r}
# Create a plot
plot_setosa <- ggplot(setosa_filtered) +
  aes(x = Sepal.Length, y = Sepal.Width) +
  geom_point()

## Store data and plot objects
db$tfl |>
  write_cnt(plot_setosa$data, "setosa_data.csv")
db$tfl |>
  write_cnt(plot_setosa, "setosa_plot.rds")

## Store plot image
tmp_file <- tempfile(fileext = ".png")
ggsave(tmp_file, plot_setosa)
db$tfl |>
  upload_cnt(tmp_file, "setosa_plot.png")

# List all files in the TFL directory
db$tfl |>
  list_content_cnt()
```
