---
title: "Read in SAS data in parallel into Spark"
author: "Jan Wijffels"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Read in SAS data in parallel into Spark}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

This R package allows R users to easily import large [SAS](https://www.sas.com) datasets into [Spark](https://spark.apache.org) tables in parallel.


The package uses the [spark-sas7bdat Spark package](https://spark-packages.org/package/saurfang/spark-sas7bdat) in order to read a SAS dataset in Spark. That Spark package imports the data in parallel on the Spark cluster using the Parso library and this process is launched from R using the [sparklyr](https://github.com/sparklyr/sparklyr) functionality. 

More information about the spark-sas7bdat Spark package and sparklyr can be found at:

- https://spark-packages.org/package/saurfang/spark-sas7bdat and https://github.com/saurfang/spark-sas7bdat
- https://github.com/sparklyr/sparklyr

## Example
The following example reads in a file called iris.sas7bdat in parallel in a table called sas_example in Spark. Do try this with bigger data on your cluster and look at the help of the [sparklyr](https://github.com/sparklyr/sparklyr) package to connect to your Spark cluster.

```{r, eval=FALSE}
library(sparklyr)
library(spark.sas7bdat)
mysasfile <- system.file("extdata", "iris.sas7bdat", package = "spark.sas7bdat")

sc <- spark_connect(master = "local")
x <- spark_read_sas(sc, path = mysasfile, table = "sas_example")
```

The resulting pointer to a Spark table can be further used in dplyr statements. These will be executed in parallel using the Spark functionalities of the spark-sas7bdat package.
```{r, eval=FALSE}
library(dplyr)
library(magrittr)
x %>% group_by(Species) %>%
  summarise(count = n(), length = mean(Sepal_Length), width = mean(Sepal_Width))
```


## Support in big data and Spark analysis

Need support in big data and Spark analysis?
Contact BNOSAC: http://www.bnosac.be
