---
title: "R interface to TensorFlow IO"
output: 
  rmarkdown::html_vignette: default
vignette: >
  %\VignetteIndexEntry{Introduction}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
type: docs
repo: https://github.com/tensorflow/io
menu:
  main:
    name: "Using TensorFlow IO"
    identifier: "tools-tfio-using"
    parent: "tfio-top"
    weight: 10
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(eval = FALSE)
```

## Overview

This is the R interface to datasets and filesystem extensions maintained by SIG-IO. Some example data sources that TensorFlow I/O supports are:

* Data source for Apache Ignite and Ignite File System (IGFS).
* Apache Kafka stream-processing.
* Amazon Kinesis data streams.
* Hadoop SequenceFile format.
* Video file format such as mp4.
* Apache Parquet format.
* Image file format such as WebP.

We provide a reference Dockerfile [here](https://github.com/tensorflow/io/blob/master/R-package/scripts/Dockerfile) for you
so that you can use the R package directly for testing. You can build it via:
```
docker build -t tfio-r-dev -f R-package/scripts/Dockerfile .
```

Inside the container, you can start your R session, instantiate a `SequenceFileDataset`
from an example [Hadoop SequenceFile](https://cwiki.apache.org/confluence/display/HADOOP2/SequenceFile)
[string.seq](https://github.com/tensorflow/io/blob/master/R-package/tests/testthat/testdata/string.seq), and then use any [transformation functions](https://tensorflow.rstudio.com/tools/tfdatasets/articles/introduction.html#transformations) provided by [tfdatasets package](https://tensorflow.rstudio.com/tools/tfdatasets/) on the dataset like the following:

```{r, eval=FALSE}
library(tfio)
dataset <- sequence_file_dataset("R-package/tests/testthat/testdata/string.seq") %>%
    dataset_repeat(2)

sess <- tf$Session()
iterator <- make_iterator_one_shot(dataset)
next_batch <- iterator_get_next(iterator)

until_out_of_range({
  batch <- sess$run(next_batch)
  print(batch)
})
```

You'll see the key-value pairs from `string.seq` file are printed as follows:

```
[1] "001"      "VALUE001"
[1] "002"      "VALUE002"
[1] "003"      "VALUE003"
[1] "004"      "VALUE004"
[1] "005"      "VALUE005"
[1] "006"      "VALUE006"
[1] "007"      "VALUE007"
[1] "008"      "VALUE008"
...
[1] "020"      "VALUE020"
[1] "021"      "VALUE021"
[1] "022"      "VALUE022"
[1] "023"      "VALUE023"
[1] "024"      "VALUE024"
[1] "025"      "VALUE025"
```
