---
title: fastTextR - Word representations
output: html_document
vignette: >
  %\VignetteIndexEntry{Word_representations}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
---

**fastTextR** is an **R** interface to the [fastText](https://github.com/facebookresearch/fastText)
library. It can be used to **word representation learning** *(Bojanowski et al., 2016)* and 
**supervised text classification** *(Joulin et al., 2016)*.
Particularly the advantage of **fastText** to other software is that, 
it was designed for biggish data.

The following example is based on the examples provided in the **fastText** library, 
the example shows how to use **fastTextR** for word representation.
For more informations about word representations can be found at the
[fastText  homepage](https://fasttext.cc/docs/en/unsupervised-tutorial.html).

```{r, eval=FALSE}
library("fastTextR")
```


## Load pretrained model
The training of these models can be quite time consuming therefore pre-trained 
models are a good option.

```{r, eval=FALSE}
model <- ft_load("cc.en.300.bin")
```

## Printing word vectors

```{r, eval=FALSE}
ft_word_vectors(model, c("asparagus", "pidgey", "yellow"))[,1:5]
```

```
##                   [,1]          [,2]         [,3]       [,4]         [,5]
## asparagus 0.0292057190 -0.0114405714 -0.003201437 0.03087331  0.127229080
## pidgey    0.0452978685  0.0090015158  0.067562237 0.11123407 -0.008441916
## yellow    0.0007776691 -0.0001886144  0.001824494 0.03869999  0.036413591
```

## Printing sentence vectors

```{r, eval=FALSE}
ft_sentence_vectors(model, c("Poets have been mysteriously silent on the subject of cheese", "Who did not let the gorilla into the ballet"))[,1:5]
```

```
???
```

## Nearest neighbor queries

```{r, eval=FALSE}
ft_nearest_neighbors(model, 'asparagus', k = 5L)
```

```
##   aspargus broccolini artichokes asparagus.  asparagas 
##  0.7316202  0.6995656  0.6930545  0.6915916  0.6911229
```

## Word analogies

```{r, eval=FALSE}
ft_analogies(model, c("berlin", "germany", "france"))
```

```
##        paris      france.      avignon  montpellier       paris. 
##    0.6831182    0.6408537    0.6288283    0.6138449    0.6059716 
##       rennes       london       Paris.       toulon montparnasse 
##    0.5884554    0.5832924    0.5743204    0.5727922    0.5715630
```


## References

[1] P. Bojanowski, E. Grave, A. Joulin, T. Mikolov, [*Enriching Word Vectors with Subword Information*](https://arxiv.org/abs/1607.04606)

```
@article{bojanowski2016enriching,
  title={Enriching Word Vectors with Subword Information},
  author={Bojanowski, Piotr and Grave, Edouard and Joulin, Armand and Mikolov, Tomas},
  journal={arXiv preprint arXiv:1607.04606},
  year={2016}
}
```

[2] A. Joulin, E. Grave, P. Bojanowski, T. Mikolov, [*Bag of Tricks for Efficient Text Classification*](https://arxiv.org/abs/1607.01759)

```
@article{joulin2016bag,
  title={Bag of Tricks for Efficient Text Classification},
  author={Joulin, Armand and Grave, Edouard and Bojanowski, Piotr and Mikolov, Tomas},
  journal={arXiv preprint arXiv:1607.01759},
  year={2016}
}
```
