---
title: "Text Alignment"
author:
  - "Lincoln Mullen"
  - "Yaoxiang Li"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Text Alignment}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

Local alignment is the process of taking two documents and finding the best subset of each document that aligns with one another. A commonly used local alignment algorithm for genetics is the [Smith-Waterman algorithm](https://en.wikipedia.org/wiki/Smith-Waterman_algorithm). This package offers a version of the Smith-Waterman algorithm intended to be used for natural language processing.

Consider these two documents. The first is part of Shakespeare's *Measure for Measure*. The second is a made-up piece of literary criticism quoting the play, but our imaginary literary critic has bungled the quotation. This is a common class of problems (not bungling literary critics but) documents which contain pieces, often heavily modified, from other documents.

```{r}
shakespeare <- paste(
  "Haste still pays haste, and leisure answers leisure;",
  "Like doth quit like, and MEASURE still FOR MEASURE.",
  "Then, Angelo, thy fault's thus manifested;",
  "Which, though thou wouldst deny, denies thee vantage.",
  "We do condemn thee to the very block",
  "Where Claudio stoop'd to death, and with like haste.",
  "Away with him!")
critic <- paste(
  "The play comes to its culmination where Duke Vincentio, quoting from",
  "the words of the Sermon on the Mount, says,",
  "'Haste still goes very quickly , and leisure answers leisure;",
  "Like doth cancel like, and measure still for measure.'",
  "These titular words sum up the meaning of the play.")
```

We can uses the local alignment function to extract the part of the text that was borrowed. Notice that the resulting object shows us the changes that have been made.

```{r}
library(textreuse)
align_local(shakespeare, critic)
```

See the documentation for the function to see how to tune the match: `?align_local`. This function works with character vectors or with documents of class `TextReuseTextDocument`.

