---
title: "An Introduction to CHNOSZ"
author: "Jeffrey M. Dick"
output:
  tufte::tufte_html:
    tufte_features: ["background"]
    toc: true
    toc_depth: 4
    mathjax: null
    margin_references: false
vignette: >
  %\VignetteIndexEntry{An Introduction to CHNOSZ}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
bibliography: vig.bib
link-citations: yes
csl: elementa.csl
---

```{css, echo=FALSE}
html { 
  font-size: 14px;
}
/* zero margin around pre blocks (looks more like R console output) */
pre {
  margin: 0;
  padding: 0;
}
.highlight {
  background-color: #F27121cc;
  border-radius: 6px;
  padding: 0px 4px;
  box-shadow: 0 2px 4px rgba(0, 0, 0, 0.2);
  display: inline-block;
}
```

```{js, echo=FALSE}
function ToggleDiv(ID) {
  var D = document.getElementById("D-" + ID);
  var B = document.getElementById("B-" + ID);
  if (D.style.display === "none") {
    // open the div and change button text
    D.style.display = "block";
    B.innerText = "Hide code";
  } else {
    // close the div and change button text
    D.style.display = "none";
    B.innerText = "Show code";
  }
}
```

```{r HTML, include=FALSE}
# Some frequently used HTML expressions
# Use lowercase because some of these are used as variables in the examples
h2o <- "H<sub>2</sub>O"
h2 <- "H<sub>2</sub>"
o2 <- "O<sub>2</sub>"
co2 <- "CO<sub>2</sub>"
h2s <- "H<sub>2</sub>S"
Psat <- "<i>P</i><sub>sat</sub>"
zc <- "<i>Z</i><sub>C</sub>"
```

```{r setup, include=FALSE}
library(knitr)

options(width = 180)

# Function to colorize messages 20171031
# Adapted from https://gist.github.com/yihui/2629886#file-knitr-color-msg-rnw
color_block = function(color) {
  function(x, options) sprintf('<pre style="color:%s">%s</pre>', color, x)
}
# Set knitr hooks
knit_hooks$set(
  warning = color_block('magenta'),
  error = color_block('red'),
  message = color_block('blue'),
  # Use pngquant to optimize PNG images
  pngquant = hook_pngquant
)

# Define a custom hook to use smaller plot margins
knit_hooks$set(small_mar = function(before, options, envir) {
  if(before) {
    par(mar = c(5, 5, 1, 1))
  }
})

# Define variables used below
basedpi <- if(nzchar(Sys.getenv("CHNOSZ_BUILD_LARGE_VIGNETTES"))) 60 else 40
pngquant <- if(nzchar(Sys.getenv("CHNOSZ_BUILD_LARGE_VIGNETTES"))) "--speed=1 --quality=0-25" else "--speed=1 --quality=0-10"
# Avoid errors if pngquant isn't available (perhaps on R-Forge?)
if (!nzchar(Sys.which("pngquant"))) pngquant <- NULL 

# Set chunk options
opts_chunk$set(
  tidy = FALSE,
  # Figure options
  fig.margin = TRUE, fig.width = 6, fig.height = 4,
  out.width = "100%", dpi = basedpi,
  cache = TRUE, pngquant = pngquant,
  # From "Tufte Handout" example dated 2016-12-27
  # Invalidate cache when the tufte version changes
  cache.extra = packageVersion('tufte')
)
```

This vignette introduces CHNOSZ, an R package for thermodynamic calculations relevant to geochemistry and geochemical biology.
CHNOSZ provides functions and a thermodynamic database for calculating properties of reactions involving minerals,
aqueous species, and gases across a range of temperatures and pressures.

This vignette was compiled on `r Sys.Date()` with CHNOSZ `r packageDescription("CHNOSZ")$Version`.

## Getting Started

After installing CHNOSZ from CRAN, load the package:

```{r library_CHNOSZ}
library(CHNOSZ)
```

This makes the thermodynamic database and functions available in your R session. To restore default settings at any point, use `reset()`.

## Basic Functionality

### Organization of CHNOSZ

CHNOSZ is made up of interrelated functions and supporting data.
The major components of the package are shown in the flow diagram.
Ellipses represent data sources and rectangles represent functions.
Bold arrows and functions show the most common workflows (described in italic legends).
Dashed arrows represent internal flows of data.

<!-- https://stackoverflow.com/questions/14675913/how-to-change-image-size-markdown -->
![Flow diagram for CHNOSZ](CHNOSZ.png){ width=75% }

CHNOSZ offers several primary functions for thermodynamic analysis:

#### Functions without side effects (return values)

* `info()`: Search the thermodynamic database
* `subcrt()`: Calculate thermodynamic properties of species and reactions
* `affinity()`: Calculate affinities of formation reactions
* `equilibrate()`: Calculate equilibrium chemical activities
* `diagram()`: Plot the results

#### Functions with side effects (modify system state)

* `basis()`: Set basis species and their chemical activities
* `species()`: Set species of interest and their activities
* `reset()`: Reset database and system settings to defaults

### Querying the thermodynamic database

The `info()` function provides access to the OBIGT thermodynamic database.

```{r info_CH4}
# Get database index for aqueous methane
info("CH4")
```

Searching by formula returns the aqueous species if it is available.
Use a species name or add the state to get a particular physical state - `aq`, `cr`, `gas`, or `liq`:

```{r info_methane}
# Two ways to lookup methane gas
info("methane")
info("CH4", "gas")
```

Use `info()` recursively to return thermodynamic parameters:

```{r info_info_CH4}
# Get thermodynamic properties for aqueous methane
info(info("CH4"))
```

You can access fuzzy search functionality by using partial names:

```{r info_fuzzy}
# Search for ribose-related entries
info("ribose+")
```

### Calculating thermodynamic properties

The `subcrt()` function [loosely named after SUPCRT; @JOH92] calculates standard thermodynamic properties.

The default conditions are 0.01--350 °C along the `r Psat` curve (defined here as the greater of 1 bar or the vapor-liquid saturation pressure for `r h2o`):

```{r subcrt_CH4}
# Properties of aqueous methane at default T and P
subcrt("CH4")
```

You can customize the *T*-*P* grid by passing the appropriate arguments:

<p></p>
```{r subcrt_TP}
# Custom T, P grid for water in supercritical region
subcrt("H2O", T = c(400, 500), P = c(250, 300))
```

Unit settings for `subcrt()` are handled by `T.units()`, `P.units()`, and `E.units()`:

```{r E.units}
# Change energy units from Joules to calories
E.units("cal")
subcrt("CH4", T = 25)
reset()  # Restore defaults
```

### Working with reactions

Define reactions with species names or formulas, states (optional), and coefficients:

NOTE: Reaction coefficients are negative for reactants and positive for products.

```{r subcrt_CO2_reaction}
# CO2 dissolution reaction
subcrt(c("CO2", "CO2"), c("gas", "aq"), c(-1, 1), T = 25)
```

Reactions can be automatically balanced using basis species:

- Here, the defined basis species are used to balance a reaction with `subcrt()` for calculating standard thermodynamic properties.

```{r basis_subcrt}
basis(c("CO2", "H2O", "H+", "e-"))
subcrt(c("CH4", "acetate"), c("aq", "aq"), c(1, -1), T = 25)
```

- Here, the defined basis species are used to compose the formed species with `species()` for calculating affinities and making diagrams.

```{r basis_species}
basis(c("CO2", "H2O", "H+", "e-"))
species(c("CH4", "acetate"))
```

There are some keywords (e.g. `CHNOS+`, `CHNOSe` and `QEC`) for loading predefined sets of basis species.
See the help page of `basis()` for more information.

### Chemical affinity and stability diagrams

```{r setup2, include = FALSE, cache = FALSE}
# Change the default options for the following chunks to hide results and messages
opts_chunk$set(
  results="hide", message=FALSE
)
```

The `affinity()` function calculates chemical affinities over ranges of *T*, *P*, and activities:

```{r diagram, fig.cap = "Eh--pH (Pourbaix) diagram for S"}
# Setup the C-H-N-O-S basis system with electron
basis("CHNOSe")
# Define aqueous sulfur species
species(c("SO4-2", "HSO4-", "HS-", "H2S"))
# Calculate affinities on an Eh-pH grid
a <- affinity(pH = c(0, 14), Eh = c(-0.8, 1.2))
# Create an Eh-pH (Pourbaix) diagram
diagram(a, col = 4, col.names = 4, italic = TRUE)
# Add legend
TC <- convert(a$T, "C")
legend <- c(describe.property("T", TC), quote(italic(P) == "1 bar"))
legend("topright", legend = legend, bty = "n")
```

NOTE: `diagram()` automatically adds shading to regions of water instability with respect to `r o2` or `r h2`.

For more sophisticated diagrams involving speciation of basis species, use the `mosaic()` function:

```{r mosaic, fig.cap = "Mosaic diagram showing effect of aqueous S speciation on the relative stabilities of Cu minerals and aqueous species"}
# Create a mosaic diagram for Cu-S-Cl-H2O system
basis(c("Cu", "H2S", "Cl-", "H2O", "H+", "e-"))
basis("H2S", -6)
basis("Cl-", -1)
species(c("CuCl", "CuCl2-", "CuCl3-2", "CuCl+", "CuCl2", "CuCl3-", "CuCl4-2"))
species(c("chalcocite", "tenorite", "cuprite", "copper"), add = TRUE)
bases <- c("H2S", "HS-", "HSO4-", "SO4-2")
m <- mosaic(bases, pH = c(0, 12), Eh = c(-1, 1), T = 200)
a <- m$A.species
diagram(a, lwd = 2, bold = species()$state == "cr")
diagram(m$A.bases, add = TRUE, col = 4, col.names = 4, italic = TRUE)
water.lines(m$A.species, lty = 3, col = 8)
# Add legend
legend <- lTP(convert(a$T, "C"), a$P)
legend("topright", legend = legend, bty = "n")
```

Here we've added dotted lines to help visualize the water stability limits.

### Equilibrium calculations

Calculate equilibrium distributions of species:

```{r equilibrate, fig.cap="Carbonate speciation as a function of pH and temperature"}
# Carbonate speciation vs pH
basis(c("CO2", "H2O", "H+", "e-"))
species(c("CO2", "HCO3-", "CO3-2"))
# 25 degrees C
a <- affinity(pH = c(0, 14))
e <- equilibrate(a)
diagram(e, alpha = TRUE)
# 100 degrees C
a <- affinity(pH = c(4, 12), T = 100)
e <- equilibrate(a)
diagram(e, alpha = TRUE, add = TRUE, col = 2, names = NA)
# Add legend
legend <- as.expression(list(lT(25), lT(100)))
legend("left", legend = legend, lty = 1, col = c(1, 2))
```

Calculate solubility of minerals or gases:

```{r corundum_solubility, fig.cap="Solubility of corundum (green line) and equilibrium concentrations of aqueous species (black lines)"}
# Corundum solubility vs pH
basis(c("Al+3", "H2O", "H+", "e-"))
species("corundum")
iaq <- info(c("Al(OH)4-", "Al(OH)3", "Al(OH)2+", "AlOH+2", "Al+3"))
s <- solubility(iaq, pH = c(2, 10), in.terms.of = "Al+3")
diagram(s, col = "green4", lwd = 2, ylim = c(-10, -2))
diagram(s, type = "loga.equil", add = TRUE)
legend("topright", c("25 °C", "1 bar"), bty = "n")
```

### Activity coefficients

Incorporate non-ideal behavior using the extended Debye--Hückel equation by setting the ionic strength parameter `IS`:

```{r corundum_solubility_IS, fig.cap="Solubility of corundum dependent on ionic strength"}
# Corundum solubility again
basis(c("Al+3", "H2O", "H+", "e-"))
species("corundum")
iaq <- info(c("Al(OH)4-", "Al(OH)3", "Al(OH)2+", "AlOH+2", "Al+3"))
# Calculate with ionic strength of 0 and 1 molal
s0 <- solubility(iaq, pH = c(2, 10))
s1 <- solubility(iaq, pH = c(2, 10), IS = 1)
diagram(s1, col = 4, lwd = 3, ylim = c(-8, -2))
diagram(s0, col = "green4", lwd = 2, add = TRUE)
legend("topleft", legend = c(1, 0), lwd = c(3, 2), col = c(4, "green4"), title = "I (mol/kg)", bty = "n")
legend("topright", c("25 °C", "1 bar"), bty = "n")
```

Functions that have the `IS` parameter are `subcrt()`, `affinity()`, `mosaic()`, and `solubility()`.
When a value for `IS` is specified, inputs and outputs labeled as activities are actually interpreted or reported as molalities.

### References

The CHNOSZ package incorporates data and methods from various sources. Use `thermo.refs()` to view citation information for data sources:

```{r refs, eval = FALSE}
# Return data frame with references for one or more species
thermo.refs(info("CH4", c("aq", "gas")))
# View all references in a browser
thermo.refs()
```

For citing CHNOSZ itself, see "[How should CHNOSZ be cited?](FAQ.html#how-should-chnosz-be-cited)" in the FAQ.

## Interlude: Affinity, Formation Reactions, and Balancing

The idea for creating stability diagrams in CHNOSZ came from Prof. Harold Helgeson's geochemistry course at UC Berkeley.
There, the students constructed diagrams that were "balanced on" a metal.
For instance, in a reaction between two minerals balanced on aluminum, Al is only present in the minerals on either side of the reaction and not as an ion.

The line-based method, used for making diagrams by hand, looks at reactions between pairs of species (let's call them transformation reactions),
then draws a line between stability fields where the non-standard Gibbs energy of reaction is zero.
The grid-based method, used in CHNOSZ, looks at reactions to compose individual species from the basis species (let's call them formation reactions),
then selects the most stable species according to their affinity values.

Affinity is just the opposite of non-standard Gibbs energy of reaction.
"Standard Gibbs energy of reaction" and "Gibbs energy of reaction"&nbsp;– which are different concepts&nbsp;– have unfortunately similar names except for
an optional *overall* or *non-standard* in front of the latter [the word choice varies among authors, e.g. @AL19;@STK19].
"Non-standard Gibbs energy of reaction" doesn't lend itself to a short, unambiguous function name, which is why its opposite, *affinity*, is used in CHNOSZ.

In the line-based method, transformation reactions are said to be *balanced on* a metal.
The grid-based method implements this balancing constraint by dividing the affinities of formation reactions by the stoichiometric coefficients of a basis species.
CHNOSZ uses these normalized affinities for making relative stability diagrams, a technique referred to as the maximum affinity method [@Dic19].
The line- and grid-based methods both have the same limitation: every species considered in the relative stability calculation
must have non-zero stoichiometry of the metal the transformation reactions are balanced on (or equivalently of the *conserved* basis species that has that metal).

## Caution: Quasisolubility contours on predominance diagrams underestimate total solubility

A useful technique in geochemical modeling is to construct "composite diagrams" [@GC65],
where stability fields for minerals and predominance fields for aqueous species are both displayed on the same plot.
Because mineral activities are assumed to be unity while aqueous species activities can vary,
the choice of aqueous species activity defines a concentration-dependent boundary between mineral and aqueous stability fields on composite diagrams.
These solubility boundaries between minerals and aqueous species are referred to as either "equisolubility" [@Pou74] or "isosolubility" [@Hel64;@GC65] lines.

CHNOSZ provides two distinct approaches for calculating isosolubility lines:

**First approach (quasisolubility contours):** This method loads multiple aqueous species with identical activities,
then constructs the predominance diagram using the maximum affinity method.
However, isosolubility lines calculated this way represent the reaction between the most stable mineral and *a single aqueous species* at each point on the diagram.
These can be called "quasisolubility contours" because they provide only a partial view of mineral solubility.

**Second approach (total solubility):** This method sums the activities of all candidate aqueous species in equilibrium with the most stable mineral,
providing isosolubility lines that represent the contribution from *all relevant aqueous species*.

The first approach is implemented in CHNOSZ by setting the activities of formed aqueous species,
then creating a predominance diagram with the maximum affinity method using `diagram()`.
[See below](#set-activities-of-formed-species-to-define-a-quasisolubility-contour) for an example of this method.
In essence, quasisolubility contours on predominance diagrams are equivalent to previously described "solubility limits" on activity diagrams [@OB79].
The term "quasisolubility" emphasizes that these contours provide only first-order estimates of solubility, considering just one aqueous species at a time.

CHNOSZ offers a second approach that improves upon the maximum affinity method by accounting for contributions from multiple aqueous species to total solubility.
This approach is implemented in the `solubility()` function ([see below](#use-solubility-to-draw-total-solubility-contours)).

The following example demonstrates the difference between these two approaches:

```{r quasisolubility, echo=FALSE, fig.cap="The total solubility contour (green) at log *m*Fe = -6 lies inside the quasisolubility contour (boundary between tan and blue areas), showing that the latter underestimates solubility; the largest contributions to total solubility are from Fe^+2^ and FeCl^+^ at low pH and FeO~2~^-^ and HFeO~2~^-^ at high pH", fig.margin=FALSE, fig.width=12, fig.height=4.2, cache=TRUE}
par(mfrow = c(1, 2))

# System definition
metal <- "Fe"
# The concentration to be used for the quasisolubility contour
logm_metal <- -6
T <- 150
P <- "Psat"
Stot <- 1e-3
# Approximate chloride molality and ionic strength for mNaCl = 1.9
m_Clminus <- IS <- 1.5
# Plot variables
res <- 300
pH <- c(2, 12, res)
O2 <- c(-55, -40, res)

# Setup basis species
basis(c(metal, "H2S", "Cl-", "oxygen", "H2O", "H+"))
basis("H2S", log10(Stot))
basis("Cl-", log10(m_Clminus))
# Define basis species to swap through for mosaic calculation
bases <- c("H2S", "HS-", "HSO4-", "SO4-2")

# Get minerals and aqueous species
icr <- info(c("hematite", "magnetite", "pyrite", "pyrrhotite"))
iaq <- retrieve(metal, c("Cl", "S", "O", "H"), state = "aq")

## First plot: predominance diagram (quasisolubility) overlaid with total solubility contours
species(icr)
species(iaq, logm_metal, add = TRUE)
m <- mosaic(bases, pH = pH, O2 = O2, T = T, P = P, IS = IS)
diagram(m$A.species, bold = species()$state == "cr")

# Make total solubility contours
species(icr)
s <- solubility(iaq, bases = bases, pH = pH, O2 = O2, T = T, P = P, IS = IS, in.terms.of = metal)
levels <- seq(-9, -3, 3)
diagram(s, levels = levels, contour.method = "flattest", add = TRUE, col = "green4", lwd = 2)
# Add a legend
leg_list <- c(
  lTP(T, P),
  lNaCl(1.9),
  lS(Stot)
)
leg_expr <- lex(leg_list)
legend("bottomleft", legend = leg_expr, bg = "white")
title("Total solubility is higher than quasisolubility", font.main = 1)

## Second plot: activities of aqueous species in solubility calculation
basis("O2", -46)
s <- solubility(iaq, bases = bases, pH = pH, T = T, P = P, IS = IS, in.terms.of = metal)
diagram(s, col = "green4", lwd = 2, ylim = c(-10, -6))
# Locate the stability boundary between pyrite and magnetite along the the pH axis 
ipH <- which.min(abs(s$loga.equil[[3]] - s$loga.equil[[2]]))
# Show py-mag stability boundary
pH_py_mag <- s$vals$pH[ipH]
abline(v = pH_py_mag, col = 8)
text(pH_py_mag, -6, "pyrite", adj = c(1.2, 1.5), font = 2)
text(pH_py_mag, -6, "magnetite", adj = c(-0.1, 1.5), font = 2)

# Consider each mineral separately to get activities of aqueous speices
species("pyrite")
s_py <- solubility(iaq, bases = bases, pH = pH, T = T, P = P, IS = IS, in.terms.of = metal)
species("magnetite")
s_mag <- solubility(iaq, bases = bases, pH = pH, T = T, P = P, IS = IS, in.terms.of = metal)
# Merge activities for stable regions of each mineral
loga.equil_both <- lapply(1:length(s_py$loga.equil), function(ispecies) {
  c(s_py$loga.equil[[ispecies]][1:(ipH-1)], s_mag$loga.equil[[ispecies]][ipH:res])
})
# Put merged activities into new object for plotting
s_both <- s_py
s_both$loga.equil <- loga.equil_both
diagram(s_both, type = "loga.equil", add = TRUE)

leg_expr <- expr.species("O2", "gas", value = -46, log = TRUE)
legend("bottomleft", legend = leg_expr, bg = "white")
title("Species contributing to total solubility", font.main = 1)
```

<button id="B-quasisolubility" onclick="ToggleDiv('quasisolubility')">Show code</button>
<div id="D-quasisolubility" style="display: none;">
```{r quasisolubility, eval=FALSE, cache=FALSE}
```
</div>


This comparison illustrates how quasisolubility boundaries on classical predominance diagrams systematically underestimate mineral solubility.
The discrepancy between quasisolubility and total solubility is minimized when one aqueous species strongly predominates over all others.
However, in systems where multiple aqueous species contribute significantly to solubility, this mismatch becomes more pronounced.
For such systems, `solubility()` provides more accurate estimates of isosolubility contours.

It is important to note that neither approach satisfies overall mass balance constraints in complex multicomponent systems.
For applications requiring rigorous mass-balance solutions, more sophisticated techniques not currently implemented in CHNOSZ should be considered [@PEHB18; @DK21].

## Advanced Uses

Having seen basic examples of the main features of CHNOSZ, here are some ideas for building more complex calculations and diagrams.

### 1. Use helper functions to create formatted labels for diagrams

Labeling diagrams is an important but often difficult step for creating publication-ready figures.

CHNOSZ provides the `axis.label()` and `expr.species()` functions to create formatted axis labels and chemical formulas.
Let's revisit the `r co2` dissolution example seen earlier and add two other gases (carbon monoxide and methane).
This plot is similar to Figure 18 of @MSS13.

```{r dissolution_logK, fig.cap="Equilibrium constants of dissolution reactions", small_mar = TRUE}
T <- seq(0, 350, 10)
CO2_logK <- subcrt(c("CO2", "CO2"), c("gas", "aq"), c(-1, 1), T = T)$out$logK
CO_logK <- subcrt(c("CO", "CO"), c("gas", "aq"), c(-1, 1), T = T)$out$logK
CH4_logK <- subcrt(c("CH4", "CH4"), c("gas", "aq"), c(-1, 1), T = T)$out$logK
logK <- data.frame(T, CO2_logK, CO_logK, CH4_logK)
matplot(logK[, 1], logK[, -1], type = "l", col = 1, lty = 1,
        xlab = axis.label("T"), ylab = axis.label("logK"))
text(80, -1.7, expr.species("CO2"))
text(240, -2.37, expr.species("CO"))
text(300, -2.57, expr.species("CH4"))
# Add legend
legend <- describe.property("P", "Psat")
legend("topright", legend = legend, bty = "n")
```

CHNOSZ has some other helper functions for labeling diagrams:

- `describe.reaction()` to format chemical reactions from the output of `subcrt()`
- `describe.property()` to format property values as equations (e.g. "*T* = 25 °C")
- `describe.basis()` to format the activities of basis species
- `lT()`, `lP()`, `lTP()`, and related functions for more compact representations of conditions (e.g. "25 °C, 1 bar")

There is no single best approach to formatting text for legends, and sometimes it's easiest to use basic R functions:

- See the documentation for `plotmath()` for general information on formatting mathematical expressions in R.
- `bquote()` is useful for putting values of variables into expressions.
- Some of the examples in this vignette and in the demos use `as.expression()` on the combined output from other functions
  to produce complex legends for the `legend()` function.

### 2. Use <span style="color:green;font-family:monospace;">retrieve()</span> to search species by elements

Want to find all the Al complexes in the database?
List them by calling `retrieve()` with the main element optionally followed by the ligand elements and state.

```{r retrieve_Al_default, results = "show"}
# List aqueous Al species in the default database
iaq <- retrieve("Al", ligands = c("F", "H", "O"), state = "aq")
# Print the first few rows and columns
info(iaq)[1:3, 1:5]
# Use the species index or name in subcrt()
subcrt(iaq[1:3], T = 25)
#subcrt(names(iaq)[1:3], T = 25)  # same as above
```

The result of `retrieve()` can also be used to add species to a diagram; see the example in #3 below.

```{r Al_diagram}
basis(c("Al+3", "H2O", "F-", "H+", "e-"))
species(iaq)
species(names(iaq))  # same as above
```

### 3. Load optional data with <span style="color:red;font-family:monospace;">add.OBIGT()</span>

Perhaps you'd like to use data from older databases that have been superseded by later updates.
The OBIGT vignette briefly summarizes the superseded data for [SUPCRT92](OBIGT.html#optional-SUPCRT92) and [SLOP98](OBIGT.html#optional-SLOP98) [@SLOP98].
Use `add.OBIGT()` to load these old data entries.

```{r add_OBIGT_SLOP98, results = "show"}
add.OBIGT("SLOP98")
iaq_all <- retrieve("Al", ligands = c("F", "H", "O"), state = "aq")
# Use difference between sets to get the added species
info(setdiff(iaq_all, iaq))
```

NOTE: Anhydrous species are commonly used for the revised Helgeson-Kirkham-Flowers (HKF) model.
For example, @SSWS97 reported the properties of AlO~2~^&minus;^, which is available in the optional SLOP98 database.
This species is the anhydrous form of Al(OH)~4~^&minus;^, which is present in the default database (see output above)
using parameters from a more recent compilation for Al species [@TS01].
Because they are effectively the same species, only one form of this species is listed in the default database.
Unless you have a specific reason to compare them, redundant species should not be considered together in an equilibrium calculation.

### 4. Use <span style="color:red;font-family:monospace;">OBIGT()</span> and <span style="color:red;font-family:monospace;">reset()</span> to restore the default database and settings

`OBIGT()` restores the default database without affecting other settings.
`reset()` restores the default database and all other settings in CHNOSZ.
These functions are useful for both interactive use and scripts that compare different versions of data or plots for different systems or conditions.

Let's put items #1--4 together to remake the corundum solubility plot using only species available in SLOP98.
To do this, we use `add.OBIGT()` followed by `retrieve()` to gather the species indices for all Al species, then take only those species sourced from @SSWS97.

```{r corundum_solubility_2, fig.cap="Corundum solubility with species from SLOP98"}
# Add superseded species from SLOP98
add.OBIGT("SLOP98")
# List all aqueous Al species
iaq <- retrieve("Al", state = "aq")
# Keep only species from Shock et al. (1997)
iaq <- iaq[grepl("SSWS97", info(iaq)$ref1)]
# Plot corundum solubility vs pH
basis(c("Al+3", "H2O", "H+", "e-"))
species("corundum")
s <- solubility(iaq, pH = c(2, 10), in.terms.of = "Al+3")
## Alternatively, we could use the species names
#s <- solubility(names(iaq), pH = c(2, 10), in.terms.of = "Al+3")
diagram(s, col = "green4", lwd = 2, ylim = c(-10, -2))
diagram(s, type = "loga.equil", add = TRUE)
legend("topright", c("25 °C", "1 bar"), bty = "n")
# Reset the database for subsequent calculations
reset()
```

### 5. Use <span style="color:red;font-family:monospace;">basis()</span> species to define the compositional space

A common question is: what are the basis species used for?
The basis species define the compositional variables of a system.
The composition of any species that can possibly be formed in that system can be represented by a linear combination of the basis species.
The basis species may also be referred to as thermodynamic components, although a strict definition of the latter does not include charged species.

CHNOSZ requires that the number of basis species is equal to the number of different elements in the basis species (plus charge, if present).
If you were studying the relative stability of F- and OH-complexes with Al, you might be tempted to try this basis definition:

```{r basis_Al_F_OH_1, error=TRUE}
basis(c("Al+3", "F-", "OH-"))
```

According to the message, we don't have enough basis species for the number of elements.
Since the composition of hydroxide is water minus a proton (i.e., OH^&minus;^&nbsp;=&nbsp;`r h2o` &minus; H^+^), we could try this instead:

```{r basis_Al_F_OH_2, error=TRUE}
basis(c("Al+3", "F-", "H+", "H2O"))
```

That's still not enough species.
As is often the case, we need to include a basis species representing oxidation-reduction (redox) reactions,
even if there are no redox reactions between the formed species.
Here are two possible basis definitions that do not give an error.

```{r basis_Al_F_OH_3}
# Use "oxygen" to get oxygen gas (for log fO2 diagrams)
basis(c("Al+3", "F-", "H+", "H2O", "oxygen"))
# Use "e-" to get aqueous electron (for Eh diagrams)
basis(c("Al+3", "F-", "H+", "H2O", "e-"))
```

### 6. Set activities of formed <span style="color:red;font-family:monospace;">species()</span> to define a quasisolubility contour

In order to make a diagram with stability fields for different species, CHNOSZ needs to know about the activities of all the species in the reaction.
The activities of the basis species start with constant values as shown in the output above (`logact` column).
Selected basis species can be assigned to plot axes (with a range of values) in `affinity()`.

NOTE: `logact` is the logarithm of *activity* for aqueous species, solids, and liquids, or logarithm of *fugacity* for gases.
All logarithms in CHNOSZ are common logarithms (R function: `log10()`) unless indicated otherwise.

How about the formed species in the system - that is, the species whose stability fields we want to visualize?
We both list the species and set their activities using `species()`.
The function defaults to activities of 10^-3^ (`logact` of -3) for aqueous species and unit activity (`logact` = 0) for minerals, gases, and liquids.
Let's change this to activities of 10^-6^ for the formed species.

```{r species_logact}
basis(c("Al+3", "F-", "H+", "H2O", "e-"))
iaq <- retrieve("Al", ligands = c("F", "H", "O"), state = "aq")
# Check that the data are from the same source
stopifnot(all(info(iaq)$ref1 == "TS01"))
species(iaq, -6)
```

NOTE: The value of `logact` passed to `species()` defines a quasisolubility contour,
which is less than the total solubility to the extent that more than one aqueous species is abundant
([see above](#caution-quasisolubility-contours-on-predominance-diagrams-underestimate-total-solubility)).

### 7. When to use <span style="font-family:monospace;">add = TRUE</span>

There are two places where you might see `add = TRUE`.
First, in `species()` to add species not already in the list.
Without `add = TRUE`, any existing species are discarded.
Second, in `diagram()` to add to an existing plot.

Let's put items #5--7 together to make a Pourbaix (Eh--pH) diagram for Al with two quasisolubility contours.

```{r Pourbaix_Mn, fig.cap = "Pourbaix diagram for Mn with two quasisolubility contours"}
basis(c("Mn+2", "H+", "H2O", "e-"))
icr <- retrieve("Mn", ligands = c("H", "O"), state = "cr")
iaq <- retrieve("Mn", ligands = c("H", "O"), state = "aq")
# First layer, logact(aq) = -3
species(icr)
species(iaq, add = TRUE)
a <- affinity(pH = c(4, 14), Eh = c(-1, 1), T = 100)
# Use names = NA to avoid plotting labels twice
diagram(a, lty = 2, names = NA)
# Second layer, logact(aq) = -4
species(icr)
species(iaq, -4, add = TRUE)
a <- affinity(pH = c(4, 14), Eh = c(-1, 1), T = 100)
d <- diagram(a, bold = species()$state == "cr", add = TRUE)
# Add water stability limits
water.lines(d, lty = 3, col = 8)
# Add legends
legend("topright", legend = c(lT(100), lP("Psat")), bty = "n")
title = as.expression(quote(log~italic(a)*"Mn(aq)"))
legend("bottomleft", legend = c(-3, -4), lty = c(2, 1), title = title, bty = "n")
```

The shaded areas in the diagram represent water instability regions and are automatically added by `diagram()`.
We use `water.lines()` here to plot the water stability limits with dotted lines.

### 8. Set grid resolution and constant *T*, *P*, or ionic strength in <span style="color:green;font-family:monospace;">affinity()</span>

After defining the basis species and formed species (and their constant activities), you have some choices about what variables to put on the plot,
the grid resolution, and values for a few other variables.
`affinity()` accepts one or more named arguments that specify ranges of variables as `c(min, max)` using the default grid resolution of 256,
or ranges and a custom grid resolution as `c(min, max, res)`.
The number of such arguments is the dimensionality of the final plot.
The grid resolution (`res`) defaults to 256 and can be different for each variable.
The names of the variables can be the formulas of any of the basis species, or `T`, `P`, or `IS` for temperature, pressure, or ionic strength.
These last three default to 25 °C, `r Psat` (the greater of 1 bar or the vapor-liquid saturation pressure for `r h2o`), and 0 mol/kg.

I often start with a low grid resolution to quickly iterate a calculation, then switch to a higher resolution when I'm satisfied with the result.

### 9. Use <span style="color:green;font-family:monospace;">NaCl()</span> to estimate ionic strength from NaCl concentration

Sodium chloride (NaCl) solutions are commonly used reference points for geochemical models.
The `NaCl()` function provides a quick-and-dirty way to estimate ionic strength and activity of chloride (Cl^&minus;^) for a given total amount of NaCl added to 1 kg of `r h2o`.
These values can then be used in setting up a calculation that involves these variables.

This function does not use either the `basis()` or `species()` definitions.
The following example runs a calculation for 0.8 mol/kg NaCl and given *T*, *P*, and pH.
See `demo('sum_S')` for a more fully worked-out example that uses this code [based on a diagram in @SW02].

```{r NaCl, results = "show"}
T <- 300
P <- 1000
pH <- 5
m_NaCl = 0.8
NaCl <- NaCl(m_NaCl = m_NaCl, T = T, P = P, pH = pH)
print(paste("mol NaCl added to 1 kg H2O:", m_NaCl))
print(paste("Ionic strength (mol/kg):", NaCl$IS))
print(paste("Chloride concentration (mol/kg):", NaCl$m_Clminus))
```

### 10. Use <span style="color:green;font-family:monospace;">solubility()</span> to draw total solubility contours

It's possible to to make a series of quasisolubility contours by setting activities of aqueous species (see the [example for Mn above](#when-to-use-add-true)).
A more refined solution that accounts for multiple aqueous species is to use `solubility()` to visualize total solubility contours.
The basic idea is to first load the mineral(s) containing a single metal as the formed `species()`.
Then, list the aqueous species with that metal as the first argument to `solubility()`.
The remaining arguments to the function define the plot variables, just as in `affinity()` and `mosaic()`.

Let's put items #8--10 together to make a set of diagrams for a single metal.
The example here uses Fe; try changing it to Cu, Zn, Pb, Au, or something else!

CHNOSZ generates warning messages about being above the Cp limits for various iron oxyhydroxides.
For the present calculation, the warnings are probably harmless because the predicted set of stable minerals
(pyrite, pyrrhotite, magnetite, and hematite) is consistent with published diagrams.

NOTE: If you see warning messages like this, it's a good idea not to ignore them,
but to consider whether you might be pushing the extrapolations of the Cp equation too far.

```{r solubility, echo=FALSE, fig.cap="Stability diagram for minerals; predominance diagram for aqueous species; composite diagram with quasisolubility contour; diagram with total solubility contours in units of log *m*", fig.margin=FALSE, fig.fullwidth=TRUE, fig.width=16, fig.height=3, cache=TRUE, dpi=1.2*basedpi}
par(mfrow = c(1, 4))

# System definition
metal <- "Fe"
# The concentration to be used for the quasisolubility contour
logm_metal <- -6
T <- 150
P <- "Psat"
Stot <- 1e-3
wt_percent_NaCl <- 10
# Plot variables
res <- 300
pH <- c(0, 14, res)
O2 <- c(-55, -35, res)

# Work out NaCl molality from weight percent
# Convert to permil to get parts out of 1000 g (1 kg) of solution
wt_permil_NaCl <- wt_percent_NaCl * 10
# Divide by molecular weight to get moles of NaCl in 1000 g of solution
moles_NaCl <- wt_permil_NaCl / mass("NaCl")
# Subtract from 1000 g to get mass of H2O
grams_H2O <- 1000 - wt_permil_NaCl
# This gives the moles of NaCl added to 1 kg of H2O
m_NaCl <- 1000 * moles_NaCl / grams_H2O
# Now calculate ionic strength and molality of Cl-
NaCl_res <- NaCl(m_NaCl, T = T, P = P)
IS = NaCl_res$IS
m_Clminus = NaCl_res$m_Clminus

# Setup basis species
basis(c(metal, "H2S", "Cl-", "oxygen", "H2O", "H+"))
basis("H2S", log10(Stot))
basis("Cl-", log10(m_Clminus))
# Define basis species to swap through for mosaic calculation
bases <- c("H2S", "HS-", "HSO4-", "SO4-2")

# Get minerals and aqueous species
icr <- retrieve(metal, c("Cl", "S", "O", "H"), state = "cr")
iaq <- retrieve(metal, c("Cl", "S", "O", "H"), state = "aq")

# Make diagram for minerals
species(icr)
mcr <- mosaic(bases, pH = pH, O2 = O2, T = T, P = P, IS = IS)
diagram(mcr$A.species, bold = TRUE, fill = "#FAEBD788")
diagram(mcr$A.bases, lty = 2, col = 4, col.names = 4, italic = TRUE, add = TRUE)
title(paste(metal, "minerals"), font.main = 1)
# Add a legend
leg_list <- c(
  lTP(T, P),
  lNaCl(m_NaCl),
  lS(Stot)
)
leg_expr <- lex(leg_list)
legend("topright", legend = leg_expr, bty = "n")

# Make diagram for aqueous species
species(iaq)
maq <- mosaic(bases, pH = pH, O2 = O2, T = T, P = P, IS = IS)
diagram(maq$A.species, fill = "#F0F8FF88")
title(paste(metal, "aqueous species"), font.main = 1)

# Make diagram for minerals and aqueous species
species(icr)
species(iaq, logm_metal, add = TRUE)
m <- mosaic(bases, pH = pH, O2 = O2, T = T, P = P, IS = IS)
diagram(m$A.species, bold = species()$state == "cr")
diagram(m$A.bases, lty = 2, col = 4, col.names = 4, italic = TRUE, add = TRUE)
main = bquote("Quasisolubility contour for" ~ log ~ italic(m)*.(metal)*"(aq)" == .(logm_metal))
title(main, font.main = 1)

# Make total solubility contours
species(icr)
s <- solubility(iaq, bases = bases, pH = pH, O2 = O2, T = T, P = P, IS = IS, in.terms.of = metal)
levels <- seq(-12, 9, 3)
diagram(s, levels = levels, contour.method = "flattest", col = "green4")
title("Total solubility contours", font.main = 1)
```

<button id="B-solubility" onclick="ToggleDiv('solubility')">Show code</button>
<div id="D-solubility" style="display: none;">
```{r solubility, eval=FALSE, cache=FALSE}
```
</div>

NOTE: As [explained above](#caution-quasisolubility-contours-on-predominance-diagrams-underestimate-total-solubility),
total solubility is higher than quasisolubility.

There are further examples of Eh--pH composite diagrams (denoting quasisolubility) overlaid with total solubility contours in `demo("Pourbaix")`.

### 11. Use <span style="color:green;font-family:monospace;">convert()</span> for common unit conversions

The `convert()` function offers several unit conversions.
It implements reciprocal conversion between pairs of units, so only the destination unit needs to be specified.
Some simple uses are to convert temperature, pressure, or energy values:

```{r convert}
# Convert 100 degrees C to value in Kelvin
convert(100, "K")
# Convert 273.15 K to value in degrees C
convert(273.15, "C")
# Convert 1 bar to value in MPa
convert(1, "MPa")
# Convert 100 MPa to value in bar
convert(100, "bar")
# Convert 1 cal/mol to value in J/mol
convert(1, "J")
# Convert 10 J/mol to value in cal/mol
convert(10, "cal")
```

Another use of `convert()` is to convert the output of `solubility()` from log activity to ppm, ppb, log&nbsp;ppm, or log&nbsp;ppb.
The following code continues from the example above:

```{r convert_ppm, fig.cap = "Solubility in units of ppm"}
sppm <- convert(s, "ppm")
levels <- c(1e-6, 1e-3, 1e0, 1e3)
diagram(sppm, levels = levels, col = "green4")
```

### 12. Use the transect mode of <span style="color:green;font-family:monospace;">affinity()</span> for synchronized variables

Specify 4 or more values for one or more variables (each variable should have the same number of values, or be set to a constant)
to activate the *transect* mode of `affinity()`.
The *transect* mode allows defining an arbitrary path in multidimensional space.

Here's a simple example:

```{r transect_setup}
basis("CHNOSe")
species(c("NO3-", "NO2-", "NH4+", "NH3"))
a <- affinity(pH = c(6, 8, 6, 8), Eh = c(0.5, 0.5, -0.5, -0.5))
```
```{r transect_results, results="show", cache=FALSE}
# Print pH and Eh values used for calculation
a$vals
# Print affinity values calculated for each species
a$values
```

NOTE: `affinity()` returns dimensionless values of affinity (i.e., *A*/2.303*RT*).

Below we'll see how to convert these to energetic units.

### 13. Choose non-default balancing constraints in <span style="color:green;font-family:monospace;">diagram()</span>

`diagram()` looks for the first basis species that has non-zero coefficients in each of the formed species.
This is called the conserved basis species in the documentation.
The affinity values are then divided by the coefficients of the conserved basis species to give normalized affinities.
This is how "balancing on a metal" is implemented.

- The conserved basis species requires an activity to be assign to calculate affinities.
  For the purposes of stability or predominance diagrams, this is a placeholder value, since the conserved basis species has the same stoichiometry
  in each normalized formation reaction.
- However, choosing a different conserved basis species can greatly affect the geometry of diagrams, owing to a different normalization vector.
- Also, choosing a balancing constraint is important for comparing affinity (or its opposite, non-standard Gibbs energy of reaction) of different reactions.

Let's put items #11--13 together to calculate affinities of organic synthesis reactions in mixed seawater and hydrothermal fluid from the Rainbow vent field
using speciation results from @SC10:

```{r rainbow, echo=FALSE, fig.cap="Affinities of organic synthesis reactions per mole of C, H<sub>2</sub>, or formed species", fig.margin=FALSE, fig.fullwidth=TRUE, fig.width=12, fig.height=3, cache=TRUE, dpi=1.2*basedpi}
basis(c("CO2", "H2", "NH4+", "H2O", "H2S", "H+"))
# Constant activity of CH4 is a simplification of the model
species("CH4", -3)
# Add other organic species with lower activity
species(c("propanoic acid", "adenine", "leucine", "deoxyribose"), -6, add = TRUE)
# Read file with variable conditions
file <- system.file("extdata/misc/SC10_Rainbow.csv", package = "CHNOSZ")
# Use `check.names = FALSE` to preserve the `NH4+` column name
df <- read.csv(file, check.names = FALSE)
# Reverse the rows to get increasing T
df <- df[rev(rownames(df)), ]
# Define six variables on a transect
a <- affinity(T = df$T, CO2 = df$CO2, H2 = df$H2,
              `NH4+` = df$`NH4+`, H2S = df$H2S, pH = df$pH)
# Get T in Kelvin to make unit conversions
T <- convert(a$vals[[1]], "K")
# Convert dimensionless affinity to Gibbs energy in units of J/mol
# nb. this is the same as converting logK to ΔG of reaction
a$values <- lapply(a$values, convert, "G", T)
# Convert J/mol to cal/mol
a$values <- lapply(a$values, convert, "cal")
# Convert cal/mol to kcal/mol
a$values <- lapply(a$values, `*`, -0.001)
# Setup figure
par(mfrow = c(1, 3))
par(cex = 0.9)
# First plot: balance on C
ylab = quote(italic(A)~"(kcal/mol) per C")
diagram(a, ylim = c(-20, 40), ylab  = ylab, col = 1:5)
abline(h = 0, lty = 2, lwd = 2)
# Second plot: balance on H2
ylab = quote(italic(A)~"(kcal/mol) per H2")
diagram(a, balance = "H2", ylim = c(-15, 10), ylab = ylab, col = 1:5)
abline(h = 0, lty = 2, lwd = 2)
# Third plot: no balancing
ylab <- quote(italic(A)~"(kcal/mol) per species")
diagram(a, balance = 1, ylim = c(-100, 100), ylab = ylab, col = 1:5)
abline(h = 0, lty = 2, lwd = 2)
```

<button id="B-rainbow" onclick="ToggleDiv('rainbow')">Show code</button>
<div id="D-rainbow" style="display: none;">
```{r rainbow, eval=FALSE, cache=FALSE}
```
</div>

Although `affinity()` uses all of the variables in the transect, `diagram()` labels the x-axis with only the first variable (temperature).
We obtain three plots:

  1. The reactions are balanced on the first basis species with non-zero coefficients, `r co2`.
     This is the same as normalizing on C, because no other basis species has C.
  2. Balance on a different species, `r h2`.
  3. No balancing constraint (`balance = 1`).
     This just shows the affinity of each reaction as given (that is, per mole of formed species),
     which is how the results were presented by @SC10.

### 14. Calculate adjusted and non-standard Gibbs energy with <span style="color:green;font-family:monospace;">subcrt()</span>

In normal use, `subcrt()` returns *standard* molal properties (G, H, S, Cp, and V) for species in their standard states, defined as unit activity or fugacity.
Two deviations from this default setting are possible: *non-standard* properties for specified activity, and *adjusted* properties for activity coefficients.

First let's look at how *adjusted* properties depend on activity coefficients.
This example uses a particular nonideality model based on @Alb03:

```{r subcrt_nonideal, results = "show"}
# Set the nonideality model
old_ni <- nonideal("Alberty")
# Calculate standard and adjusted Gibbs energy at 25 °C
species <- c("MgH2ATP", "MgHATP-", "MgATP-2")
subcrt(species, T = 25, IS = c(0, 0.25), property = "G")$out
# Restore the original nonideality model
nonideal(old_ni)
```

Notice how logarithms of the activity coefficients (loggam) are more negative for the higher-charged species.
The activity coefficients have a stabilizing effect: *adjusted* Gibbs energies (at *I* > 0) are less than the *standard* Gibbs energies (at *I* = 0).

Now let's change the activities to get *non-standard* properties.

```{r subcrt_affinity, results = "show"}
species <- c("Mg+2", "ATP-4", "MgATP-2")
coeffs <- c(-1, -1, 1)
T <- c(25, 50, 100)
# Drop some columns for more compact output
idrop <- c(2:4, 6:9)
# Unit activity: affinity is opposite of standard Gibbs energy
subcrt(species, coeffs, T = T, logact = c(0, 0, 0))$out[, -idrop]
# Non-unit activity: affinity is opposite of non-standard Gibbs energy
subcrt(species, coeffs, T = T, logact = c(-5, -4, -3))$out[, -idrop]
```

NOTE:

- The `logK`, `G`, `H`, `S`, `V`, and `Cp` columns in the output of `subcrt()` are always *standard* or *adjusted* thermodynamic properties, not non-standard ones.
- Columns for `logQ` and `A` are added if the `logact` argument is provided.
- The `logact` argument specifies the activities of species in the same order as the first argument.
- `A` in the output of `subcrt()` has the same units as `G` (J/mol by default); this differs from `affinity()`, which outputs dimensionless values (*A*/2.303*RT*).

The first call above specifies unit activities of all the species in the reaction.

- **Only** if all species have *unit activities*, then the affinity values (`A` in the output) are the opposite of the *standard* Gibbs energy (`G` in the output).

The second call specifies non-unit activities of the species.

- The result shows that affinity values **in general** are *not the opposite* of standard Gibbs energy.
- Instead, they are the opposite of *non-standard* (or overall) Gibbs energy.

### 15. Calculate non-standard Gibbs energy with <span style="color:green;font-family:monospace;">affinity()</span>

#### And swap basis species, remove formed species, and label reactions

Above we used `subcrt()` to calculate non-standard Gibbs energy, but doing it with `affinity()` can be more convenient for making diagrams.
This example plots non-standard Gibbs energies for hydrogenotrophic methanogenesis, acetate oxidation, and acetate oxidation, and is based on Fig. 4b of @MDS_13.
We combine some of the features described above with new ones for swapping basis species and removing formed species:

- `swap.basis()`: Changes one species for another in an existing basis definition
- `species()` with `delete = TRUE`: Removes one or more species from the set of formed species
- `describe.reaction()`: Formats reactions for plots using the output of `subcrt()`
- `names` and `srt` arguments of `diagram()`: Use supplied reaction labels with rotation

```{r organic, echo=FALSE, fig.cap="Non-standard Gibbs energies of organic reactions as a function of CO<sub>2</sub> fugacity", cache=TRUE}
# Format reactions with describe.reaction()
basis(c("CO2", "H2", "H2O", "H+"))
reactions <- c(
  # hydrogenotrophic methanogenesis
  describe.reaction(subcrt("CH4", 1)$reaction),
  # acetate oxidation
  describe.reaction(subcrt("acetate", -1)$reaction),
  # acetoclastic methanogenesis
  describe.reaction(subcrt(c("acetate", "CH4"), c(-1, 1))$reaction)
)
# Add space before "CO2" to avoid getting cut off to "O2" by the plot limits
reactions[[1]][2][[1]][[2]][[2]][[2]][[3]] <- "       CO"

# System 1: one C species in basis; two C species in formation reactions
basis(c("CO2", "H2", "H2O", "H+"))
# Change CO2 and H2 to gas
basis(c("CO2", "H2"), "gas")
# Set H2 (log fugacity) and pH
basis(c("H2", "pH"), c(-3.92, 7.3))
# Write reactions to form methane (gas) and acetate (aq) at given activity and fugacity
species(c("methane", "acetate"), c(-0.18, -3.4))
# Calculate affinities of formation reactions with varying log fugacity of CO2
a1 <- affinity(CO2 = c(-3, 3), T = 55, P = 50)

# System 2: two C species in basis; three C species in formation reactions
# Swap H2 for CH4 and set fugacity
swap.basis("H2", "CH4")
basis("CH4", "gas")
basis("CH4", -0.18)
# Remove methane as formed species - we only want acetate now
species(1, delete = TRUE)
a2 <- affinity(CO2 = c(-3, 3), T = 55, P = 50)

# Convert from dimensionless affinity to Gibbs energy in kJ/mol
TK <- convert(55, "K")
a1$values[[1]] <- convert(a1$values[[1]], "G", T = TK) / 1000
# Negate to reverse reaction (acetate as a reactant)
a1$values[[2]] <- -convert(a1$values[[2]], "G", T = TK) / 1000
a2$values[[1]] <- -convert(a2$values[[1]], "G", T = TK) / 1000

# Make diagram without balancing constraint (no normalization by C)
diagram(a1, balance = 1, lty = 1, lwd = 2, col = c(4, 2), names = reactions[1:2], ylab = axis.label("DG", prefix = "k"), srt = c(-13.5, 26))
diagram(a2, balance = 1, lwd = 2, col = 3, names = reactions[3], srt = 14, add = TRUE)
abline(h = 0, lty = 2, col = 8)
```

<button id="B-organic" onclick="ToggleDiv('organic')">Show code</button>
<div id="D-organic" style="display: none;">
```{r organic, eval=FALSE, cache=FALSE}
```
</div>

After running the code to make the diagram, we can print the formation reactions for each of the species.
This shows how the two acetate reactions (acetate oxidation and acetoclastic methanogenesis) are implemented by swapping `r h2` and CH~4~ in the basis species.
```{r organic_reactions, results="show"}
a1$species
a2$species
```

NOTE: This example uses `balance = 1` in the call to `diagram()` to prevent normalizating the reactions by a balancing constraint
(i.e., normalization by number of C) in order to reproduce the calculations of @MDS_13.
In most other cases (especially for making relative stability diagrams), this argument should not be used.

### 16. Extract results from the output of <span style="color:green;font-family:monospace;">diagram()</span>

Sometimes it's useful to make further computations on the results of a `diagram()` call.
For example, a system might dominated by a few stable species, but you'd rather visualize the relative stabilities of less stable (i.e., metastable) species.
Here we do this for all the aqueous S species in the OBIGT database, accessed using `retrieve()`.
We use `plot.it = FALSE` to suppress the first plot (which would look like the [Eh--pH diagram above for S](#chemical-affinity-and-stability-diagrams)),
but save the output with `d <- diagram()` to access the identified stable species in `d$predominant`.
After removing these stable species from the system, we recalculate affinities for the remaining metastable species and make a diagram for them.

```{r metastable_sulfur, fig.cap = "Eh--pH diagram for metastable S species"}
basis("CHNOSe")
iaq <- retrieve("S", c("H", "O"), state = "aq")
species(iaq)
a <- affinity(pH = c(0, 14), Eh = c(-0.8, 1.2))
# Save results but don't plot the first diagram
d <- diagram(a, plot.it = FALSE)
# Remove the stable species
istable <- unique(as.numeric(d$predominant))
species(istable, delete = TRUE)
# Make a diagram for the metastable species
a <- affinity(pH = c(0, 14), Eh = c(-0.8, 1.2))
d <- diagram(a, col = 4, col.names = 4, italic = TRUE)
```

Other possibly useful parts of the `diagram()` output are:

- `plotvals`: Affinity for each species, after normalizing by the balancing constraint
- `predominant.values`: Normalized affinity for the predominant species at each point on the diagram
- `names`: Names used for labeling lines or fields (includes formatting for chemical formulas)
- `namesx`, `namesy`: Locations for labels
- `linesout`: x, y coordinates of boundary lines between stability fields

NOTE: If `diagram()` was passed the output of `equilibrate()` or `solubility()`,
then its output contains logarithms of activities instead of dimensionless affinities.

### 17. Writing chemical formulas and counting and summing elements with <span style="color:green;font-family:monospace;">makeup()</span>

`makeup()` is used internally by some functions in CHNOSZ but is also available for the user.
It counts the elements (and charge, if present) in a chemical formula(s) formatted as a character object.
If supplied with a species index in the OBIGT database, it uses their formulas.
Setting `sum = TRUE` in the function call instructs `makeup()` to sum the elemental compositions.

The data frame returned by `makeup()` can be used in `as.chemical.formula()` to generate the character string for a formula.

```{r makeup, results="show"}
# Element count of a species in the database
iCli <- info("carrollite")
makeup(iCli)
# Sum the elements of formulas supplied as character strings
summed_elements <- makeup(c("CO2", "CH4"), sum = TRUE)
# Use the result to write a new chemical formula
as.chemical.formula(summed_elements)
```

### 18: Accessing and changing settings with <span style="color:red;font-family:monospace;">thermo()</span>

All the data used by CHNOSZ - from the thermodynamic data in OBIGT to the basis species defined by the user -
are stored in an object named `thermo` in the package environment.
Sometimes it's useful to peek inside CHNOSZ's memory banks, or more rarely to directly modify them.
The `thermo()` function returns the current value of this object and can also update it.
Here we display the first level structure of `thermo`, then show the structure of the database (`thermo()$OBIGT`) in more detail.

```{r thermo, results="show"}
str(thermo(), max.level = 1)
str(thermo()$OBIGT)
```

Call `thermo()` with a named argument to assign a value.
In this case we change the temperature units for `subcrt()`:

```{r thermo_T.units, results="show"}
# This has the same effect as T.units("K")
thermo("opt$T.units" = "K")
# Return to default
thermo("opt$T.units" = "C")
```

See the help page for the `Berman()` function for a practical example of adding thermodynamic data with the @Ber88 model,
which are stored outside of the OBIGT database.

## Interlude: From Activity to Molality

The default model for activity coefficients uses the extended Debye--Hückel equation with parameters for NaCl-dominated solutions from @HKF81.
The species-species parameters are charge and (for the default `Bdot` model) ion-size parameters used in the HCh program [@SB99].
By contrast, the `bgamma` model uses an extended term parameter that is derived from data of @Hel69, @HKF81, and high-P extrapolations of @MSS13.
The `Alberty` model uses parameters listed in Chapter 3 of @Alb03, which are applicable to relatively low temperatures.
Choose from these models with `nonideal()`.

NOTE: By default, H^+^ is assumed to have unit activity coefficient for any ionic strength.
Enable calculations of activity coefficients for H^+^ by running `thermo("opt$ideal.H" = FALSE)`.

Invoke calculations of activity coefficients by setting the `IS` argument in `subcrt()`, `affinity()`, `mosaic()`, or `solubility()`.
This has the effect of transforming activity to molality in the CHNOSZ workflow.
A set of calculations demonstrating this tranformation is in `test-logmolality.R` in the package test directory.
Key variables affected by this transformation are listed here:

* In `subcrt()`, the `logact` argument stands for log molality of aqueous species and
  calculated values of `G` are the adjusted Gibbs energy at specified ionic strength
  [this is written as &Delta;*G*°(*I*) by @Alb96].

* In `affinity()`, the following stand for log molality of aqueous species:
    + `logact` set by `basis()`
    + `logact` set by `species()`

* In `solubility()` and `equilibrate()`, the following stand for log molality of aqueous species:
    + `loga.balance` (logarithm of total molality of the conserved basis species)
    + `loga.equil` (logarithm of molality of each species).

Because function arguments have static names, we're stuck with `logact` even when it means log molality.
However, `diagram()` automatically changes labels from "log&nbsp;*a*" to "log&nbsp;*m*" when run on the output of `affinity()` with a non-NULL value for `IS`.

## Buffers

Buffers are assemblages of one or more species whose presence constrains the chemical activities (or fugacities) of basis species in a thermodynamic system.
Buffers play a critical role in geochemical modeling by providing realistic constraints on system variables like pH and oxidation state.
This section explores the implementation and application of buffers in CHNOSZ.

### 1. Defining buffers with <span style="color:red;font-family:monospace;">mod.buffer()</span>

The `mod.buffer()` function defines or modifies buffers by specifying the species that constitute the buffer and their activities:

```{r buffer_1}
# View available buffers
thermo()$buffer
# Define a new buffer or modify an existing one
mod.buffer("PPM", c("pyrite", "pyrrhotite", "magnetite"), "cr", 0)
# Buffer made of one species (acetic acid with log activity = -10)
mod.buffer("AC", "acetic acid", "aq", -10)
```

### 2. Retrieving buffered activities with <span style="color:green;font-family:monospace;">affinity(return.buffer = TRUE)</span>

Use `basis()` to associate one or more basis species with a buffer (all the elements in the buffer must be present in the basis species).
Then use `affinity(return.buffer = TRUE)` to calculate and retrieve the buffered activities or fugacities of basis species:

```{r buffer_2}
# Specify basis species
basis(c("FeCHNOS"))
# Associate O2 with the PPM buffer
basis("O2", "PPM")
# Calculate and retrieve the buffered fugacity of O2
a <- affinity(T = 200, P = 2000, return.buffer = TRUE)
# Access the buffered O2 fugacity (log fO2)
log_fO2 <- a$O2  # -44.28
# Calculate buffered fugacities across temperature range
a <- affinity(T = c(200, 400, 11), P = 2000, return.buffer = TRUE)
```

### 3. Working with multiple buffered species (e.g., `r h2s` and `r o2` in PPM)

Some buffers constrain multiple basis species simultaneously:

```{r buffer_3}
# Setup basis species
basis(c("FeS2", "H2S", "O2", "H2O"), c(0, -10, -50, 0))
# Associate both H2S and O2 with the PPM buffer
basis(c("H2S", "O2"), c("PPM", "PPM"))
# Retrieve values for both buffered species
buffer_values <- affinity(T = 300, P = 2000, return.buffer = TRUE)
log_aH2S <- buffer_values$H2S  #  -2.57
log_fO2 <- buffer_values$O2    # -37.16
```

### 4. Using <span style="color:green;font-family:monospace;">diagram(type = )</span> to display buffered activities

The `diagram()` function with the `type` argument can solve for and display activities of basis species.
This example reproduces part of Fig. 6 in @SS95:

```{r buffer_4, fig.cap = "Equilibrium log&nbsp;H<sub>2</sub> fugacity for 10^-6^ activity of HCN or formaldehyde with water, 1 bar of N<sub>2</sub> and 10 bar of CO<sub>2</sub>"}
# Setup a system in terms of gases and liquid water
basis(c("hydrogen", "carbon dioxide", "nitrogen", "water"))
# Use 10 bar of CO2 and 1 bar of other gases (default)
basis("CO2", 1)
# Load aqueous species with given log activity
species(c("HCN", "formaldehyde"), c(-6, -6))
# Calculate affinities to form aqueous species from basis species
a <- affinity(T = c(0, 350), P = 300)
# Create diagram showing H2 activity where affinity = 0
d <- diagram(a, type = "H2", lty = c(2, 3))
legend("bottomright", c("HCN", "formaldehyde"), lty = c(2, 3))
```

See `demo("buffer")` for a fully worked-out example based on the figure in @SS95.

NOTE: This feature works independently from buffers defined in `thermo()$buffer`, but produces equivalent results for certain systems;
see `test-diagram.R` in the package test directory.

### 5. Using *f*`r o2` Buffers in downstream calculations

Redox buffers like QFM, HM, and PPM can be used as inputs for subsequent calculations:

```{r buffer_5, fig.cap = "Gold solubility at 300 °C with PPM buffer for *f*O<sub>2</sub> and *a*H<sub>2</sub>S"}
# Setup system for gold solubility calculation
basis(c("Au", "Fe", "H2S", "H2O", "oxygen", "H+"))
# Apply PPM buffer for fO2 and H2S
basis("O2", "PPM")
basis("H2S", "PPM")
# Calculate gold solubility using the buffered values
species("Au")
iaq <- info(c("Au(HS)2-", "AuHS", "AuOH"))
s <- solubility(iaq, pH = c(2, 8), T = 300, P = 1000)
# Create solubility diagram
diagram(s, ylim = c(-10, -5), col = "green4", lwd = 2)
col <- c("#ED4037", "#F58645", "#0F9DE2")  # Au(HS)2-, AuHS, AuOH
diagram(s, type = "loga.equil", add = TRUE, col = col)
```

### 6. Using buffer calculations in transects with <span style="color:green;font-family:monospace;">affinity()</span>

Buffers can be used in transect calculations to model changes across gradients.
An interesting application is to add a delta to values obtained with `return.buffer = TRUE`:

```{r buffer_6}
# Set basis species
basis(c("Fe", "SiO2", "CO3-2", "H2O", "oxygen", "H+"))
# Calculate log fO2 in QFM buffer across temperature range
basis("O2", "QFM")
T <- seq(600, 1000, 100)
buf <- affinity(T = T, P = 5000, return.buffer = TRUE)
# Use buffered fO2 values in downstream calculations
species(c("CH4", "CO2", "HCO3-", "CO3-2"))
# Set values of pH for transect
pH <- seq(3.8, 4.3, length.out = length(T))
# Adding 2 log units below QFM buffer
a <- affinity(T = T, O2 = buf$O2 - 2, pH = pH, P = 5000)
```

### 7. Calculating the neutral pH of water

Neutral pH at various temperatures and pressures can be determined from the dissociation constant of water:

```{r buffer_7}
# Calculate neutral pH at 300°C and 1000 bar
T <- 300
P <- 1000
neutral_pH <- -subcrt(c("H2O", "H+", "OH-"), c(-1, 1, 1), T = T, P = P)$out$logK/2
```

### 8. Working with mineral pH buffers

Mineral assemblages like K-feldspar--muscovite--quartz (KMQ) can buffer pH:

```{r buffer_8}
# Define the KMQ buffer
mod.buffer("KMQ", c("quartz", "muscovite", "K-feldspar"), "cr", 0)
# Setup the system
basis(c("Al2O3", "quartz", "K+", "H2O", "oxygen", "H+"))
# Set K+ molality for KCl solution
basis("K+", log10(0.5))
# Associate H+ with the KMQ buffer
basis("H+", "KMQ")
# Calculate buffered pH
pH_KMQ <- -affinity(T = 300, P = 1000, return.buffer = TRUE)$`H+`
```

### Comprehensive example: Ore formation environments

Mineral buffers constrain both pH and redox state in geological systems, which in turn control metal solubility and transport in ore-forming fluids.
This example illustrates how different buffers affect gold solubility in hydrothermal systems.

```{r buffer_gold, echo=FALSE, fig.cap="Effects of different buffers on gold solubility", fig.margin=FALSE, fig.fullwidth=TRUE, fig.width=12, fig.height=3, cache=TRUE, dpi=1.2*basedpi}
# Setup the system
basis(c("Au", "Al2O3", "quartz", "Fe", "K+", "Cl-", "H2S", "H2O", "oxygen", "H+"))
# Set log activities for 0.5 m NaCl + 0.1 m KCl solution (assuming ideality for now)
basis("Cl-", log10(0.6))
basis("K+", log10(0.1))
# Set 0.01 m H2S
basis("H2S", -2)
# Setup the KMQ pH buffer
mod.buffer("KMQ", c("quartz", "muscovite", "K-feldspar"), "cr", 0)
basis("H+", "KMQ")
# Define colors for gold species
col <- c("#ED4037", "#F58645", "#0F9DE2", "#22CC88")  # Au(HS)2-, AuHS, AuOH, AuCl2-
# Create temperature transect with HM and PPM redox buffers
temps <- seq(200, 500, 10)
P <- 1000
# Initialize plot layout
par(mfrow = c(1, 4), mar = c(4, 4, 3, 1), font.main = 1, cex.main = 1.1)

# 1. Compare gold species distribution under HM buffer
basis("O2", "HM")
species("Au")
iaq <- info(c("Au(HS)2-", "AuHS", "AuOH", "AuCl2-"))
s_HM <- solubility(iaq, T = temps, P = P)
diagram(s_HM, xlab = "Temperature (°C)", ylab = "log activity", ylim = c(-10, -4),
        col = "green4", lwd = 2, main = "Solubility vs T (KMQ: pH, HM: fO2)")
diagram(s_HM, type = "loga.equil", col = col, add = TRUE)

# 2. Compare gold species distribution under PPM buffer
basis("O2", "PPM")
basis("H2S", "PPM")
s_PPM <- solubility(iaq, T = temps, P = P)
diagram(s_PPM, xlab = "Temperature (°C)", ylab = "log activity", ylim = c(-10, -4),
        col = "green4", lwd = 2, main = "Solubility vs T (KMQ: pH, PPM: fO2 and aH2S)")
diagram(s_PPM, type = "loga.equil", col = col, add = TRUE)

# 3. Gold solubility as function of pH at 350°C under PPM buffer
# Remove KMQ buffer
basis("H+", 0)
T_fixed <- 350
# Calculate solubility across pH range
s_pH <- solubility(iaq, pH = c(2, 8), T = T_fixed, P = P)
diagram(s_pH, xlab = "pH", ylab = "log activity", ylim = c(-10, -4),
        col = "green4", lwd = 2, main = "Solubility vs pH (350 °C, PPM: fO2 and aH2S)")
diagram(s_pH, type = "loga.equil", col = col, add = TRUE)
# Get neutral pH at this temperature
neutral_pH <- -subcrt(c("H2O", "H+", "OH-"), c(-1, 1, 1), T = T_fixed, P = P)$out$logK/2
# Add neutral pH line
abline(v = neutral_pH, lty = 2)
text(neutral_pH + 0.2, -5.5, "neutral pH", srt = 90, cex = 0.8)
# Add KMQ pH buffer line
basis("H+", "KMQ")
pH_KMQ <- -affinity(T = T_fixed, P = P, return.buffer = TRUE)$`H+`
abline(v = pH_KMQ, lty = 3, col = 4)
text(pH_KMQ - 0.2, -5.5, "KMQ", srt = 90, col = 4, cex = 0.8)

# 4. log fO2-pH diagram with dominant gold species at 350°C
# Remove KMQ pH buffer
basis("H+", 0)
# Remove PPM fO2 buffer
basis("O2", 0)
# Use constant aH2S from PPM buffer
a <- affinity(pH = c(2, 8), T = T, P = P, return.buffer = TRUE)
log_aH2S <- unique(a$H2S)
basis("H2S", log_aH2S)
# Load aqueous species for relative stability calculation
species(iaq)
# Define basis species to swap through for mosaic calculation
bases <- c("H2S", "HS-", "SO4-2", "HSO4-")
m <- mosaic(bases = bases, pH = c(2, 8), O2 = c(-32, -24), T = T_fixed, P = P)
a <- m$A.species
fill <- adjustcolor(col, alpha.f = 0.5)
d <- diagram(a, fill = fill, main = "Predominance fO2 vs pH (350°C, PPM: aH2S)")
water.lines(d, lty = 3)
# Add mineral buffer lines
# PPM buffer line
basis("O2", "PPM")
O2_PPM <- affinity(T = T_fixed, P = P, return.buffer = TRUE)$O2
abline(h = O2_PPM, lty = 2, col = "darkred")
text(2.5, O2_PPM, "PPM", adj = c(0, -0.5), col = "darkred", cex = 0.8)
# HM buffer line
basis("O2", "HM")
O2_HM <- affinity(T = T_fixed, P = P, return.buffer = TRUE)$O2
abline(h = O2_HM, lty = 2, col = "darkred")
text(2.5, O2_HM, "HM", adj = c(0, -0.5), col = "darkred", cex = 0.8)
# Add KMQ pH buffer line and neutral pH
abline(v = neutral_pH, lty = 2)
text(neutral_pH + 0.2, -28, "neutral pH", srt = 90, cex = 0.8)
abline(v = pH_KMQ, lty = 2, col = 4)
text(pH_KMQ - 0.2, -28, "KMQ", srt = 90, col = 4, cex = 0.8)
```

<button id="B-buffer_gold" onclick="ToggleDiv('buffer_gold')">Show code</button>
<div id="D-buffer_gold" style="display: none;">
```{r buffer_gold, eval=FALSE, cache=FALSE}
```
</div>

The diagrams show:

1. Temperature dependence of gold species distribution under the hematite-magnetite (HM) buffer [cf. Fig. 2A in @WBM09]
2. Temperature dependence under the pyrite-pyrrhotite-magnetite (PPM) buffer [cf. Fig. 2B in @WBM09]
3. pH dependence under PPM at fixed temperature, with neutral pH and KMQ buffer lines [cf. Fig. 7 in @AZ01]
4. Species predominance in log&nbsp;*f*`r o2`--pH space with redox and pH buffer lines

Note the following limitation:

- `mosaic()` calculations currently aren't supported for basis species that are associated with a buffer.
- Because of this, for the last diagram we precomputed the value of log&nbsp;*a*`r h2s` from the PPM buffer then assigned that value in `basis()`
  to use in the mosaic calculation.

See also:

- `demo("gold")` for chloride and ionic strength effects (plots show molality instead of activity)
- The FAQ for handling the K/Cl activity ratio for the KMQ buffer

## Proteins

Proteins in CHNOSZ are treated differently from other chemical species.
Instead of direct thermodynamic data, CHNOSZ uses amino acid group additivity to calculate the thermodynamic properties of proteins.
This approach requires knowledge of the amino acid composition of each protein.

### 1. Identifying proteins

In CHNOSZ, protein identifiers have a specific format that combines the protein name and organism with an underscore separator,
modeled after UniProt names (e.g., `LYSC_CHICK` for chicken lysozyme C).
This naming convention uniquely identifies each protein in the database.

```{r protein_1}
# Search by name in thermo()$protein
ip1 <- pinfo("LYSC_CHICK")       # Using protein_organism format
ip2 <- pinfo("LYSC", "CHICK")    # Using separate arguments
# Search for the same protein in different organisms
ip3 <- pinfo("MYG", c("HORSE", "PHYCA"))
```

### 2. Adding proteins from FASTA or CSV files

CHNOSZ has a small built-in database of amino acid compositions for selected proteins, but you can expand this by adding proteins from FASTA or CSV files.

```{r protein_2}
# Reading amino acid compositions from a CSV file
file <- system.file("extdata/protein/POLG.csv", package = "CHNOSZ")
aa <- read.csv(file)
# Add the proteins to CHNOSZ and get their indices
iprotein <- add.protein(aa)

# For FASTA files, use the canprot package
fasta_file <- system.file("extdata/protein/rubisco.fasta", package = "CHNOSZ")
aa <- canprot::read_fasta(fasta_file)
iprotein <- add.protein(aa)
```

The `add.protein()` function adds the amino acid compositions to the database and returns the row indices of the added proteins.

### 3. Calculating protein properties

Once proteins are added to the database, you can calculate various properties such as length, formula, and thermodynamic properties.

#### Protein length and formula

```{r protein_3, results = "show"}
# Get protein length (number of amino acids)
pl <- protein.length("LYSC_CHICK")
# Get chemical formula
pf <- protein.formula("LYSC_CHICK")
# Display results
list(length = pl, formula = pf)
```

#### Carbon oxidation state

The average oxidation state of carbon, calculated with `ZC()`, provides insight into the redox state of protein sequences:

```{r protein_4}
# Calculate ZC for a protein
ZC_value <- ZC(protein.formula("LYSC_CHICK"))
# For multiple proteins
proteins <- c("LYSC_CHICK", "MYG_HORSE", "RNAS1_BOVIN")
ZC_values <- ZC(protein.formula(proteins))
```

### 4. Thermodynamic calculations for proteins

CHNOSZ provides several functions for calculating thermodynamic properties of proteins.

#### Standard thermodynamic properties

Calculate standard thermodynamic properties of non-ionized proteins using `subcrt()`:

```{r protein_5, results = "show"}
# Properties of non-ionized protein
subcrt("LYSC_CHICK")$out[[1]][1:6, ]
```

#### Ionization effects

For more accurate calculations, especially in biological systems, protein ionization must be considered [@DLH06].
CHNOSZ handles this through the `ionize.aa()` function, which allows specifying the temperature, pressure and pH conditions:

```{r protein_6}
# Calculate ionization properties
aa <- pinfo(pinfo("LYSC_CHICK"))
charge <- ionize.aa(aa, pH = c(4, 7, 9))
# Calculate heat capacity of ionization
Cp_ionization <- ionize.aa(aa, property = "Cp", pH = 7, T = c(25, 100))
```

### 5. Setting up a chemical system with proteins

To include proteins in a chemical system, first define the basis species, then add proteins to the system:

```{r protein_7}
# Define the basis species with H+
basis("CHNOS+")
# Add proteins to the system
species(c("LYSC_CHICK", "MYG_HORSE", "RNAS1_BOVIN"))
```

### 6. Calculating affinities and equilibrium distributions

#### Affinities of formation reactions

The `affinity()` function accounts for ionization effects when calculating affinities of formation reactions:

```{r protein_8}
# Calculate affinity as a function of pH
basis("CHNOS+")
species(c("LYSC_CHICK", "MYG_HORSE", "RNAS1_BOVIN"))
a1 <- affinity(pH = c(0, 14))
```

For performance optimization, use protein indices directly with the `iprotein` argument to `affinity()`.
This doesn't require proteins to be added with `species()`:

```{r protein_9}
species(delete = TRUE)
ip <- pinfo(c("LYSC_CHICK", "MYG_HORSE", "RNAS1_BOVIN"))
# Set logarithm of activity with loga.protein
a2 <- affinity(pH = c(0, 14), iprotein = ip, loga.protein = -3)

# Check that both methods produce equivalent results
for(i in 1:3) stopifnot(all.equal(a1$values[[i]], a2$values[[i]]))
```

#### Equilibrium distributions

Calculate the relative abundance of proteins in metastable equilibrium using `equilibrate()`.
This example uses averaged amino acid compositions of protein sequences in metagenomes from a temperature and chemical gradient in a hot spring [@DS11]:

```{r protein_10, fig.cap = "Metastable equilibrium of proteins from hot-spring metagenomes"}
# Calculate equilibrium distribution as a function of Eh
basis("CHNOSe")
proteins <- paste("overall", paste0("bison", c("N", "S", "R", "Q", "P")), sep = "_")
ip <- pinfo(proteins)
a <- affinity(Eh = c(-0.35, -0.15), iprotein = ip)
# Normalize by protein length to get residue-equivalent distribution
# Set loga.balance to distribute proteins with total activity of residues equal to 1
e <- equilibrate(a, normalize = TRUE, loga.balance = 0)
col <- c("darkred", "red", "darkgray", "blue", "darkblue")
diagram(e, ylim = c(-4, -2.5), col = col, lwd = 2)
legend("bottomleft", c("High-T reducing", NA, NA, NA, "Low-T oxidizing"),
  lty = 1:5, col = col, title = "Environmental Conditions", inset = c(0.1, 0))
```

The `normalize = TRUE` option is important for proteins because their large size leads to extreme separation of activities in metastable equilibrium.
Normalizing by protein length (calculating per-residue equivalents) compresses the range of relative abundances to be more experimentally realistic.

#### Scaling protein abundances

Use `unitize()` to scale abundances of proteins so that number of residues sums to unity:

```{r protein_11}
# Sample protein data from YeastGFP study
protein <- c("YDL195W", "YHR098C", "YLR208W", "YNL049C", "YPL085W")
abundance <- c(1840, 12200, 21400, 1720, 358)
# Find protein indices in CHNOSZ database
ip <- match(protein, thermo()$protein$protein)
# Get protein lengths
pl <- protein.length(ip)
# Scale protein abundance so total abundance of residues is unity
scaled_abundance <- 10^unitize(log10(abundance), pl)
# Check that sum for residues is unity
stopifnot(all.equal(sum(scaled_abundance * pl), 1))
```

Unit total activity of residues is set by `equilibrate(loga.balance = 0)`, allowing comparison between experimental and predicted abundances:

```{r protein_12, fig.cap = "Unoptimized predicted abundances of proteins compared to experimental abundances, both scaled to unit total activity of residues; dashed line is 1:1 line", small_mar = TRUE}
basis("CHNOSe")
# Make a guess for Eh
basis("Eh", -0.5)
a <- affinity(iprotein = ip)
e <- equilibrate(a, normalize = TRUE, loga.balance = 0)
# Check for unit total abundance of residues
predicted_abundance <- 10^unlist(e$loga.equil)
stopifnot(all.equal(sum(predicted_abundance * pl), 1))
plot(log10(scaled_abundance), log10(predicted_abundance), pch = 19)
# Show 1:1 line
lims <- range(par("usr"))
lines(c(lims[1], lims[2]), c(lims[1], lims[2]), lty = 2)
MAE <- mean(abs(log10(scaled_abundance) - log10(predicted_abundance)))
legend("topleft", paste("MAE =", round(MAE, 2)), bty = "n")
```

### 7. Additional protein analysis

The **canprot** package provides a different interface for calculating `r zc` and other chemical analyses of proteins from their amino acid composition:

```{r protein_13}
# Load canprot package
library(canprot)
# Get amino acid compositions
ip <- pinfo(c("LYSC_CHICK", "MYG_HORSE", "RNAS1_BOVIN"))
aa <- pinfo(ip)
# canprot has Zc(); CHNOSZ has ZC()
Zc(aa)
# Stoichiometric oxygen and water content
nO2(aa)
nH2O(aa)
# Isoelectric point and GRAVY
pI(aa)
GRAVY(aa)
```

### Comprehensive example: Parameter optimization to fit experimental protein abundances

Let's analyze the relative abundances of proteins from the ER-to-Golgi location in *S. cerevisiae* (yeast)
and compare theoretical predictions with experimental measurements from the YeastGFP study [@GHB_03]:

```{r protein_optimization, echo=FALSE, fig.cap = "Optimizing redox potential to fit experimental protein abundances", fig.margin=FALSE, fig.fullwidth=TRUE, fig.width=12, fig.height=3, cache=TRUE, dpi=1.2*basedpi}
# Protein data from YeastGFP study
protein <- c("YDL195W", "YHR098C", "YLR208W", "YNL049C", "YPL085W")
abundance <- c(1840, 12200, 21400, 1720, 358)
# Find protein indices in CHNOSZ database
ip <- match(protein, thermo()$protein$protein)
# Calculate protein lengths
pl <- protein.length(ip)
# Scale experimental abundances to unit total abundance of residues
scaled_logabundance <- unitize(log10(abundance), pl)

# Assign colors from low to high oxidation state (red to blue)
col_in <- c("darkred", "red", "darkgray", "blue", "darkblue")
aa <- pinfo(ip)
Zc_values <- canprot::Zc(aa)
col <- col_in[rank(Zc_values)]

# Setup chemical system with H+ and e-
basis("CHNOSe")
# Calculate affinities as a function of redox potential
a <- affinity(Eh = c(-0.6, 0), iprotein = ip)

# Calculate equilibrium distributions without normalization
e <- equilibrate(a, loga.balance = 0)
# Plot results
par(mfrow = c(1, 4))
diagram(e, ylim = c(-5, -2), col = col, lwd = 2, main = "Without normalization")

# With normalization
e_norm <- equilibrate(a, normalize = TRUE, loga.balance = 0)
diagram(e_norm, ylim = c(-5, -2.5), col = col, lwd = 2, main = "With normalization")

# Calculate and plot mean absolute error as a function of Eh
predicted_logabundances <- sapply(e_norm$loga.equil, c)
MAE <- apply(abs(t(predicted_logabundances) - scaled_logabundance), 2, mean)
par(xaxs = "r", yaxs = "r")
plot(a$vals$Eh, MAE)
imin <- which.min(MAE)
abline(v = a$vals$Eh[imin], lty = 2)
title("Optimized model Eh")

# Show calculated and experimental abundances
optimized_logabundance <- predicted_logabundances[imin, ]
plot(scaled_logabundance, optimized_logabundance, pch = 19, col = col)
# Show 1:1 line
lims <- range(par("usr"))
lines(c(lims[1], lims[2]), c(lims[1], lims[2]), lty = 2)
title("Predicted vs experimental abundance")
# Add legends
legend("topleft", paste("MAE =", round(MAE[imin], 2)), bty = "n")
legend("bottomright", c("High Zc", NA, NA, NA, "Low Zc"), pch = 19, col = rev(col_in))
```

<button id="B-protein_optimization" onclick="ToggleDiv('protein_optimization')">Show code</button>
<div id="D-protein_optimization" style="display: none;">
```{r protein_optimization, eval=FALSE, cache=FALSE}
```
</div>

The diagrams show:

1. Theoretical abundances of proteins calculated with `normalize = FALSE` are highly divergent.
2. The normalization step compresses the range of abundances, making the comparison with experimental data more meaningful.
3. Mean absolute error between logarithms of experimental and predicted abundances, both scaled to unit total abundance of residues.
   The MAE minimizes at the Eh indicated by the vertical line.
4. Comparison of experimental and optimized theoretical relative abundances of proteins (with normalization).

The correspondence between theoretical predictions and experimental measurements depends on
normalization of protein formulas and optimizing physicochemical parameters.
The metastable equilibrium model provides a theoretical framework for predicting how chemical conditions influence relative protein abundances.

### Environmental controls on protein evolution: A case study with CRISPR-Cas systems

Evolution doesn't happen in a vacuum&nbsp;– organisms and their molecular machinery must cope with changing environmental conditions over geological time.
Just as geochemists use relative stability diagrams to predict which minerals are stable under different physicochemical conditions,
we can apply similar thermodynamic principles to understand protein evolution.
The central hypothesis is that environmental variables, such as pH and redox conditions (Eh),
have shaped the amino acid compositions of proteins throughout Earth's history.
This approach becomes especially powerful when comparing not individual proteins, but entire evolutionary lineages.

CRISPR-Cas systems are molecular scissors that bacteria and archaea use as immune systems against viruses,
and which biotechnologists have adapted for precise gene editing.
These systems evolved into six major types (I--VI), each representing an evolutionary branch with multiple representatives across different genomes [@MWI_20].

Rather than comparing individual proteins, we can ask:
which types of CRISPR-Cas systems would be most thermodynamically stable under different environmental conditions?
The following diagram introduces the players by showing the carbon oxidation state (`r zc`) and size of the effector modules;
the effector module combines with CRISPR RNA (crRNA) to form the effector complex that does the actual cutting work on a target DNA sequence.
Class 1 systems tend to have larger effector modules made up of multiple proteins,
which were combined with the `sum_aa()` function from **canprot** before calculating `r zc`.

```{r Cas_Zc, echo=FALSE, fig.cap = "Carbon oxidation state and size of CRISPR-Cas effector modules", cache=TRUE, small_mar = TRUE}
# Read data table
file <- system.file("extdata/protein/Cas/Cas_uniprot.csv", package = "CHNOSZ")
dat <- read.csv(file)
# Use UniProt ID as the file name
ID <- dat$UniProt
# In case UniProt ID is missing, use alternate ID
ID[ID == ""] <- dat$Protein[ID == ""]
# Store ID in data frame
dat$ID <- ID
# Remove missing IDs
dat <- subset(dat, ID != "")
# Keep proteins in effector complexes - the functional units that cut DNA
dat <- subset(dat, Effector)

# Read amino acid compositions
aafile <- system.file("extdata/protein/Cas/Cas_aa.csv", package = "CHNOSZ")
aa <- read.csv(aafile)

# Loop over evolutionary subtypes (I-A, I-B etc.)
subtypes <- unique(dat$Subtype)
effector_aa_list <- lapply(subtypes, function(subtype) {
  # Get the IDs for this evolutionary subtype
  idat <- dat$Subtype == subtype
  ID <- dat$ID[idat]
  # Combine amino acid compositions of all effector proteins in this subtype
  iaa <- aa$ref %in% ID
  all_aa <- aa[iaa, ]
  summed_aa <- canprot::sum_aa(all_aa)
  # Store gene names and IDs for reference
  summed_aa$protein <- paste(all_aa$protein, collapse = ",")
  summed_aa$ref <- paste(all_aa$ref, collapse = ",")
  # Label with the evolutionary subtype
  summed_aa$abbrv <- subtype
  # Return the combined amino acid composition for this evolutionary branch
  summed_aa
})
# Create data frame of amino acid compositions for each evolutionary branch
effector_aa <- do.call(rbind, effector_aa_list)

# Identify the six major evolutionary types (I to VI)
type_names <- c("I", "II", "III", "IV", "V", "VI")
# Map subtypes to their parent types
subtype_type <- sapply(strsplit(effector_aa$abbrv, "-"), "[", 1)
# Assign evolutionary classes (1 or 2) based on system architecture
class <- ifelse(subtype_type %in% c("I", "III", "IV"), 1, 2)

# Color scheme to distinguish evolutionary classes
col1 <- hcl.colors(5, "Peach")[1:3]
col2 <- hcl.colors(5, "Purp")[1:3]
type_col <- c(
  # Class 1 - multi-protein complexes
  I = col1[1], III = col1[2], IV = col1[3],
  # Class 2 - single-protein effectors
  II = col2[1], V = col2[2], VI = col2[3]
)
# Point sizes to show types in each class
type_cex <- c(
  I = 1, III = 1.5, IV = 2,
  II = 1, V = 1.5, VI = 2
)
# Apply colors and sizes to subtypes
subtype_col <- type_col[subtype_type]
subtype_cex <- type_cex[subtype_type]

# Calculate chemical features of effector complexes
Zc <- canprot::Zc(effector_aa)
naa <- protein.length(effector_aa)
# Plot chemical features
plot(naa, Zc, pch = class + 20, cex = subtype_cex, col = subtype_col, bg = subtype_col, xlab = "Number of amino acids", ylab = quote(italic(Z)[C]))
legend("topright", c("I               ", "III", "IV"), pch = 21, pt.cex = type_cex[1:3], col = col1, pt.bg = col1, title = "Class 1               ", cex = 0.95)
legend("topright", c("II", "V", "VI"), pch = 22, pt.cex = type_cex[4:6], col = col2, pt.bg = col2, title = "Class 2", bty = "n", cex = 0.95)
```

<button id="B-Cas_Zc" onclick="ToggleDiv('Cas_Zc')">Show code</button>
<div id="D-Cas_Zc" style="display: none;">
```{r Cas_Zc, eval=FALSE, cache=FALSE}
```
</div>

Each type (I--VI) represents an evolutionary branch containing multiple genome representatives&nbsp;– some branches have just a few members, others have more.
This creates a methodological challenge: how do we fairly assess the relative stability of groups with unequal membership?

The solution lies in CHNOSZ's `rank.affinity()` function, which calculates formation energies for all individual proteins,
ranks them, then finds the average rank for each evolutionary group.
A rescaling step ensures that groups with different numbers of members can be compared fairly.
To see why rescaling is necessary, consider that the average rank of a group with one member is bounded by 1 and `r length(Zc)`
(the total number of genomes in this calculation),
but the average rank of a group with three members is bounded by 2 (the average of 1, 2, and 3) and 41 (the average of 40, 41, and 42).

Rather than representing maximum affinity as in previous diagrams, the resulting stability fields represent maximum average rank of formation affinity after rescaling.
This provides a thermodynamic framework for predicting which CRISPR-Cas types would predominate under different environmental conditions.

NOTE: This code normalizes proteins to single residue equivalents *before* calling `affinity()` by using the `as.residue = TRUE` argument in `add.protein()`.
If we didn't do that, then larger effector complexes would have higher affinity rankings just because of their size.

```{r Cas_stability, echo=FALSE, fig.cap = "Groupwise relative stabilities of effector modules in different types of CRISPR-Cas systems as a function of Eh and pH at 25 °C; dashed lines are water stability limits", cache=TRUE}
# Setup figure
par(mfrow = c(1, 2))

# Setup basis species to represent chemical variables
basis("QEC+")
swap.basis("O2", "e-")

# Load amino acid compositions normalized to one residue equivalent
iprotein <- add.protein(effector_aa, as.residue = TRUE)

# Calculate formation affinities for Class 1 effectors across environmental gradients
aout <- affinity(pH = c(0, 14), Eh = c(-0.8, 0.8), iprotein = iprotein[class == 1])
# Group proteins by evolutionary type for comparative analysis
class_1_groups <- sapply(c("I", "III", "IV"), `==`, subtype_type[class == 1], simplify = FALSE)
# Calculate average rank of formation affinity for each evolutionary group
arank <- rank.affinity(aout, groups = class_1_groups)
# Create stability diagram showing thermodynamically favored types
d <- diagram(arank, fill = col1)
water.lines(d, lty = 2)
title("Class 1", font.main = 1)

# Generate second diagram for Class 2
aout <- affinity(pH = c(0, 14), Eh = c(-0.8, 0.8), iprotein = iprotein[class == 2])
class_2_groups <- sapply(c("II", "V", "VI"), `==`, subtype_type[class == 2], simplify = FALSE)
arank <- rank.affinity(aout, groups = class_2_groups)
d <- diagram(arank, fill = col2)
water.lines(d, lty = 2)
title("Class 2", font.main = 1)
```

<button id="B-Cas_stability" onclick="ToggleDiv('Cas_stability')">Show code</button>
<div id="D-Cas_stability" style="display: none;">
```{r Cas_stability, eval=FALSE, cache=FALSE}
```
</div>

The stability diagrams reveal a compelling pattern: Type III systems dominate at reducing conditions (low Eh).
This finding gains evolutionary significance when we consider that Type III was likely the first CRISPR-Cas system to evolve in Class 1 [@MWK22].
The thermodynamic preference for reducing conditions aligns with the hypothesis that these ancient immune systems arose
when Earth's atmosphere and oceans were more reducing than today.

Another interesting result is that Type I systems appear to be less stable compared to others in Class 1 at low pH.
This could be due to lower frequencies of acidic amino acids in Type I effector modules.
Furthermore, Type II systems (in Class 2) are not visualized as stable relative to other Class 2 systems in this chemical space.
Changes to other physicochemical variables&nbsp;– represented by some combination of `r h2o` and the elements C, N, and O,
as well as temperature and pressure&nbsp;– might be needed to stabilize this group.

The key innovation here&nbsp;– using `rank.affinity()` to compare groups rather than individuals&nbsp;–
opens up possibilities for analyzing any system where evolutionary lineages contain multiple representatives, from enzyme families to entire metabolic pathways.
This groupwise approach to relative stability analysis enriches the geobiologist's toolkit with methods 
for mapping the environmental niches where different protein families achieve maximum thermodynamic stability.

## Further Resources - Demos

Explore demos with `demo(package = "CHNOSZ")`.
You can also use `demos()` to run all the demos without pausing or just one (e.g. `demos("mosaic")`).

### More use cases for <span style="color:green;font-family:monospace;">mosaic()</span>

  - `mosaic`: Speciating more than one set of basis species
  - `sum_S`, `uranyl`: Using summed activities of speciated basis species
  - `comproportionation`: Non-standard Gibbs energy of reaction with speciated basis species
  - `arsenic`, `copper`: More examples of Eh--pH diagrams
  - `sphalerite`, `contour`, `minsol`: Solubility calculations with speciated basis species

### Solubility contours with <span style="color:green;font-family:monospace;">solubility()</span>

  - `Pourbaix`: Isosolubility lines for various metals (try Fe, Cu, Mn)
  - `contour`: Solubility contours for gold
  - `minsol` Solubility contours for multiple minerals
  - `solubility`: `r co2` and calcite

### Other contour plots

  - `saturation`: Saturation lines (where affinity = 0) and labels for activity ratios
  - `ionize`: Protein ionization properties
  - `TCA`: Citric acid cycle energetics
  - `comproportionation`: Using a color scale (image map)

### Calculations using the output of <span style="color:green;font-family:monospace;">diagram()</span>

  - `buffer`: Place labels next to lines
  - `MgATP`: Calculate number of protons bound per ATP molecule

### Activity buffers

  - `buffer`: Plotting buffers as a function of temperature
  - `DEW`: Applying calculated values of log&nbsp;*f*`r o2` in `affinity()`
  - `gold`: Settting pH and *f*`r o2` buffers in `basis()` for solubility of gold
  - `protbuff`: Using proteins as buffer species

### Other thermodynamic models

  - `DEW`: Deep Earth Water model (extension of HKF to high pressures)
  - `AD`: Akinfiev-Diamond model for aqueous nonelectrolytes

### Calculations with proteins

  - `Shh`: Relative stabilities of transcription factors along redox and water activity gradient
  - `carboxylase`: Predicted rank abundance with varying temperature and redox
  - `rank.affinity`: Average rank of affinity for groupwise stability comparisons

## Further Resources - Vignettes

Additional vignettes cover:

### Frequently asked questions

The [FAQ](FAQ.html) is a non-comprehensive collection of questions and answers about CHNOSZ.

### OBIGT thermodynamic database

The [OBIGT](OBIGT.html) vignette is generated from reference information in the database
and lists all literature citations for species arranged by default and optional data files.

### Customizing the thermodynamic database

The [custom_data](custom_data.html) vignette describes `add.OBIGT()` for adding data from files,
`mod.OBIGT()` for updating or adding parameters of particular species, and `logK.to.OBIGT()` for generating parameters from logK values.

### Fitting thermodynamic data

The [eos-regress](eos-regress.html) vignette shows how to fit experimental data (volume and heat capacity) using constructed equation-of-state models.

### Creating multi-metal diagrams

The [multi-metal](multi-metal.html) vignette has some techniques for overcoming the limitation of balancing reactions on a single metal.

## Getting Help

R provides convenient access to documentation in a local browser:

- If you haven't installed CHNOSZ yet, run `install.packages("CHNOSZ")`
- Run `help.start()` to open a browser window for the R help system
- Choose "Packages" followed by "CHNOSZ"
- Select the help topic for any function to see detailed documentation and examples
- Follow the links at the top of the main page for *demos* (longer examples) and *vignettes* (more in-depth documentation; this document is a vignette)

You can also:

- Browse the package documentation with `help(package = "CHNOSZ")`
  - See function-specific help: `?info`, `?subcrt`, etc.
- Use the [Discussions forum on GitHub](https://github.com/jedick/CHNOSZ/discussions) for questions about CHNOSZ
- Contact the maintainer for bug reports or feature requests
  - Use `maintainer("CHNOSZ")` to get contact info

## Document History

- 2010-09-30 First version, titled "Getting started with CHNOSZ".
- 2017-02-15 Rewritten and switched from Sweave to [knitr](https://yihui.org/knitr/).
- 2025-05-10 Major revision; used AI assistance for [Basic Functionality](#basic-functionality), [Buffers](#buffers), and [Proteins](#proteins).

<p>
```{r the_end}
   ######    ##   ##    ##   ##    ######     #####  #####
 ##         ##===##    ## \\##   ##    ##     \\       //
 ######    ##   ##    ##   ##    ######    #####      #####
```
</p>

```{r sessionInfo, echo = FALSE}
## For finding what versions of packages are on R-Forge and winbuilder
#sessionInfo()
```

## References

