\documentclass[nojss]{jss}

\usepackage{enumitem}
\usepackage{amsmath}
\usepackage{float}

\clubpenalty = 10000
\widowpenalty = 10000
\displaywidowpenalty = 10000

\author{Manuel J. A. Eugster\\{\small
		Ludwig-Maximilians-Universit{\"a}t M{\"u}nchen} \And
        Friedrich Leisch\\{\small Ludwig-Maximilians-Universit{\"a}t
		M{\"u}nchen}}

\title{From Spider-Man to Hero --\\
	Archetypal Analysis in \proglang{R}}

\Plainauthor{Manuel J. A. Eugster, Friedrich Leisch}
\Plaintitle{From Spider-Man to Hero -- Archetypal Analysis in R}
\Shorttitle{Archetypal Analysis in \proglang{R}}
\Keywords{archetypal analysis, convex hull, \proglang{R}}
\Plainkeywords{archetypal analysis, convex hull, R}

\Abstract{
This introduction to the \proglang{R} package \pkg{archetypes} is a
(slightly) modified version of \citet{Eugster+Leisch@2009}, published
in the {\it Journal of Statistical Software}.

\bigskip

Archetypal analysis has the aim to represent observations in a
multivariate data set as convex combinations of extremal points. This
approach was introduced by \citet{Cutler+Breiman@1994}; they defined
the concrete problem, laid out the theoretical foundations and
presented an algorithm written in \proglang{Fortran}. In this paper we
present the \proglang{R} package \pkg{archetypes} which is available
on the Comprehensive \proglang{R} Archive Network. The package
provides an implementation of the archetypal analysis algorithm within
\proglang{R} and different exploratory tools to analyze the algorithm
during its execution and its final result. The application of the
package is demonstrated on two examples.
}

\Address{
  Manuel J. A. Eugster\\
  Department of Statistics\\
  Ludwig-Maximilians-Universit{\"a}t M{\"u}nchen\\
  80539 Munich, Germany\\
  E-mail: \email{Manuel.Eugster@stat.uni-muenchen.de}\\
  URL: \url{http://www.statistik.lmu.de/~eugster/}

  \medskip
  Friedrich Leisch\\
  Department of Statistics\\
  Ludwig-Maximilians-Universit{\"a}t M{\"u}nchen\\
  80539 Munich, Germany\\
  E-mail: \email{Friedrich.Leisch@R-project.org}\\
  URL: \url{http://www.statistik.lmu.de/~leisch/}

}

%%\usepackage{Sweave} %% already provided by jss.cls
%%\VignetteIndexEntry{From Spider-Man to Hero -- Archetypal Analysis in R}
%%\VignetteDepends{archetypes}
%%\VignetteKeywords{archetypal analysis, convex hull, R}
%%\VignettePackage{archetypes}

\SweaveOpts{eps=FALSE, keep.source=TRUE}
<<echo=FALSE,print=FALSE,results=hide>>=
options(width=80, prompt='R> ')
@

\begin{document}

\section[Introduction]{Introduction\label{intro}}

The \citet{merriam-webster:archetype} defines an archetype as {\it the
  original pattern or model of which all things of the same type are
  representations or copies}. The aim of archetypal analysis is to
find ``pure types'', the archetypes, within a set defined in a
specific context. The concept of archetypes is used in many different
areas, the set can be defined in terms of literature, philosophy,
psychology and also statistics. Here, the concrete problem is to find
a few, not necessarily observed, points (archetypes) in a set of
multivariate observations such that all the data can be well
represented as convex combinations of the archetypes. The title of
this article illustrates the concept of archetypes on the basis of
archetypes in literature: the {\it Spider-Man} personality belongs to
the generic {\it Hero} archetype, and archetypal analysis tries to
find this coherence.

In statistics archetypal analysis was first introduced by
\citet{Cutler+Breiman@1994}. In their paper they laid out the
theoretical foundations, defined the concrete problem as a nonlinear
least squares problem and presented an alternating minimizing
algorithm to solve it. It has found applications in different areas,
with recently grown popularity in economics, e.g.,
\citet{Li+Wang+Louviere+Carson@2003} and
\citet{Porzio+Ragozini+Vistocco@2008}. In spite of the rising interest
in this computer-intensive but numerically sensitive method, no
``easy-to-use'' and freely available software package has been
developed yet. In this paper we present the software package
\pkg{archetypes} within the \proglang{R} statistical environment
\citep{R} which provides an implementation of the archetypal analysis
algorithm. Additionally, the package provides exploratory tools to
visualize the algorithm during the minimization steps and its final
result. The newest released version of \pkg{archetypes} is always
available from the Comprehensive R Archive Network at
\url{http://CRAN.R-project.org/package=archetypes}.

The paper is organized as follows: In Section~\ref{algorithm} we
outline the archetypal analysis with its different conceptual
parts. We present the theoretical background as far as we need it for
a sound introduction of our implementation; for a complete explanation
we refer to the original paper. Section~\ref{pkg} demonstrates how to use
\pkg{archetypes} based on a simple artificial data set, with details about
numerical problems and the behavior of the algorithm. Section~\ref{comp}
 presents a simulation study to show how the implementation scales
with numbers of observations, attributes and archetypes. In
Section~\ref{body} we show a real word example -- the archetypes of
human skeletal diameter measurements. Section~\ref{outlook} concludes
the article with future investigations.



\section[Archetypal analysis]{Archetypal analysis\label{algorithm}}

Given is an $n \times m$ matrix $X$ representing a multivariate data
set with $n$ observations and $m$ attributes. For a given $k$ the
archetypal analysis finds the matrix $Z$ of $k$ $m$-dimensional
archetypes according to the two fundamentals:
\begin{enumerate}
	[label=(\arabic{enumi})]

	\item The data are best approximated by convex combinations of the
	archetypes, i.e., they minimize
	\[
		\mbox{RSS} = \|X - \alpha Z^{\top}\|_2
	\]
	with $\alpha$, the coefficients of the archetypes, an $n \times k$
	matrix; the elements are required to be greater equal $0$ and
	their sum must be $1$, i.e., $\sum_{j=1}^{k} \alpha_{ij} = 1$
	with $\alpha_{ij} \geq 0$ and $i = 1, \ldots, n$. $\|\cdot\|_2$
	denotes an appropriate matrix norm.

	\item The archetypes are convex combinations of the data points:
	\[
		Z = X^{\top} \beta
	\]
	with $\beta$, the coefficients of the data set, a $n \times k$
	matrix where the elements are required to be greater equal $0$ and
	their sum must be $1$, i.e., $\sum_{i=1}^{n} \beta_{ji} = 1$
	with $\beta_{ji} \geq 0$ and $j = 1, \ldots, k$.
\end{enumerate}
These two fundamentals also define the basic principles of the
estimation algorithm:
it alternates between finding the best $\alpha$ for given
archetypes $Z$ and finding the best archetypes $Z$ for given
$\alpha$; at each step several convex least squares problems are
solved, the overall $\mbox{RSS}$ is reduced successively.

With a view to the implementation, the algorithm consists of the
following steps:
\begin{enumerate}
	[label=\arabic{enumi}.,ref=\arabic{enumi},itemsep=6pt]

	\item[] Given the number of archetypes $k$:

	\item\label{alg:pre} Data preparation and initialization: scale
	data, add a dummy row (see below) and initialize $\beta$
	in a way that the the constraints are fulfilled to calculate the
	starting archetypes $Z$.

	\item\label{alg:loop} Loop until $\mbox{RSS}$ reduction is
	sufficiently small or the number of maximum iterations is reached:

	\begin{enumerate}
		[label=\arabic{enumi}.\arabic{enumii}.,
		ref=\arabic{enumi}.\arabic{enumii}, itemsep=6pt]

		\item\label{alg:loop-alpha} Find best $\alpha$ for
		the given set of archetypes $Z$: solve $n$ convex least
		squares problems ($i = 1, \ldots, n$)
		\[
			\min_{\alpha_i} \frac{1}{2} \|X_i - Z \alpha_i\|_2
			\mbox{ subject to} \; \alpha_i \geq 0 \;
			\mbox{ and } \; \sum_{j=1}^{k} \alpha_{ij} = 1\mbox{.}
		\]

		\item\label{alg:loop-zt} Recalculate archetypes $\tilde{Z}$:
		solve system of linear equations $X = \alpha \tilde{Z}^{\top}$.

		\item\label{alg:loop-beta} Find best $\beta$ for the
		given set of archetypes $\tilde{Z}$: solve $k$ convex least
		squares problems ($j = 1, \ldots, k$)
		\[
			\min_{\beta_j} \frac{1}{2} \|\tilde{Z}_j - X \beta_j\|_2
			\mbox{ subject to} \; \beta_j \ge 0 \;
			\mbox{ and } \; \sum_{i=1}^{n} \beta_{ji} = 1\mbox{.}
		\]

		\item\label{alg:loop-z} Recalculate archetypes $Z$: $Z = X \beta$.

		\item\label{alg:loop-rss} Calculate residual sum of squares
		$\mbox{RSS}$.
	\end{enumerate}

	\item\label{alg:post} Post-processing: remove dummy row and rescale
	archetypes.
\end{enumerate}
The algorithm has to deal with several numerical problems,
i.e. systems of linear equations and convex least squares
problems. In the following we explain each step in detail.

\paragraph{Solving the convex least squares problems:} In Step
\ref{alg:loop-alpha} and \ref{alg:loop-beta} several convex
least squares problems have to be solved. \citet{Cutler+Breiman@1994}
use a penalized version of the non-negative least squares algorithm by
\citet{Lawson+Hanson@1974} \citep[as general reference
see,e.g.,][]{Luenberger@1984}. In detail, the problems to solve are of
the form  $\|u - Tw\|_2$  with $u, w$ vectors and $T$ a matrix, all of
appropriate dimensions, and the non-negativity and equality
constraints. The penalized version adds an extra element $M$ to $u$
and to each observation of $T$; then
\[
	\|u - Tw\|_2 + M^2 \|1 - w\|_2
\]
is minimized under non-negativity restrictions. For large
$M$, the second term dominates and forces the equality constraint to be
approximately satisfied while maintaining the non-negativity
constraint. The hugeness of the value $M$ varies from problem to
problem and thus can be seen as a hyperparameter of the
algorithm. Default value in the package is $200$.

\paragraph{Solving the system of linear equations:} In Step
\ref{alg:loop-zt} the system of linear equations
\[
	\tilde{Z} = \alpha^{-1} X
\]
has to be solved. A lot of methods exist, one approach is the
Moore-Penrose pseudoinverse which provides an approximated unique
solution by a least squares approach: given the pseudoinverse
$\alpha^{+}$ of $\alpha$,
\[
	\tilde{Z} = \alpha^{+} X\mbox{,}
\]
is solved. Another approach is the usage of $QR$ decomposition:
$\alpha = QR$, where $Q$ is an orthogonal and $R$ an upper triangular
matrix, then
\[
	\tilde{Z} = Q^{\top} X R^{-1}\mbox{,}
\]
is solved. Default approach in the package is the $QR$ decomposition
using the \code{solve()} function.

\paragraph{Calculating the residual sum of squares:} In Step
\ref{alg:loop-rss} the $\mbox{RSS}$ is calculated. It uses the
spectral norm \citep[see, e.g.,][]{Golub+Loan@1996}. The spectral
norm of a matrix $X$ is the largest singular value of $X$ or the
square root of the largest eigenvalue of $X^{*} X$,
\[
	\|X\|_2 = \sqrt{\lambda_{max}(X^{*} X)}\mbox{,}
\]
where $X^{*}$ is the conjugate transpose of $X$.

\paragraph{Avoiding local minima:} \citet{Cutler+Breiman@1994} show
that the algorithm converges in all cases, but not necessarily to a
global minimum. Hence, the algorithm should be started several times
with different initial archetypes. It is important that these are not
too close together, this can cause slow convergence or convergence to
a local minimum.

\paragraph{Choosing the correct number of archetypes:} As in many cases
there is no rule for the correct number of archetypes $k$. A simple
method the determine the value of $k$ is to run the algorithm for
different numbers of $k$ and use the ``elbow criterion'' on the
$\mbox{RSS}$ where a ``flattening'' of the curve indicates the correct
value of $k$.

\paragraph{Approximation of the convex hull:} Through the definition
of the problem, archetypes lie on the boundary of the convex hull of
the data. Let $N$ be the number of data points which define the
boundary of the convex hull, then \citet{Cutler+Breiman@1994} showed:
if $1 < k < N$, there are $k$ archetypes on the boundary which
minimize $\mbox{RSS}$; if $k = N$, exactly the data points which
define the convex hull are the archetypes with $\mbox{RSS} = 0$; and
if $k = 1$, the sample mean minimizes the $\mbox{RSS}$. In practice,
these theoretical results can not always be achieved as we will see in
the following two sections.



\section[Using archetypes]{Using package \pkg{archetypes}\label{pkg}}

The package is loaded into \proglang{R} using the
\code{library()} or \code{require()} command:
<<echo=FALSE>>=
library(archetypes)
@
\begin{Schunk}
\begin{Sinput}
R> library("archetypes")
\end{Sinput}
\begin{Soutput}
Loading required package: nnls
\end{Soutput}
\end{Schunk}
It requires the packages \pkg{nnls} \citep{nnls} for solving the
convex least squares problems.

We use a simple artificial two-dimensional data set to explain the
usage of the implementation, and the behavior of the archetypal
analysis at all. The advantage is that we can imagine the results and
simply visualize them, Section \ref{body} then shows a more realistic
example. Note that in the following the plot commands do not produce
exactly the shown graphics concerning primitives coloring, width,
etc.; to increase readability we have reduced the presented commands
to the significant arguments. E.g., if we use a different plotting
symbol in a scatter plot the corresponding (\code{pch = \ldots}) will be
omitted.
<<eval=FALSE>>=
data("toy")
plot(toy)
@
\begin{figure}[h!t]
	\centering
\setkeys{Gin}{width=3in}
<<fig=TRUE,echo=FALSE,width=3,height=3>>=
data("toy")

par(mar = c(2,2,0,0) + 0.1, ps = 9)
plot(toy, xlab = "", ylab = "", xlim = c(0,20), ylim = c(0,20),
     pch = 19, col = gray(0.7), cex = 0.6)
@
	\caption{Data set {\tt toy}.}
	\label{fig:toy-data}
\end{figure}
Data set \code{toy} consists of the two attributes $x$ and $y$, and
$\Sexpr{nrow(toy)}$ observations. According to the shape of the data,
it seems to be a good idea to apply archetypal analysis with $k = 3$
archetypes.
<<>>=
suppressWarnings(RNGversion("3.5.0"))
set.seed(1986)
a <- archetypes(toy, 3, verbose = TRUE)
@
During the fit, the function reports its improvement and stops after a
maximum number of iterations (default is \code{maxIterations = 100})
or if the improvement is less than a defined value (default is
\verb|minImprovement = sqrt(.Machine$double.eps)|). As basis for our %$
further research, the implementation is a flexible framework where the
problem solving mechanisms of the individual steps can be
exchanged. The default values are the ``original ones'' described
in the previous section (\code{family = archetypesFamily()}). The
result is an \proglang{S}3 \code{archetypes} object,
<<>>=
a
@
containing the three final archetypes:
<<>>=
parameters(a)
@

The \code{plot()} function visualizes archetypes for two-dimensional
data sets; for higher-dimensional data sets parallel coordinates are
used.
<<eval=FALSE>>=
xyplot(a, toy, chull = chull(toy))
xyplot(a, toy, adata.show = TRUE)
@
\begin{figure}[h!t]
	\centering
\setkeys{Gin}{width=6in}
<<fig=TRUE,echo=FALSE,width=6,height=3>>=
par(mfrow = c(1,2), mar = c(2,2,0,0)+0.1, ps = 9)
xyplot(a, toy, chull = chull(toy),
       xlab = "", ylab = "", xlim = c(0, 20), ylim = c(0, 20), cex = 0.6)
xyplot(a, toy, adata.show = TRUE,
       xlab = "", ylab = "", xlim = c(0, 20), ylim = c(0, 20), cex = 0.6)
@
	\caption{Visualization of three archetypes.}
	\label{fig:toy-a}
\end{figure}
The left plot of Figure~\ref{fig:toy-a} shows the archetypes, their
approximation of the convex hull (red dots and lines) and the convex
hull (black dots and lines) of the data. The right plot of
Figure~\ref{fig:toy-a} additionally shows the approximation of the data
through the archetypes and the corresponding $\alpha$ values (green
symbols, and grey connection lines); as we can see, all data points
outside the approximated convex hull are mapped on its boundary. This
plot is based on an idea and \proglang{MATLAB} source code of
\citet{pc:Pailthorpe}.

With \code{saveHistory = TRUE} (which is set per default) each step of
the execution is saved and we can examine the archetypes in each
iteration using the \code{a\$history} memento object; the initial archetypes,
for example, are \code{a\$history\$get(0)}. This can be used to
create an ``evolution movie'' of the archetypes,
<<eval=FALSE>>=
movieplot(a, toy)
@
\begin{figure}[H]
	\centering
\setkeys{Gin}{width=6in}
<<fig=TRUE,echo=FALSE,width=6,height=3>>=
par(mfrow = c(2, 4), mar = c(0, 0, 0, 0) + 0.1, ps = 9)
movieplot(a, toy, xlim = c(0, 20), ylim = c(0, 20), cex = 0.6, axes = FALSE,
          postfn = function(iter) {
            box()
            text(1, 19, paste(iter + 1, ".", sep = ""), cex = 1)
          })
@
	\caption{Visualization of the algorithm
          iterations.}
	\label{fig:toy-movieplot}
\end{figure}
The figure shows the plots of the seven steps (the random
initialization and the six iterations) from top to  bottom and left
to right. In each step the three archetypes move further to the three
corners of the data set. A movie of the approximated data is shown when
setting parameter \code{show = "adata"}\footnote{Real animations are
  as Flash movies along with this paper and available from
  \url{http://www.jstatsoft.org/v30/i08/}.}.

In the previous section we mentioned that the algorithm should be
started several times to avoid local minima. This is done using the
\code{stepArchetypes()} function; it passes all arguments to the
\code{archetypes()} function and additionally has the argument
\code{nrep} which specifies the number of repetitions.
<<>>=
set.seed(1986)
a4 <- stepArchetypes(data = toy, k = 3, verbose = FALSE, nrep = 4)
@
The result is an \proglang{S}3 \code{stepArchetypes} object,
<<>>=
a4
@
where \code{summary()} provides an overview of each repetition by
showing the final residual sum of squares and number of iterations:
<<>>=
summary(a4)
@
There are no huge differences in the residual sum of squares, thus if
there are different local minima then they are all equally good. But
the following plot shows that the repetition starts all nearly found
the same final archetypes (and thus the same local minima),
<<eval=FALSE>>=
xyplot(a4, toy)
@
\begin{figure}[H]
	\centering
\setkeys{Gin}{width=3in}
<<fig=TRUE,echo=FALSE,width=3,height=3>>=
par(mar = c(2, 2, 0, 0) + 0.1, ps = 9)
xyplot(a4, toy, cex = 0.6, xlim = c(0, 20), ylim = c(0, 20),
       xlab = "", ylab = "")
@
	\caption{Visualization of different repetitions.}
	\label{fig:toy-a4}
\end{figure}
However, the model of repetition $3$ has the lowest residual sum of
squares and is the best model:
<<eval=FALSE>>=
bestModel(a4)
@
<<echo=FALSE>>=
print(bestModel(a4), full = FALSE)
@

At the beginning of the example we decided by looking at the data
that three archetypes may be a good choice. Visual inspection does not
necessarly lead to the best choice, and is not an option for
higher-dimensional data sets.
As already mentioned in the previous section, one simple way
to choose a good number of archetypes is to run the algorithm for
different numbers of $k$ and use the ``elbow criterion'' on the
residual sum of squares. The \code{stepArchetypes()} function allows a
vector as value of argument $k$ and executes for each $k_i$ the
\code{archetypes()} function $nrep$ times.
<<echo=FALSE>>=
file <- "as.RData"
if ( file.exists(file) ) {
  load(file = file)
} else {
  set.seed(1986)
  as <- stepArchetypes(data = toy, k = 1:10, verbose = FALSE, nrep = 4)
  save(as, file = file)
}
@
\begin{Schunk}
\begin{Sinput}
R> set.seed(1986)
R> as <- stepArchetypes(data = toy, k = 1:10, verbose = FALSE, nrep = 4)
\end{Sinput}
\begin{Soutput}
There were 23 warnings (use warnings() to see)
\end{Soutput}
\end{Schunk}
The occurred warnings indicate that errors occured during the
execution, in this case, singular matrices in solving the linear
equation system in Step \ref{alg:loop-zt} as from $k = 4$:
\begin{Schunk}
\begin{Sinput}
R> warnings()
\end{Sinput}
\begin{Soutput}
Warnings:
1: In archetypes(..., k = k[i], verbose = verbose) ... :
  k=4: Error in qr.solve(alphas %*% t(alphas)): singular matrix 'a' in solve
2: In archetypes(..., k = k[i], verbose = verbose) ... :
  k=5: Error in qr.solve(alphas %*% t(alphas)): singular matrix 'a' in solve
3: In archetypes(..., k = k[i], verbose = verbose) ... :
  k=5: Error in qr.solve(alphas %*% t(alphas)): singular matrix 'a' in solve
[...]
\end{Soutput}
\end{Schunk}
In these cases the residual sum of squares is \code{NA}:
<<>>=
rss(as)
@
And all errors occured during the first iteration,
<<>>=
t(sapply(as, function(a) sapply(a, '[[', 'iters')))
@
which is an indication for an afflicted random initialisation. If
warnings occur within repetitions for $k_i$ archetypes, the pretended
meaningful solutions (according to the $RSS$) have to be examined
carefully; Section~\ref{ginv} illustrates a pretended meaningful
solution where the plot then uncovers problems. But up to $k = 5$
there is always at least one start with a meaningful result and the
residual sum of squares curve of the best models shows that by the
``elbow criterion'' three or maybe seven is the best number of
archetypes:
<<eval=FALSE>>=
screeplot(as)
@
\begin{figure}[H]
	\centering
\setkeys{Gin}{width=4in}
<<fig=TRUE,echo=FALSE,width=4,height=3>>=
par(mar = c(3, 4, 0.1, 0) + 0.1, ps = 9)
screeplot(as, cex = 0.6, ylim = c(0, 0.08), axes = FALSE)
mtext("Archetypes", side = 1, line = 2)
axis(2, las = 2)
box()
@
	\caption{Screeplot of the residual sum of squares.}
	\label{fig:toy-rssplot}
\end{figure}
We already have seen the three archetypes in detail; the seven
archetypes of the best repetition and their approximation of the data
are:
<<eval=FALSE>>=
a7 <- bestModel(as[[7]])
xyplot(a7, toy, chull = chull(toy))
xyplot(a7, toy, adata.show = TRUE)
@
\begin{figure}[H]
	\centering
\setkeys{Gin}{width=6in}
<<fig=TRUE,echo=FALSE,width=6,height=3>>=
a7 <- bestModel(as[[7]])

par(mfrow = c(1, 2), mar = c(2, 2, 0, 0) + 0.1, ps = 9)
xyplot(a7, toy, chull = chull(toy),
       xlim = c(0, 20), ylim = c(0, 20), cex = 0.6, xlab = "", ylab = "")
xyplot(a7, toy, adata.show = TRUE,
       xlim = c(0, 20), ylim = c(0, 20), cex = 0.6, xlab = "", ylab = "")
@
        \caption{Visualization of seven archetypes.}
        \label{fig:toy-a7}
\end{figure}
The approximation of the convex hull is now clearly visible.


\subsection[Alternative numerical methods]{Alternative numerical
methods\label{ginv}}

As we mentioned in Section \ref{algorithm}, there are many ways to
solve linear equation systems. One other possibility is the
Moore-Penrose pseudoinverse:
<<echo=FALSE>>=
file <- "gas.RData"
if ( file.exists(file) ) {
  load(file = file)
} else {
  set.seed(1986)
  gas <- stepArchetypes(data = toy, k = 1:10,
                        family = archetypesFamily("original",
                        zalphasfn = archetypes:::ginv.zalphasfn),
                        verbose = FALSE, nrep = 4)
  save(gas, file = file)
}
@
\begin{Schunk}
\begin{Sinput}
R> set.seed(1986)
R> gas <- stepArchetypes(data = toy, k = 1:10,
+                        family = archetypesFamily("original",
+                        zalphasfn = archetypes:::ginv.zalphasfn),
+                        verbose = FALSE, nrep = 4)
\end{Sinput}
\begin{Soutput}
Loading required package: MASS
There were 23 warnings (use warnings() to see)
\end{Soutput}
\end{Schunk}
We use the \code{ginv()} function from the \pkg{MASS} package to
calculate the pseudoinverse. The function ignores ill-conditioned
matrices and ``just solves the linear equation system'', but the
\code{archetypes} function throws warnings of ill-conditioned matrices
if the matrix condition number $\kappa$ is bigger than an upper bound
(default is \code{maxKappa = 1000}):
\begin{Schunk}
\begin{Sinput}
R> warnings()
\end{Sinput}
\begin{Soutput}
Warnings:
1: In archetypes(..., k = k[i], verbose = verbose) ... :
  k=4: alphas > maxKappa
2: In archetypes(..., k = k[i], verbose = verbose) ... :
  k=5: alphas > maxKappa
3: In archetypes(..., k = k[i], verbose = verbose) ... :
  k=5: alphas > maxKappa
[...]
\end{Soutput}
\end{Schunk}
In comparison with the $QR$ decomposition, the warnings occured for the
same number of archetypes $k_i$ during the same repetition. In most of
these cases the residual sum of squares is about $12$,
<<>>=
rss(gas)
@
and the randomly chosen initial archetypes ``collapse'' to the center
of the data as we exemplarily see for $k = 9$, $r = 3$:
<<eval=FALSE>>=
movieplot(gas[[9]][[3]], toy)
@
\begin{figure}[H]
	\centering
\setkeys{Gin}{width=6in}
<<fig=TRUE,echo=FALSE,width=6,height=1.5>>=
par(mfrow = c(1, 4), mar = c(0, 0, 0, 0) + 0.1, ps = 9)
movieplot(gas[[9]][[3]], toy, xlim = c(0, 20), ylim = c(0, 20), cex = 0.6,
          axes = FALSE, postfn = function(iter) {
            box()
            text(1, 19, paste(iter + 1, ".", sep = ""), cex = 1)
          })
@
	\caption{Visualization of algorithm iterations until
          ``collapsing''.}
	\label{fig:toy-movieplot-gas}
\end{figure}
The figure shows the four steps (from left to right), the random
initialization and the three iterations until all archetypes are in
the center of the data.

All other residual sum of squares are nearly equivalent to the ones
calculated with $QR$ decomposition. Further investigations would show
that three or maybe seven is the best number of archetypes, and in
case of $k = 3$ nearly the same three points are the best
archetypes. An interesting exception is the case $k = 7, r = 2$; the
residual sum of squares is exactly the same, but not the
archetypes. The plots of the archetypes and their approximation of the
data:
<<eval=FALSE>>=
ga7 <- bestModel(gas[[7]])
xyplot(ga7, toy, chull = chull(toy))
xyplot(ga7, toy, adata.show = TRUE)
@
\begin{figure}[H]
	\centering
\setkeys{Gin}{width=6in}
<<fig=TRUE,echo=FALSE,width=6,height=3>>=
ga7 <- bestModel(gas[[7]])

par(mfrow = c(1, 2), mar = c(2, 2, 0, 0) + 0.1, ps = 9)
xyplot(ga7, toy, chull = chull(toy),
       xlim = c(0, 20), ylim = c(0, 20), cex = 0.6, xlab = "", ylab = "")
xyplot(ga7, toy, adata.show = TRUE,
       xlim = c(0, 20), ylim = c(0, 20), cex = 0.6, xlab = "", ylab = "")
@
\caption{Visualization of seven archetypes; one archetype ``collapsed''.}
        \label{fig:toy-a7-gas}
\end{figure}
Interesting is the one archetype in the center of the data set and
especially the approximation of the data in the right area of
it. As the data are approximated by a linear combination of archetypes
and non-negative $\alpha$, the only possibility for this kind of
approximation is when $\alpha$ for this archetype is always zero:
<<>>=
apply(coef(ga7, 'alphas'), 2, range)
@
As we can see, $\alpha$ of archetype $1$ (column one) is $0$ for all
data points. Theoretically, this is not possible, but ill-conditioned
matrices during the fit process lead to such results in
practice. The occurred warnings (\code{k=7: alphas > max.kappa})
notify that solving the convex least squares problems lead to the
ill-conditioned matrices. Our simulations showed that this behavior
mostly appears when requesting a relatively large number of archetypes
in relation to size of the data set.



\section[Computational complexity]{Computational complexity\label{comp}}

In Section~\ref{intro} we stated that the archetypal analysis
algorithm is computer-intensive; in this section we now present a
simulation to show how the implementation scales with numbers of
observations, attributes and archetypes. In general, the speed of the
algorithm is determined by the efficiency of the convex least squares
method. The penalized non-negative least squares method is quite slow,
but is appealing because it can be used when the number of attributes
is larger than the number of observations \citep[according
to][]{Cutler+Breiman@1994}.

The simulation setup is the following \citep[cf.~the {\it simulation
problem} by][]{Hothorn+Leisch+Zeileis+Hornik@2005}: we consider the
multivariate standard normal distribution as data generating process
(spherical data). A data set is generated for each combination of $m =
5, 10, 25, 50$ attributes, $n = 100, 1000, 10000, 10000$ observations
and $20$ replications in each case. On each data set $k = 1, \ldots,
10$ archetypes are fitted; stop criteria are $100$ iterations or an
improvement less than \verb|sqrt(.Machine$double.eps)|.  %$
Each $m, n, k$ configuration is fitted with randomly chosen initial
archetypes. For each of the $m \times n \times k \times 20$ fits the
computation time and the number of iterations until convergence are
measured.

We present two main results of the simulation: (1) the computation
time per iteration, i.e., focus on the efficiency of the convex least
squares methods; (2) the computation time per fit, where the
convergence behavior matters. The results are presented in
Figure~\ref{fig:rt-iteration} and Figure~\ref{fig:rt-algorithm},
respectively. Both show four panels, one for each possible number of
observations and each panel contains four lines, one for each possible
number of attributes. The $x$-axis is the number of archetypes and the
$y$-axis the measured value; notice that the scale of the $y$-axis
varies between the panels.

Figure~\ref{fig:rt-iteration} shows the computation time per iteration
in seconds, i.e., the computation time per fit divided by the number
of iterations until convergence. In general the behavior is as
expected: in case of fixed $n$ and $m$ (each individual line),
increasing number of archetypes imply a linear increase of the
computation time. Similarly, in case of fixed $k$ and $m$ (dots of same
color and $x$-axis position, different panels), increasing
observations imply approximately $10$ times more computation time. And
in the case of fixed $k$ and $n$ (dots of different colors, same
$x$-axis position and panel), increasing attributes indicate a
polynomial increase of the computation time. Noticeable is that in
high dimensions ($n = 10000, 100000$ and $m = 50$) two, three and four
archetypes need more computation time than the remaining numbers of
archetypes.

\begin{figure}[h!t]
	\centering
\setkeys{Gin}{width=5in}
\includegraphics{rt-002}
	\caption{Computation time per iteration in seconds.}
	\label{fig:rt-iteration}
\end{figure}

Figure~\ref{fig:rt-algorithm} shows the (median) computation time per
fit in seconds. The figure shows that an increasing number of
observations approximately imply a linear increase of the computation
time. By contrast, for fixed $n$ the computation time is approximately
constant in $k$. Except from $n = 100000$ the peak at two archetypes
is strongly distinct. Currently we have no reasonable explanation and
more detailed simulations need to be done.

\begin{figure}[h!t]
	\centering
\setkeys{Gin}{width=5in}
\includegraphics{rt-003}
	\caption{Computation time per fit in seconds.}
	\label{fig:rt-algorithm}
\end{figure}

\clearpage


\section[Example]{Example: Skeletal archetypes\label{body}}

In this section we apply archetypal analysis on an interesting
real world example: in \citet{Heinz+Peterson+Johnson+Kerk@2003} the
authors took body girth measurements and skeletal diameter
measurements, as well as age, weight, height and gender on $247$ men
and $260$ women in their twenties and early thirties, with a scattering
of older man and woman, and all physically active. The full data are
available within the package as \code{data("body")}, but we are only
interested in a subset, the skeletal measurements and the height (all
measured in centimeters),
<<>>=
data("skel")
skel2 <- subset(skel, select = -Gender)
@

The skeletal measurements consist of nine diameter measurements:
biacromial (\code{Biac}), shoulder diameter; biiliac (\code{Biil}),
pelvis diameter; bitrochanteric (\code{Bitro}) hip diameter; chest
depth between spine and sternum at nipple level, mid-expiration
(\code{ChestDp}); chest diameter at nipple level, mid-expiration
(\code{ChestDiam}); elbow diameter, sum of two elbows
(\code{ElbowDiam}); wrist diameter, sum of two wrists
(\code{WristDiam}); knee diameter, sum of two knees (\code{KneeDiam});
ankle diameter, sum of two ankles (\code{AnkleDiam}). See the original
publication for a full anatomical explanation of the skeletal
measurements and the process of measuring. We use basic elements of
{\it Human Modeling and Animation} to model the skeleton and create a
schematic representation of an individual, e.g.,
\code{skeletonplot(skel2[1, ])} for observation number one. The
function \code{jd()} (for ``John Doe'') uses this plot and shows a
generic individual with explanations of the measurements:
<<eval=FALSE>>=
jd()
@
\begin{figure}[H]
	\centering
\setkeys{Gin}{width=6in}
<<fig=TRUE,echo=FALSE,width=6,height=3.7>>=
par(mar = c(1, 4, 0, 0) + 0.1, ps = 9)
jd()
@
	\caption{Generic skeleton with explanations of the
          measurements.}
        \label{fig:body-jd}
\end{figure}
For visualizing the full data set, parallel coordinates with
axes arranged according to the ``natural order'' are used,
<<eval=FALSE>>=
pcplot(skel2)
@
\begin{figure}[H]
 	\centering
\setkeys{Gin}{width=6in}
<<echo=FALSE>>=
datacol <- rgb(178, 178, 178, maxColorValue = 255,
               alpha = round(255 * 0.2))
@
<<eval=FALSE,fig=TRUE,echo=FALSE,width=6,height=4.5>>=
par(mar = c(5, 0.4, 0, 0.4) + 0.1, ps = 9)
pcplot(skel2, las = 2, col = datacol)
@
<<results=tex,echo=FALSE>>=
png(filename = "body-pcplot-raw.png", units = "px",
    width = 590, height = 430, pointsize = 12)
par(mar = c(5.5, 0.4, 0, 0.4) + 0.1)
pcplot(skel2, las = 2, col = datacol)
graphics.off()
cat("\\includegraphics{body-pcplot-raw.png}\n")
@
        \caption{Parallel coordinates of the {\tt skel2} data set.}
        \label{fig:body-data}
\end{figure}
At first view no patterns are visible and it is difficult to guess a
meaningful number of archetypes. Therefore, we calculate the
archetypes for $k = 1, \ldots, 15$ with three repetitions each
time,
<<echo=FALSE>>=
file <- "bas.RData"
if ( file.exists(file) ) {
  load(file = file)
} else {
  set.seed(1981)
  as <- stepArchetypes(skel2, k = 1:15, verbose = FALSE)
  save(as, file = file)
}
@
\begin{Schunk}
\begin{Sinput}
R> set.seed(1981)
R> as <- stepArchetypes(skel2, k = 1:15, verbose = FALSE, nrep = 3)
\end{Sinput}
\begin{Soutput}
There were 12 warnings (use warnings() to see)
\end{Soutput}
\end{Schunk}
The warnings indicate ill-conditioned matrices as from $k =
11$, but, not as in the previous section, the residual sum of squares
contains no \code{NA} values. The corresponding curve of the best
model in each case is:
<<eval=FALSE>>=
screeplot(as)
@
\begin{figure}[H]
	\centering
\setkeys{Gin}{width=5in}
<<fig=TRUE,echo=FALSE,width=5,height=1.8>>=
par(mar = c(3, 4, 0.4, 0) + 0.1, ps = 9)
screeplot(as, cex = 0.6, axes = FALSE)
mtext("Archetypes", side = 1, line = 2)
axis(2, las = 2)
box()
@
	\caption{Screeplot of the residual sum of squares.}
	\label{fig:body-rss}
\end{figure}
And according to the ``elbow criterion'' $k = 3$ or maybe $k = 7$ is
the best number of archetypes. Corresponding to Occam's razor we
proceed with three archetypes,
<<>>=
a3 <- bestModel(as[[3]])
@
The three archetypes are (transposed for better readability):
<<>>=
t(parameters(a3))
@
Or as a barplot in relation to the original data:
<<eval=FALSE>>=
barplot(a3, skel2, percentiles = TRUE)
@
\begin{figure}[H]
	\centering
\setkeys{Gin}{width=5in}
<<fig=TRUE,echo=FALSE,width=5,height=4>>=
par(mar = c(5, 4, 0.4, 0) + 0.1, ps = 9)
barplot(a3, skel2, percentiles = TRUE,
        below.compressed.height = 0.4,
        below.compressed.srt = 90)
@
	\caption{Barplot visualizing the three archetypes.}
	\label{fig:body-barplot}
\end{figure}
Archetype 2 (gray) represents individuals which are ``huge'' in all
measurements; on the other hand, archetype 3 (lightgray) represents
individuals which are ``small''. Archetype 1 (darkgray) represents
individuals with average measures except the bitrochanteric and
biiliac -- the meaning of this is best visible when looking at the
data with gender information (men are blue, women are green colored,
with alpha transparency) and the archetypes (red),
<<eval=FALSE>>=
pcplot(a3, skel2, data.col = as.numeric(skel$Gender))
@
\begin{figure}[H]
	\centering
\setkeys{Gin}{width=6in}
<<echo=FALSE>>=
datacol <- c(rgb(0, 205, 0, maxColorValue = 255,
                 alpha = round(255 * 0.2)),
             rgb(0, 0, 255, maxColorValue = 255,
                 alpha=round(255 * 0.2)))
@
<<eval=FALSE,fig=TRUE,echo=FALSE,width=6,height=4.5>>=
par(mar = c(5, 0.4, 0, 0.4) + 0.1, ps = 9)
pcplot(a3, skel2, las = 2, data.col = datacol[skel$Gender])
@
<<results=tex,echo=FALSE>>=
png(filename = "body-pcplot-gender.png", units = "px",
    width = 590, height = 430, pointsize = 12)
par(mar = c(5.5, 0.4, 0, 0.4) + 0.1)
pcplot(a3, skel2, las = 2, data.col = datacol[skel$Gender])
graphics.off()
cat("\\includegraphics{body-pcplot-gender.png}\n")
@
        \caption{Parallel coordinates with colors related to the
          gender and the three archetypes.}
        \label{fig:body-a3}
\end{figure}
Archetype 2 reflects the primary difference between men and women in
body structure -- the comparatively wider hip and pelvis of women.
A verification of this interpretation can be done by looking at the
coefficients $\alpha$ and see how much each archetype contributes to
the approximation of each individual observation. For three
archetypes, a ternary plot is a usable graphical representation
\citep[e.g., package \pkg{vcd} by][]{vcd}:
<<eval=FALSE>>=
ternaryplot(coef(a3, 'alphas'), col = as.numeric(skel$Gender))
@
\begin{figure}[H]
	\centering
\setkeys{Gin}{width=2in}
<<fig=TRUE,echo=FALSE,width=2,height=2>>=
library("vcd")
ternaryplot(coef(a3, 'alphas'), dimnames = 1:3, cex = 0.3,
            col = datacol[skel$Gender], main = NULL,
            labels = "none", grid = FALSE)
grid.text("1", x = 3, y = 3)
@
	\caption{Ternary plot visualizing the $\alpha$, colored
          according to the gender.}
        \label{fig:body-alpha-ternary}
\end{figure}
Clearly, males cluster close to archetype $2$ and women mixes mainly
the first and the third archetype. Therefore, males are primarly
approximated by archetype $2$, women by linear combination of
archetypes $1$ and $3$. For more than three archetypes parallel
coordinates with an axis for each archetype projecting the
corresponding coefficients (in range $[0,1]$) can be used to
investigate the coefficients $\alpha$.

Finally, the \code{skeletonplot()} visualizes the three skeleton
archetypes:
<<eval=FALSE>>=
skeletonplot(parameters(a3))
@
\begin{figure}[H]
	\centering
\setkeys{Gin}{width=6in}
<<fig=TRUE,echo=FALSE,width=6,height=3.7>>=
par(mar = c(1, 4, 0, 0) + 0.1, ps = 9)
skeletonplot(parameters(a3), skel.height = 190)
@
        \caption{Skeleton plot visualizing the three archetypes.}
        \label{fig:body-a3-real}
\end{figure}
The left skeleton visualizes archetype $2$ with the wider hip and
pelvis; the middle skeleton visualizes archetype $1$ which is
``huge'', the right skeleton visualizes archetype $3$ which is
``small'' in all measurements. Assume that we want to automatically
fabricate blue worker overalls from given templates (e.g., paper
patterns). Combinations of archetypes $1$ and $3$ gives overalls in
different sizes which each have similar width at shoulder and
waist. If we add archetype $2$, we get overalls with a wider waist
(compared to the shoulder).



\section[Summary and outlook]{Summary and outlook\label{outlook}}

In Section~\ref{algorithm} we explained the different steps of the
algorithms according to the original paper. However, for each problem
a number of methods exist and the choice of the right method often
depends on the concrete data set. In Section~\ref{pkg} we have already
used the \code{archetypesFamily()} function to use different linear
equations solver, but the function has to be extendend to allow
abritary problem solving blocks for each step of the algorithm. Of
primary interest is the usage of different optimization methods to
solve the optimization problem.

Additionally, the two fundamentals defined in Section~\ref{algorithm}
can be generalized to allow for example arbitrary loss functions,
arbitrary conditions on the coefficients $\alpha$ or arbitrary matrix
norms. As the algorithm strategy is similar to the least squares
formulation of the $k$-means algorithm, this leads to a general
framework for this class of $k$-means-like algorithms
\citep[$k$-means, $k$-median, Fuzzy $k$-means, etc.; see,
e.g.,][]{Steinley@2006}.

Altogether, the short-term goal for the package \pkg{archetypes} was
the implementation of a general framework with interchangeable blocks
for the concrete problem solving mechanisms of each algorithm
step. Now, further work is the design of a clean archetypes family
object, especially with a view to our long-term goal, the
generalization towards the class of $k$-means-like algorithms.


\appendix

\section*{Computational details}

<<print=TRUE,echo=TRUE>>=
sessionInfo()
@

The newest released version of \pkg{archetypes} is always available
from the Comprehensive R Archive Network at
\url{http://CRAN.R-Project.org/package=archetypes}. The development
version is hosted on R-Forge as project \pkg{Archetypal Analysis
(archetypes)} and available from
\url{http://r-forge.r-project.org/projects/archetypes}. The source
code is documented using the Roxygen documentation system for
\proglang{R} \citep{roxygen}. An up-to-date version of this manuscript
is contained in the package as a vignette.


\section*{Acknowledgments}

The authors thank Bernard Pailthorpe, University of Queensland, for
providing his \proglang{MATLAB} code for comparison with our code.


\bibliography{archetypes}

\end{document}
