
\documentclass{article}

%\VignetteIndexEntry{The compare package fundamentals}
%\VignettePackage{compare}

\title{More details about the compare package}
\author{Paul Murrell}

\SweaveOpts{keep.source=TRUE}

\usepackage{Rd}

\begin{document}
\maketitle

<<echo=FALSE>>=
library(compare)

@ 
This document provides more information about how the
\pkg{compare} package works.  It should only be read
after the accompanying vignette ``Introduction to the compare package''.

We will look in more detail at how the \code{compare()}
function works, plus we will explore some of the lower-level
functions that the \code{compare()} function calls.

\subsection*{The \code{"comparison"} class}

The value returned by \code{compare()} is an object of class
\code{"comparison"}.  Importantly, this is not simply a logical
value.  In order to test whether the comparison was successful
overall, for example as the condition in an \code{if} statement,
you must use the \code{isTRUE()} function.

<<>>=
isTRUE(compare(1:10, 1:10))

@ 
The \code{transforms()} function can be used to extract the
vector of transformations from a \code{"comparison"} object.

<<>>=
obj1 <- c("a", "a", "b", "c")
obj1

<<>>=
obj2 <- factor(obj1)
obj2

<<>>=
transforms(compare(obj1, obj2[1:3], allowAll=TRUE))

@ 
\subsection*{Order and persistence of transformations}

The \code{compare()} function performs transformations in a particular
order.  

Any rounding of numeric values, trimming or upper-casing of
strings, sorting or dropping of factor levels, 
reordering of dimensions for arrays, matrices, and tables, 
and reordering
of data frame columns or list components occurs as part of
the test for equality.  This means that these transformations 
are only available if \code{equal=TRUE}, but also means that
these transformations will be applied again after every other transformation
that is attempted.

The remaining transformations occur in the following order:  
coerce, shorten, ignore sort
order, ignore case of names, ignore names altogether, ignore attributes
altogether.  The justification for this order is that sorting only makes
sense once the objects are of the same class and size, attributes are less
fundamental than the core data structure so they should come later, and
names are just a special case of general attributes, so they should come 
earlier.

In general, a transformation is persistent.  For example, if coercion has
been attempted, but the objects being compared are not the same 
after coercion, then sorting the objects will be attempted and \emph{this
sorting will occur on the coerced objects}.  The exception to this rule
is that any transformations applied during a test for equality are
\emph{not} persistent.  This is because tests for equality are repeated
after every other transformation.

The following manufactured example demonstrates these rules about the
 order and persistence of transformations.  
In this case, the comparison object is a different class from the 
model object, the comparison is in a different order from the model,
\emph{and} the comparison needs to be rounded to be equal to the model.

<<>>=
compare(as.numeric(1:10), 
        as.character(10:1 + .1), 
        round=TRUE, coerce=TRUE, ignoreOrder=TRUE)

@ 
First of all, the objects are tested for equality (\code{equal=TRUE} by
default).  In this case, despite the fact that
 \code{round=TRUE}, no rounding occurs because the 
comparison is not numeric.  

The objects are not the same, but \code{coerce=TRUE} so the
comparison object is coerced to a numeric object \emph{and the two objects
are checked again for equality}.  Furthermore, because 
\code{round=TRUE}, the coerced comparison values are rounded as well.

The objects are still not the same, but because \code{ingoreOrder=TRUE}
the comparison 
object is now sorted.  This sorting occurs on the \emph{coerced} but
\emph{not rounded} comparison object because equality
transformations are \emph{not} persistent, but all other 
transformations \emph{are} persistent.  The coerced and sorted comparison
object is then tested for equality with the model object, which again involves
rounding the comparison object, and this time the objects are the same.

The overall result is success
and the transformations that lead to success were coercion,
followed by sorting,
followed by rounding.

If the order imposed by \code{compare()} 
is not appropriate, it is possible to resort to performing
the transformations individually (see the next section).


\section*{Extending the \pkg{compare} package}

The \code{compare()} function is built on a set of functions that perform 
the individual 
transformations.  This means that it should be relatively
straightforward to produce an alternative to \code{compare()} 
by simply reordering the calls to transform functions
\emph{and} it should be relatively easy to extend the comparisons 
by writing new transformation functions.

\subsection*{Standalone comparisons}

The \code{compare()} function is built upon a set of 
standalone comparison functions.  Table \ref{table:comparefuns} 
lists the standalone functions currently available.

\begin{table}
\caption{\label{table:comparefuns}Standalone comparison functions 
upon which the 
compare() function is built.}
\begin{center}
\begin{tabular}{l p{.6\textwidth}}
Name & Description  \\\hline
\code{compareIdentical()} & 
Test whether two objects are identical. \\[2mm]
\code{compareEqual()} & 
Compare whether two objects are equal. \\
& Rounds numeric values, trims leading and trailing whitespace
and ignores case in strings, drops unused levels and ignores
level order in factors, ignores dimension order in arrays (and matrices
and tables), orders columns or components by name (ignoring case)
for lists and data frames. \\[2mm]
\code{compareCoerce()} & 
If necessary, coerce comparison object to class of
model object, then compare for equality. \\
& Orders columns or components by name (ignoring case)
for lists and data frames. \\[2mm]
\code{compareShorten()} &
If necessary, shorten the longer of the model and comparison
objects so that the two objects are the same ``size''. \\
& For arrays, drops entire dimensions, for matrices, forces
comparison to be two-dimensional, and for tables, collapses (sums)
across extra dimensions.
For data frames will drop columns and rows.
Orders columns or components by name (ignoring case)
for lists and data frames. \\[2mm]
\code{compareIgnoreOrder()} &
If necessary, sort both model and comparison objects,
then compare for equality. \\
& Orders columns or components by name (ignoring case)
for lists and data frames. \\[2mm]
\code{compareIgnoreNameCase()} &
If necessary, upper cases name attributes of both model and comparison,
then compare for equality. \\
& For data frames, upper cases rownames as well as column names.
Orders columns or components by name (ignoring case)
for lists and data frames. \\[2mm]
\code{compareIgnoreNames()} &
If necessary, drops name attributes from both model and comparison
then compare for equality. \\
& Orders columns or components by name (ignoring case)
for lists and data frames. \\[2mm]
\code{compareIgnoreAttrs()} &
If necessary, drop all attributes from both model and comparison,
then compare for equality. \\  \hline
\end{tabular}
\end{center}
\end{table}

The \code{compareIdentical()} function is just a wrapper for
\code{identical()} that returns a \code{"comparison"} object
as the result (so that it can play nicely with the other 
comparison functions.

The \code{compareEqual()} function relaxes the comparison
(depending on the arguments it is given) and allows for minor
differences.

All of the other comparison functions call \code{compareIdentical()},
and if that fails, and \code{equal=TRUE}, they call \code{compareEqual()}.
These functions allow individual transformations to be attempted
in isolation.  A basic design feature of these functions is that they
attempt to check whether they need to perform their transformation
before they do it so that only necessary or potentially beneficial 
trasformations are attempted.  For example, if two objects are of the
same class, \code{compareCoerce()} will \emph{not} attempt
to coerce the comparison object.

A more general design feature is that the transformations have been 
broken into separate functions on the basis of orthogonality.  Every 
effort is made to make the transformations independent in the sense
that one transformation does not make any subsequent transformation
impossible or nonsensical.  As an example, a transformation that
sorts objects is completely compatible with later dropping the names
attributes from the objects.

All of these comparison functions are generic so that appropriate
methods can be written for different classes.  There are methods
for the basic vector types and for arrays, matrices, tables, data frames,
and lists.  By having them as standalone functions, new methods
can easily be developed and incorporated.

In general, comparisons involving recursive objects 
(data frames and 
lists) will transform
columns or components instead of or as well as transforming the overall object.
For example, \code{compareCoerce().data.frame} coerces the 
overall comparison object to a data frame \emph{and}
all columns of the comparison to the same classes as the corresponding
columns of the model data frame.

These recursive comparison methods
 should provide the option of reordering the columns or components by name
before performing transformations on the columns or components.

All comparisons other than \code{compareIdentical()} and 
\code{compareEqual()} should return a ``partial'' result as well 
as a full record of transformations and transformed objects.  This
partial result will contain transformations and transformed objects
from the primary transformation for that function, but \emph{not}
any subsequent transformations due to \code{compareEqual()}.
This allows for calling a sequence of standalone comparison functions
without repeatedly recording unsuccessful transformations performed by 
\code{compareEqual()}.
The utility function \code{same()} takes care of this issue 
automatically.

\subsection*{More about the \code{"comparison"} class}

In addition to the overall result and the transformations attempted
during a comparison, the \code{"comparison"} class also 
contains a copy of the transformed model and comparison objects.
These objects provide a record of what the original objects look
like after a \emph{successful} transformation (i.e., when the transformed
objects are equal).

In addition to this information, a set of ``partial'' results
are also stored.  These are the transformations, plus the
transformed model and comparison objects, \emph{without} any 
transformations that were carried out as part of the test for
equality.  These objects are useful following an \emph{unsuccessful}
comparison, as the basis for further transformations.

\subsection*{Combining comparisons}

The \code{compare()} function works by calling the standalone
comparison functions in sequence until a successful result
is achieved (or all possible transformations have been attempted).
The calls are arranged in the following general format:

\begin{verbatim}
comp <- comparisonA(model, object, ...)
if (!comp$result && doComparisonB) {
    comp <- comparisonB(comp$tMpartial,
                        comp$tCpartial,
                        comp$partialTransform,
                        ...)
}
\end{verbatim}

If the first comparison is successful, the full result is returned,
so all transformations, including anything by \code{compareEqual()}
are reported.  However, if the first comparison fails, only the
partial result is passed on;  \code{compareEqual()} will be called
again and will have another chance to try all of its transformations.

This basic pattern can easily be implemented ``by hand'' to produce a specific
subset and a specific ordering of standalone comparisons, or as the basis
of a variation on the \code{compare()} function.  It is also a simple
pattern to follow if new transformations need to be inserted into 
or added onto the current set available in \code{compare()}.

\section*{A worked example}

This section describes the addition of comparison 
methods for \code{"list"} objects
to give an idea of the design considerations that have gone into 
the existing code and as a template for the possible addition of
methods for further classes.  This should be read in conjunction
with the relevant source code.

There is no need for an \code{identical.list()} method.  The default,
which calls \code{identical()} works already for lists and does the
appropriate thing;  tests whether the model and comparison objects
are exactly the same.

The \code{compareEqual()} method differs from \code{compareIdentical()}
because it allows the model and comparison objects to have minor
differences.  In the case of a list, this means that we will allow
two lists to be the equal if they consist of the same number of 
components \emph{and} the names of the components are the same, even 
though they may be in a different order and they may even differ
in terms of case.  Of course, the actual components themselves must 
also be equal, so this function calls \code{compareEqual()} for each
pair of model-component and comparison-component.  For additional
flexibility, the argument \code{recurseFun} actually specifies which
function is called on the pairs of components.  This allows \code{compare()}
to specify itself as the recursion function.  The relaxation in 
terms of order and case of component names is not ``on'' by default;
arguments are provided to enable these relaxations.  NOTE that 
the uppercasing of names and reordering of names is NOT recorded in
the partial results (i.e., these transformations are NOT persistent).
The result is a \code{"multipleComparison"}, which includes not only 
the overall result, but also a breakdown per component of 
which components were equal.

The first step in \code{compareCoerce.list()} is to make sure that 
the comparison object is a list, coercing if necessary.
After that, things run very similarly to \code{compareEqual()};  we 
reorder components by name if possible and necessary, then we 
attempt to compare all components of the model and comparison,
coercing components as necessary.  The one major difference is
that any transformations---coercion and reordering of components---\emph{are}
persistent this time, so become part of the partial result.

The \code{compareShorten()} method starts to get a bit easier.  Again,
we reorder components if necessary, but then all we need to do is drop
any extra components and call the \code{same()} function to do the 
identical/equal check on the resulting model and comparison (which are
now of the same length).

\code{compareIgnoreOrder()} is even easier;  we just reorder the components
as before, then call \code{same()}.  Likewise \code{compareIgnoreNameCase}. 
For \code{compareIgnoreNames()} there's just the additional action of dropping
names attributes \emph{after} ordering by name (in case \emph{some} names
are in common, but other names conflict).

The default \code{compareIgnoreAttrs()} will do for lists.

Finally, for the \code{compare()} function, we need to add the
\code{ignoreComponentOrder} argument so that this can be included in
a general comparison, with this argument defaulting to the current
setting of \code{allowAll}.

\end{document}
