\documentclass[a4paper]{article}
\usepackage[round]{natbib}
\usepackage{boxedminipage} % for compare.tex
\usepackage{hyperref} % for \url
\bibliographystyle{abbrvnat}

\usepackage{Sweave}
%\VignetteIndexEntry{Introduction to the compare package}
%\VignettePackage{compare}

\newcommand{\pkg}[1]{{\bfseries #1}}
\newcommand{\code}[1]{{\ttfamily #1}}
\newcommand{\R}{{\sffamily R}}
\newcommand{\dfn}[1]{\emph{#1}}

\begin{document}

\title{Comparing Non-Identical Objects\\
Introducing the `compare' package}
\author{by Paul Murrell}

\maketitle

The \pkg{compare} package provides functions
for comparing two \R{} objects for equality,
while allowing for a range of ``minor''
differences.  Objects may be reordered, rounded, or resized, they may
have names or attributes removed, or they may 
even be coerced to a new class if necessary in order
to achieve equality.  

The results 
of comparisons report not just whether the objects are the same, but also
include a record of any modifications
that were performed.

This package was developed for the purpose of partially
automating the marking of coursework 
involving \R{} code submissions, so functions are also
provided to convert the results of comparisons into numeric 
grades and to provide feedback for students.

\section*{Motivation}

STATS 220 is a second year university course run  by
the Department of Statistics at the University of 
Auckland.\footnote{\url{http://www.stat.auckland.ac.nz/courses/stage2/\#STATS220}}
The course covers a range of ``Data Technologies'', including
HTML, XML, databases, SQL, and, as a general purpose data
processing tool, \R{}.

In addition to larger assignments, students in the course must complete
short exercises in weekly computer labs.  

For the \R{} section of the course, students must write short pieces
of \R{} code to produce specific \R{} objects.  Figure
\ref{figure:lab} shows two examples of basic, introductory exercises.

\begin{figure*}
\begin{boxedminipage}{\linewidth}
\begin{minipage}[t]{.45\linewidth}
\begin{enumerate}
\item 
Write R code to create the three
{\bfseries vectors} and the {\bfseries factor}
shown below, with names {\tt id}, {\tt age},
{\tt edu}, and {\tt class}.

You should end up with objects that look like this:

\begin{verbatim}
> id
[1] 1 2 3 4 5 6

> age
[1] 30 32 28 39 20 25

> edu
[1] 0 0 0 0 0 0

> class
[1] poor   poor   poor   middle 
[5] middle middle
Levels: middle poor
\end{verbatim}
\end{enumerate}
\end{minipage}\hfill%
\begin{minipage}[t]{.45\linewidth}
\begin{enumerate}
\setcounter{enumi}{1}
\item 
Combine the objects from Question 1 together to make a {\bfseries data
frame} called {\tt IndianMothers}.

You should end up with an object that looks like this:

\begin{verbatim}
> IndianMothers
  id age edu  class
1  1  30   0   poor
2  2  32   0   poor
3  3  28   0   poor
4  4  39   0 middle
5  5  20   0 middle
6  6  25   0 middle
\end{verbatim}
\end{enumerate}
\end{minipage}\hspace*{\fill}
\end{boxedminipage}
\caption{\label{figure:lab}Two simple examples of the exercises
that STATS 220 students are asked to perform.}
\end{figure*}

The students submit their answers to the exercises
as a file containing \R{} code, which means that it is possible to
recreate their answers by calling \code{source()} on the
submitted files.

At this point, the \R{} objects generated by the students' code can be compared
with a set of model \R{} objects in order to establish whether
the students' answers are correct.  

How this comparison occurs is the focus of this article.

\subsection*{Black and white comparisons}

The simplest and most strict test for equality between two objects
in the base \R{} system \citep{R}
is provided by the function \code{identical()}.  This returns 
\code{TRUE} if the two objects are \emph{exactly} the same, otherwise
it returns \code{FALSE}.

The problem with this function is that it is very strict indeed and
will fail for objects that are, for all practical purposes, the same.
The classic example is the comparison of two real (floating-point) values,
as demonstrated in the following code, where differences can arise
simply due to the limitations of how numbers are represented in 
computer memory \citep[see R FAQ 7.31,][]{rfaq}.

\begin{Schunk}
\begin{Sinput}
> identical(0.3 - 0.2, 0.1)
\end{Sinput}
\begin{Soutput}
[1] FALSE
\end{Soutput}
\end{Schunk}
Using the function to test for equality would 
clearly be unreasonably harsh when marking any student answer that
involves calculating a numeric result.

The \code{identical()} function, by itself, is not sufficient for
comparing student answers with model answers.

\subsection*{Shades of grey}

The recommended solution to the problem mentioned above of comparing two 
floating-point values is to use the \code{all.equal()} function.
This function allows for ``insignificant'' differences between
numeric values, as shown below.

\begin{Schunk}
\begin{Sinput}
> all.equal(0.3 - 0.2, 0.1)
\end{Sinput}
\begin{Soutput}
[1] TRUE
\end{Soutput}
\end{Schunk}
This makes \code{all.equal()} a much more appropriate function
for comparing student answers with model answers.

What is less well-known about the \code{all.equal()} function
is that it also works for comparing other sorts of \R{} objects,
besides numeric vectors, \emph{and} that it does more than just
report equality between two objects.  

If the objects being compared
have differences, then \code{all.equal()} does not simply return
\code{FALSE}.  Instead, it returns a character vector 
containing messages that describe the differences between the objects.
The following code gives a simple example, where \code{all.equal()}
reports that the two character vectors have different lengths,
and that, of the two pairs of strings that can be compared, one pair
of strings does not match.

\begin{Schunk}
\begin{Sinput}
> all.equal(c("a", "b", "c"), c("a", "B"))
\end{Sinput}
\end{Schunk}
{\footnotesize
\begin{Schunk}
\begin{Soutput}
[1] "Lengths (3, 2) differ (string compare on first 2)"
[2] "1 string mismatch"                                
\end{Soutput}
\end{Schunk}
} % {\small

This feature is actually very useful for marking student work.
Information about whether a student's answer is correct is useful
for determining a raw mark, but it is also useful to have 
information about what the student did wrong.  This information
can be used
as the basis for assigning partial marks for an answer that
is close to the correct answer, and for providing feedback 
to the student about where marks were lost.

The \code{all.equal()} function has some useful features that
make it a helpful 
tool for comparing student answers with model answers.
However, there is an approach that can perform better than this.

The \code{all.equal()} function looks for equality between two objects
and, if that fails,
provides information about the sort of differences that exist.
An alternative approach, when two objects are not equal, is to 
try to \dfn{transform} the objects to make them equal, and report
on which transformations were necessary in order to achieve equality.

As an example of the difference between these approaches, consider the
two objects below:  a character vector and a factor.

\begin{Schunk}
\begin{Sinput}
> obj1 <- c("a", "a", "b", "c")
> obj1
\end{Sinput}
\begin{Soutput}
[1] "a" "a" "b" "c"
\end{Soutput}
\end{Schunk}
\begin{Schunk}
\begin{Sinput}
> obj2 <- factor(obj1)
> obj2
\end{Sinput}
\begin{Soutput}
[1] a a b c
Levels: a b c
\end{Soutput}
\end{Schunk}
The \code{all.equal()} function reports that these objects are different
because they differ in terms of their fundamental mode---one has
attributes and the other does not---and because
each object is of a different class.

\begin{Schunk}
\begin{Sinput}
> all.equal(obj1, obj2)
\end{Sinput}
\end{Schunk}
{\footnotesize
\begin{Schunk}
\begin{Soutput}
[1] "Modes: character, numeric"                      
[2] "Attributes: < target is NULL, current is list >"
[3] "target is character, current is factor"         
\end{Soutput}
\end{Schunk}
} % {\small

The alternative approach would be to allow various transformations of
the objects to see if they can be transformed to be the same.
The following code shows this approach, which reports that the objects
are equal, if the second one is coerced from a factor to a character
vector.  This is more information than was provided by \code{all.equal()}
and, in the particular case of comparing student answers 
to model answers, it tells us a lot about how close the student
got to the right answer.

\begin{Schunk}
\begin{Sinput}
> library(compare)
> compare(obj1, obj2, allowAll=TRUE)
\end{Sinput}
\begin{Soutput}
TRUE
  coerced from <factor> to <character>
\end{Soutput}
\end{Schunk}
Another limitation of \code{all.equal()} is that it does not
report on some other possible differences between objects.
For example, it is possible for a student to have the correct values
for an \R{} object,
but have the values in the wrong order.  Another common mistake is to get the
case wrong in a set of string values (e.g., in a character vector
or in the \code{names} attribute of an object).

In summary, while \code{all.equal()} provides some desirable features
for comparing student answers to model answers,
we can do better by allowing for a wider range of differences between
objects and by taking a different approach that
attempts to transform the student answer to be the same as the
model answer, if at all possible, while reporting which
transformations were necessary.

The remainder of this article describes the \pkg{compare} package,
which provides functions for producing these sorts of comparisons.

\section*{The \code{compare()} function}

The main function in the \pkg{compare} package is the
\code{compare()} function.  This function checks whether two 
objects are the same and, if they are not, carries out various
transformations on the objects and checks them again to see if
they are the same after they have been transformed.

By default, \code{compare()} only succeeds if the two objects are
identical (using the \code{identical()} function) \emph{or} the two
objects are numeric and they are equal
(according to \code{all.equal()}).  If the objects are not the same,
no transformations
of the objects are considered.  In other words, by default, 
\code{compare()} is simply a convenience wrapper for 
\code{identical()} and \code{all.equal()}.  As a simple example,
the following comparison takes account of the fact that the 
values being compared are numeric and uses \code{all.equal()}
rather than \code{identical()}.

\begin{Schunk}
\begin{Sinput}
> compare(0.3 - 0.2, 0.1)
\end{Sinput}
\begin{Soutput}
TRUE
\end{Soutput}
\end{Schunk}
\subsection*{Transformations}

The more interesting uses of \code{compare()} involve specifying
one or more of the arguments that
allow transformations of the objects that are being compared.
For example, the \code{coerce} argument specifies that the second
argument may be coerced to the class of the first argument.
This allows for more flexible comparisons such as between
a factor and a character
vector.

\begin{Schunk}
\begin{Sinput}
> compare(obj1, obj2, coerce=TRUE)
\end{Sinput}
\begin{Soutput}
TRUE
  coerced from <factor> to <character>
\end{Soutput}
\end{Schunk}

It is important to note that there is a definite order to the
objects;  the \dfn{model} object is given first and the 
\dfn{comparison} object is given second.  Transformations
attempt to make the comparison object like the model object,
though in a number of cases (e.g., 
when ignoring the case of strings) the model object may also
be transformed.  In the example above, the comparison object
has been coerced to be the same class as
the model object.  The following code demonstrates the effect
of reversing the order of the objects in the comparison.  Now
the character vector is being coerced to a factor.

\begin{Schunk}
\begin{Sinput}
> compare(obj2, obj1, coerce=TRUE)
\end{Sinput}
\begin{Soutput}
TRUE
  coerced from <character> to <factor>
\end{Soutput}
\end{Schunk}
Of course, transforming an object is not guaranteed to produce 
identical objects if the original objects are genuinely different.

\begin{Schunk}
\begin{Sinput}
> compare(obj1, obj2[1:3], coerce=TRUE)
\end{Sinput}
\begin{Soutput}
FALSE
  coerced from <factor> to <character>
\end{Soutput}
\end{Schunk}
Notice, however, that even though the comparison failed,
 the result still reports the transformation
that was attempted.  This result indicates that the comparison
object was converted from a factor (to a character vector), but it
\emph{still} did not end up being the same as the model object.

A number of other transformations are available 
in addition to coercion.  For example, differences in length, like
in the last case, can also be ignored.

\begin{Schunk}
\begin{Sinput}
> compare(obj1, obj2[1:3], 
+         shorten=TRUE, coerce=TRUE)
\end{Sinput}
\begin{Soutput}
TRUE
  coerced from <factor> to <character>
  shortened model
\end{Soutput}
\end{Schunk}
It is also possible to allow values to
be sorted, or rounded, or to convert all character values to upper case
(i.e., ignore the case of strings).

Table \ref{table:transforms} provides a complete list of the
transformations that are currently allowed (in version 0.2
of \pkg{compare}) and the arguments
that are used to enable them.  

A further argument to the \code{compare()} function, \code{allowAll},
controls the default setting for most of these transformations, so
specifying \code{allowAll=TRUE} is a quick way of enabling
all possible transformations.  Specific transformations 
can still be \emph{excluded} by explicitly setting the appropriate argument
to \code{FALSE}. 

\begin{table*}
\begin{center}
\caption{\label{table:transforms}Arguments to the \code{compare()}
function that control which transformations are attempted 
when comparing a model object to a comparison object.}
\begin{tabular}{l p{.5\textwidth}}
Argument & Meaning \\ \hline 
\code{equal} & 
Compare objects for ``equality'' as well as ``identity''
(e.g., use \code{all.equal()}
if model object is numeric). \\[2mm]
\code{coerce} & 
Allow coercion of comparison object to class of model object. \\[2mm]
\code{shorten} &
Allow either the model or the comparison to be shrunk so that
the objects have the same ``size''.\\[2mm]
\code{ignoreOrder} & 
Ignore the original order of the comparison and model objects;
allow both comparison object and model object to be sorted.\\[2mm]
\code{ignoreNameCase} &
Ignore the case of the \code{names} attribute for both 
comparison and model objects;  the \code{name} attributes 
for both objects are converted to upper case. \\[2mm]
\code{ignoreNames} &
Ignore any differences in the \code{names} attributes of the
comparison and model objects;  any \code{names} attributes are
dropped. \\[2mm]
\code{ignoreAttrs} &
Ignore all attributes of both the comparison and model objects;
all attributes are dropped.\\[2mm]
\code{round${}^*$} & 
Allow numeric values to be rounded; either \code{FALSE} (the default),
or an integer value giving the number of decimal places for rounding,
or a function of one argument, e.g., \code{floor}. \\[2mm]
\code{ignoreCase${}^*$} & 
Ignore the case of character vectors;  both comparison and model
are converted to upper case. \\[2mm]
\code{trim${}^*$} &
Ignore leading and trailing spaces in character vectors;
leading and trailing spaces are trimmed from both comparison 
and model.\\[2mm]
\code{ignoreLevelOrder${}^*$} &
Ignore original order of levels of factor objects;
the levels of the comparison object are sorted to the order
of the levels of the model object.\\[2mm]
\code{dropLevels${}^*$} & 
Ignore any unused levels in factors;  unused levels are
dropped from both comparison and model objects. \\[2mm]
\code{ignoreDimOrder} & 
Ignore the order of dimensions in array, matrix, or table objects;
the dimensions are reordered by name. \\[2mm]
\code{ignoreColOrder} &
Ignore the order of columns in data frame objects;
the columns in the comparison object are reordered to match the
model object.\\[2mm]
\code{ignoreComponentOrder} &
Ignore the order of components in a list object;
the components are reordered by name. \\
\hline
\multicolumn{2}{l}{${}^*$These transformations only occur if 
\code{equal=TRUE}}\\
\end{tabular}
\end{center}
\end{table*}

The \code{equal} argument is a bit of a special case because
it is \code{TRUE} by default, whereas almost all others
are \code{FALSE}.  The \code{equal} argument is also especially influential
because objects are compared after every transformation and this 
argument controls
what sort of comparison takes place.
Objects are always compared using \code{identical()} first,
which will only succeed if the objects have exactly
the same representation in memory.  If the test using
\code{identical()} fails and \code{equal=TRUE}, then a more
lenient comparison is also performed.
By default, this just means that numeric values are compared
using \code{all.equal()}, but various other arguments can extend this
to allow things like differences in case for character values (see the
asterisked arguments in Table \ref{table:transforms}).

The \code{round} argument is also special because  it always defaults to
\code{FALSE}, even if \code{allowAll=TRUE}.
This means that the \code{round} argument
must be specified explicitly in order to enable rounding.  
The default is set up this way because the value of 
the \code{round} argument is either \code{FALSE} or an 
integer value specifying the 
number of decimal places to round to.  For this argument, the value
\code{TRUE} corresponds to rounding to zero decimal places.

Finally, there is an additional argument \code{colsOnly} for comparing
data frames.  This argument controls whether transformations 
are only applied to columns (and not to rows).  For example, 
by default, a data frame will only allow columns to be dropped,
but not rows, if \code{shorten=TRUE}.  
Note, however, that \code{ignoreOrder} means ignore the order of
\emph{rows} for data frames and \code{ignoreColOrder} must be used
to ignore the order of columns in comparisons involving data frames.

\subsection*{The \code{compareName()} function}

The \code{compareName()} function offers a slight variation on the
\code{compare()} function.  

For this function, only the \emph{name} of
the comparison object is specified, rather than an explicit object.
The advantage of this is that it
 allows for variations in case in the names of objects.  For 
example, a student might create a variable called \code{indianMothers}
rather than the desired \code{IndianMothers}.  This 
case-insensitivity is enabled via
the \code{ignore.case} argument.

Another advantage of this function is that it is possible to specify,
via the \code{compEnv} argument,
a particular environment to search within for the comparison object
(rather than just the current workspace).  This becomes useful
when checking the answers from several students because each student's
answers may be generated within a separate environment in order to avoid
any interactions between code from different students.

The following code shows a simple demonstration of this function,
where a comparison object is created within a temporary environment
and the name of the comparison object is upper case when it should 
be lowercase.

\begin{Schunk}
\begin{Sinput}
> tempEnv <- new.env()
> with(tempEnv, X <- 1:10)
> compareName(1:10, "x", compEnv=tempEnv)
\end{Sinput}
\begin{Soutput}
TRUE
  renamed object
\end{Soutput}
\end{Schunk}
Notice that, as with the transformations in \code{compare()},
 the \code{compareName()} 
function records whether it needed to ignore the case
of the name of the comparison object.

\subsection*{A pathological example}

This section shows a manufactured example that demonstrates 
some of the flexibility of the \code{compare()} function.

We will compare two data frames that have a number of 
simple differences.  The model object is a data frame
with three columns:  a numeric vector, a character vector,
and a factor.

\begin{Schunk}
\begin{Sinput}
> model <- 
+     data.frame(x=1:26, 
+                y=letters, 
+                z=factor(letters),
+                row.names=letters,
+                stringsAsFactors=FALSE)
\end{Sinput}
\end{Schunk}
The comparison object contains essentially the same
information, except that there is an extra column, the
column names are uppercase rather than lowercase, the columns
are in a different order,
the \code{y} variable is a factor rather than a 
character vector, and the \code{z} variable is a
character variable rather than a factor.  The
\code{y} variable and the row names are also uppercase rather than lowercase.

\begin{Schunk}
\begin{Sinput}
> comparison <- 
+     data.frame(W=26:1,
+                Z=letters,
+                Y=factor(LETTERS), 
+                X=1:26, 
+                row.names=LETTERS,
+                stringsAsFactors=FALSE)
\end{Sinput}
\end{Schunk}
The \code{compare()} function can detect that these
two objects are essentially the same as long as
we reorder the columns (ignoring the case of the column names),
coerce the \code{y} and \code{z} variables, drop
the extra variable, ignore the case of the \code{y} variable,
and ignore the case of the row names.

\begin{Schunk}
\begin{Sinput}
> compare(model, comparison, allowAll=TRUE)
\end{Sinput}
\begin{Soutput}
TRUE
  renamed
  reordered columns
  [Y] coerced from <factor> to <character>
  [Z] coerced from <character> to <factor>
  shortened comparison
  [Y] ignored case
  renamed rows
\end{Soutput}
\end{Schunk}

Notice that we have used \code{allowAll=TRUE} to allow \code{compare()}
to attempt all possible transformations at its disposal.

\section*{Comparing files of \R{} code}

Returning now to the original motivation for the \pkg{compare} package,
the \code{compare()} function provides an excellent basis for determining
not only whether a student's answers are correct, but also how much
incorrect answers differ from the model answer.

As described earlier, submissions by students in the STATS 220 course consist
of files of \R{} code.  Marking these submissions consists of using 
\code{source()} to run the code, then comparing the resulting objects
with model answer objects.
With approximately 100 students
in the STATS 220 course, with weekly labs, and with multiple questions per lab,
each of which may contain more than one \R{} object, there is 
a reasonable marking burden.
Consequently, there is a strong incentive to automate
as much
of the marking process as possible.  

\subsection*{The \code{compareFile()} function}

The \code{compareFile()} function can be used to run \R{} code from a 
specific file and compare the results with a set of model answers.
This function
requires three pieces of 
information:  the name of a file containing the ``comparison code'', 
which is run
within a local environment,
using \code{source()}, to generate the comparison values; 
a vector of ``model names'', which are the names of the objects that will 
be looked for in the local environment after the comparison code has been run;
and the model answers, either as the name of a binary file to 
\code{load()}, or as the name of a file of \R{} code to 
\code{source()}, or as a list object containing the ready-made
model answer objects.

Any argument to \code{compare()} may also be included in the call.

Once the comparison code has been run, 
\code{compareName()} is called for each of the model names and the result
is a list of 
\code{"comparison"} objects.

As a simple demonstration, consider
the basic questions shown in Figure \ref{figure:lab}.
The model names in this case are the following:

\begin{Schunk}
\begin{Sinput}
> modelNames <- c("id", "age", 
+                 "edu", "class", 
+                 "IndianMothers")
\end{Sinput}
\end{Schunk}
One student's submission for this exercise is in a file called 
\code{student1.R}, within a directory called \code{Examples}.  
The model answer is in a file called \code{model.R}
in the same directory.
We can evaluate this student's submission and compare it to the
model answer with the following code:

\begin{Schunk}
\begin{Sinput}
> compareFile(file.path("Examples", 
+                       "student1.R"),
+             modelNames,
+             file.path("Examples", 
+                       "model.R"))
\end{Sinput}
\begin{Soutput}
$id
TRUE

$age
TRUE

$edu
TRUE

$class
FALSE

$IndianMothers
FALSE
  object not found
\end{Soutput}
\end{Schunk}
This provides a strict check and shows that the student got the first
three problems correct, but the last two wrong.  In fact, the student's
code completely failed to generate an object with the name 
\code{IndianMothers}.

We can provide extra
arguments to allow transformations of the student's 
answers, as in the following code:

\begin{Schunk}
\begin{Sinput}
> compareFile(file.path("Examples", 
+                       "student1.R"),
+             modelNames,
+             file.path("Examples", 
+                       "model.R"),
+             allowAll=TRUE)
\end{Sinput}
\begin{Soutput}
$id
TRUE

$age
TRUE

$edu
TRUE

$class
TRUE
  reordered levels

$IndianMothers
FALSE
  object not found
\end{Soutput}
\end{Schunk}
This shows that, although the student's answer for the \code{class}
object was not perfect, it was pretty close;  it just had the 
levels of the factor in the wrong order.

\subsection*{The \code{compareFiles()} function}

The \code{compareFiles()} function builds on \code{compareFile()}
by allowing a vector of comparison file names.  This allows a whole
set of student submissions to be tested at once.
The result of this function is a list of lists of 
\code{"comparison"} objects and a special print method provides
a simplified view of this result.

Continuing the example from above,
the \code{Examples} directory contains submissions from a further
four students.  We can compare all of these submissions
with the model answers and produce a summary of the results with
a single call to \code{compareFiles()}. The appropriate code
and output are shown in Figure
\ref{figure:comparefiles}.

\begin{figure*}
\begin{Schunk}
\begin{Sinput}
> files <- list.files("Examples", 
+                     pattern="^student[0-9]+[.]R$",
+                     full.names=TRUE)
> results <- compareFiles(files,
+                         modelNames,
+                         file.path("Examples", "model.R"),
+                         allowAll=TRUE,
+                         resultNames=gsub("Examples.|[.]R", "", files))
> results
\end{Sinput}
\end{Schunk}
{\small
\begin{Schunk}
\begin{Soutput}
         id    age   edu   class                                     IndianMothers         
student1 TRUE  TRUE  TRUE  TRUE reordered levels                     FALSE object not found
student2 TRUE  TRUE  TRUE  TRUE                                      TRUE                  
student3 TRUE  TRUE  TRUE  TRUE coerced from <character> to <factor> FALSE object not found
student4 TRUE  TRUE  TRUE  TRUE coerced from <character> to <factor> TRUE renamed object   
student5 TRUE  TRUE  TRUE  FALSE object not found                    FALSE object not found
\end{Soutput}
\end{Schunk}
}
\caption{\label{figure:comparefiles}%
Using the \code{compareFiles()} function to run \R{} code from several
files and compare the results to model objects.  The result of this sort
of comparison can easily get quite wide, so it is often useful to 
print the result with \code{options(width)} set to some large value
and using a small font, as has been done here.}
\end{figure*}

The results show that most students got the first three problems 
correct.  They had more trouble getting the fourth problem right,
with one getting the factor levels in the wrong order and two others
producing a character vector rather than a factor.  Only one student,
\code{student2},
got the final problem exactly right and only one other, 
\code{student4}, got essentially
the right answer, though this student spelt the name of the object
wrong.  

\section*{Assigning marks and\\giving feedback}

The result returned by \code{compareFiles()} is a list of lists of 
comparison results, where each result is itself a list of information
including whether two objects are the same and a record of 
how the objects were 
transformed during the comparison.  This represents a wealth 
of information with which to assess the performance of students
on a set of \R{} exercises, but it can be a little unwieldly to deal with.

The \pkg{compare} package provides further functions that make it easier
to deal with this information for the purpose of determining a final mark
and for the purpose of providing comments
for each student submission.

In order to determine a final mark, we use the \code{questionMarks()} function
to specify which object names are involved in a particular question,
to provide a maximum mark for the question, and to
specify a set of rules that determine how many marks should be deducted for
various deviations from the correct answers.
  
The \code{rule()} function is used
to define a marking rule.  It takes an object name, a number of marks
to deduct if the comparison for that object is \code{FALSE}, 
plus any number of 
transformation rules.  The latter are
 generated using the \code{transformRule()} function, which
associates a regular expression with a number of
marks to deduct.  If the regular expression is matched in the record of 
transformations for a comparison, 
then the appropriate number of marks are deducted.  

A simple example, based on the second question in Figure
\ref{figure:lab}, is shown below.  This specifies that the question only
involves an object named \code{IndianMothers}, that there is a maximum
mark of 1 for this question, and that 1 mark is deducted if the 
comparison is \code{FALSE}.

\begin{Schunk}
\begin{Sinput}
> q2 <- 
+     questionMarks("IndianMothers",
+                   maxMark=1,
+                   rule("IndianMothers", 1))
\end{Sinput}
\end{Schunk}
The first question from Figure \ref{figure:lab} provides a more
complex example.  In this case, there are four different objects
involved and the maximum mark is 2.  The rules below specify that
any \code{FALSE} comparison drops a mark \emph{and} that, for
the comparison involving the object named \code{"class"}, 
a mark should also be deducted 
if coercion was necessary to get
a \code{TRUE} result. 

\begin{Schunk}
\begin{Sinput}
> q1 <- 
+     questionMarks(
+         c("id", "age", "edu", "class"),
+         maxMark=2,
+         rule("id", 1),
+         rule("age", 1),
+         rule("edu", 1),
+         rule("class", 1,
+              transformRule("coerced", 1)))
\end{Sinput}
\end{Schunk}
Having set up this marking scheme, marks are generated using 
the \code{markQuestions()} function, as shown by the following code.

\begin{Schunk}
\begin{Sinput}
> markQuestions(results, q1, q2)
\end{Sinput}
\begin{Soutput}
         id-age-edu-class IndianMothers
student1                2             0
student2                2             1
student3                1             0
student4                1             1
student5                1             0
\end{Soutput}
\end{Schunk}
For the first question, the third and fourth students lose a mark because
of the coercion, and 
the fifth student loses a mark because he has not generated 
the required object.

A similar suite of functions are provided to associate comments,
rather than mark deductions, with  
particular transformations.  The following code provides
 a simple demonstration.

\begin{Schunk}
\begin{Sinput}
> q1comments <-
+     questionComments(
+         c("id", "age", "edu", "class"),
+         comments(
+             "class",
+             transformComment(
+                 "coerced",
+                 "'class' is a factor!")))
> commentQuestions(results, q1comments)
\end{Sinput}
\begin{Soutput}
         id-age-edu-class      
student1 ""                    
student2 ""                    
student3 "'class' is a factor!"
student4 "'class' is a factor!"
student5 ""                    
\end{Soutput}
\end{Schunk}
In  this case, we have just generated feedback for the students
who generated a character vector instead of the desired factor
in Question 1 of the exercise.

\section*{Summary, discussion, and\\future directions}

The \pkg{compare} package is based around the \code{compare()} 
function, which compares two objects for equality and, if they
are not equal, attempts
to transform the objects to make them equal.  It reports whether 
the comparison succeeded overall and provides a record of 
the transformations that were
attempted during the comparison.

Further functions are provided on top of the \code{compare()} 
function to facilitate marking exercises where students in a class
submit \R{} code in a file to create a set of \R{} objects.

This article has given some basic demonstrations of the use of 
the \pkg{compare()} package for comparing objects and marking 
student submissions.  The package could also be useful for the
students themselves, both to check whether they have the correct 
answer and to provide feedback about how their answer differs 
from the model answer.  More generally, the \code{compare()}
function may have application wherever the \code{identical()}
and \code{all.equal()} functions are currently in use.  For example,
it may be useful when debugging code and for performing regression
tests as part of a quality control process.

Obvious extensions of the \pkg{compare} package include adding
new transformations and providing comparison 
methods for other classes of objects.  
More details about how the package works and how these 
extensions might be developed are discussed in
the vignette, ``Fundamentals of the Compare Package'',
which is installed as part of the \pkg{compare} package.

\section*{Acknowledgements}

Many thanks to the editors and anonymous reviewers for their useful
comments and suggestions, on both this article and the 
\pkg{compare} package itself.

\bibliography{compare}

\end{document}
