| Type: | Package |
| Title: | Distribution-Free Goodness-of-Fit Testing for Regression |
| Version: | 1.2 |
| Depends: | R (≥ 4.4) |
| Imports: | calculus, clue, lme4, methods |
| Suggests: | knitr, rmarkdown |
| VignetteBuilder: | knitr, rmarkdown |
| Date: | 2026-05-05 |
| Maintainer: | Jesse Miller <smallepsilon@proton.me> |
| Description: | Implements the distribution-free goodness-of-fit regression testing procedure, introduced by Estate Khmaladze (2021, <doi:10.1007/s10463-021-00786-3>) to test whether or not the mean structure of a parametric model belongs to a specified model family. The test is implemented for general mean functions with minimal distributional assumptions as well as common models (e.g., lm, glm) with the usual model assumptions. |
| License: | GPL-3 |
| NeedsCompilation: | no |
| Packaged: | 2026-05-06 20:33:47 UTC; j |
| Author: | Jesse Miller |
| Repository: | CRAN |
| Date/Publication: | 2026-05-08 08:00:13 UTC |
Distribution-Free Goodness-of-Fit Testing for Regression
Description
Implements the distribution-free goodness-of-fit regression testing procedure, introduced by Estate Khmaladze (2021, <doi:10.1007/s10463-021-00786-3>) to test whether or not the mean structure of a parametric model belongs to a specified model family. The test is implemented for general mean functions with minimal distributional assumptions as well as common models (e.g., lm, glm) with the usual model assumptions.
Details
The DESCRIPTION file:
| Package: | distfreereg |
| Type: | Package |
| Title: | Distribution-Free Goodness-of-Fit Testing for Regression |
| Version: | 1.2 |
| Depends: | R (>= 4.4) |
| Imports: | calculus, clue, lme4, methods |
| Suggests: | knitr, rmarkdown |
| VignetteBuilder: | knitr, rmarkdown |
| Date: | 2026-05-05 |
| Authors@R: | person(given = "Jesse", family = "Miller", role = c("aut", "cre"), email = "smallepsilon@proton.me", comment = c(ORCID = "0009-0005-9465-7461")) |
| Maintainer: | Jesse Miller <smallepsilon@proton.me> |
| Description: | Implements the distribution-free goodness-of-fit regression testing procedure, introduced by Estate Khmaladze (2021, <doi:10.1007/s10463-021-00786-3>) to test whether or not the mean structure of a parametric model belongs to a specified model family. The test is implemented for general mean functions with minimal distributional assumptions as well as common models (e.g., lm, glm) with the usual model assumptions. |
| License: | GPL-3 |
| Author: | Jesse Miller [aut, cre] (ORCID: <https://orcid.org/0009-0005-9465-7461>) |
Index of help topics:
asymptotics Convenience Function for Exploring Asymptotic
Behavior and Sample Size Adequacy
coef.distfreereg Extract Estimated Parameters from 'distfreereg'
Objects
compare Compare the Distributions of Empirical and
Theoretical Statistics Used in
Distribution-Free Parametric Regression Testing
confint.distfreereg Calculate Confidence Intervals with a
'distfreereg' Object
distfreereg Distribution-Free Parametric Regression Testing
distfreereg-package Distribution-Free Goodness-of-Fit Testing for
Regression
fitted.distfreereg Extract Fitted Values from 'distfreereg'
Objects
formula.distfreereg Extract Formulas from 'distfreereg' Objects
ks.test.compare Formally Compare Empirical and Theoretical
Statistics from a 'compare' Object
plot.compare Summary and Diagnostic Plots for 'compare'
Objects
plot.distfreereg Summary and Diagnostic Plots for 'distfreereg'
Objects
predict.distfreereg Generate Predicted Values from 'distfreereg'
Objects
print.compare Printing 'compare' Objects
print.distfreereg Printing 'distfreereg' Objects
rejection Compute Rejection Rates of a Distribution-Free
Test from a 'compare' Object
residuals.distfreereg Extract Residuals from 'distfreereg' Objects
update.distfreereg Update 'distfreereg' Objects
vcov.distfreereg Estimate Parameter Covariance Matrices from
'distfreereg' Objects
Further information is available in the following vignettes:
v1_introduction | An Introduction to the distfreereg Package (source) |
v2_compare | Comparing Distributions with the distfreereg Package (source) |
v3_plotting | Plotting with the distfreereg Package (source) |
v4_parameter-estimation | Parameter Estimation with distfreereg (source) |
v5_advanced-options | Advanced Options for the distfreereg Package (source) |
Author(s)
Jesse Miller [aut, cre] (ORCID: <https://orcid.org/0009-0005-9465-7461>)
Maintainer: Jesse Miller <smallepsilon@proton.me>
Convenience Function for Exploring Asymptotic Behavior and Sample Size Adequacy
Description
This is a convenience function that calls compare using object to create the true and test model specifications.
Usage
asymptotics(object, ...)
Arguments
object |
Object of class |
... |
Additional arguments to pass to |
Details
An important step in implementing distfreereg is determining the plausibility that the sample size of a data set is adequate to produce the desired asymptotic behavior of the simulated statistics. This can be done by calling compare with the appropriate argument values. Because of the importance of this step, this convenience function automates that call using a single argument, namely the object of class distfreereg in question.
Value
Object of class compare.
Warning
The fitted model in object is assumed to be the true model, not an estimation of the true model. Further, the simulation of values requires assuming a particular error distribution. See the "Warnings" section in compare's documentation for details.
Author(s)
Jesse Miller
See Also
Extract Estimated Parameters from distfreereg Objects
Description
This is a coef method for objects of class distfreereg. It extracts the estimated parameters from a model in a distfreereg object.
Usage
## S3 method for class 'distfreereg'
coef(object, ...)
Arguments
object |
Object of class |
... |
Additional parameters passed to or from other methods. Currently ignored. |
Value
Numeric vector of estimated model parameters.
Author(s)
Jesse Miller
See Also
distfreereg, vcov.distfreereg, confint.distfreereg
Compare the Distributions of Empirical and Theoretical Statistics Used in Distribution-Free Parametric Regression Testing
Description
Simulate response data repeatedly with true_mean as the mean and
true_covariance as the covariance structure, each time running
distfreereg on the simulated data. The observed statistics and
p-values are saved, as are the simulated statistics from the first
replication.
This function is intended to facilitate exploring sample size adequacy and
the tests' powers. For a convenient wrapper to expedite investigating sample
size adequacy using a distfreereg object, see
asymptotics.
See the Comparing Distributions with the
distfreereg Package vignette for an introduction.
Usage
compare(true_mean, true_method = NULL, true_method_args = NULL, true_covariance,
true_X = NULL, true_data = NULL, theta = NULL, n = NULL, reps = 1e3,
prog = reps/10, simulate_args = NULL, err_dist_fun = NULL,
err_dist_args = NULL, keep = NULL, manual = NULL, update_args = NULL, ...)
Arguments
true_mean |
Object specifying the mean structure of the true model. It is used to generate
the true values of |
true_method |
Character vector of length one; specifies the function (e.g.,
|
true_method_args |
Optional list; values are passed to the function specified by
|
true_covariance |
Named list; specifies the covariance structures of the true error
distribution in the format described in the documentation for the
|
true_X, true_data |
Optional numeric matrix or data frame, respectively; specifies the covariate
values for the true model. |
theta |
Numeric vector; used as the (true) parameter values for the model when
|
n |
Optional integer; indicates how long each simulated data vector should be.
Required only when no covariate values are specified for either the true or
test mean; should be |
reps |
Integer; specifies number of replications. Silently converted to integer if numeric. |
prog |
Integer or |
simulate_args |
Optional list; specifies additional named arguments to pass to
|
err_dist_fun |
Character string; specifies the name of the function to be used to simulate
errors when |
err_dist_args |
Optional list; specifies additional named arguments to pass to
|
keep |
A vector of integers, or the character string " |
manual |
Optional function; applied to the |
update_args |
Optional named list; specifies arguments to pass to
|
... |
Additional arguments passed to |
Details
This function allows the user to explore the asymptotic behavior of the
distributions involved in the test conducted by distfreereg. If
the sample size is large enough and the true covariance matrix of the errors
is known or is estimated well enough, then the observed and simulated
statistics have nearly the same distribution. How large the sample size must
be depends on the details of the situation. This function can be used to
determine how large the sample size must be to obtain approximately equal
distributions, and to estimate the power of the test against a specific
alternative.
The user specifies a particular true model which is used to generate outcome values. There are three cases:
When
true_meanis a function, this function determines the mean of the outcome values anderr_dist_funis used to generate errors. The error-generating function will usually include an element oftrue_covarianceas an argument, and in that case must accept the appropriate class of object. For example, if the true covariance is a list of matrices corresponding to a block-diagonal covariance matrix, thenerr_dist_funmust accept such a list as an argument.When
true_meanis annlsobject, or when it is aformulaandtrue_methodis "nls", the function determined by the formula (in the model call or user-specified, respectively) is used to determine the mean function, anderr_dist_fungenerates the errors.When
true_meanis a model object that is not annlsobject, or aformulaandmethodis not "nls", thensimulateis used to generate outcome values.
If none of these cases apply to true_mean, then compare()
cannot be used. (E.g., true_mean cannot be a glm object fitted
using a "quasi" family, because simulate does
not work for that family.)
The user also specifies arguments to pass to distfreereg, most
notably a model to test comprising a mean function test_mean and a
covariance structure specified by covariance. For each repetition,
compare sends the simulated data, as Y or as part of
data, to distfreereg.
The true_covariance argument specifies the covariance structure that
is available to err_dist_fun for generating errors. The needs of
err_dist_fun can vary (for example, the default function uses
SqrtSigma to generate multivariate normal errors), so any one of the
elements Sigma, SqrtSigma, P, and Q (defined in
the documentation of distfreereg) can be specified. Any
element needed by err_dist_fun is calculated automatically if not
supplied.
The value of err_dist_fun must be a function whose output is a
numeric matrix with n rows and reps columns. Each column is
used as the vector of errors in one repetition. The error function's
arguments can include the special values n, reps,
Sigma, SqrtSigma, P, and Q. These arguments are
automatically assigned their corresponding values from the values passed to
compare. For example, the default value rmvnorm uses
SqrtSigma to generate multivariate normal values with mean 0 and
covariance Sigma.
The argument keep is useful for diagnosing problems, but caution
should be used lest a very large object be created. It is often sufficient
to save the distfreereg objects from only the first few replications.
For more specialized needs, the manual argument allows the
calculation and saving of objects during each repetition. For example, using
manual = function(x) residuals(x) will save the (raw) residuals from
each repetition.
The first repetition creates a distfreereg object. During each
subsequent repetition, this object is passed to
update.distfreereg to create a new object. The
update_args argument can be used to modify this call.
If necessary, global_override can be used to pass an override
argument to distfreereg in each repetition. For example, using
gobal_override = list(theta_hat = theta) forces the estimated
parameter vector used in the test in each call to be the true parameter
vector theta.
Value
An object of class compare with the
following components:
call |
The matched call. |
Y |
The matrix whose columns contain the model outcome values used for the corresponding repetitions. |
theta |
Supplied vector of parameter values. |
true_mean |
Supplied object specifying the true mean function. |
true_covariance |
List containing element(s) that specify the true covariance structure. |
true_X |
Supplied matrix of true covariate values. |
true_data |
Supplied data frame of true covariate values. |
test_mean |
Supplied object specifying the mean function being tested. |
covariance |
List containing element(s) that specify the test covariance structure. |
X |
Supplied matrix of test covariate values. |
data |
Supplied data frame of test covariate values. |
empirical_stats |
The statistics observed in each repetition. |
theoretical_stats |
The simulated statistics from
the first repetition. (They are the same for each repetition, because
|
p |
The p-values for the observed statistics. |
dfrs |
A list containing the
outputs of |
manual |
A list containing the results of the function specified by the
argument |
Warnings
The generation of new outcome values requires specifying an error
distribution. The default behavior when true_mean is a
function, an nls object, or a formula with
method equal to "nls" is to use a multivariate normal error
distribution, but different error-generating functions can be defined by the
user. When true_mean is a model object that is not an nls
object, or a formula and method is not "nls", then the
errors are generated using simulate and are therefore
distributed according to that function's specifications. In short, the
asymptotic behavior is determined for a specific (true) error distribution,
even though the test itself is distribution-free.
Simulated outcome values for model using a subset argument can result
in unexpected behavior when using a subsetting condition that involves the
outcome variable. Such subsetting is highly discouraged, since the
subsetting will occur after outcome values have been replaced and therefore
different subsets might be selected for each repetition. Further, because of
the way that a data set is retrieved from an nls object, subsetting
with nls using "self-referential" conditions can cause
unexpected results. For example, if a condition such as x > median(x)
is used, then the median will be calculated on each subset in turn, which
will shrink the sample size with each repetition and throw an error.
Note
Some of the processing of the elements of true_covariance is analogous
to the processing of covariance by distfreereg. Any
values of solve_tol and symmetric specified in
distfreereg's control argument are used by compare
to similar effect in processing true_covariance.
Support for glm objects is limited to those created using a
family that has a simulate element.
The presence of call in the value allows a compare object to be
passed to update.
Author(s)
Jesse Miller
See Also
asymptotics, distfreereg,
ks.test.compare, plot.compare,
rejection
Examples
set.seed(20240201)
n <- 100
func <- function(X, theta) theta[1] + theta[2]*X[,1]
Sig <- rWishart(1, df = n, Sigma = diag(n))[,,1]
theta <- c(2,5)
X <- matrix(rexp(n, rate = 1))
# In practice, 'reps' should be much larger
cdfr <- compare(true_mean = func, true_X = X, true_covariance = list(Sigma = Sig),
test_mean = func, X = X, covariance = list(Sigma = Sig),
reps = 10, prog = Inf, theta = theta, theta_init = rep(1, length(theta)))
cdfr$p
Calculate Confidence Intervals with a distfreereg Object
Description
This is a confint method for objects of class
distfreereg. It calculates confidence intervals for the estimated
parameters of a model in a distfreereg object.
Usage
## S3 method for class 'distfreereg'
confint(object, parm, level = 0.95, ...)
Arguments
object |
Object of class |
parm |
Numeric or character vector; specifies which parameters are to be given confidence intervals. If missing, all parameters are considered. |
level |
Numeric vector of length one; specifies the confidence level. |
... |
Additional parameters passed to other methods. Currently ignored. |
Details
When object contains a model object (either in the test_mean or
model element), then this model is sent to confint.
Otherwise, object is sent to confint.default.
Value
The output from the appropriate confint method.
Note
If object was created by calling distfreereg with no
test mean function (that is, with test_mean equal to NULL),
there is no estimated parameter vector, and therefore this function does not
apply.
Author(s)
Jesse Miller
See Also
distfreereg, vcov.distfreereg, confint
Distribution-Free Parametric Regression Testing
Description
Conduct distribution-free parametric regression testing using the process
introduced in Khmaladze (2021). A parametric model for the
conditional mean (specified by test_mean) is checked against the data
by fitting the model, transforming the resulting residuals, and then
calculating a statistic on the empirical partial sum process of the
transformed residuals. The statistic's null distribution can be simulated in
a straight-forward way, thereby producing a p-value.
Using f to denote the mean function being tested, the specific test
has the following null and alternative hypotheses:
H_0\colon\ \exists\theta\in\Theta\subseteq\mathbb R^p
\mathrel{\bigl|}\textrm{E}(Y| X)=f(X;\theta)
\quad\hbox{against}\quad
H_1\colon\ \forall\theta\in\Theta\subseteq\mathbb R^p
\mathrel{\bigl|} \textrm{E}(Y| X)\neq f(X;\theta).
This assumes a known or consistently estimated covariance matrix. See the
An Introduction to the distfreereg
Package vignette for an introduction.
Usage
distfreereg(test_mean = NULL, covariance = NULL, Y = NULL, X = NULL,
theta_init = NULL, data = NULL, method = NULL, method_args = NULL,
stat = c("KS", "CvM"), B = 1e4, ordering = "simplex", group = TRUE,
control = NULL, override = NULL, verbose = TRUE)
Arguments
test_mean |
A specification of the mean function to be tested. Accepted classes are
|
covariance |
Named list; specifies the covariance structure of the model's error
distribution. Must be 'NULL' when 'test_mean' is a model object, in
which case the covariance structure is extracted automatically from the
model object. Valid element names are "
See Details. |
Y |
Numeric vector of observations used when |
X |
Optional numeric matrix of covariates used when |
theta_init |
Numeric vector; specifies the starting parameter values passed to the
optimization function used to estimate the parameter vector. Used when
|
data |
Data frame of covariate values used when |
method |
Character vector; specifies the function to use for fitting the model
when |
method_args |
Optional list of argument values to be passed to the function specified
by the |
stat |
Character vector; specifies the names of the functions used to calculate the desired statistics. By default, a Kolmogorov–Smirnov statistic and a Cramer–von Mises-like statistic are calculated:
where |
B |
Numeric vector of length one; specifies the Monte Carlo sample size used when simulating statistics. Silently converted to integer. |
ordering |
A character string or a list; specifies how to order the residuals to form the empirical partial sum process. Valid character strings are:
If |
group |
Logical; if |
control |
Optional named list of elements that control the details of the algorithm's computations. The following elements are accepted for all methods:
The following named elements, all but the first of which control the
process of calculating the generalized least squares estimation of the
parameter vector, are accepted for the
|
override |
Optional named list of arguments that override internally calculated
values. Used primarily by
|
verbose |
Logical; if |
Details
This function implements distribution-free parametric regression testing. The model is specified by a mean structure and a covariance structure.
The mean structure is specified by the argument test_mean. This can
be an object of class function, formula, glm,
lm, lmerMod, or nls. It can also be NULL.
If test_mean is a function, then it must have one or two arguments:
either theta only, or theta and either X (uppercase) or
x (lowercase). An uppercase X is interpreted in the function
definition as a matrix, while a lowercase x is interpreted as a
vector. (See examples and this
vignette.) The primary reason to use a lowercase x is to allow
for a function definition using an R function that is not vectorized.
In general, an uppercase X should be preferred for speed.
If test_mean is a formula, then it must be a formula that can be
passed to glm, lm, lmer, or
nls, and the data argument must be specified. The
appropriate model is created and is then sent to distfreereg() for
method dispatch.
The function method estimates parameter values, and then uses those
to evaluate the Jacobian of the mean function and to calculate fitted
values.
All of these methods create a Jacobian matrix and a vector of fitted values.
These, along with the covariance structure, are sent the default method. The
default method also allows the user to implement the algorithm even when the
mean structure is not specified in R, as long as the Jacobian, fitted
values, and covariance structure can be imported. (This is useful if a
particularly complicated function is defined in another language and cannot
easily be copied into R.)
The covariance structure for Y|X must be specified using the
covariance argument for the function and default methods. For
other methods, the covariance is estimated automatically.
Any element of covariance can be a numeric matrix, a list of numeric
matrices, or a numeric vector. If it is a vector, its length must be either
1 or the sample size. This option is mathematically equivalent to setting a
covariance list element to a diagonal matrix with the specified value(s)
along the diagonal. Using vectors, when possible, is more efficient than
using the corresponding matrix. If an element of covariance is a list
of numeric matrices, then these matrices are interpreted as the blocks of a
block diagonal matrix.
Internally, distfreereg() only needs Q, so some efficiency can
be gained by supplying that directly when available. When Q is not
specified, it is calculated using whichever element is specified. When more
than one of the other elements are specified, no verification of
compatibility of the elements is done. Q is calculated using the
supplied element(s) by preferring one-step operations and operations on the
square-root level. (For example, if both P and SqrtSigma
elements are supplied, then Q is calculated using SqrtSigma
only. If Sigma and P are supplied, then Q is
calculated using P only.)
The override argument is used primarily by
update.distfreereg to avoid unnecessary and potentially
computationally expensive recomputation. This update method
imports appropriate values automatically from a previously created object of
class distfreereg, and therefore validation is not always done. Use
manually with caution.
The res_order element of override must be a vector of integers
from 1 to n (the sample size) that determines the order of the
residuals to use when forming the empirical partial sum process. Elements of
the vector can be repeated, in which case the residuals corresponding to
matching res_order values are grouped when group is
TRUE.
Value
An object of class distfreereg with the following components:
call |
The matched call. |
data |
A list containing the |
test_mean |
The value supplied to the argument |
model |
The model built when using the |
covariance |
The list of covariance matrices, containing at least
|
theta_hat |
The estimated parameter vector. When the model being tested
has class |
optimization_output |
The output of |
fitted_values |
The vector of fitted values,
|
J |
The Jacobian matrix, |
mu |
The mu matrix. |
r |
The matrix of transformation anchor vectors. |
r_tilde |
The matrix of modified transformation anchor vectors. |
residuals |
A named list of three vectors containing raw, sphered, and transformed residuals. |
res_order |
A numeric vector indicating the ranking of the residuals
used to form the empirical partial sum process, in a format to be used
as the input of |
grouping_matrix |
The matrix used to group residuals; present if
|
epsp |
The empirical partial sum process formed by calculating the
scaled partial sums of the transformed residuals ordered according to
|
observed_stats |
A named list of the observed statistic(s) corresponding to the transformed residuals. |
theoretical_stats |
A named list, each element of which contains the values of a simulated statistic. |
p |
A named list with two elements: |
Warnings
Methods for model objects (e.g., lm objects) are intended to be used
with objects created using a data argument that contains all
variables used by the model. This data argument is assumed to be defined in
the same environment as the model object, and this is assumed to be the
environment in which distfreereg() is called.
Consistency between test_mean and theta_init is verified only
indirectly. Uninformative errors can occur when, for example,
theta_init does not have the correct length. The most common error
message that arises in this case is "f_out cannot have NA values",
which occurs when theta_init is too short. To be safe, always define
test_mean to use every element of theta.
No verification of consistency is done when multiple elements of
coviariance are specified. For example, if P and Sigma
are both specified, then the code will use only one of these, and will not
verify that P is the inverse of Sigma.
When using the control argument element optimization_fun to
specify an optimization function other than optim, the
verification that theta_hat_name actually matches the name of an
element of the optimization function's output is done only after the
optimization has been done. If this optimization will likely take a long
time, it is important to verify the value of theta_hat_name before
running distfreereg().
The default values of sym_tol and sym_tol1 are intended to
check for substantial asymmetry such as would be caused by the user
inputting the wrong matrix. Therefore, they do not check for symmetry with
high precision. In particular, these default values are orders of magnitude
larger than the default values in isSymmetric.
The theory described by Khmaladze (2021) does not apply directly to
generalized linear models nor linear mixed-effects models, but extensive
simulations indicate that the method applies nonetheless to these models.
Further investigation can be done using asymptotics. Non-null
values for the weights argument in glm are not
supported.
Author(s)
Jesse Miller
References
Khmaladze, Estate V. Distribution-free testing in linear and parametric regression, 2021-03, Annals of the Institute of Statistical Mathematics, Vol. 73, No. 6, p. 1063–1087. doi:10.1007/s10463-021-00786-3
See Also
coef.distfreereg, confint.distfreereg,
fitted.distfreereg, formula.distfreereg,
plot.distfreereg, predict.distfreereg,
print.distfreereg, residuals.distfreereg,
update.distfreereg, vcov.distfreereg
Examples
set.seed(20240218)
n <- 1e2
func <- function(X, theta) X[,1]^theta[1] + theta[2]*X[,2]
Sig <- runif(n, min = 1, max = 3)
theta <- c(2,5)
X <- matrix(runif(2*n, min = 1, max = 5), nrow = n)
Y <- X[,1]^theta[1] + theta[2]*X[,2] + rnorm(n, sd = sqrt(Sig))
(dfr <- distfreereg(Y = Y, X = X, test_mean = func,
covariance = list(Sigma = Sig),
theta_init = c(1,1)))
# Same test with lowercase "x" for reference;
# use uppercase whenever possible.
func_lower <- function(x, theta) x[1]^theta[1] + theta[2]*x[2]
(dfr_lower <- distfreereg(Y = Y, X = X, test_mean = func_lower,
covariance = list(Sigma = Sig),
theta_init = c(1,1)))
Extract Fitted Values from distfreereg Objects
Description
This is a fitted method for objects of class distfreereg.
Usage
## S3 method for class 'distfreereg'
fitted(object, ...)
Arguments
object |
Object of class |
... |
Additional parameters passed to or from other methods. Currently ignored. |
Value
Numeric vector of fitted values.
Author(s)
Jesse Miller
See Also
Extract Formulas from distfreereg Objects
Description
This is a formula method for objects of class distfreereg. It extracts the formula from a model in a distfreereg object.
Usage
## S3 method for class 'distfreereg'
formula(x, ...)
Arguments
x |
Object of class |
... |
Additional parameters passed to or from other methods. Currently ignored. |
Value
Formula extracted from x$test_mean, or NULL if such a formula cannot be extracted.
Author(s)
Jesse Miller
See Also
Formally Compare Empirical and Theoretical Statistics from a compare Object
Description
This is a ks.test method for objects of class compare.
It performs a two-sample Kolmogorov–Smirnov test to compare the observed
and simulated statistics in an object of class compare.
Usage
## S3 method for class 'compare'
ks.test(x, ..., stat = NULL)
Arguments
x |
Object of class |
... |
Additional parameters passed to |
stat |
Character string specifying the statistic on which to run the test. |
Details
When stat is NULL, the default value is the first statistic
appearing in the observed_stats element of object.
Value
A list of the form specified in ks.test.
Author(s)
Jesse Miller
See Also
Examples
# In practice, set "reps" larger than 200.
set.seed(20240201)
n <- 100
func <- function(X, theta) theta[1] + theta[2]*X[,1]
Sig <- rWishart(1, df = n, Sigma = diag(n))[,,1]
theta <- c(2,5)
X <- matrix(rexp(n, rate = 1))
cdfr <- compare(true_mean = func, true_X = X, true_covariance = list(Sigma = Sig),
test_mean = func, X = X, covariance = list(Sigma = Sig), reps = 200,
prog = Inf, theta = theta, theta_init = rep(1, length(theta)))
ks.test(cdfr)
ks.test(cdfr, stat = "CvM")
Summary and Diagnostic Plots for compare Objects
Description
This is a plot method for objects of class compare. It
automates the creation of four summary and diagnostic plots for
compare objects. See the Plotting with
the distfreereg Package vignette for examples.
Usage
## S3 method for class 'compare'
plot(x, y, ..., which = "cdf", stat = NULL, hlines = NULL, curve_args = NULL,
confband_args = FALSE, density_args = NULL, poly = NULL, legend = NULL,
qqline = NULL)
Arguments
x |
Object of class |
y |
Optional object of class |
... |
Additional parameters passed to a plotting function depending on the
value of |
which |
Character string. Acceptable values are "
|
stat |
Character string, specifies the statistic to plot. |
hlines |
An optional list of arguments to pass to |
curve_args |
An optional list used to pass arguments to |
confband_args |
An optional list of values that control the calculation and plotting of
confidence bands when
Setting equal to |
density_args |
An optional list of arguments passed to |
poly |
An optional list of arguments passed to |
legend |
An optional list of arguments passed to |
qqline |
An optional list of arguments passed to |
Details
This function produces a plot of a type specified by which. The values
plotted depend on whether or not y is present and the value of
which. When y is present, the plots compare the empirical
statistics in x and the empirical statistics in y. When y
is missing, the plots compare the empirical and theoretical statistics in
x. (The exception is when which is "qqp", which is only
available when y is missing.)
When which is "cdf" or "dens", the plotting region and
associated labels, tick marks, etc., are created by an initial call to
plot. The curves themselves are drawn with lines.
The arguments specified in ... are passed to the initial call to
plot.
Value
The values used to create the curves (or points, in the case of a Q–Q plot)
are returned invisibly. The details depend on the value of which:
-
cdf: A list with two or four elements, all lists. The first two sub-lists contain thex- andy-values cdf curves. If confidence bands are plotted, then two additional elements are included with output from the confidence band calculations, including elementsw,cb_lower, andcb_upper, which contain, respectively, thex-coordinates for both the upper and lower bounds of the band, they-coordinates for the lower band, and they-coordinates for the upper band. -
dens: A list with two or four elements, all lists. The first two sub-lists containx- andy-values for the density curves. If confidence bands are plotted, then two additional sub-lists are supplied, with contents identical to what is described for "cdf". -
qq,qqp: The output ofqqplot.
For "cdf" and "dens", the names of the elements of the returned
list depend on whether or not a value for the argument y was supplied.
Author(s)
Jesse Miller
References
Flegal, James M. et al. Simultaneous confidence bands for (Markov chain) Monte Carlo simulations, forthcoming.
See Also
Summary and Diagnostic Plots for distfreereg Objects
Description
This is a plot method for objects of class distfreereg.
It automates the creation of summary and diagnostic plots for
distfreereg objects. See the Plotting
with the distfreereg Package vignette for examples.
Usage
## S3 method for class 'distfreereg'
plot(x, which = "dens", stat = NULL, density_args = NULL,
polygon_args = NULL, confband_args = NULL, abline_args = NULL,
shade_col = rgb(1,0,0,0.5), text_args = NULL, ...)
Arguments
x |
Object of class |
which |
Character string. Acceptable values are "
|
stat |
Character vector of length one specifying the name of the statistic to
plot when |
density_args |
An optional list of arguments to pass to |
polygon_args |
An optional list of arguments to pass to
|
confband_args |
An optional list of values that control the calculation and plotting of confidence bands. Any of the following named elements are allowed.
Setting equal to |
abline_args |
An optional list of arguments to pass to |
shade_col |
Character string or other value specifying the color to use to shade the
upper tail of the distribution when |
text_args |
An optional list of arguments to pass to |
... |
Additional arguments to pass to |
Details
This function produces one of three specified plots, depending on the value
of which.
When which is "dens" or "ecdf", a plot of the estimated
density or empirical cumulative distribution function, respectively, of the
simulated statistics is produced, including a vertical line at the value of
the observed test statistic with the p-value displayed.
The default placement of the p-value text is on the left side of the line
indicating the statistic value. Specifically, the default values of x
and y passed to text are the statistic value itself and
the midpoint between zero and the maximum value of the density curve. The
default value passed to adj is c(1,0.5), meaning that the text
is aligned to the left of the value (x,y) and centered vertically on
it. (The default value for the text itself, which can be modified via the
label argument of text, includes a space on the left
and the right for padding so the text does not overlap the vertical line
itself.) To align the text so it appears on the right side (for example, to
avoid overlapping the density curve), use text_args = list(adj =
c(0,0.5)). See documentation for text for details on this and
other arguments.
When which is "residuals", a time-series-like plot is produced
showing transformed residuals in the order given by x$res_order. In
the case that the null hypothesis is rejected, this plot can help determine
where (in terms of the linearly ordered covariates) a discrepancy between
the model and the data occurs.
When which is "epsp", a plot of the empirical partial sum
process is produced; that is, the y-values are
y_j =
{1\over\sqrt{n}}\sum_{i=1}^j\hat e_i
where \hat e_i is the ith
transformed residual in the order given by x$res_order. Similar to
the case when which is "residuals", this plot can help
determine where (in terms of the linearly ordered covariates) a discrepancy
between the model and the data occurs.
Value
When which is "dens" or "ecdf", the values used to
create the plot are returned invisibly in a list with two named elements,
x and y. If the confidence band is plotted, then it is
included as an element named confband.
For other values of which, nothing is returned.
Author(s)
Jesse Miller
References
Flegal, James M. et al. Simultaneous confidence bands for (Markov chain) Monte Carlo simulations, forthcoming.
See Also
Generate Predicted Values from distfreereg Objects
Description
This is a predict method for objects of class distfreereg.
Usage
## S3 method for class 'distfreereg'
predict(object, ...)
Arguments
object |
Object of class |
... |
Additional arguments passed to other |
Details
When object$test_mean is a model object ("lm", "glm", or "nls"), object$test_mean is sent to predict for method dispatch. When object$test_mean is of class "formula", object$model is sent to predict.
Value
Numeric vector of predicted values.
Author(s)
Jesse Miller
See Also
Printing compare Objects
Description
This is a print method for objects of class compare.
Usage
## S3 method for class 'compare'
print(x, ...)
Arguments
x |
Object of class |
... |
Additional parameters, currently ignored. |
Details
This function prints a useful summary of the compare object x.
Value
No return value (NULL).
Author(s)
Jesse Miller
See Also
Printing distfreereg Objects
Description
This is a print method for objects of class distfreereg.
Usage
## S3 method for class 'distfreereg'
print(x, ..., digits = 3, col_sep = 2, show_params = TRUE)
Arguments
x |
Object of class |
... |
Additional parameters, currently ignored. |
digits |
Integer; passed to |
col_sep |
Integer; specifies the padding (in units of spaces) between columns in the printed table of statistics. |
show_params |
Logical; determines whether or not the parameter estimates, if present in |
Details
This function prints a useful summary of the distfreereg object x.
Value
No return value (NULL).
Author(s)
Jesse Miller
See Also
Compute Rejection Rates of a Distribution-Free Test from a compare
Object
Description
Compute the rejection rates of the tests simulated in a compare
object. Specifically, this function estimates the rejection rates of the
tests conducted with specified statistics of the hypothesis that the mean
function is test_mean when the true mean function is
true_mean.
Usage
rejection(object, alpha = 0.05, stat = names(object[["empirical_stats"]]), ...)
Arguments
object |
Object of class |
alpha |
Numeric vector; specifies the |
stat |
Character vector; specifies the names of the statistics to use. The default
value computes the rejection rate associated with every statistic in
|
... |
Additional arguments to pass to |
Value
Data frame containing estimated rejection rates and associated Monte Carlo
standard errors, with one row for each combination of stat and
alpha elements.
Warning
The reported Monte Carlo standard error does not account for the uncertainty
of the estimation of the 1-\alpha quantiles of the distribution of
simulated statistics. The number of Monte Carlo simulations should be large
enough to make this estimate sufficiently accurate that it can be considered
known for practical purposes. The standard errors of estimated quantiles can
be calculated using the mcmcse package.
Author(s)
Jesse Miller
See Also
Examples
# In practice, set "reps" much larger than 20.
set.seed(20240201)
n <- 100
func <- function(X, theta) theta[1] + theta[2]*X[,1]
Sig <- rWishart(1, df = n, Sigma = diag(n))[,,1]
theta <- c(2,5)
X <- matrix(rexp(n, rate = 1))
cdfr <- compare(true_mean = func, true_X = X, true_covariance = list(Sigma = Sig),
test_mean = func, X = X, covariance = list(Sigma = Sig), reps = 20,
prog = Inf, theta = theta, theta_init = rep(1, length(theta)))
rejection(cdfr)
rejection(cdfr, stat = "CvM")
rejection(cdfr, alpha = c(0.1, 0.2))
Extract Residuals from distfreereg Objects
Description
This is a residuals method for objects of class
distfreereg. It can extract any of the three available types of
residuals.
Usage
## S3 method for class 'distfreereg'
residuals(object, ..., type = "raw")
Arguments
object |
Object of class |
... |
Additional parameters passed to or from other methods. Currently ignored. |
type |
Character string specifying the type of residuals to return. Must be one of
" |
Value
Numeric vector of residuals.
Author(s)
Jesse Miller
See Also
Update distfreereg Objects
Description
This is a distfreereg method for update. The method takes
advantage of the override argument of distfreereg to
prevent unnecessary recalculation of potentially computationally expensive
objects.
Usage
## S3 method for class 'distfreereg'
update(object, ..., smart = TRUE)
Arguments
object |
Object of class |
... |
Additional named parameters to pass to |
smart |
Logical. If |
Details
This function updates an object of class distfreereg. By default, it
does so "intelligently" in the sense that it does not unnecessarily recompute
elements that are already saved in object. For example, if a new value
for covariance is not included in ..., then the value of
covariance saved in object is automatically passed to the new
call, preventing recalculating Q. If a new value of covariance
is specified, then all objects dependent on that (e.g., \hat\theta) are
recomputed by default.
In particular, the simulated samples depend on the data and function only
through the number of observations, the covariates (if any), and the dimension
of the parameter space of the function. If none of these change, then the
updated object reuses the simulated samples from the supplied object.
This function uses the 'override' argument of distfreereg, and
therefore it must be handled carefully when it itself is being updated. To
create the value of 'override' for the updated call, the following three named
lists are combined. When a name appears in more than one of these lists,
priority is given in the order in which the lists are shown:
The value of 'override' supplied to
update.Values of 'fitted_values' and 'J' supplied to the 'override' argument in the 'call' element of 'object', if the value of 'test_mean' in the updated call is 'NULL'.
The list of "intelligently chosen" override values determined by
update.distfreereg.
Value
An updated object of class distfreereg.
Note
The usual behavior of update is to create an updated call and
then evaluate that call. This is what update.distfreereg does, as well,
but some of the updated elements are drawn from object itself for use
as override values. In general, an object created by update.distfreereg
is not identical to the object created by
distfreereg using corresponding arguments, because the
call values will differ.
Author(s)
Jesse Miller
See Also
Examples
set.seed(20240218)
n <- 1e2
func <- function(X, theta) X[,1]^theta[1] + theta[2]*X[,2]
Sig <- runif(n, min = 1, max = 3)
theta <- c(2,5)
X <- matrix(runif(2*n, min = 1, max = 5), nrow = n)
Y <- X[,1]^theta[1] + theta[2]*X[,2] + rnorm(n, sd = sqrt(Sig))
dfr_1 <- distfreereg(Y = Y, X = X, test_mean = func,
covariance = list(Sigma = Sig),
theta_init = c(1,1))
func_updated <- function(X, theta) X[,1]^theta[1] + theta[2]*X[,2]^2
dfr_2 <- update(dfr_1, test_mean = func_updated)
Estimate Parameter Covariance Matrices from distfreereg Objects
Description
This is a vcov method for objects of class distfreereg.
It estimates the covariance matrix of the estimated parameters in a model
from a distfreereg object.
Usage
## S3 method for class 'distfreereg'
vcov(object, ..., jacobian_args, hessian_args)
Arguments
object |
Object of class |
... |
Additional parameters passed to other methods when |
jacobian_args, hessian_args |
Lists of additional arguments to pass to |
Details
When the test_mean element of object is of class
function, the covariance matrix is estimated using the method
described in section 5.3 of Van der Vaart (2007). Otherwise,
test_mean is of a class that has its own method for
vcov, which is used to calculate the output.
Value
Named numeric matrix equal to the estimated covariance matrix of the parameter
estimates from object.
Warning
This calculation can be computationally intensive when the sample size is
large and object$test_mean is a function.
Note
If object was created by calling distfreereg with no
test mean function (that is, with test_mean equal to NULL),
there is no estimated parameter vector, and therefore this function does not
apply.
Author(s)
Jesse Miller
References
Vaart, A. W. Asymptotic statistics, 2007, Cambridge series on statistical and probabilistic mathematics, Cambridge University Press.