boottest.lm
is a S3 method that allows for fast wild cluster
bootstrap inference for objects of class lm by implementing
fast wild bootstrap algorithms as developed in Roodman et al., 2019
and MacKinnon, Nielsen & Webb (2022).
Usage
# S3 method for lm
boottest(
object,
param,
B,
clustid = NULL,
bootcluster = "max",
conf_int = TRUE,
R = NULL,
r = 0,
beta0 = NULL,
sign_level = 0.05,
type = "rademacher",
impose_null = TRUE,
bootstrap_type = "fnw11",
p_val_type = "two-tailed",
tol = 1e-06,
maxiter = 10,
sampling = "dqrng",
nthreads = getBoottest_nthreads(),
ssc = boot_ssc(adj = TRUE, fixef.K = "none", cluster.adj = TRUE, cluster.df =
"conventional"),
engine = getBoottest_engine(),
floattype = "Float64",
maxmatsize = FALSE,
bootstrapc = FALSE,
getauxweights = FALSE,
...
)
Arguments
- object
An object of class lm
- param
A character vector or rhs formula. The name of the regression coefficient(s) for which the hypothesis is to be tested
- B
Integer. The number of bootstrap iterations. When the number of clusters is low, increasing B adds little additional runtime.
- clustid
A character vector or rhs formula containing the names of the cluster variables. If NULL, a heteroskedasticity-robust (HC1) wild bootstrap is run.
- bootcluster
A character vector or rhs formula of length 1. Specifies the bootstrap clustering variable or variables. If more than one variable is specified, then bootstrapping is clustered by the intersections of clustering implied by the listed variables. To mimic the behavior of stata's boottest command, the default is to cluster by the intersection of all the variables specified via the
clustid
argument, even though that is not necessarily recommended (see the paper by Roodman et al cited below, section 4.2). Other options include "min", where bootstrapping is clustered by the cluster variable with the fewest clusters. Further, the subcluster bootstrap (MacKinnon & Webb, 2018) is supported - see thevignette("fwildclusterboot", package = "fwildclusterboot")
for details.- conf_int
A logical vector. If TRUE, boottest computes confidence intervals by test inversion. If FALSE, only the p-value is returned.
- R
Hypothesis Vector giving linear combinations of coefficients. Must be either NULL or a vector of the same length as
param
. If NULL, a vector of ones of length param.- r
A numeric. Shifts the null hypothesis H0: param = r vs H1: param != r
- beta0
Deprecated function argument. Replaced by function argument 'r'.
- sign_level
A numeric between 0 and 1 which sets the significance level of the inference procedure. E.g. sign_level = 0.05 returns 0.95% confidence intervals. By default, sign_level = 0.05.
- type
character or function. The character string specifies the type of boostrap to use: One of "rademacher", "mammen", "norm" and "webb". Alternatively, type can be a function(n) for drawing wild bootstrap factors. "rademacher" by default. For the Rademacher distribution, if the number of replications B exceeds the number of possible draw ombinations, 2^(#number of clusters), then
boottest()
will use each possible combination once (enumeration).- impose_null
Logical. Controls if the null hypothesis is imposed on the bootstrap dgp or not. Null imposed
(WCR)
by default. If FALSE, the null is not imposed(WCU)
- bootstrap_type
Determines which wild cluster bootstrap type should be run. Options are "fnw11","11", "13", "31" and "33" for the wild cluster bootstrap and "11" and "31" for the heteroskedastic bootstrap. For more information, see the details section. "fnw11" is the default for the cluster bootstrap, which runs a "11" type wild cluster bootstrap via the algorithm outlined in "fast and wild" (Roodman et al (2019)). "11" is the default for the heteroskedastic bootstrap.
- p_val_type
Character vector of length 1. Type of p-value. By default "two-tailed". Other options include "equal-tailed", ">" and "<".
- tol
Numeric vector of length 1. The desired accuracy (convergence tolerance) used in the root finding procedure to find the confidence interval. 1e-6 by default.
- maxiter
Integer. Maximum number of iterations used in the root finding procedure to find the confidence interval. 10 by default.
- sampling
'dqrng' or 'standard'. If 'dqrng', the 'dqrng' package is used for random number generation (when available). If 'standard', functions from the 'stats' package are used when available. This argument is mostly a convenience to control random number generation in a wrapper package around
fwildclusterboot
,wildrwolf
. I recommend to use the fast' option.- nthreads
The number of threads. Can be: a) an integer lower than, or equal to, the maximum number of threads; b) 0: meaning all available threads will be used; c) a number strictly between 0 and 1 which represents the fraction of all threads to use. The default is to use 1 core.
- ssc
An object of class
boot_ssc.type
obtained with the functionboot_ssc()
. Represents how the small sample adjustments are computed. The defaults areadj = TRUE, fixef.K = "none", cluster.adj = "TRUE", cluster.df = "conventional"
. You can find more details in the help file forboot_ssc()
. The function is purposefully designed to mimic fixest'sfixest::ssc()
function.- engine
Character scalar. Either "R", "R-lean" or "WildBootTests.jl". Controls if
boottest()
should run via its native R implementation orWildBootTests.jl
. "R" is the default and implements the cluster bootstrap as in Roodman (2019). "WildBootTests.jl" executes the wild cluster bootstrap via the WildBootTests.jl package. For it to run, Julia and WildBootTests.jl need to be installed. The "R-lean" algorithm is a memory friendly, but less performant rcpp-armadillo based implementation of the wild cluster bootstrap. Note that if no cluster is provided, boottest() always defaults to the "lean" algorithm. You can set the employed algorithm globally by using thesetBoottest_engine()
function.- floattype
Float64 by default. Other option: Float32. Should floating point numbers in Julia be represented as 32 or 64 bit? Only relevant when 'engine= "WildBootTests.jl"'
- maxmatsize
NULL by default = no limit. Else numeric scalar to set the maximum size of auxilliary weight matrix (v), in gigabytes. Only relevant when 'engine= "WildBootTests.jl"'
- bootstrapc
Logical scalar, FALSE by default. TRUE to request bootstrap-c instead of bootstrap-t. Only relevant when 'engine = "WildBootTests.jl"'
- getauxweights
Logical. Whether to save auxilliary weight matrix (v)
- ...
Further arguments passed to or from other methods.
Value
An object of class boottest
- p_val
The bootstrap p-value.
- conf_int
The bootstrap confidence interval.
- param
The tested parameter.
- N
Sample size. Might differ from the regression sample size if the cluster variables contain NA values.
- boot_iter
Number of Bootstrap Iterations.
- clustid
Names of the cluster Variables.
- N_G
Dimension of the cluster variables as used in boottest.
- sign_level
Significance level used in boottest.
- type
Distribution of the bootstrap weights.
- impose_null
Whether the null was imposed on the bootstrap dgp or not.
- R
The vector "R" in the null hypothesis of interest Rbeta = r.
- r
The scalar "r" in the null hypothesis of interest Rbeta = r.
- point_estimate
R'beta. A scalar: the constraints vector times the regression coefficients.
- grid_vals
All t-statistics calculated while calculating the confidence interval.
- p_grid_vals
All p-values calculated while calculating the confidence interval.
- t_stat
The 'original' regression test statistics.
- t_boot
All bootstrap t-statistics.
- regression
The regression object used in boottest.
- call
Function call of boottest.
- engine
The employed bootstrap algorithm.
- nthreads
The number of threads employed.
Setting Seeds
To guarantee reproducibility, you need to set a global random seed via
set.seed()
when usingthe lean algorithm (via
engine = "R-lean"
) including the heteroskedastic wild bootstrapthe wild cluster bootstrap via
engine = "R"
with Mammen weights orengine = "WildBootTests.jl"
dqrng::dqset.seed()
when usingengine = "R"
for Rademacher, Webb or Normal weights
Via the engine
function argument, it is possible to specify different
variants of the wild cluster bootstrap, and if the algorithm should
be run via R or WildBootTests.jl
.
Confidence Intervals
boottest
computes confidence intervals by inverting p-values.
In practice, the following procedure is used:
Based on an initial guess for starting values, calculate p-values for 26 equal spaced points between the starting values.
Out of the 26 calculated p-values, find the two pairs of values x for which the corresponding p-values px cross the significance level sign_level.
Feed the two pairs of x into an numerical root finding procedure and solve for the root. boottest currently relies on
stats::uniroot
and sets an absolute tolerance of 1e-06 and stops the procedure after 10 iterations.
Run boottest
quietly
You can suppress all warning and error messages by setting the following global
options:
options(rlib_warning_verbosity = "quiet")
options(rlib_message_verbosity = "quiet")
Not that this will turn off all warnings (messages) produced via rlang::warn()
and
rlang::inform()
, which might not be desirable if you use other software build on
rlang
, as e.g. the tidyverse
.
Stata, Julia and Python Implementations
The fast wild cluster bootstrap algorithms are further implemented in the following software packages:
Stata:boottest
Julia:WildBootTests.jl
Python:wildboottest
References
Roodman et al., 2019, "Fast and wild: Bootstrap inference in STATA using boottest", The STATA Journal. (https://ideas.repec.org/p/qed/wpaper/1406.html)
MacKinnon, James G., Morten Ørregaard Nielsen, and Matthew D. Webb. Fast and reliable jackknife and bootstrap methods for cluster-robust inference. No. 1485. 2022.
Cameron, A. Colin, Jonah B. Gelbach, and Douglas L. Miller. "Bootstrap-based improvements for inference with clustered errors." The Review of Economics and Statistics 90.3 (2008): 414-427.
Cameron, A.Colin & Douglas L. Miller. "A practitioner's guide to cluster-robust inference" Journal of Human Resources (2015) doi:10.3368/jhr.50.2.317
Davidson & MacKinnon. "Wild Bootstrap Tests for IV regression" Journal of Economics and Business Statistics (2010) doi:10.1198/jbes.2009.07221
MacKinnon, James G., and Matthew D. Webb. "The wild bootstrap for few (treated) clusters." The Econometrics Journal 21.2 (2018): 114-135.
MacKinnon, James G., and Matthew D. Webb. "Cluster-robust inference: A guide to empirical practice" Journal of Econometrics (2022) doi:10.1016/j.jeconom.2022.04.001
MacKinnon, James. "Wild cluster bootstrap confidence intervals." L'Actualite economique 91.1-2 (2015): 11-33.
Webb, Matthew D. Reworking wild bootstrap based inference for clustered errors. No. 1315. Queen's Economics Department Working Paper, 2013.
Examples
if (FALSE) {
requireNamespace("fwildclusterboot")
data(voters)
lm_fit <- lm(proposition_vote ~ treatment + ideology1 + log_income +
Q1_immigration,
data = voters
)
boot1 <- boottest(lm_fit,
B = 9999,
param = "treatment",
clustid = "group_id1"
)
boot2 <- boottest(lm_fit,
B = 9999,
param = "treatment",
clustid = c("group_id1", "group_id2")
)
boot3 <- boottest(lm_fit,
B = 9999,
param = "treatment",
clustid = c("group_id1", "group_id2"),
sign_level = 0.2,
r = 2
)
# test treatment + ideology1 = 2
boot4 <- boottest(lm_fit,
B = 9999,
clustid = c("group_id1", "group_id2"),
param = c("treatment", "ideology1"),
R = c(1, 1),
r = 2
)
summary(boot1)
print(boot1)
plot(boot1)
nobs(boot1)
pval(boot1)
confint(boot1)
generics::tidy(boot1)
# run different bootstrap types following MacKinnon, Nielsen & Webb (2022):
# default: the fnw algorithm
boot_fnw11 <- boottest(lm_fit,
B = 9999,
param = "treatment",
clustid = "group_id1",
bootstrap_type = "fnw11"
)
# WCR 31
boot_WCR31 <- boottest(lm_fit,
B = 9999,
param = "treatment",
clustid = "group_id1",
bootstrap_type = "31"
)
# WCU33
boot_WCR31 <- boottest(lm_fit,
B = 9999,
param = "treatment",
clustid = "group_id1",
bootstrap_type = "33",
impose_null = FALSE
)
}