Simple tool that aggregates the value of CATT coefficients in
staggered difference-in-difference setups with inference based on
a wild cluster bootstrap (see details) - similar to fixest::aggregate()
Source: R/boot_aggregate.R
boot_aggregate.Rd
This is a function helping to replicate the estimator from Sun and Abraham (2021, Journal of Econometrics). You first need to perform an estimation with cohort and relative periods dummies (typically using the function i), this leads to estimators of the cohort average treatment effect on the treated (CATT). Then you can use this function to retrieve the average treatment effect on each relative period,or for any other way you wish to aggregate the CATT.
Usage
boot_aggregate(
x,
agg,
full = FALSE,
use_weights = TRUE,
clustid = NULL,
B,
bootstrap_type = "fnw11",
bootcluster = "max",
fe = NULL,
sign_level = 0.05,
beta0 = NULL,
type = "rademacher",
impose_null = TRUE,
p_val_type = "two-tailed",
nthreads = getBoottest_nthreads(),
tol = 1e-06,
maxiter = 10,
ssc = boot_ssc(adj = TRUE, fixef.K = "none", cluster.adj = TRUE, cluster.df =
"conventional"),
engine = getBoottest_engine(),
floattype = "Float64",
maxmatsize = FALSE,
bootstrapc = FALSE,
getauxweights = FALSE,
sampling = "dqrng",
...
)
Arguments
- x
An object of type fixest estimated using
sunab()
- agg
A character scalar describing the variable names to be aggregated, it is pattern-based. All variables that match the pattern will be aggregated. It must be of the form
"(root)"
, the parentheses must be there and the resulting variable name will be"root"
. You can add another root with parentheses:"(root1)regex(root2)"
, in which case the resulting name is"root1::root2"
. To name the resulting variable differently you can pass a named vector:c("name" = "pattern")
orc("name" = "pattern(root2)")
. It's a bit intricate sorry, please see the examples.- full
Logical scalar, defaults to
FALSE
. IfTRUE
, then all coefficients are returned, not only the aggregated coefficients.- use_weights
Logical, default is
TRUE
. If the estimation was weighted, whether the aggregation should take into account the weights. Basically if the weights reflected frequency it should beTRUE
.- clustid
A character vector or rhs formula containing the names of the cluster variables. If NULL, a heteroskedasticity-robust (HC1) wild bootstrap is run.
- B
Integer. The number of bootstrap iterations. When the number of clusters is low, increasing B adds little additional runtime.
- bootstrap_type
Determines which wild cluster bootstrap type should be run. Options are "fnw11", which runs a "11" type wild cluster bootstrap via the algorithm outlined in "fast and wild" (Roodman et al (2019)).
- bootcluster
A character vector or rhs formula of length 1. Specifies the bootstrap clustering variable or variables. If more than one variable is specified, then bootstrapping is clustered by the intersections of clustering implied by the listed variables. To mimic the behavior of stata's boottest command, the default is to cluster by the intersection of all the variables specified via the
clustid
argument, even though that is not necessarily recommended (see the paper by Roodman et al cited below, section 4.2). Other options include "min", where bootstrapping is clustered by the cluster variable with the fewest clusters. Further, the subcluster bootstrap (MacKinnon & Webb, 2018) is supported - see thevignette("fwildclusterboot", package = "fwildclusterboot")
for details.- fe
A character vector or rhs formula of length one which contains the name of the fixed effect to be projected out in the bootstrap. Note: if regression weights are used, fe needs to be NULL.
- sign_level
A numeric between 0 and 1 which sets the significance level of the inference procedure. E.g. sign_level = 0.05 returns 0.95% confidence intervals. By default, sign_level = 0.05.
- beta0
Deprecated function argument. Replaced by function argument 'r'.
- type
character or function. The character string specifies the type of boostrap to use: One of "rademacher", "mammen", "norm" and "webb". Alternatively, type can be a function(n) for drawing wild bootstrap factors. "rademacher" by default. For the Rademacher distribution, if the number of replications B exceeds the number of possible draw ombinations, 2^(#number of clusters), then
boottest()
will use each possible combination once (enumeration).- impose_null
Logical. Controls if the null hypothesis is imposed on the bootstrap dgp or not. Null imposed
(WCR)
by default. If FALSE, the null is not imposed(WCU)
- p_val_type
Character vector of length 1. Type of p-value. By default "two-tailed". Other options include "equal-tailed", ">" and "<".
- nthreads
The number of threads. Can be: a) an integer lower than, or equal to, the maximum number of threads; b) 0: meaning all available threads will be used; c) a number strictly between 0 and 1 which represents the fraction of all threads to use. The default is to use 1 core.
- tol
Numeric vector of length 1. The desired accuracy (convergence tolerance) used in the root finding procedure to find the confidence interval. 1e-6 by default.
- maxiter
Integer. Maximum number of iterations used in the root finding procedure to find the confidence interval. 10 by default.
- ssc
An object of class
boot_ssc.type
obtained with the functionboot_ssc()
. Represents how the small sample adjustments are computed. The defaults areadj = TRUE, fixef.K = "none", cluster.adj = "TRUE", cluster.df = "conventional"
. You can find more details in the help file forboot_ssc()
. The function is purposefully designed to mimic fixest'sfixest::ssc()
function.- engine
Character scalar. Either "R", "R-lean" or "WildBootTests.jl". Controls if
boottest()
should run via its native R implementation orWildBootTests.jl
. "R" is the default and implements the cluster bootstrap as in Roodman (2019). "WildBootTests.jl" executes the wild cluster bootstrap via the WildBootTests.jl package. For it to run, Julia and WildBootTests.jl need to be installed. The "R-lean" algorithm is a memory friendly, but less performant rcpp-armadillo based implementation of the wild cluster bootstrap. Note that if no cluster is provided, boottest() always defaults to the "lean" algorithm. You can set the employed algorithm globally by using thesetBoottest_engine()
function.- floattype
Float64 by default. Other option: Float32. Should floating point numbers in Julia be represented as 32 or 64 bit? Only relevant when 'engine = "WildBootTests.jl"'
- maxmatsize
NULL by default = no limit. Else numeric scalar to set the maximum size of auxilliary weight matrix (v), in gigabytes. Only relevant when 'engine = "WildBootTests.jl"'
- bootstrapc
Logical scalar, FALSE by default. TRUE to request bootstrap-c instead of bootstrap-t. Only relevant when 'engine = "WildBootTests.jl"'
- getauxweights
Logical. Whether to save auxilliary weight matrix (v)
- sampling
'dqrng' or 'standard'. If 'dqrng', the 'dqrng' package is used for random number generation (when available). If 'standard', functions from the 'stats' package are used when available. This argument is mostly a convenience to control random number generation in a wrapper package around
fwildclusterboot
,wildrwolf
. I recommend to use the fast' option.- ...
misc function arguments
Details
Note that contrary to the SA article, here the cohort share in the sample is considered to be a perfect measure for the cohort share in the population.
Most of this function is written by Laurent Bergé and used in the fixest package published under GPL-3, https://cran.r-project.org/web/packages/fixest/index.html minor changes by Alexander Fischer
Examples
if (FALSE) {
if(requireNamespace("fixest")){
library(fixest)
data(base_stagg)
# The DiD estimation
res_sunab = feols(y ~ x1 + sunab(year_treated, year) | id + year, base_stagg)
res_sunab_3ref = feols(y ~ x1 + sunab(
year_treated, year, ref.p = c(.F + 0:2, -1)) |
id + year,
cluster = "id",
base_stagg,
ssc = ssc(adj = FALSE, cluster.adj = FALSE))
aggregate(res_sunab, agg = "ATT")
# test ATT equivalence
boot_att <-
boot_aggregate(
res_sunab,
B = 9999,
agg = "ATT",
clustid = "id"
)
head(boot_att)
#'boot_agg2 <-
boot_aggregate(
res_sunab,
B = 99999,
agg = TRUE,
ssc = boot_ssc(adj = FALSE, cluster.adj = FALSE)
)
}
}