1000x faster Wild Cluster Bootstrap Inference in R with fwildclusterboot đ
When you suspect that the error terms in your regression model are correlated within clusters, and the number of clusters is small, trouble might be running at you. In such a situation, common cluster robust standard errors tend to be downward biased - they are too eager to reject the null hypothesis. Since Cameron, Gelbach & Miller first suggested that the wild cluster bootstrap might be preferable to sandwich standard errors when the number of clusters is small, it has become common practice among empirical economists to check their cluster robust inferences against the wild cluster bootstrap.
At some point, I found myself in a âsmall number of clustersâ situation. I was trying to estimate a treatment effect for a sample of a few thousand observations, which were grouped into around 20 clusters. So I started to search for R packages that implement the wild cluster bootstrap, and found two implementations on CRAN: sandwich
and clusterSEs
. I opted for the sandwich
package (because itâs actually a really great package) and fit my regression model via lm()
. Then I started to bootstrap with sandwichâs vcovBS()
function.
So the bootstrap ran ⌠and I waited. Eventually, I left my office to get some coffee with a colleague, returned to my desk ⌠and the bootstrap still ran, and I waited even longer.
But while the bootstrap was running, I scrolled the web and stumbled over the âFast & Wildâ paper by Roodman et al (2019). The claimed performance in the paper seemed to good to be true: bootstrap inference with several thousands of iterations, in a fraction of a second? The paper presents a Stata implementation of the fast algorithm, boottest, and that was a good enough reason for me to start up a Stata session to try it out.
And indeed, boottest
is mind-blowingly fast: the bootstrap finished almost instantaneously. I was hooked: how was it possible that boottest
was so damn fast?
Luckily, the âFast & Wildâ paper explains the algorithm powering boottest
in great detail. Out of curiosity, I started to implement it in R, and the fwildclusterboot
package is the result of this effort. Now, was it worth all the work? How much faster is the âfast algorithmâ implemented in fwildclusterboot
?
To compare fwildclusterboot's
performance to sandwich
, I simulate a data set with \(N = 10.000\) observations and \(N_G = 42\) distinct clusters (42 is the magic number of clusters for which the economics profession has decided that large N asymptotics fail, see Angrist & Pischkeâs âMostly Harmlessâ, Chapter 8.2.3) and fit a regression model via lm()
.
library(fwildclusterboot)
library(sandwich)
library(lmtest)
library(bench)
# simulate data
seed <- 236723478
N <- 10000
data <- fwildclusterboot:::create_data(N = N,
N_G1 = 42, icc1 = 0.1,
N_G2 = 20, icc2 = 0.8,
numb_fe1 = 10,
numb_fe2 = 10,
seed = seed,
weights = 1:N)
lm_fit <- lm(proposition_vote ~ treatment + as.factor(Q1_immigration) + as.factor(Q2_defense), data)
In the first experiment, the bootstrap will run for \(B = 9999\) iterations. For the estimation via vcovBS
, we will use 4 cores.
B <- 9999
# wild cluster bootstrap via sandwich::vcovBS
bench1 <-
bench::mark(
boot_slow = sandwich::vcovBS(lm_fit,
R = B,
cluster = ~ group_id1,
cores = 4),
iterations = 1
)
bench1
## # A tibble: 1 x 6
## expression min median `itr/sec` mem_alloc `gc/sec`
## <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
## 1 boot_slow 36.9s 36.9s 0.0271 12.7MB 0
vcovBS()
finishes in around 37 seconds - thatâs not too bad, isnât it?
# wild cluster bootstrap via fwildclusterboot::boottest()
bench1f <-
bench::mark(boot_fast =
fwildclusterboot::boottest(lm_fit,
clustid = c("group_id1"),
B = B,
param = "treatment",
seed = 3,
nthreads = 1),
iterations = 25)
bench1f
## # A tibble: 1 x 6
## expression min median `itr/sec` mem_alloc `gc/sec`
## <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
## 1 boot_fast 73.3ms 81.7ms 9.48 98.7MB 26.2
While sandwich::vcovBS()
takes almost 36.9 seconds, fwildclusterboot::boottest()
runs in around one fifth of a second đ. Yes, really: one fifth of a second! Thatâs a speed gain of a factor of 451! If you donât have 4 cores available, performance differences get even more extreme (e.g. if you only have one core, you have to multiply 37 with a number slightly smaller than 4).
How do vcovBS()'s
and boottest()'s
results compare?
summary(boot_fast)
## boottest.lm(object = lm_fit, clustid = c("group_id1"), param = "treatment",
## B = B, seed = 3, nthreads = 1)
##
## Hypothesis: 1*treatment = 0
## Observations: 10000
## Bootstr. Iter: 9999
## Bootstr. Type: rademacher
## Clustering: 1-way
## Confidence Sets: 95%
## Number of Clusters: 42
##
## term estimate statistic p.value conf.low conf.high
## 1 1*treatment = 0 0.002 0.516 0.605 -0.007 0.012
lmtest::coeftest(x = lm_fit, vcov = boot_slow)[2,]
## Estimate Std. Error t value Pr(>|t|)
## 0.002387792 0.004571759 0.522291836 0.601478745
lmtest::coefci(x = lm_fit, vcov = boot_slow)[2,]
## 2.5 % 97.5 %
## -0.006573777 0.011349362
Between the two implementations, the bootstrapped t-statistics, p-values and confidence intervals are almost identical. They are not exactly identical for two reasons: first due to sampling uncertainty in the bootstrap, and second because vcovBS
does not apply any small sample adjustments (at least I could not find anything related to small-sample adjustments in both documentation and source code).
The speed gains of fwildclusterboot
scale well in the number of bootstrap iterations. For \(B = 99.999\) iterations, it finishes in around one second. For vcovBS
, you can expect a linear increase in run-time in the number of bootstrap iterations: a ten-fold increase in bootstrap iterations will increase run-time to around 360 seconds.
B <- 99999
bench2f <-
bench::mark(
boot_fast =
fwildclusterboot::boottest(lm_fit,
clustid = c("group_id1"),
B = B,
param = "treatment",
seed = 3,
nthreads = 1),
iterations = 10
)
bench2f
## # A tibble: 1 x 6
## expression min median `itr/sec` mem_alloc `gc/sec`
## <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
## 1 boot_fast 476ms 571ms 1.72 727MB 11.4
What happens if we increase the sample size to \(N = 100.000\)?
N <- 100000
data <- fwildclusterboot:::create_data(N = N,
N_G1 = 50, icc1 = 0.1,
N_G2 = 20, icc2 = 0.8,
numb_fe1 = 10,
numb_fe2 = 10,
seed = seed,
weights = 1:N)
lm_fit <- lm(proposition_vote ~ treatment + as.factor(Q1_immigration) + as.factor(Q2_defense), data)
B <- 9999
# wild cluster bootstrap via sandwich::vcovBS
bench3 <- bench::mark(
boot_slow = sandwich::vcovBS(lm_fit,
R = B,
cluster = ~ group_id1,
cores = 4),
iterations = 1)
bench3
## # A tibble: 1 x 6
## expression min median `itr/sec` mem_alloc `gc/sec`
## <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
## 1 boot_slow 8.32m 8.32m 0.00200 31.2MB 0
More than 8 minutes pass before vcovBS()
finishes. How does boottest()
do in comparison?
# wild cluster bootstrap via fwildclusterboot::boottest()
bench3f <-
bench::mark(
boot_fast =
fwildclusterboot::boottest(lm_fit,
clustid = c("group_id1"),
B = B,
param = "treatment",
seed = 3,
nthreads = 1),
iterations = 5)
bench3f
## # A tibble: 1 x 6
## expression min median `itr/sec` mem_alloc `gc/sec`
## <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
## 1 boot_fast 310ms 333ms 2.68 308MB 10.7
With B = 9999
iterations, boottest()
runs for around 0.33 seconds, while vcovBS()
only finishes after 499.36 seconds. fwildclusterboot::boottest()
is 1499 times faster than sandwich::vcovBS
!
As a conclusion: if you face a âsmall number of clustersâ problem and want to reduce your daily â consumption, you should consider using fwildclusterboot, Stataâs boottest, or WildBootTests.jl, which is a novel Julia implementation of the âfast algorithmâ. If all of this seems like black magic to you and you want to learn more about the âfast algorithmâ, I cannot recommend the âFast & Wildâ paper highly enough.
Literature
- âFast & Wildâ, Roodman et al. (2019), The Stata Journal
- âBootstrap-Based Improvements for Inference with Clustered Errorsâ, Cameron, Gelbach & Miller (2008), The Review of Economics and Statistics
- âCluster-robust inference: A guide to empirical practiceâ (2020), MacKinnon, Oerregaard Nielsen & Webb, Working Paper
- âMostly Harmless Econometricsâ, Angrist & Pischke (2009), Princeton University Press