Fast Wild Cluster Bootstrapping in Python via wildboottest 🐍
Aleksandr Michuda and I have just released version 0.1 of wildboottest
to PyPi.
wildboottest
is a Python package to conduct fast wild cluster bootstrap inference in Python and implements the wild cluster bootstrap following algorithms sketched out in MacKinnon (2021) and MacKinnon, Nielsen & Webb, 2022 (MNW).
Most importantly, it supports all eight variants of the wild cluster bootstrap discussed in MNW as well as CRV3 inference via the cluster jackknife. Some of these new variants appear to perform even better than the “standard” (WCR11) wild cluster bootstrap in situations where the textbook CRV1 cluster robust variance estimator is known to struggle. And thanks to the excellent numba
library, it is actually quite fast!
In terms of functionality, wildboottest
still lacks behind its sister packages (Stata’s boottest, R’s fwildcusterboot and Julia’s WildBootTests.jl). wildboottest
supports
- The wild cluster bootstrap for OLS (Cameron, Gelbach & Miller 2008, Roodman et al (2019)).
- Multiple new versions of the wild cluster bootstrap as described in MacKinnon, Nielsen & Webb (2022), including the WCR13, WCR31, WCR33, WCU13, WCU31 and WCU33.
- CRV1 and CRV3 robust variance estimation, including the CRV3-Jackknife as described in MacKinnon, Nielsen & Webb (2022).
Features that are currently not (yet) supported:
- The (non-clustered) wild bootstrap for OLS (Wu, 1986).
- The subcluster bootstrap (MacKinnon and Webb 2018).
- Confidence intervals formed by inverting the test and iteratively searching for bounds.
- Multiway clustering.
- Regression Weights (Weighted Least Squares / WLS).
You can install the package from PyPi by running
pip install wildboottest
Here’s a small example on how to use wildboottest
:
from wildboottest.wildboottest import wildboottest
import statsmodels.api as sm
import numpy as np
import pandas as pd
# create data
np.random.seed(12312312)
N = 1000
k = 10
G = 25
X = np.random.normal(0, 1, N * k).reshape((N,k))
X = pd.DataFrame(X)
X.rename(columns = {0:"X1"}, inplace = True)
beta = np.random.normal(0,1,k)
beta[0] = 0.005
u = np.random.normal(0,1,N)
Y = 1 + X @ beta + u
cluster = np.random.choice(list(range(0,G)), N)
# estimation
model = sm.OLS(Y, X)
wildboottest(model, param = "X1", cluster = cluster, B = 9999, bootstrap_type = "11")
# param statistic p-value
# 0 X1 [-1.0530803154504016] 0.308831
wildboottest(model, param = "X1", cluster = cluster, B = 9999, bootstrap_type = "31")
# param statistic p-value
# 0 X1 [-1.0530803154504016] 0.307631
wildboottest(model, param = "X1", cluster = cluster, B = 9999, bootstrap_type = "33")
# param statistic p-value
# 0 X1 [-1.0394791020434824] 0.294286
This was the first time I have worked on a Python package, and it has been quite a nice experience - after having used quite a bit of Python at work, I have now actually started to enjoy Python and object oriented programming! The wild cluster bootstrap variants really fit nicely into an OOP framework, and I am really impressed by the numba
jit compiler. Submitting to PyPi was a surprisingly smooth experience as well😄.
What are the next steps for wildboottest
? We need to close a few performance bottlenecks, in particular for the WCRx3 bootstrap types, and then I’d like to close the functionality gaps discussed above. I’d also like to allow users to call WildBootTests.jl
, which is just blazing fast. And optimally, we’ll make the package callable from statsmodels
and linearmodels
.
And no, despite having a lot of fun working on wildboottest
and some recent troubles of getting fwildclusterboot
back to CRAN, I don’t plan to stop developing in R 😄
References
- MacKinnon - Fast cluster bootstrap methods for linear regression models, 2021, Econometrics & Statistics
- MacKinnon, Nielsen & Webb - Fast and Reliable Jackknife and Bootstrap Methods for Cluster-Robust Inference, 2022, Queens University Working Paper No 1485