Fast Wild Cluster Bootstrapping in Python via wildboottest 🐍

Aleksandr Michuda and I have just released version 0.1 of wildboottest to PyPi.

wildboottest is a Python package to conduct fast wild cluster bootstrap inference in Python and implements the wild cluster bootstrap following algorithms sketched out in MacKinnon (2021) and MacKinnon, Nielsen & Webb, 2022 (MNW).

Most importantly, it supports all eight variants of the wild cluster bootstrap discussed in MNW as well as CRV3 inference via the cluster jackknife. Some of these new variants appear to perform even better than the “standard” (WCR11) wild cluster bootstrap in situations where the textbook CRV1 cluster robust variance estimator is known to struggle. And thanks to the excellent numba library, it is actually quite fast!

Rejection Frequencies of different Wild Cluster Bootstrap Variants (Figure from MNW (2022, full citation below). The main takeaway is that the new bootstrap variants appear to perform really, really well!

Figure 1: Rejection Frequencies of different Wild Cluster Bootstrap Variants (Figure from MNW (2022, full citation below). The main takeaway is that the new bootstrap variants appear to perform really, really well!

In terms of functionality, wildboottest still lacks behind its sister packages (Stata’s boottest, R’s fwildcusterboot and Julia’s WildBootTests.jl). wildboottest supports

Features that are currently not (yet) supported:

  • The (non-clustered) wild bootstrap for OLS (Wu, 1986).
  • The subcluster bootstrap (MacKinnon and Webb 2018).
  • Confidence intervals formed by inverting the test and iteratively searching for bounds.
  • Multiway clustering.
  • Regression Weights (Weighted Least Squares / WLS).

You can install the package from PyPi by running

pip install wildboottest

Here’s a small example on how to use wildboottest:

from wildboottest.wildboottest import wildboottest
import statsmodels.api as sm
import numpy as np
import pandas as pd

# create data
np.random.seed(12312312)
N = 1000
k = 10
G = 25
X = np.random.normal(0, 1, N * k).reshape((N,k))
X = pd.DataFrame(X)
X.rename(columns = {0:"X1"}, inplace = True)
beta = np.random.normal(0,1,k)
beta[0] = 0.005
u = np.random.normal(0,1,N)
Y = 1 + X @ beta + u
cluster = np.random.choice(list(range(0,G)), N)

# estimation
model = sm.OLS(Y, X)

wildboottest(model, param = "X1", cluster = cluster, B = 9999, bootstrap_type = "11")
#   param              statistic   p-value
# 0    X1  [-1.0530803154504016]  0.308831

wildboottest(model, param = "X1", cluster = cluster, B = 9999, bootstrap_type = "31")
#   param              statistic   p-value
# 0    X1  [-1.0530803154504016]  0.307631

wildboottest(model, param = "X1", cluster = cluster, B = 9999, bootstrap_type = "33")
#   param              statistic   p-value
# 0    X1  [-1.0394791020434824]  0.294286

This was the first time I have worked on a Python package, and it has been quite a nice experience - after having used quite a bit of Python at work, I have now actually started to enjoy Python and object oriented programming! The wild cluster bootstrap variants really fit nicely into an OOP framework, and I am really impressed by the numba jit compiler. Submitting to PyPi was a surprisingly smooth experience as well😄.

What are the next steps for wildboottest? We need to close a few performance bottlenecks, in particular for the WCRx3 bootstrap types, and then I’d like to close the functionality gaps discussed above. I’d also like to allow users to call WildBootTests.jl, which is just blazing fast. And optimally, we’ll make the package callable from statsmodels and linearmodels.

And no, despite having a lot of fun working on wildboottest and some recent troubles of getting fwildclusterboot back to CRAN, I don’t plan to stop developing in R 😄

References