FastRerandomize Package Tutorial
2026-01-24
Introduction
This vignette demonstrates how to use the fastrerandomize package for generating and testing experimental designs in an agricultural study setting based on:
- Burchardi, K.B., Gulesci, S., Lerva, B. and Sulaiman, M., 2019. Moral hazard: Experimental Evidence from Tenancy Contracts. The Quarterly Journal of Economics, 134(1), pp.281-347.
We will walk through:
- Generating synthetic covariate data
- Creating candidate randomizations
- Exploring the S3 methods for printing, summarizing, and plotting randomization objects
- Conducting a randomization-based inference test
Key Concepts
- Balance statistic M: Mahalanobis distance between treatment and control covariate mean vectors. Smaller M = better balance.
- Acceptance rule: keep an assignment if
M ≤ afor a chosen cutoff a. - Acceptance probability q: q = P(M ≤ a) under complete randomization. Smaller q = stricter balance.
- Accepted set: assignments that pass the rule; inference resamples from this set.
- Shrinkage factor: how much rerandomization reduces covariate imbalance variance; larger reduction yields more precision when covariates predict outcomes.
Stringency refers to how selective the acceptance rule is, typically expressed by the acceptance probability q (lower q means more stringent). The shrinkage factor is the scalar reduction in covariate imbalance variance when conditioning on acceptance. In the paper's diagnostics, acceptance at q = 1% yields a shrinkage factor of about 0.45 at d = 30 but about 0.88 at d = 1000, so stringency has diminishing returns in very high-dimensional settings. In the same illustrative setup (nT = nC = 500, R² = 0.5), that corresponds to about 15% RMSE reduction at d = 30 versus about 3% at d = 1000.
Note: This tutorial assumes you have installed the fastrerandomize package, either from source or via GitHub:
# If you haven't installed or set up the package:
# devtools::install_github("cjerzak/fastrerandomize-software", subdir = "fastrerandomize")
# (Done once) Optionally build the JAX backend if needed
# fastrerandomize::build_backend()
# note that this vignette can be found like so:
# vignette(package = "fastrerandomize")
1. Obtain Pre-treatment Covariate Data
Covariate prep checklist
- Use pre-treatment covariates only—never condition on post-treatment variables
- Standardize/scale when appropriate (especially if covariates have very different units)
- Avoid severe collinearity (Mahalanobis distance relies on the covariance matrix being invertible)
For illustration, we use several pre-treatment covariates from the original experiment.
library(fastrerandomize)
# Obtain pre-treatment covariates
data(QJEData, package = "fastrerandomize")
myCovariates <- c("children","married","hh_size","hh_sexrat")
QJEData <- QJEData[!is.na(rowSums(QJEData[,myCovariates])),]
X <- QJEData[,myCovariates]
# Analysis parameters
n_units <- nrow(X)
n_treated <- round(nrow(X)/2)
2. Generate Randomizations
We now create our candidate randomizations using the function
generate_randomizations() from fastrerandomize. Depending
on the scale and complexity of your experiment, you may choose either
exact enumeration ("exact") or Monte Carlo ("monte_carlo")
randomization.
Choosing stringency (q) in practice
- Start with a target acceptance probability (common:
q = 0.01for moderate,q = 0.001for high stringency). - Use
diagnose_rerandomization()to check implied balance improvement and computational implications. - Plan candidate draws for a sufficiently large accepted pool (aim for thousands of accepted assignments for stable inference).
- Remember: smaller q → better balance, but more draws needed and diminishing returns in high dimensions.
In this example, we'll use the Monte Carlo approach. Note that
randomization_accept_prob is set very low (0.0001) purely
for demonstration.
In high-dimensional planning, diagnostics can recommend extremely small q values. For example, in the paper's illustrative case (d = 1000, nT = nC = 500, R² = 0.4, σ = 1, |τ| = 0.2), the diagnostic inversion implies q* ≈ 0.000012 (about 1.2e-5), or roughly 81,226 draws per accepted assignment. Holding q fixed, more draws mainly increase the accepted pool (lowering the minimum attainable p-value) without changing expected balance.
RunMainAnalysis <- (!is.null(check_jax_availability()))
# if FALSE, consider calling build_backend()
if(RunMainAnalysis){
CandRandomizations <- generate_randomizations(
n_units = n_units,
n_treated = n_treated,
X = X,
randomization_type = "monte_carlo",
max_draws = 100000L,
batch_size = 1000L,
randomization_accept_prob = 0.0001
)
}
## Using monte carlo randomization
## Starting batch processing with 100 batches.
## On batch 100 of 100
## MC Loop Time (s): 1.2409
4. S3 Methods: Print, Summary, and Plot
generate_randomizations() returns an S3 object of class
fastrerandomize_randomizations. We can use its associated
print, summary, and plot methods to inspect the candidate
randomizations.
if(RunMainAnalysis){
# 4a. Print the object
print(CandRandomizations)
# 4b. Summary
summary(CandRandomizations)
# 4c. Plot the balance distribution
plot(CandRandomizations)
}
## Object of class 'fastrerandomize_randomizations'
##
## Call:
## generate_randomizations(n_units = n_units, n_treated = n_treated,
## X = X, randomization_accept_prob = 1e-04, max_draws = 100000L,
## batch_size = 1000L, randomization_type = "monte_carlo")
##
## Number of candidate randomizations: 10
## Number of units (columns): 302
## Balance statistics available for each randomization.
## Summary of 'fastrerandomize_randomizations' object:
##
## Balance statistics (summary):
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.003574 0.008231 0.008746 0.009835 0.013152 0.017181
3. Randomization Test
Next, we showcase a randomization-based inference procedure using fastrerandomize.
Why the randomization test conditions on acceptance
Under rerandomization, treatment was assigned from the subset of assignments that passed the balance rule. To keep p-values calibrated, the randomization test must resample only from assignments that would also pass the same rule. This is what "design-respecting inference" means.
Randomization tests are conditioned on the accepted assignments: the observed statistic is compared to its distribution over the accepted pool. As a result, p-values are lower-bounded by 1 / M_accept, and optional fiducial intervals are available by inverting the test.
3a. Setup Simulated Outcomes
We simulate an outcome Y that depends on our covariates
X, the chosen randomization, and some true treatment
effect τ. We pick the first acceptable randomization from our set as
our “observed” assignment Wobs.
if(RunMainAnalysis){
CoefY <- rnorm(ncol(X)) # Coefficients for outcome model
Wobs <- CandRandomizations$randomizations[1, ] # 1st acceptable randomization
tau_true <- 1 # true treatment effect
# Generate Y = X * Coef + W*tau + noise
Yobs <- as.numeric(as.matrix(X) %*% as.matrix(CoefY) +
Wobs * tau_true + rnorm(n_units, sd = 0.1))
}
3b. Run the Randomization Test
We pass our observed treatment assignment (obsW),
observed outcomes (obsY), and the full matrix of
candidate randomizations
(candidate_randomizations) to
randomization_test(). This test:
- Computes the observed difference in means
- Permutes the treatment assignment across our candidate randomizations
- Estimates the p-value by comparing the observed statistic to its permutation distribution
if(RunMainAnalysis){
randomization_test_results <- randomization_test(
obsW = Wobs,
obsY = Yobs,
candidate_randomizations = CandRandomizations$randomizations,
findFI = TRUE
)
# Inspect results
print(randomization_test_results)
summary(randomization_test_results)
plot(randomization_test_results)
}
## Object of class 'fastrerandomize_test'
##
## Call:
## randomization_test(obsW = Wobs, obsY = Yobs, candidate_randomizations = CandRandomizations$randomizations,
## findFI = TRUE)
##
## P-value: 0.1
## Observed effect (tau_obs): 0.9888765
## Fiducial interval: [-547.2352, 498.731]
## Summary of 'fastrerandomize_test' object:
##
## Call:
## randomization_test(obsW = Wobs, obsY = Yobs, candidate_randomizations = CandRandomizations$randomizations,
## findFI = TRUE)
##
## P-value:
## 0.1
##
## Observed effect (tau_obs):
## 0.9888765
##
## Fiducial Interval:
## [-547.2352, 498.731]
The findFI = TRUE flag attempts to locate a fiducial
interval (FI) for the treatment effect.
5. What to Report
Recommended reporting checklist
- Covariates used in the balance check and balance statistic (e.g., Mahalanobis M)
- Acceptance rule (M ≤ a) and either a or the implied acceptance probability q
- Number of candidate draws and the resulting accepted pool size
- That inference was conducted conditional on the acceptance rule (design-respecting inference)
Transparent reporting helps readers understand how the design affects inference and enables replication.
Conclusion
This tutorial demonstrated a basic workflow with fastrerandomize:
- Generating synthetic or real covariate data
- Producing a set of acceptable treatment assignments via randomization
- Visualizing and summarizing these assignments
- Testing for treatment effects with a randomization-based approach
We encourage you to consult the package documentation for more advanced functionalities, including GPU-accelerated computations via JAX and flexible definitions of balance criteria.
Citation
Connor T. Jerzak, Rebecca Goldstein, Aniket Kamat, and Fucheng Warren Zhu. FastRerandomize: An R Package for Fast Rerandomization Using Accelerated Computing. SoftwareX, 2026.
@article{jerzak2025fastrerandomize,
title={FastRerandomize: An R Package for Fast Rerandomization Using Accelerated Computing},
author={Jerzak, Connor T. and Rebecca Goldstein and Aniket Kamat and Fucheng Warren Zhu},
journal={SoftwareX},
year={2026}
}