FastRerandomize Package Tutorial

2026-01-24

Introduction

This vignette demonstrates how to use the fastrerandomize package for generating and testing experimental designs in an agricultural study setting based on:

  • Burchardi, K.B., Gulesci, S., Lerva, B. and Sulaiman, M., 2019. Moral hazard: Experimental Evidence from Tenancy Contracts. The Quarterly Journal of Economics, 134(1), pp.281-347.

We will walk through:

  • Generating synthetic covariate data
  • Creating candidate randomizations
  • Exploring the S3 methods for printing, summarizing, and plotting randomization objects
  • Conducting a randomization-based inference test

Key Concepts

  • Balance statistic M: Mahalanobis distance between treatment and control covariate mean vectors. Smaller M = better balance.
  • Acceptance rule: keep an assignment if M ≤ a for a chosen cutoff a.
  • Acceptance probability q: q = P(M ≤ a) under complete randomization. Smaller q = stricter balance.
  • Accepted set: assignments that pass the rule; inference resamples from this set.
  • Shrinkage factor: how much rerandomization reduces covariate imbalance variance; larger reduction yields more precision when covariates predict outcomes.

Stringency refers to how selective the acceptance rule is, typically expressed by the acceptance probability q (lower q means more stringent). The shrinkage factor is the scalar reduction in covariate imbalance variance when conditioning on acceptance. In the paper's diagnostics, acceptance at q = 1% yields a shrinkage factor of about 0.45 at d = 30 but about 0.88 at d = 1000, so stringency has diminishing returns in very high-dimensional settings. In the same illustrative setup (nT = nC = 500, R² = 0.5), that corresponds to about 15% RMSE reduction at d = 30 versus about 3% at d = 1000.

Note: This tutorial assumes you have installed the fastrerandomize package, either from source or via GitHub:


# If you haven't installed or set up the package:
# devtools::install_github("cjerzak/fastrerandomize-software", subdir = "fastrerandomize")

# (Done once) Optionally build the JAX backend if needed
# fastrerandomize::build_backend()

# note that this vignette can be found like so:
# vignette(package = "fastrerandomize")
          

1. Obtain Pre-treatment Covariate Data

Covariate prep checklist

  • Use pre-treatment covariates only—never condition on post-treatment variables
  • Standardize/scale when appropriate (especially if covariates have very different units)
  • Avoid severe collinearity (Mahalanobis distance relies on the covariance matrix being invertible)

For illustration, we use several pre-treatment covariates from the original experiment.


library(fastrerandomize)

# Obtain pre-treatment covariates 
data(QJEData, package = "fastrerandomize")
myCovariates <- c("children","married","hh_size","hh_sexrat")
QJEData <- QJEData[!is.na(rowSums(QJEData[,myCovariates])),]
X <- QJEData[,myCovariates]

# Analysis parameters
n_units   <- nrow(X)
n_treated <- round(nrow(X)/2)
          

2. Generate Randomizations

We now create our candidate randomizations using the function generate_randomizations() from fastrerandomize. Depending on the scale and complexity of your experiment, you may choose either exact enumeration ("exact") or Monte Carlo ("monte_carlo") randomization.

Choosing stringency (q) in practice

  • Start with a target acceptance probability (common: q = 0.01 for moderate, q = 0.001 for high stringency).
  • Use diagnose_rerandomization() to check implied balance improvement and computational implications.
  • Plan candidate draws for a sufficiently large accepted pool (aim for thousands of accepted assignments for stable inference).
  • Remember: smaller q → better balance, but more draws needed and diminishing returns in high dimensions.

In this example, we'll use the Monte Carlo approach. Note that randomization_accept_prob is set very low (0.0001) purely for demonstration.

In high-dimensional planning, diagnostics can recommend extremely small q values. For example, in the paper's illustrative case (d = 1000, nT = nC = 500, R² = 0.4, σ = 1, |τ| = 0.2), the diagnostic inversion implies q* ≈ 0.000012 (about 1.2e-5), or roughly 81,226 draws per accepted assignment. Holding q fixed, more draws mainly increase the accepted pool (lowering the minimum attainable p-value) without changing expected balance.


RunMainAnalysis <- (!is.null(check_jax_availability()))
# if FALSE, consider calling build_backend()

if(RunMainAnalysis){
  CandRandomizations <- generate_randomizations(
    n_units = n_units,
    n_treated = n_treated,
    X = X,
    randomization_type = "monte_carlo", 
    max_draws = 100000L,
    batch_size = 1000L,
    randomization_accept_prob = 0.0001
  )
}
          
## Using monte carlo randomization
## Starting batch processing with 100 batches.
## On batch 100 of 100
## MC Loop Time (s): 1.2409

4. S3 Methods: Print, Summary, and Plot

generate_randomizations() returns an S3 object of class fastrerandomize_randomizations. We can use its associated print, summary, and plot methods to inspect the candidate randomizations.


if(RunMainAnalysis){
  # 4a. Print the object
  print(CandRandomizations)

  # 4b. Summary
  summary(CandRandomizations)

  # 4c. Plot the balance distribution
  plot(CandRandomizations)
}
          
## Object of class 'fastrerandomize_randomizations'
## 
## Call:
##    generate_randomizations(n_units = n_units, n_treated = n_treated, 
##       X = X, randomization_accept_prob = 1e-04, max_draws = 100000L, 
##       batch_size = 1000L, randomization_type = "monte_carlo") 
## 
## Number of candidate randomizations: 10 
## Number of units (columns): 302 
## Balance statistics available for each randomization.
## Summary of 'fastrerandomize_randomizations' object:
## 
## Balance statistics (summary):
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## 0.003574 0.008231 0.008746 0.009835 0.013152 0.017181

3. Randomization Test

Next, we showcase a randomization-based inference procedure using fastrerandomize.

Why the randomization test conditions on acceptance

Under rerandomization, treatment was assigned from the subset of assignments that passed the balance rule. To keep p-values calibrated, the randomization test must resample only from assignments that would also pass the same rule. This is what "design-respecting inference" means.

Randomization tests are conditioned on the accepted assignments: the observed statistic is compared to its distribution over the accepted pool. As a result, p-values are lower-bounded by 1 / M_accept, and optional fiducial intervals are available by inverting the test.

3a. Setup Simulated Outcomes

We simulate an outcome Y that depends on our covariates X, the chosen randomization, and some true treatment effect τ. We pick the first acceptable randomization from our set as our “observed” assignment Wobs.


if(RunMainAnalysis){
  CoefY <- rnorm(ncol(X))          # Coefficients for outcome model
  Wobs  <- CandRandomizations$randomizations[1, ]  # 1st acceptable randomization
  tau_true <- 1                    # true treatment effect

  # Generate Y = X * Coef + W*tau + noise
  Yobs <- as.numeric(as.matrix(X) %*% as.matrix(CoefY) +
                     Wobs * tau_true + rnorm(n_units, sd = 0.1))
}
            

3b. Run the Randomization Test

We pass our observed treatment assignment (obsW), observed outcomes (obsY), and the full matrix of candidate randomizations (candidate_randomizations) to randomization_test(). This test:

  • Computes the observed difference in means
  • Permutes the treatment assignment across our candidate randomizations
  • Estimates the p-value by comparing the observed statistic to its permutation distribution

if(RunMainAnalysis){
  randomization_test_results <- randomization_test(
    obsW = Wobs,
    obsY = Yobs,
    candidate_randomizations = CandRandomizations$randomizations,
    findFI = TRUE
  )

  # Inspect results
  print(randomization_test_results)
  summary(randomization_test_results)
  plot(randomization_test_results)
}
            
## Object of class 'fastrerandomize_test'
## 
## Call:
##    randomization_test(obsW = Wobs, obsY = Yobs, candidate_randomizations = CandRandomizations$randomizations, 
##       findFI = TRUE) 
## 
## P-value:  0.1 
## Observed effect (tau_obs):  0.9888765 
## Fiducial interval: [-547.2352, 498.731]
## Summary of 'fastrerandomize_test' object:
## 
## Call:
##    randomization_test(obsW = Wobs, obsY = Yobs, candidate_randomizations = CandRandomizations$randomizations, 
##       findFI = TRUE) 
## 
## P-value:
##    0.1 
## 
## Observed effect (tau_obs):
##    0.9888765 
## 
## Fiducial Interval:
##   [-547.2352, 498.731]

The findFI = TRUE flag attempts to locate a fiducial interval (FI) for the treatment effect.

5. What to Report

Recommended reporting checklist

  • Covariates used in the balance check and balance statistic (e.g., Mahalanobis M)
  • Acceptance rule (M ≤ a) and either a or the implied acceptance probability q
  • Number of candidate draws and the resulting accepted pool size
  • That inference was conducted conditional on the acceptance rule (design-respecting inference)

Transparent reporting helps readers understand how the design affects inference and enables replication.

Conclusion

This tutorial demonstrated a basic workflow with fastrerandomize:

  • Generating synthetic or real covariate data
  • Producing a set of acceptable treatment assignments via randomization
  • Visualizing and summarizing these assignments
  • Testing for treatment effects with a randomization-based approach

We encourage you to consult the package documentation for more advanced functionalities, including GPU-accelerated computations via JAX and flexible definitions of balance criteria.

Citation

Connor T. Jerzak, Rebecca Goldstein, Aniket Kamat, and Fucheng Warren Zhu. FastRerandomize: An R Package for Fast Rerandomization Using Accelerated Computing. SoftwareX, 2026.

@article{jerzak2025fastrerandomize,
  title={FastRerandomize: An R Package for Fast Rerandomization Using Accelerated Computing},
  author={Jerzak, Connor T. and Rebecca Goldstein and Aniket Kamat and Fucheng Warren Zhu},
  journal={SoftwareX},
  year={2026}
}