FastRerandomize Package Tutorial

2025-01-22

Introduction

This vignette demonstrates how to use the fastrerandomize package for generating and testing experimental designs in an agricultural study setting based on:

  • Burchardi, K.B., Gulesci, S., Lerva, B. and Sulaiman, M., 2019. Moral hazard: Experimental Evidence from Tenancy Contracts. The Quarterly Journal of Economics, 134(1), pp.281-347.

We will walk through:

  • Generating synthetic covariate data
  • Creating candidate randomizations
  • Exploring the S3 methods for printing, summarizing, and plotting randomization objects
  • Conducting a randomization-based inference test

Note: This tutorial assumes you have installed the fastrerandomize package, either from source or via GitHub:


# If you haven't installed or set up the package:
# devtools::install_github("cjerzak/fastrerandomize-software/fastrerandomize")

# (Done once) Optionally build the JAX backend if needed
# fastrerandomize::build_backend()

# note that this vignette can be found like so:
# vignette(package = "fastrerandomize")
          

1. Obtain Pre-treatment Covariate Data

For illustration, we use several pre-treatment covariates from the original experiment.


library(fastrerandomize)

# Obtain pre-treatment covariates 
data(QJEData, package = "fastrerandomize")
myCovariates <- c("children","married","hh_size","hh_sexrat")
QJEData <- QJEData[!is.na(rowSums(QJEData[,myCovariates])),]
X <- QJEData[,myCovariates]

# Analysis parameters
n_units   <- nrow(X)
n_treated <- round(nrow(X)/2)
          

2. Generate Randomizations

We now create our candidate randomizations using the function generate_randomizations() from fastrerandomize. Depending on the scale and complexity of your experiment, you may choose either exact enumeration (“exact”) or Monte Carlo (“monte_carlo”) randomization.

In this example, we’ll use the Monte Carlo approach. Note that randomization_accept_prob is set very low (0.0001) purely for demonstration.


RunMainAnalysis <- (!is.null(check_jax_availability()))
# if FALSE, consider calling build_backend()

if(RunMainAnalysis){
  CandRandomizations <- generate_randomizations(
    n_units = n_units,
    n_treated = n_treated,
    X = X,
    randomization_type = "monte_carlo", 
    max_draws = 100000L,
    batch_size = 1000L,
    randomization_accept_prob = 0.0001
  )
}
          
## Using monte carlo randomization
## Starting batch processing with 100 batches.
## Skipping to batch 100 of 100
## On batch 100 of 100
## MC Loop Time (s): 1.2409

4. S3 Methods: Print, Summary, and Plot

generate_randomizations() returns an S3 object of class fastrerandomize_randomizations. We can use its associated print, summary, and plot methods to inspect the candidate randomizations.


if(RunMainAnalysis){
  # 4a. Print the object
  print(CandRandomizations)

  # 4b. Summary
  summary(CandRandomizations)

  # 4c. Plot the balance distribution
  plot(CandRandomizations)
}
          
## Object of class 'fastrerandomize_randomizations'
## 
## Call:
##    generate_randomizations(n_units = n_units, n_treated = n_treated, 
##       X = X, randomization_accept_prob = 1e-04, max_draws = 100000L, 
##       batch_size = 1000L, randomization_type = "monte_carlo") 
## 
## Number of candidate randomizations: 10 
## Number of units (columns): 302 
## Balance statistics available for each randomization.
## Summary of 'fastrerandomize_randomizations' object:
## 
## Balance statistics (summary):
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## 0.003574 0.008231 0.008746 0.009835 0.013152 0.017181

3. Randomization Test

Next, we showcase a randomization-based inference procedure using fastrerandomize.

3a. Setup Simulated Outcomes

We simulate an outcome Y that depends on our covariates X, the chosen randomization, and some true treatment effect τ. We pick the first acceptable randomization from our set as our “observed” assignment Wobs.


if(RunMainAnalysis){
  CoefY <- rnorm(ncol(X))          # Coefficients for outcome model
  Wobs  <- CandRandomizations$randomizations[1, ]  # 1st acceptable randomization
  tau_true <- 1                    # true treatment effect

  # Generate Y = X * Coef + W*tau + noise
  Yobs <- as.numeric(as.matrix(X) %*% as.matrix(CoefY) +
                     Wobs * tau_true + rnorm(n_units, sd = 0.1))
}
            

3b. Run the Randomization Test

We pass our observed treatment assignment (obsW), observed outcomes (obsY), and the full matrix of candidate randomizations (candidate_randomizations) to randomization_test(). This test:

  • Computes the observed difference in means
  • Permutes the treatment assignment across our candidate randomizations
  • Estimates the p-value by comparing the observed statistic to its permutation distribution

if(RunMainAnalysis){
  randomization_test_results <- randomization_test(
    obsW = Wobs,
    obsY = Yobs,
    candidate_randomizations = CandRandomizations$randomizations,
    findFI = TRUE
  )

  # Inspect results
  print(randomization_test_results)
  summary(randomization_test_results)
  plot(randomization_test_results)
}
            
## Object of class 'fastrerandomize_test'
## 
## Call:
##    randomization_test(obsW = Wobs, obsY = Yobs, candidate_randomizations = CandRandomizations$randomizations, 
##       findFI = TRUE) 
## 
## P-value:  0.1 
## Observed effect (tau_obs):  0.9888765 
## Fiducial interval: [-547.2352, 498.731]
## Summary of 'fastrerandomize_test' object:
## 
## Call:
##    randomization_test(obsW = Wobs, obsY = Yobs, candidate_randomizations = CandRandomizations$randomizations, 
##       findFI = TRUE) 
## 
## P-value:
##    0.1 
## 
## Observed effect (tau_obs):
##    0.9888765 
## 
## Fiducial Interval:
##   [-547.2352, 498.731]

The findFI = TRUE flag attempts to locate a fiducial interval (FI) for the treatment effect.

Conclusion

This tutorial demonstrated a basic workflow with fastrerandomize:

  • Generating synthetic or real covariate data
  • Producing a set of acceptable treatment assignments via randomization
  • Visualizing and summarizing these assignments
  • Testing for treatment effects with a randomization-based approach

We encourage you to consult the package documentation for more advanced functionalities, including GPU-accelerated computations via JAX and flexible definitions of balance criteria.