Regression Discontinuity in Stata: From Theory to Code
A practical guide to sharp and fuzzy RD designs in Stata using rdrobust, with bandwidth selection, visualization, and validation checks.
Regression discontinuity is one of the most credible causal inference designs available. When treatment is assigned by a threshold rule — you get a scholarship if your GPA is above 3.5, you’re eligible for a program if your income is below $40,000 — the units just above and just below the cutoff are essentially randomly assigned. It’s a natural experiment baked into the institution.
The canonical reference is Cattaneo, Idrobo, and Titiunik (2020), whose rdrobust package has become the standard in Stata. This guide walks through the theory and implementation, from sharp designs to fuzzy designs to validation.
Sharp RD: The Basic Case
In a sharp RD design, treatment is a deterministic function of the running variable. Everyone above the cutoff is treated; everyone below is not. There’s no fuzziness.
Here, y is the outcome, x is the running variable, and c(0) is the cutoff (default is 0). rdrobust automatically selects the optimal bandwidth using the method of Calonico, Cattaneo, and Titiunik (2014), fits local polynomial regressions on either side of the cutoff, and reports bias-corrected confidence intervals.
The output gives you three key numbers:
- Conventional estimate: The standard local polynomial estimate.
- Bias-corrected estimate: Corrects for the bias in the conventional estimate. This is what you should report.
- Robust confidence interval: Valid coverage despite the bias correction. Report this alongside the bias-corrected point estimate.
Bandwidth Selection
The bandwidth determines how many observations near the cutoff are used in estimation. Too narrow and you lose power. Too wide and you introduce bias from observations far from the cutoff.
The all option reports bandwidths from multiple methods (MSE-optimal, CER-optimal, etc.). For the main specification, use the MSE-optimal (mserd). For robustness, show that results are stable across different bandwidth choices:
Stop fighting with syntax.
Sytra is an AI research assistant built specifically for statistical computing. No more copy-pasting code into ChatGPT.
Get Early AccessVisualization: The RD Plot
Every RD paper needs the RD plot. It’s both a visual argument and a diagnostic. Readers should see a clear discontinuity at the cutoff.
Fuzzy RD
In practice, thresholds are often imperfectly enforced. A student just below the GPA cutoff might still get the scholarship through an appeal. A household just above the income threshold might still enroll in the program. When treatment probability jumps at the cutoff but doesn’t go from 0 to 1, you have a fuzzy RD.
Fuzzy RD is conceptually equivalent to instrumental variables: the cutoff instruments for treatment.
The fuzzy() option specifies the treatment variable. rdrobust estimates the local Wald ratio: the jump in the outcome divided by the jump in the treatment probability. The result is a local average treatment effect (LATE) for compliers at the cutoff.
Validation Checks
An RD design is only credible if units can’t manipulate the running variable to sort themselves around the cutoff. Several diagnostic tests are standard:
1. Density test (McCrary test)
If there’s a discontinuity in the density of the running variable at the cutoff, it suggests sorting. People near the threshold are gaming the system. A significant test statistic here is a serious threat to identification.
2. Covariate balance at the cutoff
Pre-determined variables (things that can’t be affected by treatment) should show no discontinuity at the cutoff. If they do, it suggests that the groups just above and just below the cutoff differ in ways that aren’t captured by the running variable.
3. Placebo cutoffs
If you find treatment effects at cutoffs where no treatment actually occurs, your design has a problem. Significant effects at placebo cutoffs suggest a smooth relationship between the running variable and the outcome that’s being mistaken for a discontinuity.
How Sytra Handles RD
Tell Sytra: “Estimate a regression discontinuity design. The running variable is test_score, the cutoff is 80, and the treatment is scholarship.”
Sytra detects the design (sharp vs. fuzzy based on whether there’s perfect compliance), installs rdrobust if needed, estimates the effect, runs the density test, checks covariate balance, and produces the RD plot — all in one loop. If the McCrary test flags potential manipulation, Sytra tells you before reporting results.
Further Reading
- Cattaneo, M. D., Idrobo, N., & Titiunik, R. (2020). A Practical Introduction to Regression Discontinuity Designs. Cambridge University Press.
- Calonico, S., Cattaneo, M. D., & Titiunik, R. (2014). “Robust Nonparametric Confidence Intervals for Regression-Discontinuity Designs.” Econometrica, 82(6), 2295-2326.
- McCrary, J. (2008). “Manipulation of the Running Variable in the Regression Discontinuity Design.” Journal of Econometrics, 142(2), 698-714.