Methodology
2026-03-0612 min read

Regression Discontinuity in Stata: From Theory to Code

A practical guide to sharp and fuzzy RD designs in Stata using rdrobust, with bandwidth selection, visualization, and validation checks.

Sytra Team
Research Engineering Team, Sytra AI

Regression discontinuity is one of the most credible causal inference designs available. When treatment is assigned by a threshold rule — you get a scholarship if your GPA is above 3.5, you’re eligible for a program if your income is below $40,000 — the units just above and just below the cutoff are essentially randomly assigned. It’s a natural experiment baked into the institution.

The canonical reference is Cattaneo, Idrobo, and Titiunik (2020), whose rdrobust package has become the standard in Stata. This guide walks through the theory and implementation, from sharp designs to fuzzy designs to validation.

Sharp RD: The Basic Case

In a sharp RD design, treatment is a deterministic function of the running variable. Everyone above the cutoff is treated; everyone below is not. There’s no fuzziness.

* Install rdrobust if needed
ssc install rdrobust, replace
ssc install rdlocrand, replace
ssc install rddensity, replace
 
* Sharp RD — basic estimation
rdrobust y x, c(0)

Here, y is the outcome, x is the running variable, and c(0) is the cutoff (default is 0). rdrobust automatically selects the optimal bandwidth using the method of Calonico, Cattaneo, and Titiunik (2014), fits local polynomial regressions on either side of the cutoff, and reports bias-corrected confidence intervals.

The output gives you three key numbers:

  • Conventional estimate: The standard local polynomial estimate.
  • Bias-corrected estimate: Corrects for the bias in the conventional estimate. This is what you should report.
  • Robust confidence interval: Valid coverage despite the bias correction. Report this alongside the bias-corrected point estimate.

Bandwidth Selection

The bandwidth determines how many observations near the cutoff are used in estimation. Too narrow and you lose power. Too wide and you introduce bias from observations far from the cutoff.

* Optimal bandwidth selection
rdbwselect y x, c(0) all

The all option reports bandwidths from multiple methods (MSE-optimal, CER-optimal, etc.). For the main specification, use the MSE-optimal (mserd). For robustness, show that results are stable across different bandwidth choices:

* Sensitivity: vary bandwidth by 50%, 75%, 125%, 150%
forvalues m = 50(25)150 {
local bw = e(h_mserd) * `m' / 100
rdrobust y x, c(0) h(`bw')
}

Stop fighting with syntax.

Sytra is an AI research assistant built specifically for statistical computing. No more copy-pasting code into ChatGPT.

Get Early Access

Visualization: The RD Plot

Every RD paper needs the RD plot. It’s both a visual argument and a diagnostic. Readers should see a clear discontinuity at the cutoff.

* Standard RD plot with confidence intervals
rdplot y x, c(0) ci(95) shade
 
* Custom bins and aesthetics
rdplot y x, c(0) nbins(20 20) binselect(es) ///
graph_options(title("Treatment Effect at the Cutoff") ///
xtitle("Running Variable") ytitle("Outcome"))

Fuzzy RD

In practice, thresholds are often imperfectly enforced. A student just below the GPA cutoff might still get the scholarship through an appeal. A household just above the income threshold might still enroll in the program. When treatment probability jumps at the cutoff but doesn’t go from 0 to 1, you have a fuzzy RD.

Fuzzy RD is conceptually equivalent to instrumental variables: the cutoff instruments for treatment.

* Fuzzy RD — treatment is endogenous, cutoff instruments
rdrobust y x, c(0) fuzzy(treatment)

The fuzzy() option specifies the treatment variable. rdrobust estimates the local Wald ratio: the jump in the outcome divided by the jump in the treatment probability. The result is a local average treatment effect (LATE) for compliers at the cutoff.

Validation Checks

An RD design is only credible if units can’t manipulate the running variable to sort themselves around the cutoff. Several diagnostic tests are standard:

1. Density test (McCrary test)

* Test for manipulation of the running variable
rddensity x, c(0) plot

If there’s a discontinuity in the density of the running variable at the cutoff, it suggests sorting. People near the threshold are gaming the system. A significant test statistic here is a serious threat to identification.

2. Covariate balance at the cutoff

* Check that pre-determined covariates are smooth at the cutoff
foreach var in age gender income_baseline {
rdrobust `var' x, c(0)
}

Pre-determined variables (things that can’t be affected by treatment) should show no discontinuity at the cutoff. If they do, it suggests that the groups just above and just below the cutoff differ in ways that aren’t captured by the running variable.

3. Placebo cutoffs

* Test at false cutoffs — should find no effect
foreach c in -2 -1 1 2 {
rdrobust y x, c(`c')
}

If you find treatment effects at cutoffs where no treatment actually occurs, your design has a problem. Significant effects at placebo cutoffs suggest a smooth relationship between the running variable and the outcome that’s being mistaken for a discontinuity.

How Sytra Handles RD

Tell Sytra: “Estimate a regression discontinuity design. The running variable is test_score, the cutoff is 80, and the treatment is scholarship.”

Sytra detects the design (sharp vs. fuzzy based on whether there’s perfect compliance), installs rdrobust if needed, estimates the effect, runs the density test, checks covariate balance, and produces the RD plot — all in one loop. If the McCrary test flags potential manipulation, Sytra tells you before reporting results.

Further Reading

  • Cattaneo, M. D., Idrobo, N., & Titiunik, R. (2020). A Practical Introduction to Regression Discontinuity Designs. Cambridge University Press.
  • Calonico, S., Cattaneo, M. D., & Titiunik, R. (2014). “Robust Nonparametric Confidence Intervals for Regression-Discontinuity Designs.” Econometrica, 82(6), 2295-2326.
  • McCrary, J. (2008). “Manipulation of the Running Variable in the Regression Discontinuity Design.” Journal of Econometrics, 142(2), 698-714.
#RDD#Stata#Causal Inference#Economics

Enjoyed this article?