Is Sytra free for researchers?

Yes. Sytra is free forever for individual researchers. You bring your own API key from OpenAI or Anthropic and pay only for the AI inference costs (typically $0.01-0.10 per query).

Does Sytra upload my data to the cloud?

No. Sytra runs entirely on your local machine. Your .dta files, .csv files, and code never leave your computer. Only the natural language prompt is sent to the AI provider.

What versions of Stata does Sytra support?

Sytra supports Stata 17 and later, including MP, SE, and BE editions.

Methodology

2026-03-0612 min read

Regression Discontinuity in Stata: From Theory to Code

A practical guide to sharp and fuzzy RD designs in Stata using rdrobust, with bandwidth selection, visualization, and validation checks.

Sytra Team

Research Engineering Team, Sytra AI

Regression discontinuity is one of the most credible causal inference designs available. When treatment is assigned by a threshold rule — you get a scholarship if your GPA is above 3.5, you’re eligible for a program if your income is below $40,000 — the units just above and just below the cutoff are essentially randomly assigned. It’s a natural experiment baked into the institution.

The canonical reference is Cattaneo, Idrobo, and Titiunik (2020), whose rdrobust package has become the standard in Stata. This guide walks through the theory and implementation, from sharp designs to fuzzy designs to validation.

Sharp RD: The Basic Case

In a sharp RD design, treatment is a deterministic function of the running variable. Everyone above the cutoff is treated; everyone below is not. There’s no fuzziness.

* Install rdrobust if needed

ssc install rdrobust, replace

ssc install rdlocrand, replace

ssc install rddensity, replace

* Sharp RD — basic estimation

rdrobust y x, c(0)

Here, y is the outcome, x is the running variable, and c(0) is the cutoff (default is 0). rdrobust automatically selects the optimal bandwidth using the method of Calonico, Cattaneo, and Titiunik (2014), fits local polynomial regressions on either side of the cutoff, and reports bias-corrected confidence intervals.

The output gives you three key numbers:

Conventional estimate: The standard local polynomial estimate.
Bias-corrected estimate: Corrects for the bias in the conventional estimate. This is what you should report.
Robust confidence interval: Valid coverage despite the bias correction. Report this alongside the bias-corrected point estimate.

Bandwidth Selection

The bandwidth determines how many observations near the cutoff are used in estimation. Too narrow and you lose power. Too wide and you introduce bias from observations far from the cutoff.

* Optimal bandwidth selection

rdbwselect y x, c(0) all

The all option reports bandwidths from multiple methods (MSE-optimal, CER-optimal, etc.). For the main specification, use the MSE-optimal (mserd). For robustness, show that results are stable across different bandwidth choices:

* Sensitivity: vary bandwidth by 50%, 75%, 125%, 150%

forvalues m = 50(25)150 {

local bw = e(h_mserd) * `m' / 100

rdrobust y x, c(0) h(`bw')

}

Stop fighting with syntax.

Sytra is an AI research assistant built specifically for statistical computing. No more copy-pasting code into ChatGPT.

Get Early Access

Visualization: The RD Plot

Every RD paper needs the RD plot. It’s both a visual argument and a diagnostic. Readers should see a clear discontinuity at the cutoff.

* Standard RD plot with confidence intervals

rdplot y x, c(0) ci(95) shade

* Custom bins and aesthetics

rdplot y x, c(0) nbins(20 20) binselect(es) ///

graph_options(title("Treatment Effect at the Cutoff") ///

xtitle("Running Variable") ytitle("Outcome"))

Fuzzy RD

In practice, thresholds are often imperfectly enforced. A student just below the GPA cutoff might still get the scholarship through an appeal. A household just above the income threshold might still enroll in the program. When treatment probability jumps at the cutoff but doesn’t go from 0 to 1, you have a fuzzy RD.

Fuzzy RD is conceptually equivalent to instrumental variables: the cutoff instruments for treatment.

* Fuzzy RD — treatment is endogenous, cutoff instruments

rdrobust y x, c(0) fuzzy(treatment)

The fuzzy() option specifies the treatment variable. rdrobust estimates the local Wald ratio: the jump in the outcome divided by the jump in the treatment probability. The result is a local average treatment effect (LATE) for compliers at the cutoff.

Validation Checks

An RD design is only credible if units can’t manipulate the running variable to sort themselves around the cutoff. Several diagnostic tests are standard:

1. Density test (McCrary test)

* Test for manipulation of the running variable

rddensity x, c(0) plot

If there’s a discontinuity in the density of the running variable at the cutoff, it suggests sorting. People near the threshold are gaming the system. A significant test statistic here is a serious threat to identification.

2. Covariate balance at the cutoff

* Check that pre-determined covariates are smooth at the cutoff

foreach var in age gender income_baseline {

rdrobust `var' x, c(0)

}

Pre-determined variables (things that can’t be affected by treatment) should show no discontinuity at the cutoff. If they do, it suggests that the groups just above and just below the cutoff differ in ways that aren’t captured by the running variable.

3. Placebo cutoffs

* Test at false cutoffs — should find no effect

foreach c in -2 -1 1 2 {

rdrobust y x, c(`c')

}

If you find treatment effects at cutoffs where no treatment actually occurs, your design has a problem. Significant effects at placebo cutoffs suggest a smooth relationship between the running variable and the outcome that’s being mistaken for a discontinuity.

How Sytra Handles RD

Tell Sytra: “Estimate a regression discontinuity design. The running variable is test_score, the cutoff is 80, and the treatment is scholarship.”

Sytra detects the design (sharp vs. fuzzy based on whether there’s perfect compliance), installs rdrobust if needed, estimates the effect, runs the density test, checks covariate balance, and produces the RD plot — all in one loop. If the McCrary test flags potential manipulation, Sytra tells you before reporting results.

Regression Discontinuity in Stata: From Theory to Code

Sharp RD: The Basic Case

Bandwidth Selection

Stop fighting with syntax.

Visualization: The RD Plot

Fuzzy RD

Validation Checks

1. Density test (McCrary test)

2. Covariate balance at the cutoff

3. Placebo cutoffs

How Sytra Handles RD

Further Reading

Enjoyed this article?

Related Guides

Difference-in-Differences in Stata: A Complete Guide

Panel Data in Stata: xtreg vs. reghdfe vs. areg

Instrumental Variables in Stata: When and How

Logistic Regression in Stata: Marginal Effects That Actually Make Sense

Panel Data in R: fixest vs. plm vs. lfe