Is Sytra free for researchers?

Yes. Sytra is free forever for individual researchers. You bring your own API key from OpenAI or Anthropic and pay only for the AI inference costs (typically $0.01-0.10 per query).

Does Sytra upload my data to the cloud?

No. Sytra runs entirely on your local machine. Your .dta files, .csv files, and code never leave your computer. Only the natural language prompt is sent to the AI provider.

What versions of Stata does Sytra support?

Sytra supports Stata 17 and later, including MP, SE, and BE editions.

Methodology

2026-02-1114 min read

Difference-in-Differences in Stata: A Complete Guide

A complete guide to difference-in-differences estimation in Stata — from basic 2x2 DiD to staggered adoption with Callaway-Sant'Anna. Includes code, diagnostics, and AI-assisted workflow.

Sytra Team

Research Engineering Team, Sytra AI

Difference-in-differences is the most commonly used causal inference method in applied economics. It’s also the one that has changed the most in the last five years. If you learned DiD from a pre-2020 textbook, most of what you learned about the estimator is either incomplete or wrong.

This guide covers everything: the classic 2×2 case, multi-period panel estimation with fixed effects, the staggered adoption problem, and the modern solutions from Callaway and Sant’Anna (2021), Sun and Abraham (2021), and others. With full Stata code at every step.

The Classic 2×2 DiD

The simplest DiD setup: one treatment group, one control group, one pre-period, one post-period. The treatment goes into effect at a single point in time for all treated units.

* Setup: Generate treatment and period indicators

gen treat = (group == "treatment")

gen post = (year > 2015)

* Estimate the DiD

reg y treat##post, vce(robust)

The interaction coefficient treat#post is your DiD estimate — the differential change in y for the treated group relative to the control group. Stata’s factor variable notation (##) automatically creates the main effects and the interaction, so you don’t need to manually gen the interaction term.

How to read the output:

1.treat — the baseline difference between treatment and control groups (pre-treatment)
1.post — the time trend for the control group
1.treat#1.post — the treatment effect

Panel DiD with Fixed Effects

The real world is messy. You usually have many units observed over many periods and a treatment that turns on at a specific time for some units. For this, you add unit and time fixed effects.

* Panel DiD with two-way fixed effects

reghdfe y treatment, absorb(unit year) vce(cluster unit)

Why reghdfe and not xtreg?

Speed: reghdfe uses an iterative demeaning algorithm that’s orders of magnitude faster for large datasets with high-dimensional FE.
Multiple FE: xtreg, fe can absorb one dimension. For two-way FE (unit + time), you’d need to include time dummies manually with i.year. reghdfe handles arbitrary dimensions with absorb().
Singleton dropping: reghdfe automatically drops singleton observations (groups with only one observation in a FE category). These don’t contribute to identification and can inflate your F-statistic. xtreg keeps them.

Common mistake: forgetting to install reghdfe. It’s a community-contributed package:

ssc install reghdfe, replace

ssc install ftools, replace

Stop fighting with syntax.

Sytra is an AI research assistant built specifically for statistical computing. No more copy-pasting code into ChatGPT.

Get Early Access

Staggered Adoption: The Modern DiD Crisis

Here’s where things get interesting — and where most pre-2020 textbooks are wrong.

If treatment is adopted at different times by different units (e.g., states adopt a policy in different years), the standard TWFE (two-way fixed effects) estimator is biased. This was demonstrated by Goodman-Bacon (2021) in a paper that shook the field.

The problem: TWFE uses already-treated units as controls for later-treated units. If the treatment effect changes over time (dynamic treatment effects), the comparison is contaminated. The estimated coefficient is a weighted average of all possible 2×2 DiD comparisons — and some of those weights are negative. You can get a positive TWFE coefficient even when the treatment effect is negative for every single unit.

The Bacon Decomposition

The first diagnostic step is to decompose your TWFE estimate into its component comparisons:

* Bacon decomposition — see where the TWFE weights come from

ssc install bacondecomp, replace

bacondecomp y treatment, ddetail

This shows you how much weight each comparison type gets: treated vs. never-treated, early vs. late treated, late vs. early treated. If the “late vs. early” comparisons have large negative weights, your TWFE estimate is unreliable.

Callaway and Sant’Anna (2021): The csdid Command

The solution is to use an estimator that only makes “clean” comparisons — newly-treated units vs. not-yet-treated (or never-treated) units. Callaway and Sant’Anna’s estimator does exactly this.

* Install the package

ssc install csdid, replace

ssc install drdid, replace

* Estimate group-time ATTs

csdid y, ivar(unit) time(year) gvar(first_treat) method(dripw)

* Aggregate into an event study

csdid_plot, group(first_treat) xtitle("Periods relative to treatment")

* Or get the overall ATT

csdid_stats simple

Key arguments:

ivar(unit) — the panel identifier
time(year) — the time variable
gvar(first_treat) — the variable indicating when each unit was first treated (0 for never-treated)
method(dripw) — doubly-robust inverse probability weighting (the recommended default)

Sun and Abraham (2021): eventstudyinteract

An alternative approach that’s particularly popular for event study designs:

* Sun and Abraham interaction-weighted estimator

ssc install eventstudyinteract, replace

* Generate relative-time indicators

gen rel_time = year - first_treat

tab rel_time, gen(rel_)

* Run the estimator

eventstudyinteract y rel_*, cohort(first_treat) control_cohort(never_treat) absorb(unit year) vce(cluster unit)

Diagnostics and Validation

No DiD paper is complete without these checks:

Parallel trends testing

The identifying assumption of DiD is that treated and control groups would have followed the same trend in the absence of treatment. You can’t test this directly (it’s a counterfactual), but you can check pre-treatment trends:

* Event study with pre-treatment coefficients

reghdfe y ib(-1).rel_time, absorb(unit year) vce(cluster unit)

* Plot the event study

coefplot, keep(*.rel_time) vertical yline(0) xline(4.5, lpattern(dash))

The pre-treatment coefficients (periods before the treatment) should be close to zero and statistically insignificant. If they’re not, your parallel trends assumption may be violated, and the DiD estimate is unreliable.

But as Roth (2022) warns: pre-testing has low power. The absence of a statistically significant pre-trend does not guarantee that parallel trends holds. Be cautious.

How Sytra Automates This

Tell Sytra: “Run a difference-in-differences with staggered adoption. Unit and year fixed effects. Cluster at the unit level. Show me the event study.”

Sytra selects the appropriate estimator (csdid for staggered adoption), generates the code, installs required packages if needed, runs the regression, produces the event study plot, and checks the pre-treatment coefficients for parallel trends. If the coefficients are significant, it flags the issue in the output.

The entire pipeline — from English prompt to validated event study — happens in one step. No copy-paste. No debugging. No reading documentation to figure out the difference between csdid and didregress.

Difference-in-Differences in Stata: A Complete Guide

The Classic 2×2 DiD

Panel DiD with Fixed Effects

Stop fighting with syntax.

Staggered Adoption: The Modern DiD Crisis

The Bacon Decomposition

Callaway and Sant’Anna (2021): The csdid Command

Sun and Abraham (2021): eventstudyinteract

Diagnostics and Validation

Parallel trends testing

How Sytra Automates This

Further Reading

Enjoyed this article?

Related Guides

Regression Discontinuity in Stata: From Theory to Code

Instrumental Variables in Stata: When and How

Why ChatGPT Fails at Stata: The Imperative-Declarative Divide

Event Studies in Stata: Finance and Economics Applications

Panel Data in R: fixest vs. plm vs. lfe