Methodology
2026-02-1114 min read

Difference-in-Differences in Stata: A Complete Guide

A complete guide to difference-in-differences estimation in Stata — from basic 2x2 DiD to staggered adoption with Callaway-Sant'Anna. Includes code, diagnostics, and AI-assisted workflow.

Sytra Team
Research Engineering Team, Sytra AI

Difference-in-differences is the most commonly used causal inference method in applied economics. It’s also the one that has changed the most in the last five years. If you learned DiD from a pre-2020 textbook, most of what you learned about the estimator is either incomplete or wrong.

This guide covers everything: the classic 2×2 case, multi-period panel estimation with fixed effects, the staggered adoption problem, and the modern solutions from Callaway and Sant’Anna (2021), Sun and Abraham (2021), and others. With full Stata code at every step.

The Classic 2×2 DiD

The simplest DiD setup: one treatment group, one control group, one pre-period, one post-period. The treatment goes into effect at a single point in time for all treated units.

* Setup: Generate treatment and period indicators
gen treat = (group == "treatment")
gen post = (year > 2015)
 
* Estimate the DiD
reg y treat##post, vce(robust)

The interaction coefficient treat#post is your DiD estimate — the differential change in y for the treated group relative to the control group. Stata’s factor variable notation (##) automatically creates the main effects and the interaction, so you don’t need to manually gen the interaction term.

How to read the output:

  • 1.treat — the baseline difference between treatment and control groups (pre-treatment)
  • 1.post — the time trend for the control group
  • 1.treat#1.postthe treatment effect

Panel DiD with Fixed Effects

The real world is messy. You usually have many units observed over many periods and a treatment that turns on at a specific time for some units. For this, you add unit and time fixed effects.

* Panel DiD with two-way fixed effects
reghdfe y treatment, absorb(unit year) vce(cluster unit)

Why reghdfe and not xtreg?

  • Speed: reghdfe uses an iterative demeaning algorithm that’s orders of magnitude faster for large datasets with high-dimensional FE.
  • Multiple FE: xtreg, fe can absorb one dimension. For two-way FE (unit + time), you’d need to include time dummies manually with i.year. reghdfe handles arbitrary dimensions with absorb().
  • Singleton dropping: reghdfe automatically drops singleton observations (groups with only one observation in a FE category). These don’t contribute to identification and can inflate your F-statistic. xtreg keeps them.

Common mistake: forgetting to install reghdfe. It’s a community-contributed package:

ssc install reghdfe, replace
ssc install ftools, replace

Stop fighting with syntax.

Sytra is an AI research assistant built specifically for statistical computing. No more copy-pasting code into ChatGPT.

Get Early Access

Staggered Adoption: The Modern DiD Crisis

Here’s where things get interesting — and where most pre-2020 textbooks are wrong.

If treatment is adopted at different times by different units (e.g., states adopt a policy in different years), the standard TWFE (two-way fixed effects) estimator is biased. This was demonstrated by Goodman-Bacon (2021) in a paper that shook the field.

The problem: TWFE uses already-treated units as controls for later-treated units. If the treatment effect changes over time (dynamic treatment effects), the comparison is contaminated. The estimated coefficient is a weighted average of all possible 2×2 DiD comparisons — and some of those weights are negative. You can get a positive TWFE coefficient even when the treatment effect is negative for every single unit.

The Bacon Decomposition

The first diagnostic step is to decompose your TWFE estimate into its component comparisons:

* Bacon decomposition — see where the TWFE weights come from
ssc install bacondecomp, replace
bacondecomp y treatment, ddetail

This shows you how much weight each comparison type gets: treated vs. never-treated, early vs. late treated, late vs. early treated. If the “late vs. early” comparisons have large negative weights, your TWFE estimate is unreliable.

Callaway and Sant’Anna (2021): The csdid Command

The solution is to use an estimator that only makes “clean” comparisons — newly-treated units vs. not-yet-treated (or never-treated) units. Callaway and Sant’Anna’s estimator does exactly this.

* Install the package
ssc install csdid, replace
ssc install drdid, replace
 
* Estimate group-time ATTs
csdid y, ivar(unit) time(year) gvar(first_treat) method(dripw)
 
* Aggregate into an event study
csdid_plot, group(first_treat) xtitle("Periods relative to treatment")
 
* Or get the overall ATT
csdid_stats simple

Key arguments:

  • ivar(unit) — the panel identifier
  • time(year) — the time variable
  • gvar(first_treat) — the variable indicating when each unit was first treated (0 for never-treated)
  • method(dripw) — doubly-robust inverse probability weighting (the recommended default)

Sun and Abraham (2021): eventstudyinteract

An alternative approach that’s particularly popular for event study designs:

* Sun and Abraham interaction-weighted estimator
ssc install eventstudyinteract, replace
 
* Generate relative-time indicators
gen rel_time = year - first_treat
tab rel_time, gen(rel_)
 
* Run the estimator
eventstudyinteract y rel_*, cohort(first_treat) control_cohort(never_treat) absorb(unit year) vce(cluster unit)

Diagnostics and Validation

No DiD paper is complete without these checks:

The identifying assumption of DiD is that treated and control groups would have followed the same trend in the absence of treatment. You can’t test this directly (it’s a counterfactual), but you can check pre-treatment trends:

* Event study with pre-treatment coefficients
reghdfe y ib(-1).rel_time, absorb(unit year) vce(cluster unit)
 
* Plot the event study
coefplot, keep(*.rel_time) vertical yline(0) xline(4.5, lpattern(dash))

The pre-treatment coefficients (periods before the treatment) should be close to zero and statistically insignificant. If they’re not, your parallel trends assumption may be violated, and the DiD estimate is unreliable.

But as Roth (2022) warns: pre-testing has low power. The absence of a statistically significant pre-trend does not guarantee that parallel trends holds. Be cautious.

How Sytra Automates This

Tell Sytra: “Run a difference-in-differences with staggered adoption. Unit and year fixed effects. Cluster at the unit level. Show me the event study.”

Sytra selects the appropriate estimator (csdid for staggered adoption), generates the code, installs required packages if needed, runs the regression, produces the event study plot, and checks the pre-treatment coefficients for parallel trends. If the coefficients are significant, it flags the issue in the output.

The entire pipeline — from English prompt to validated event study — happens in one step. No copy-paste. No debugging. No reading documentation to figure out the difference between csdid and didregress.

Further Reading

  • Callaway, B., & Sant’Anna, P. H. (2021). “Difference-in-Differences with Multiple Time Periods.” Journal of Econometrics, 225(2), 200-230.
  • Goodman-Bacon, A. (2021). “Difference-in-Differences with Variation in Treatment Timing.” Econometrica, 89(5), 2381-2423.
  • Sun, L., & Abraham, S. (2021). “Estimating Dynamic Treatment Effects in Event Studies with Heterogeneous Treatment Effects.” Journal of Econometrics, 225(2), 175-199.
  • Roth, J. (2022). “Pretest with Caution: Event-Study Estimates after Testing for Parallel Trends.” American Economic Review: Insights, 4(3), 305-322.
#Diff-in-Diff#Stata#Causal Inference#Economics

Enjoyed this article?