Workflow
2026-03-139 min read

The Reproducibility Crisis in Stata: What .do Files Aren't Solving

Do-files are necessary but not sufficient for reproducibility. Here's what's still breaking — and how AI-assisted, logged execution changes the equation.

Sytra Team
Research Engineering Team, Sytra AI

Every economist knows the mantra: “Put everything in a .do file.” It’s the first rule of reproducible research. Write your code in a script, run it from top to bottom, and anyone can replicate your results.

Except they can’t. And the data editor at the AER has the receipts.

What .do Files Don’t Solve

Lars Vilhuber, the AER Data Editor, published a report showing that roughly 30% of submitted replication packages fail to reproduce. The code runs, but it doesn’t produce the published numbers. Why?

  • Path dependencies: The .do file uses cd "C:\Users\John\Dropbox\Paper\". Unless you’re John, it doesn’t run.
  • Undocumented packages: The analysis requires reghdfe, estout, and ftools — but the .do file doesn’t install them. It assumes they’re there.
  • Stata version differences: Code written in Stata 17 may behave differently in Stata 18 if default behaviors changed. Without version 17 at the top, there’s no guarantee.
  • Data versioning: The .do file refers to data_v3_final_FINAL.dta. Which version? When was it last modified?
  • Interactive modifications: Somewhere between the code and the published table, someone manually changed a column header in Excel. The .do file didn’t capture that.

The Do-File Is Necessary but Not Sufficient

A .do file is a necessary condition for reproducibility. It captures the commands. But it doesn’t capture:

  • The computational environment (Stata version, installed packages)
  • The data state at each step
  • The reasoning behind analytical choices
  • Whether the code actually ran successfully from start to finish
  • The output that was produced (logs are often not saved)

Stop fighting with syntax.

Sytra is an AI research assistant built specifically for statistical computing. No more copy-pasting code into ChatGPT.

Get Early Access

What Would Complete Reproducibility Look Like?

1. Environment lockfile

Like Python’s requirements.txt or R’s renv.lock, Stata needs a file that specifies: Stata version, installed ado-files with versions, and system dependencies. Currently no native solution exists.

2. Execution logging

Every command should be logged with its timestamp, output, and any errors or warnings. Not just the code — the results. A log file from log using captures this, but researchers forget to start it, or it gets overwritten.

3. Data provenance

Each dataset should have a checksum (hash) recorded at each stage: raw data, cleaned data, analysis data. If the data changes, the hash changes, and you know something is different.

4. Intent documentation

Why did you choose reghdfe instead of xtreg? Why cluster at the state level? The .do file shows what you did. It doesn’t show why.

How AI Changes the Equation

An AI-assisted workflow can embed reproducibility by design. When Sytra generates code, it also generates:

  • A timestamped execution log of every command and its output
  • The natural language prompt that generated each code block (intent documentation)
  • Package dependency tracking (which ado-files were used)
  • Automatic version pinning (version 18 headers)
  • Output validation (did the regression converge? are the standard errors sensible?)

The result is a replication package that generates itself as a byproduct of doing the analysis. You don’t need to remember to start the log or document your choices — the system does it for you.

Practical Steps You Can Take Today

* At the top of every master .do file:
version 18
clear all
set more off
cap log close
log using "$logdir/analysis_`c(current_date)'.log", replace
 
* Document required packages
foreach pkg in reghdfe ftools estout {
cap which `pkg'
if _rc ssc install `pkg'
}

This is the minimum. But the field needs more — and the tools to deliver it are coming.

#Reproducibility#Stata#Workflow#Economics

Enjoyed this article?