Workflow
2026-03-2711 min read

Building a Replication Package in Stata: The Complete Checklist

AER, QJE, and REStud now require replication packages. Here's a complete checklist for building one in Stata — directory structure, master .do file, data documentation, and automated testing.

Sytra Team
Research Engineering Team, Sytra AI

AER, QJE, REStud, Econometrica — the top journals now require replication packages. If your code can’t reproduce your results from scratch, your paper doesn’t get published. This isn’t a suggestion; it’s a gate.

Here’s a complete checklist for building a replication package in Stata that passes the data editor’s review on the first try.

Directory Structure

replication/
├── README.md
├── master.do
├── code/
│ ├── 01_clean.do
│ ├── 02_construct.do
│ ├── 03_analysis.do
│ └── 04_tables.do
├── data/
│ ├── raw/
│ └── derived/
├── output/
│ ├── tables/
│ └── figures/
└── logs/

The Master .do File

* master.do — Run this file to reproduce all results
version 18
clear all
set more off
set maxvar 10000
 
* Set paths (EDIT THIS LINE ONLY)
global root "/path/to/replication"
global code "$root/code"
global data "$root/data"
global raw "$data/raw"
global derived "$data/derived"
global output "$root/output"
global tables "$output/tables"
global figures "$output/figures"
global logs "$root/logs"
 
* Install required packages
foreach pkg in reghdfe ftools estout csdid drdid {
cap which `pkg'
if _rc ssc install `pkg', replace
}
 
* Start log
cap log close
log using "$logs/master_`c(current_date)'.log", replace
 
* Run scripts sequentially
do "$code/01_clean.do"
do "$code/02_construct.do"
do "$code/03_analysis.do"
do "$code/04_tables.do"
 
log close

Stop fighting with syntax.

Sytra is an AI research assistant built specifically for statistical computing. No more copy-pasting code into ChatGPT.

Get Early Access

The README

The README is the single most important file. Data editors read it first. Include:

  • Overview: One paragraph describing the paper and the replication package.
  • Data Availability Statement: Where the data comes from. If it’s restricted, explain how to request access.
  • Computational Requirements: Stata version, required packages, estimated runtime, hardware requirements.
  • Instructions: “Edit the root path in master.do, then run master.do.”
  • Output Map: Which script produces which table/figure in the paper.

Checklist

1
README with data availability statement
2
Single master .do file that runs everything
3
Only ONE path to edit (the root directory)
4
version command at the top of master.do
5
Package installation block
6
Log file generated automatically
7
Raw data included (or access instructions)
8
All intermediate data generated by code
9
Output map: script → table/figure number
10
No absolute paths except the root
11
No manual steps between scripts
12
Tested on a clean Stata installation

Common Reasons for Rejection

  • Hardcoded paths: "C:\Users\Jane\Desktop\..." appears 47 times across 12 .do files. Solution: use globals set in master.do.
  • Missing packages: The code uses reghdfe but doesn’t install it. The replicator gets “command not found.”
  • Interactive steps: “Run 03_analysis.do, then manually copy the coefficient from the log and paste it into 04_tables.do.” No.
  • Unlabeled output: Table 3 in the paper doesn’t correspond to any named file in the output folder.
  • Version sensitivity: Code works in Stata 17 but not Stata 18 because a default changed.

How Sytra Automates This

When you build your analysis through Sytra, it automatically generates the replication package structure: numbered scripts, a master .do file with the correct globals, package dependencies, and an execution log that serves as the README’s computational documentation. The output map is built as you create each table and figure.

#Reproducibility#Stata#Workflow#Economics

Enjoyed this article?