How to Structure a Stata Project: Directory Layout, Naming, and Automation
A clean Stata project structure saves you hours. Here's the directory layout, naming conventions, and master .do file template used by top economics departments.
A clean project structure is the difference between “I can resume this analysis after three months” and “I need to start over.” Most researchers learn this the hard way — usually during revisions, when a referee asks for a robustness check and you can’t find the right .do file.
The Recommended Layout
Naming Conventions
- Numbered prefixes:
01_,02_etc. enforce execution order. - Descriptive names:
03_construct_treatment_vars.donotanalysis_v2_new.do. - No spaces in file names. Use underscores.
- Data files: Include the date or version —
panel_data_2024.dta— but only in the raw folder. Intermediate files are regenerated.
The config.do Pattern
Stop fighting with syntax.
Sytra is an AI research assistant built specifically for statistical computing. No more copy-pasting code into ChatGPT.
Get Early AccessEvery .do file starts with do "$root/config.do". Change a path in one place, and it propagates everywhere. Change the control variables, and every regression updates.
Rules for Raw Data
- Never modify raw data files. They are read-only inputs.
- Document provenance. Where did each file come from? When was it downloaded? What’s the URL?
- Include checksums. Run
datasignatureafter loading raw data to generate a hash you can verify later.
The Master .do File Pattern
Run do master.do from a fresh Stata session. If it completes without error, your results are reproducible. If it doesn’t, fix it until it does.
Version Control
Put your code in Git. Not your data (unless it’s small). Not your output (it’s regenerated). Just the code and the config.