What Happens When You Ask Copilot to Run a Regression in Stata
GitHub Copilot can autocomplete Python in its sleep. But ask it to run a Stata regression and things fall apart. Here's a side-by-side test.
GitHub Copilot is extraordinary at Python. You start typing a function signature and it fills in the entire body — docstring, edge cases, type hints, the lot. For JavaScript, TypeScript, Go, Rust — it’s transformative.
But Stata is not Python. And when you open a .do file in VS Code and start typing, Copilot goes from brilliant to bewildered.
We ran a systematic test. Here’s what happened.
The Setup
We created a fresh .do file in VS Code with Copilot enabled. We typed natural language comments describing common regression tasks, then accepted whatever Copilot suggested. We tested five scenarios that cover 80% of what applied economists and epidemiologists do daily.
Test 1: Basic OLS with Controls
Fair enough. Copilot handles basic regress fine. But this is the “Hello World” of Stata — nobody needs AI for this. Let’s see what happens when the task gets real.
Test 2: Panel Fixed Effects
Three problems here:
xtregrequires you to firstxtsetthe panel. Copilot didn’t generate it.xtregwithfecan only absorb one dimension. Thei.yeardummies work but are slow and memory-intensive.reghdfeis the standard for multi-way FE.- The
cluster(firm)syntax is wrong. It should bevce(cluster firm).
The correct version:
Test 3: Difference-in-Differences
This is the 2005 way of doing DiD. It works for the 2×2 case. But it doesn’t use Stata’s factor variable notation (treat##post), doesn’t include fixed effects, doesn’t cluster standard errors, and completely ignores the last decade of econometric advances on staggered treatment timing. A modern DiD estimation would use csdid or at minimum reghdfe with proper FE and clustering.
Copilot doesn’t know about Callaway and Sant’Anna (2021). It doesn’t know about Goodman-Bacon (2021). It generates code from 15 years ago because that’s what’s in the training data.
Stop fighting with syntax.
Sytra is an AI research assistant built specifically for statistical computing. No more copy-pasting code into ChatGPT.
Get Early AccessTest 4: Instrumental Variables
The syntax is right this time. But Copilot stops after the estimation command. No estat firststage to check the first-stage F-statistic. No estat overid for overidentification. No vce(robust). A researcher would never run ivregress and stop — the entire point of IV is that you need to validate the instrument. Copilot doesn’t know that because it doesn’t understand what IV estimation is. It just knows the syntax.
Test 5: Publication Table Export
outreg2 still works, but the field has largely moved to esttab (from the estout package) which is more flexible, better documented, and produces cleaner LaTeX. Copilot suggested the older tool — again, training data bias toward legacy code.
The Pattern
Across all five tests, a clear pattern emerges:
- Trivial tasks: Copilot handles them fine. But you don’t need AI for
regress y x. - Real tasks: Copilot generates syntactically plausible code that is either wrong (option syntax errors) or incomplete (missing post-estimation, wrong commands for the era).
- Context awareness: Zero. Copilot doesn’t know what data is in memory, what commands are installed, or what the research design requires.
The fundamental issue is that Copilot is an autocomplete engine. It predicts the next token based on surrounding code. It doesn’t understand statistical methodology, it doesn’t validate inference, and it can’t execute code to check if it works. For Stata — where “does it run?” is the least important question — that’s a serious limitation.
What Would Actually Help
A useful AI for Stata needs to go beyond autocomplete. It needs to understand the estimation lifecycle, know which commands require which preconditions (e.g., xtset before xtreg), suggest post-estimation diagnostics, and — critically — run the code to verify it actually works.
That’s the difference between a code completer and a research assistant. Copilot is the former. Sytra is built to be the latter.