Is Sytra free for researchers?

Yes. Sytra is free forever for individual researchers. You bring your own API key from OpenAI or Anthropic and pay only for the AI inference costs (typically $0.01-0.10 per query).

Does Sytra upload my data to the cloud?

No. Sytra runs entirely on your local machine. Your .dta files, .csv files, and code never leave your computer. Only the natural language prompt is sent to the AI provider.

What versions of Stata does Sytra support?

Sytra supports Stata 17 and later, including MP, SE, and BE editions.

Methodology

2026-03-1311 min read

Instrumental Variables in Stata: When and How

A guide to IV estimation in Stata — when to use instruments, how to test for weak instruments, and why ChatGPT gets the syntax wrong.

Sytra Team

Research Engineering Team, Sytra AI

Instrumental variables is one of the most powerful tools in the econometrician’s arsenal — and one of the most frequently misapplied. A good instrument can solve endogeneity problems that no amount of controls can fix. A bad instrument produces estimates that are worse than OLS. The difference between the two is not always obvious, which is why the diagnostics matter as much as the estimation.

When You Need IV

You need instrumental variables when your key explanatory variable is correlated with the error term — i.e., when ordinary least squares is biased. This happens because of:

Omitted variable bias: There’s an unobserved confounder that affects both your X and your Y.
Reverse causality: Y causes X, not just the other way around.
Measurement error: X is measured with noise, and that noise attenuates the coefficient.

An instrument Z must satisfy two conditions: (1) relevance — Z is correlated with X, and (2) exclusion — Z affects Y only through X. The first can be tested. The second cannot — it’s an untestable assumption that must be argued on theoretical grounds.

Basic 2SLS in Stata

* Two-stage least squares

ivregress 2sls y controls (x = z1 z2), vce(robust)

The syntax: the endogenous variable x is inside parentheses, with the instruments z1 z2 after the equals sign. Exogenous controls go before the parentheses. Always use vce(robust) or vce(cluster clustvar) — IV-efficient standard errors are almost never valid in practice.

ChatGPT commonly generates:

ivregress 2sls y (x = z1 z2)

→ Missing controls placement (minor)

→ Missing vce(robust) (critical)

→ Missing all post-estimation (fatal)

The First Stage: Testing Instrument Strength

This is the single most important diagnostic in IV estimation. A weak instrument — one that is only weakly correlated with the endogenous variable — produces IV estimates that are biased toward OLS, have enormous standard errors, and can be wildly misleading.

* Run the first-stage diagnostics

estat firststage

The rule of thumb from Stock and Yogo (2005): the first-stage F-statistic should be at least 10. Below that, your instruments are weak and your IV estimates are unreliable. In recent work, Andrews, Stock, and Sun (2019) suggest that even F > 10 isn’t always sufficient — the exact threshold depends on the number of instruments and the desired maximum bias.

ChatGPT never generates this check. In every test we ran, ChatGPT produced the ivregress command and stopped. No first-stage diagnostics, no weak instrument test. This is like running a t-test without looking at your sample size — technically you get a number, but it might mean nothing.

Stop fighting with syntax.

Sytra is an AI research assistant built specifically for statistical computing. No more copy-pasting code into ChatGPT.

Get Early Access

Overidentification: The Hansen/Sargan Test

If you have more instruments than endogenous variables (overidentification), you can test whether the extra instruments are valid. The logic: if all instruments are truly exogenous, they should all give you the same answer. If they don’t, at least one is invalid.

* Overidentification test (requires homoskedasticity for Sargan)

estat overid

A significant test statistic means at least one of your instruments fails the exclusion restriction. This is bad news — it means your IV estimates are not consistent.

Important nuance: If you used vce(robust), Stata reports the Hansen J test instead of the Sargan test. They test the same thing but under different assumptions (heteroskedasticity-robust vs. homoskedasticity). The Hansen J is almost always what you want.

The Full IV Pipeline

Here’s what a complete IV estimation in Stata actually looks like — the version ChatGPT never generates:

* Complete IV estimation pipeline

* 1. Estimate the model

ivregress 2sls y age i.race (education = distance_college parents_educ), vce(robust)

* 2. Check instrument strength

estat firststage

* 3. Test overidentifying restrictions

estat overid

* 4. Compare IV to OLS (Hausman-style test)

estat endogeneity

* 5. Store estimates for comparison table

estimates store iv_main

Five commands. ChatGPT gives you one of them. The other four are where the actual science happens.

Common Mistakes

Too many instruments: With many instruments, the first-stage F can be large but the instruments can still be collectively weak. Use the Cragg-Donald statistic and Stock-Yogo critical values, not just the first-stage F from a single equation.
Ignoring endogeneity: IV is only better than OLS if the endogeneity is real. The Durbin-Wu-Hausman test (estat endogeneity) checks this. If the test is insignificant, stick with OLS — it’s more efficient.
Using ivreg instead of ivregress: The old ivreg command is deprecated. Always use ivregress, which supports vce() options and modern diagnostics.
Reporting only the second stage: Journals expect to see first-stage results. Use estimates table or esttab to report both stages.

How Sytra Handles IV

When you tell Sytra “estimate the effect of education on income using distance to college as an instrument,” it generates the full pipeline: ivregress with proper syntax, estat firststage, estat overid, and estat endogeneity. If the first-stage F is below 10, Sytra flags it with a warning. If the overidentification test rejects, it suggests revisiting your instrument set.

The AI doesn’t just write the regression command. It runs the entire inferential chain — because that’s what valid IV estimation requires.

Instrumental Variables in Stata: When and How

When You Need IV

Basic 2SLS in Stata

The First Stage: Testing Instrument Strength

Stop fighting with syntax.

Overidentification: The Hansen/Sargan Test

The Full IV Pipeline

Common Mistakes

How Sytra Handles IV

Further Reading

Enjoyed this article?

Related Guides

Difference-in-Differences in Stata: A Complete Guide

Regression Discontinuity in Stata: From Theory to Code

Why ChatGPT Fails at Stata: The Imperative-Declarative Divide

Logistic Regression in Stata: Marginal Effects That Actually Make Sense

Panel Data in Stata: xtreg vs. reghdfe vs. areg