Methodology
2026-03-1311 min read

Instrumental Variables in Stata: When and How

A guide to IV estimation in Stata — when to use instruments, how to test for weak instruments, and why ChatGPT gets the syntax wrong.

Sytra Team
Research Engineering Team, Sytra AI

Instrumental variables is one of the most powerful tools in the econometrician’s arsenal — and one of the most frequently misapplied. A good instrument can solve endogeneity problems that no amount of controls can fix. A bad instrument produces estimates that are worse than OLS. The difference between the two is not always obvious, which is why the diagnostics matter as much as the estimation.

When You Need IV

You need instrumental variables when your key explanatory variable is correlated with the error term — i.e., when ordinary least squares is biased. This happens because of:

  • Omitted variable bias: There’s an unobserved confounder that affects both your X and your Y.
  • Reverse causality: Y causes X, not just the other way around.
  • Measurement error: X is measured with noise, and that noise attenuates the coefficient.

An instrument Z must satisfy two conditions: (1) relevance — Z is correlated with X, and (2) exclusion — Z affects Y only through X. The first can be tested. The second cannot — it’s an untestable assumption that must be argued on theoretical grounds.

Basic 2SLS in Stata

* Two-stage least squares
ivregress 2sls y controls (x = z1 z2), vce(robust)

The syntax: the endogenous variable x is inside parentheses, with the instruments z1 z2 after the equals sign. Exogenous controls go before the parentheses. Always use vce(robust) or vce(cluster clustvar) — IV-efficient standard errors are almost never valid in practice.

ChatGPT commonly generates:

ivregress 2sls y (x = z1 z2)
→ Missing controls placement (minor)
→ Missing vce(robust) (critical)
→ Missing all post-estimation (fatal)

The First Stage: Testing Instrument Strength

This is the single most important diagnostic in IV estimation. A weak instrument — one that is only weakly correlated with the endogenous variable — produces IV estimates that are biased toward OLS, have enormous standard errors, and can be wildly misleading.

* Run the first-stage diagnostics
estat firststage

The rule of thumb from Stock and Yogo (2005): the first-stage F-statistic should be at least 10. Below that, your instruments are weak and your IV estimates are unreliable. In recent work, Andrews, Stock, and Sun (2019) suggest that even F > 10 isn’t always sufficient — the exact threshold depends on the number of instruments and the desired maximum bias.

ChatGPT never generates this check. In every test we ran, ChatGPT produced the ivregress command and stopped. No first-stage diagnostics, no weak instrument test. This is like running a t-test without looking at your sample size — technically you get a number, but it might mean nothing.

Stop fighting with syntax.

Sytra is an AI research assistant built specifically for statistical computing. No more copy-pasting code into ChatGPT.

Get Early Access

Overidentification: The Hansen/Sargan Test

If you have more instruments than endogenous variables (overidentification), you can test whether the extra instruments are valid. The logic: if all instruments are truly exogenous, they should all give you the same answer. If they don’t, at least one is invalid.

* Overidentification test (requires homoskedasticity for Sargan)
estat overid

A significant test statistic means at least one of your instruments fails the exclusion restriction. This is bad news — it means your IV estimates are not consistent.

Important nuance: If you used vce(robust), Stata reports the Hansen J test instead of the Sargan test. They test the same thing but under different assumptions (heteroskedasticity-robust vs. homoskedasticity). The Hansen J is almost always what you want.

The Full IV Pipeline

Here’s what a complete IV estimation in Stata actually looks like — the version ChatGPT never generates:

* Complete IV estimation pipeline
 
* 1. Estimate the model
ivregress 2sls y age i.race (education = distance_college parents_educ), vce(robust)
 
* 2. Check instrument strength
estat firststage
 
* 3. Test overidentifying restrictions
estat overid
 
* 4. Compare IV to OLS (Hausman-style test)
estat endogeneity
 
* 5. Store estimates for comparison table
estimates store iv_main

Five commands. ChatGPT gives you one of them. The other four are where the actual science happens.

Common Mistakes

  • Too many instruments: With many instruments, the first-stage F can be large but the instruments can still be collectively weak. Use the Cragg-Donald statistic and Stock-Yogo critical values, not just the first-stage F from a single equation.
  • Ignoring endogeneity: IV is only better than OLS if the endogeneity is real. The Durbin-Wu-Hausman test (estat endogeneity) checks this. If the test is insignificant, stick with OLS — it’s more efficient.
  • Using ivreg instead of ivregress: The old ivreg command is deprecated. Always use ivregress, which supports vce() options and modern diagnostics.
  • Reporting only the second stage: Journals expect to see first-stage results. Use estimates table or esttab to report both stages.

How Sytra Handles IV

When you tell Sytra “estimate the effect of education on income using distance to college as an instrument,” it generates the full pipeline: ivregress with proper syntax, estat firststage, estat overid, and estat endogeneity. If the first-stage F is below 10, Sytra flags it with a warning. If the overidentification test rejects, it suggests revisiting your instrument set.

The AI doesn’t just write the regression command. It runs the entire inferential chain — because that’s what valid IV estimation requires.

Further Reading

  • Angrist, J. D., & Pischke, J. S. (2009). Mostly Harmless Econometrics. Princeton University Press. Chapter 4.
  • Stock, J. H., & Yogo, M. (2005). “Testing for Weak Instruments in Linear IV Regression.” In Identification and Inference for Econometric Models.
  • Andrews, I., Stock, J. H., & Sun, L. (2019). “Weak Instruments in Instrumental Variables Regression.” Annual Review of Economics, 11, 727-753.
#IV#Stata#Causal Inference#Economics

Enjoyed this article?