Stata & R Guides
Tutorials, reference guides, and solutions to every Stata error — written by researchers, for researchers.
Knowledge Tree
Follow structured Stata learning paths by category, including start-here sequences, command indexes, and troubleshooting hubs.
Most Discussed on Statalist
How to Merge Datasets in Stata: 1:1, m:1, 1:m with Complete Examples
The definitive guide to merging in Stata. Covers every merge type, _merge diagnostics, keepusing, common errors, and when to use joinby instead.
Reshape in Stata: Wide to Long and Long to Wide with Real Panel Data
reshape is one of the most confusing Stata commands. Here's how i() and j() work, with real panel data examples and error debugging.
Stata Loops: foreach and forvalues Tutorial with 20 Practical Examples
Stop writing the same command 50 times. Here are 20 real-world loop patterns — from basic iteration to nested loops and automated tables.
reghdfe in Stata: High-Dimensional Fixed Effects Made Simple
reghdfe absorbs any number of fixed effects without creating dummy variables. Here's the full tutorial — install, syntax, absorb(), cluster(), and singleton handling.
Stata margins: Complete Guide to Marginal Effects with Interpretation
AME, MEM, MER — all demystified. margins after OLS, logit, probit, interactions, continuous variables, and marginsplot customization.
Finding and Removing Duplicates in Stata: duplicates tag, report, drop
Duplicates break merges, inflate standard errors, and corrupt analysis. Here's how to detect, understand, and remove them safely.
Latest
Stata Data Quality Checklist: Uniqueness, Ranges, Missingness, Logs
Build a reproducible datacheck stata workflow in Stata with execution logs, fail-fast assertions, and review-ready outputs.
set maxvar in Stata: Wide-Data Limits and Better Alternatives
Build a reproducible set maxvar stata workflow in Stata with execution logs, fail-fast assertions, and review-ready outputs.
compress in Stata: Reduce Memory Safely Without Semantic Drift
Build a reproducible compress stata workflow in Stata with execution logs, fail-fast assertions, and review-ready outputs.
Speed Up reshape and merge in Stata: Performance Design Patterns
Build a reproducible reshape performance stata workflow in Stata with execution logs, fail-fast assertions, and review-ready outputs.
Unicode in Stata: Encoding Fixes for Broken Strings and Merges
Fix unicode stata in Stata with a stepwise diagnosis flow, exact error handling, and checks that prevent repeat failures.
export delimited in Stata: Controlled CSV Exports for Replication
Build a reproducible export delimited stata workflow in Stata with execution logs, fail-fast assertions, and review-ready outputs.
import delimited in Stata: CSV Imports Without Type Breaks
Use import delimited stata in Stata with full runnable code, realistic panel variables, and QA checks before downstream estimation.
putdocx in Stata: Build Automated Word Reports
Build a reproducible putdocx stata workflow in Stata with execution logs, fail-fast assertions, and review-ready outputs.
putexcel in Stata: Export Results to Excel Reliably
Build a reproducible putexcel stata workflow in Stata with execution logs, fail-fast assertions, and review-ready outputs.
Maps in Stata: Choropleths, Shapefile Joins, and QA
Create map stata outputs in Stata with publication-oriented styling, export controls, and interpretation guardrails.
Heatmaps in Stata: Matrix-Style Visuals with twoway Patterns
Create heatmap stata outputs in Stata with publication-oriented styling, export controls, and interpretation guardrails.
Stata Graph Schemes: Consistent Visual Style Across Projects
Create graph scheme stata outputs in Stata with publication-oriented styling, export controls, and interpretation guardrails.
graph combine in Stata: Multi-Panel Layouts That Print Cleanly
Create combine graphs stata outputs in Stata with publication-oriented styling, export controls, and interpretation guardrails.
Coefficient Plots in Stata: coefplot Workflow for Papers
Create coefplot stata outputs in Stata with publication-oriented styling, export controls, and interpretation guardrails.
Bar Charts in Stata: graph bar Design and Reliable Comparisons
Create graph bar stata outputs in Stata with publication-oriented styling, export controls, and interpretation guardrails.
Histograms in Stata: Bins, Density, and Distribution Checks
Create histogram stata outputs in Stata with publication-oriented styling, export controls, and interpretation guardrails.
Line Plots in Stata: Time-Series Trends and Multi-Line Layouts
Create line plot stata outputs in Stata with publication-oriented styling, export controls, and interpretation guardrails.
Scatter Plots in Stata: twoway scatter with Labels and Groups
Create scatter plot stata outputs in Stata with publication-oriented styling, export controls, and interpretation guardrails.
graph export in Stata: Formats, DPI, and Journal Requirements
Create graph export stata outputs in Stata with publication-oriented styling, export controls, and interpretation guardrails.
preserve/restore Pitfalls in Stata: Nested State and Recovery
Fix preserve restore error in Stata with a stepwise diagnosis flow, exact error handling, and checks that prevent repeat failures.
Stata "matrix not found": e(), r(), and Scope Debugging
Fix matrix not found stata in Stata with a stepwise diagnosis flow, exact error handling, and checks that prevent repeat failures.
Mata Conformability Error: Dimension Debugging and Fixes
Fix conformability error mata in Stata with a stepwise diagnosis flow, exact error handling, and checks that prevent repeat failures.
Stata "no room to add more variables": maxvar and Redesign
Fix no room to add more variables in Stata with a stepwise diagnosis flow, exact error handling, and checks that prevent repeat failures.
Stata "too many variables specified": Macro Expansion and Limits
Fix too many variables specified in Stata with a stepwise diagnosis flow, exact error handling, and checks that prevent repeat failures.
Stata r(198) invalid syntax: Systematic Debug Workflow
Fix r(198) invalid syntax in Stata with a stepwise diagnosis flow, exact error handling, and checks that prevent repeat failures.
Stata r(601) file not found: Paths, Quotes, and Project Hygiene
Fix r(601) file not found in Stata with a stepwise diagnosis flow, exact error handling, and checks that prevent repeat failures.
Stata "command not found": Path, Version, and Package Diagnosis
Fix command not found stata in Stata with a stepwise diagnosis flow, exact error handling, and checks that prevent repeat failures.
adopath in Stata: Ado Conflicts and Command Discovery Fixes
Fix adopath stata in Stata with a stepwise diagnosis flow, exact error handling, and checks that prevent repeat failures.
Installing Stata Packages: ssc install, net install, Conflicts
Build a reproducible ssc install stata workflow in Stata with execution logs, fail-fast assertions, and review-ready outputs.
version in Stata Do-Files: Cross-Machine Consistency Control
Build a reproducible version stata workflow in Stata with execution logs, fail-fast assertions, and review-ready outputs.
Reproducibility in Stata: set seed, RNG Streams, version
Build a reproducible set seed stata workflow in Stata with execution logs, fail-fast assertions, and review-ready outputs.
simulate in Stata: Monte Carlo Designs That Replicate
Build a reproducible simulate stata workflow in Stata with execution logs, fail-fast assertions, and review-ready outputs.
jackknife in Stata: Use Cases, Limits, and Pitfalls
Build a reproducible jackknife stata workflow in Stata with execution logs, fail-fast assertions, and review-ready outputs.
bootstrap in Stata: Correct Resampling Standard Errors
Build a reproducible bootstrap stata workflow in Stata with execution logs, fail-fast assertions, and review-ready outputs.
Dynamic Panels in Stata: xtabond2 and Instrument Discipline
Implement xtabond2 stata in Stata with identification-first setup, diagnostics, and reporting standards used in applied research.
GMM in Stata: Moment Setup and Convergence Debugging
Run gmm stata in Stata with coefficient interpretation, inference checks, and practical modeling decisions for real datasets.
Instrumental Variables in Stata: Concept, Code, and Diagnostics
Implement instrumental variables stata in Stata with identification-first setup, diagnostics, and reporting standards used in applied research.
IPW in Stata: Stabilized Weights, Trimming, and Diagnostics
Implement inverse probability weighting stata in Stata with identification-first setup, diagnostics, and reporting standards used in applied research.
Treatment Effects in Stata: RA, IPW, AIPW Patterns
Implement teffects stata in Stata with identification-first setup, diagnostics, and reporting standards used in applied research.
PSM in Stata: Matching Workflow with Balance Diagnostics
Implement propensity score matching stata in Stata with identification-first setup, diagnostics, and reporting standards used in applied research.
Regression Discontinuity in Stata: Bandwidth, Bins, and Reporting
Implement regression discontinuity stata in Stata with identification-first setup, diagnostics, and reporting standards used in applied research.
Synthetic Control in Stata: Data Prep, Weights, and Placebos
Implement synthetic control stata in Stata with identification-first setup, diagnostics, and reporting standards used in applied research.
Parallel Trends in Stata: Diagnostics, Placebos, and Visual Checks
Implement parallel trends test stata in Stata with identification-first setup, diagnostics, and reporting standards used in applied research.
Event Study in Stata: Leads/Lags with Factor Variables
Implement event study stata in Stata with identification-first setup, diagnostics, and reporting standards used in applied research.
Staggered Adoption DiD in Stata: csdid Workflow and Event Plots
Implement csdid stata in Stata with identification-first setup, diagnostics, and reporting standards used in applied research.
Difference-in-Differences in Stata with didregress
Implement didregress stata in Stata with identification-first setup, diagnostics, and reporting standards used in applied research.
Few Clusters in Stata: Better Inference with Clustered SE Limits
Implement cluster few clusters stata in Stata with identification-first setup, diagnostics, and reporting standards used in applied research.
Heteroskedasticity Tests in Stata: estat hettest and BP/White
Run heteroskedasticity test stata in Stata with coefficient interpretation, inference checks, and practical modeling decisions for real datasets.
Regression Diagnostics in Stata: Residuals, Leverage, Influence
Run regression diagnostics stata in Stata with coefficient interpretation, inference checks, and practical modeling decisions for real datasets.
stset Errors in Stata: Time, Failure, and Censoring Fixes
Fix stset stata in Stata with a stepwise diagnosis flow, exact error handling, and checks that prevent repeat failures.
Survival Analysis in Stata: stset, stcox, and Hazard Workflow
Run survival analysis stata in Stata with coefficient interpretation, inference checks, and practical modeling decisions for real datasets.
glm in Stata: Families, Links, and margins Interpretation
Run glm stata in Stata with coefficient interpretation, inference checks, and practical modeling decisions for real datasets.
Negative Binomial in Stata: nbreg and Overdispersion Checks
Run nbreg stata in Stata with coefficient interpretation, inference checks, and practical modeling decisions for real datasets.
poisson in Stata: Count Models, Exposure, and Robust SE
Run poisson stata in Stata with coefficient interpretation, inference checks, and practical modeling decisions for real datasets.
probit in Stata: Interpretation with margins and Use Cases
Run probit stata in Stata with coefficient interpretation, inference checks, and practical modeling decisions for real datasets.
logit in Stata: Odds Ratios, AMEs, and Predicted Probabilities
Run logit stata in Stata with coefficient interpretation, inference checks, and practical modeling decisions for real datasets.
hausman Test in Stata: FE vs RE and Common Failure Modes
Run hausman test stata in Stata with coefficient interpretation, inference checks, and practical modeling decisions for real datasets.
Weak Instruments in Stata: First-Stage F, KP, and Reporting
Implement weak instrument test stata in Stata with identification-first setup, diagnostics, and reporting standards used in applied research.
ivreg2 in Stata: Robust IV Workflow and Reporting Standards
Implement ivreg2 stata in Stata with identification-first setup, diagnostics, and reporting standards used in applied research.
2SLS in Stata: ivregress 2sls with Required Diagnostics
Implement ivregress 2sls stata in Stata with identification-first setup, diagnostics, and reporting standards used in applied research.
areg in Stata: Absorbed Fixed Effects and Practical Limits
Run areg stata in Stata with coefficient interpretation, inference checks, and practical modeling decisions for real datasets.
reghdfe vs xtreg in Stata: High-Dimensional FE Tradeoffs
Run reghdfe stata in Stata with coefficient interpretation, inference checks, and practical modeling decisions for real datasets.
Fixed Effects in Stata with xtreg, fe: Assumptions and Output
Run xtreg fe stata in Stata with coefficient interpretation, inference checks, and practical modeling decisions for real datasets.
tabstat and table in Stata: Summary Tables for Reporting
Build a reproducible tabstat stata workflow in Stata with execution logs, fail-fast assertions, and review-ready outputs.
summarize, detail in Stata: Percentiles, Skewness, and Checks
Use summarize detail stata in Stata with full runnable code, realistic panel variables, and QA checks before downstream estimation.
tabulate in Stata: One-Way, Two-Way, Missing, and Percentages
Use tabulate stata in Stata with full runnable code, realistic panel variables, and QA checks before downstream estimation.
contract in Stata: Frequency Tables You Can Merge Back
Use contract stata in Stata with full runnable code, realistic panel variables, and QA checks before downstream estimation.
isid in Stata: Enforce Key Uniqueness and Repair Failures
Use isid stata in Stata with full runnable code, realistic panel variables, and QA checks before downstream estimation.
Unique IDs in Stata: egen group(), isid, and Key Discipline
Use egen group stata in Stata with full runnable code, realistic panel variables, and QA checks before downstream estimation.
Sorting in Stata: sort, gsort, stable, and Tie Handling
Use sort stable stata in Stata with full runnable code, realistic panel variables, and QA checks before downstream estimation.
preserve and restore in Stata: Safe Temporary Transforms
Build a reproducible preserve restore stata workflow in Stata with execution logs, fail-fast assertions, and review-ready outputs.
assert in Stata: Data Validation Checks That Fail Fast
Build a reproducible assert stata workflow in Stata with execution logs, fail-fast assertions, and review-ready outputs.
capture, noisily, and assert in Stata: Robust Script Patterns
Build a reproducible capture noisily stata workflow in Stata with execution logs, fail-fast assertions, and review-ready outputs.
File Paths in Stata: Windows vs Mac/Linux and Space-Safe Paths
Build a reproducible file path stata workflow in Stata with execution logs, fail-fast assertions, and review-ready outputs.
tempfile and tempname in Stata: Safer Intermediate Pipelines
Build a reproducible tempfile stata workflow in Stata with execution logs, fail-fast assertions, and review-ready outputs.
Stata Quoting Rules: Compound Quotes, Nested Macros, and Paths
Fix local macro quotes stata in Stata with a stepwise diagnosis flow, exact error handling, and checks that prevent repeat failures.
forvalues in Stata: Numeric Iteration with Clean Indexing
Build a reproducible forvalues stata workflow in Stata with execution logs, fail-fast assertions, and review-ready outputs.
foreach in Stata: Iterate Variables and Files Safely
Build a reproducible foreach loop stata workflow in Stata with execution logs, fail-fast assertions, and review-ready outputs.
Local vs Global Macros in Stata: Scope, Safety, and Patterns
Build a reproducible globals vs locals stata workflow in Stata with execution logs, fail-fast assertions, and review-ready outputs.
Stata Do-File Structure: A Professional Reproducible Template
Build a reproducible do file stata workflow in Stata with execution logs, fail-fast assertions, and review-ready outputs.
Stata Logs: log using, SMCL vs Text, and Troubleshooting Records
Build a reproducible log using stata workflow in Stata with execution logs, fail-fast assertions, and review-ready outputs.
asdoc in Stata: Exporting Tables to Word Safely
Build a reproducible asdoc stata workflow in Stata with execution logs, fail-fast assertions, and review-ready outputs.
outreg2 in Stata: Fast Regression Tables and Caveats
Build a reproducible outreg2 stata workflow in Stata with execution logs, fail-fast assertions, and review-ready outputs.
esttab and eststo in Stata: Consistent Regression Tables
Build a reproducible esttab stata workflow in Stata with execution logs, fail-fast assertions, and review-ready outputs.
Interaction Terms in Stata: c.x##i.group with margins
Run interaction term stata in Stata with coefficient interpretation, inference checks, and practical modeling decisions for real datasets.
Factor Variables in Stata: i., c., Interactions, and Base Levels
Run i. factor variables stata in Stata with coefficient interpretation, inference checks, and practical modeling decisions for real datasets.
marginsplot in Stata: Publication-Ready Effects Visualizations
Create marginsplot stata outputs in Stata with publication-oriented styling, export controls, and interpretation guardrails.
margins in Stata: Predictions, AMEs, and Clean Interpretation
Run margins stata in Stata with coefficient interpretation, inference checks, and practical modeling decisions for real datasets.
Stata "omitted because of collinearity": Correct Interpretation
Fix omitted because of collinearity in Stata with a stepwise diagnosis flow, exact error handling, and checks that prevent repeat failures.
Multicollinearity in Stata: VIF, Dropped Variables, and Redesign
Run multicollinearity stata in Stata with coefficient interpretation, inference checks, and practical modeling decisions for real datasets.
Clustered Standard Errors in Stata: vce(cluster) Do and Don't
Run cluster standard errors stata in Stata with coefficient interpretation, inference checks, and practical modeling decisions for real datasets.
Robust Standard Errors in Stata: vce(robust) and Interpretation
Run robust standard errors stata in Stata with coefficient interpretation, inference checks, and practical modeling decisions for real datasets.
regress in Stata: OLS Basics and Correct Interpretation
Run regress stata in Stata with coefficient interpretation, inference checks, and practical modeling decisions for real datasets.
Panel Diagnostics in Stata: xtdescribe, xtsum, and Balance Checks
Implement xtdescribe stata in Stata with identification-first setup, diagnostics, and reporting standards used in applied research.
tsset in Stata: Time-Series Setup, Formats, and Duplicate Times
Use tsset stata in Stata with full runnable code, realistic panel variables, and QA checks before downstream estimation.
xtset in Stata: Panel IDs, Time Variables, and Gap Pitfalls
Use xtset stata in Stata with full runnable code, realistic panel variables, and QA checks before downstream estimation.
Survival Analysis in R: survminer, Cox PH, and Competing Risks
A practical guide to survival analysis in R — from Surv objects to Cox PH to competing risks — for public health and biostatistics researchers.
Open Source Statistical Software in 2026: The Landscape
R, Julia, Python statsmodels, and now Sytra. A map of the open-source statistical computing landscape in 2026 — what each tool does best and where the gaps are.
Fix "factor variables may not contain noninteger values" in Stata
Fix factor variables not allowed in Stata with a stepwise diagnosis flow, exact error handling, and checks that prevent repeat failures.
Stata no observations r(2000): Filters, Merges, and Missingness
Fix no observations r(2000) in Stata with a stepwise diagnosis flow, exact error handling, and checks that prevent repeat failures.
Stata variable not found r(111): Why It Happens and Prevention
Fix variable not found r(111) in Stata with a stepwise diagnosis flow, exact error handling, and checks that prevent repeat failures.
Stata type mismatch r(109): A Diagnostic Workflow
Fix type mismatch r(109) in Stata with a stepwise diagnosis flow, exact error handling, and checks that prevent repeat failures.
Stata Error "string variables not allowed": Causes and Fixes
Fix string variable not allowed in Stata with a stepwise diagnosis flow, exact error handling, and checks that prevent repeat failures.
reshape wide in Stata: Stubs, Suffixes, and Sparse Panels
Fix reshape wide stata in Stata with a stepwise diagnosis flow, exact error handling, and checks that prevent repeat failures.
reshape long in Stata: Fix i/j Errors and Duplicate Keys
Fix reshape long stata in Stata with a stepwise diagnosis flow, exact error handling, and checks that prevent repeat failures.
Panel Data in R: fixest vs. plm vs. lfe
Comparing fixest, plm, and lfe for panel data estimation in R. Speed benchmarks, syntax differences, and when to use each.
AI and the Future of Econometrics: A Working Researcher's Perspective
AI won't replace econometricians. But it will change how we work. Here's what a PhD student running 400K-observation regressions thinks about the next 5 years.
merge keepusing() and keep(match) in Stata: Cleaner Joins
Use merge keepusing stata in Stata with full runnable code, realistic panel variables, and QA checks before downstream estimation.
Range Joins in Stata: Match Rows Within Date and Value Intervals
Use rangejoin stata in Stata with full runnable code, realistic panel variables, and QA checks before downstream estimation.
joinby in Stata: Many-to-Many Matching with Explicit Audits
Use joinby stata in Stata with full runnable code, realistic panel variables, and QA checks before downstream estimation.
Why merge m:m in Stata Is Dangerous and What to Do Instead
Fix merge m:m stata in Stata with a stepwise diagnosis flow, exact error handling, and checks that prevent repeat failures.
Stata Frames: Working with Multiple Datasets in Memory
Build a reproducible frames stata workflow in Stata with execution logs, fail-fast assertions, and review-ready outputs.
order and sort in Stata: Stable, Readable Datasets for Reproducibility
Build a reproducible order stata workflow in Stata with execution logs, fail-fast assertions, and review-ready outputs.
keep/drop in Stata: Subset Data Without Accidental Loss
Use keep drop stata in Stata with full runnable code, realistic panel variables, and QA checks before downstream estimation.
Multilevel Models in Stata: HLM for Education Research
A practical guide to multilevel/HLM models in Stata using the mixed command. Students nested in schools. Patients nested in hospitals. Here's how.
Difference-in-Differences in R: didimputation, did, and fixest
A complete guide to modern DiD estimation in R — using fixest, did, and didimputation — with parallel trends testing and event study plots.
rename in Stata: Bulk Rename Patterns with Wildcards
Build a reproducible rename stata workflow in Stata with execution logs, fail-fast assertions, and review-ready outputs.
Value Labels in Stata: label define, label values, and Label Hygiene
Use label values stata in Stata with full runnable code, realistic panel variables, and QA checks before downstream estimation.
recode in Stata: Safer Category Edits and Missing Handling
Use recode stata in Stata with full runnable code, realistic panel variables, and QA checks before downstream estimation.
tostring in Stata: Convert Without Breaking IDs and Merges
Use tostring stata in Stata with full runnable code, realistic panel variables, and QA checks before downstream estimation.
Duplicates in Stata: report/list/drop with an Audit Checklist
Use duplicates drop stata in Stata with full runnable code, realistic panel variables, and QA checks before downstream estimation.
bysort in Stata: Reusable Within-Group Transform Patterns
Use bysort stata in Stata with full runnable code, realistic panel variables, and QA checks before downstream estimation.
Missing Values in Stata: ., .a-.z, and Safe Recode Rules
Use missing values stata in Stata with full runnable code, realistic panel variables, and QA checks before downstream estimation.
Building a Replication Package in Stata: The Complete Checklist
AER, QJE, and REStud now require replication packages. Here's a complete checklist for building one in Stata — directory structure, master .do file, data documentation, and automated testing.
Why ChatGPT Produces Invalid R Code for Statistical Analysis
ChatGPT knows R syntax better than Stata. But it still produces statistically invalid code — wrong standard errors, missing diagnostics, and unvalidated inference.
Cursor vs. RStudio vs. Rao: AI Coding Assistants for R Users
A head-to-head comparison of Cursor, RStudio, and Rao (Lotas) for AI-assisted R programming — features, limitations, and what's still missing.
Stata Dates: daily(), %td, %tm, and Import Trap Fixes
Use stata date format in Stata with full runnable code, realistic panel variables, and QA checks before downstream estimation.
encode vs egen group() in Stata: Correct Category IDs for Modeling
Use encode stata in Stata with full runnable code, realistic panel variables, and QA checks before downstream estimation.
destring and real() in Stata: Convert String Numbers Safely
Fix destring stata in Stata with a stepwise diagnosis flow, exact error handling, and checks that prevent repeat failures.
egen in Stata: Group IDs, Totals, Ranks, and Practical Cookbook Patterns
Use egen stata in Stata with full runnable code, realistic panel variables, and QA checks before downstream estimation.
collapse in Stata: Group Summaries Without Losing Design Integrity
Use collapse stata in Stata with full runnable code, realistic panel variables, and QA checks before downstream estimation.
append in Stata: Stack Datasets Safely with Variable Alignment Checks
Use append stata in Stata with full runnable code, realistic panel variables, and QA checks before downstream estimation.
import excel in Stata: Clean Types, Headers, Ranges, and Dates
Use import excel stata in Stata with full runnable code, realistic panel variables, and QA checks before downstream estimation.
Event Studies in Stata: Finance and Economics Applications
How to implement event studies in Stata for finance and economics research — abnormal returns, CAR estimation, and visualization.
Matching Estimators in Stata: PSM, CEM, and Modern Alternatives
A guide to matching methods in Stata — propensity score matching, coarsened exact matching, and why you should probably use teffects instead of psmatch2.
From Raw Data to Published Paper: The 7-Step Stata Pipeline
A step-by-step pipeline from raw .csv to published table: import, clean, construct, analyze, visualize, export, validate. With code at every step.
reshape in Stata: Wide to Long and Back with Repeatable Patterns
Use reshape stata in Stata with full runnable code, realistic panel variables, and QA checks before downstream estimation.
merge in Stata: 1:1, m:1, 1:m with Match Audits
Use merge stata in Stata with full runnable code, realistic panel variables, and QA checks before downstream estimation.
Instrumental Variables in Stata: When and How
A guide to IV estimation in Stata — when to use instruments, how to test for weak instruments, and why ChatGPT gets the syntax wrong.
The Reproducibility Crisis in Stata: What .do Files Aren't Solving
Do-files are necessary but not sufficient for reproducibility. Here's what's still breaking — and how AI-assisted, logged execution changes the equation.
Regression Discontinuity in Stata: From Theory to Code
A practical guide to sharp and fuzzy RD designs in Stata using rdrobust, with bandwidth selection, visualization, and validation checks.
Logistic Regression in Stata: Marginal Effects That Actually Make Sense
Odds ratios are confusing. Marginal effects are what you actually want. Here's how to compute and interpret them correctly in Stata.
What Would a Truly Intelligent Statistical AI Look Like?
Not a chatbot. Not a code completer. A system that understands estimation, validates inference, and self-corrects. Here's the architecture.
Linked Datasets in Stata: frlink/frget Workflows Instead of Repeated Merges
Use Stata frames to connect datasets with frlink and frget, reduce merge mistakes, and keep clean, reproducible data pipelines.
GIS Data in Stata: Spatial Coordinates, Distance Features, and Regional Plots
A practical GIS data workflow in Stata using latitude/longitude validation, distance engineering, and map-ready regional outputs.
Stata preserve/restore and tempvar: Safe Data Manipulation Patterns
preserve and restore let you make destructive changes safely. tempvar and tempfile prevent namespace pollution. Here are the patterns every do-file needs.
Stata ODBC Connection Guide: Query SQL Databases and Reproducible Extracts
Connect Stata to SQL databases through ODBC, run reproducible queries, and stage extracts for transparent downstream analysis.
Stata Factor Variables: i., c., ibn., and # Notation Explained
Factor variable notation is powerful but confusing. Here's when to use i., c., ibn., ##, and # — with interaction examples and common mistakes.
The Copy-Paste Workflow Is Killing Your Research
Open ChatGPT. Describe analysis. Get code. Copy. Paste. Error. Repeat. This workflow is how 90% of researchers use AI today — and it's destroying reproducibility.
Survival Analysis in Stata: A Guide for Epidemiologists
A practical guide to survival analysis in Stata — from stset to Cox PH to competing risks. Written for public health researchers and epidemiologists.
API Data in Stata: Import JSON/CSV Feeds and Build Analysis-Ready Panels
Pull API data into Stata, parse fields, and turn daily feeds into clean panel datasets with key checks and reproducible staging.
Stata Macros: local, global, and Extended Functions Explained
Macros are Stata's variable system for code. local vs global, backtick-apostrophe syntax, extended functions, and the patterns that make do-files readable.
Stata Weights Explained: fweight, pweight, aweight, iweight — When to Use Which
Four weight types, four different purposes. Here's when each weight type is correct, with real examples from survey data and aggregated data.
Stata predict: Postestimation Commands for Fitted Values, Residuals, and More
Everything predict can do after regression — fitted values, residuals, influence statistics, predicted probabilities, and margins.
Panel Data in Stata: xtreg vs. reghdfe vs. areg
When to use xtreg, areg, or reghdfe for panel data in Stata. Includes speed benchmarks, syntax comparison, and common mistakes.
The Inference Problem: Why AI Tools Need to Think Like Statisticians, Not Programmers
AI coding assistants optimize for "does it run?" Statistical computing demands "is the inference valid?" This distinction is the most important problem in AI-assisted research.
Interaction Effects in Stata: Factor Variables, margins, and Interpretation
Interactions trip everyone up. Here's how i.x##i.z and c.x#c.z work, how to use margins and marginsplot to interpret them, and the common mistakes.
Clustered Standard Errors in Stata: vce(cluster) Explained with Examples
When and why to cluster standard errors. vce(cluster), vce(robust), two-way clustering, and the most common mistake researchers make.
Stata margins: Complete Guide to Marginal Effects with Interpretation
AME, MEM, MER — all demystified. margins after OLS, logit, probit, interactions, continuous variables, and marginsplot customization.
reghdfe in Stata: High-Dimensional Fixed Effects Made Simple
reghdfe absorbs any number of fixed effects without creating dummy variables. Here's the full tutorial — install, syntax, absorb(), cluster(), and singleton handling.
Export Regression Tables in Stata: esttab Tutorial with LaTeX, Word, and CSV
The most detailed esttab guide on the internet. Multi-model tables, custom stats, LaTeX with booktabs, Word output, and side-by-side comparison with outreg2.
Stata Dates: Formatting, Converting, and Working with Date Variables
Stata dates are stored as numbers. Here's the complete date handling reference — converting strings, display formats, date arithmetic, and tsset.
Finding and Removing Duplicates in Stata: duplicates tag, report, drop
Duplicates break merges, inflate standard errors, and corrupt analysis. Here's how to detect, understand, and remove them safely.
Cursor for Stata? Why General AI Coding Tools Miss the Point
Cursor is a brilliant code editor. But it doesn't know that your instrument is weak or that your standard errors need clustering. Here's why statistical computing needs a different kind of AI.
Publication-Ready Tables in Stata: esttab, outreg2, and collect
Stop manually formatting regression tables. Here's how to produce camera-ready LaTeX and Word tables from Stata using esttab, outreg2, and the new collect framework.
Stata Labels: Variable Labels, Value Labels, and label define
Well-labeled data is self-documenting. Here's the complete guide to variable labels, value labels, encode, decode, and label management.
Importing Data into Stata: Excel, CSV, Fixed-Width, SAS, and SPSS
How to get data into Stata from every format — with the exact import syntax, encoding options, and gotchas that waste 30 minutes.
Stata String Functions: substr, strpos, regexm, and 30 More with Examples
Every string function you need in Stata — from basic substr and trim to regex matching — with copy-paste examples for data cleaning.
Stata collapse: How to Aggregate Data with Examples
Need to go from individual-level to group-level data? collapse does it in one line. Full syntax, aggregation functions, and gotchas.
What Happens When You Ask Copilot to Run a Regression in Stata
GitHub Copilot can autocomplete Python in its sleep. But ask it to run a Stata regression and things fall apart. Here's a side-by-side test.
Stata egen Functions: Complete Reference with Examples for Every Function
Every egen function in one place — mean, total, count, max, min, rowmean, rowtotal, group, tag, rank — with examples for each.
Stata Loops: foreach and forvalues Tutorial with 20 Practical Examples
Stop writing the same command 50 times. Here are 20 real-world loop patterns — from basic iteration to nested loops and automated tables.
Reshape in Stata: Wide to Long and Long to Wide with Real Panel Data
reshape is one of the most confusing Stata commands. Here's how i() and j() work, with real panel data examples and error debugging.
Difference-in-Differences in Stata: A Complete Guide
A complete guide to difference-in-differences estimation in Stata — from basic 2x2 DiD to staggered adoption with Callaway-Sant'Anna. Includes code, diagnostics, and AI-assisted workflow.
How to Structure a Stata Project: Directory Layout, Naming, and Automation
A clean Stata project structure saves you hours. Here's the directory layout, naming conventions, and master .do file template used by top economics departments.
How to Merge Datasets in Stata: 1:1, m:1, 1:m with Complete Examples
The definitive guide to merging in Stata. Covers every merge type, _merge diagnostics, keepusing, common errors, and when to use joinby instead.
Stata 'variable already defined': Why gen Fails and How to Fix It
You ran gen and Stata said the variable already exists. Here's when to use replace, when to drop first, and the safe pattern for do-files.
Stata 'not sorted' Error in Merge: The Fix That Takes 5 Seconds
Stata says your data is not sorted. The fix is one line — but understanding why prevents hours of debugging merge issues.
Stata Error r(100): 'varlist required' and 'varlist not allowed' Explained
r(100) means Stata expected variable names and didn't get them — or got them when it didn't want them. Here's every scenario and the fix.
Stata 'matsize too small' Error: How to Fix It (and When You Shouldn't)
set matsize is the quick fix. But if you need 11,000 matsize, the real fix is reghdfe. Here's how to tell the difference.
Stata Type Mismatch Error in Merge: String vs Numeric Key Variables
One dataset stores FIPS codes as numbers. The other as strings. Stata refuses to merge. Here's the 5-second diagnosis and fix.
Singleton Observations in Stata reghdfe: What They Are and What to Do
reghdfe dropped 12,000 observations and you don't know why. They're singletons — fixed effect groups with one observation. Here's what that means for your paper.
Stata 'Convergence Not Achieved': Causes and Solutions for ML Estimation
Your logit, probit, or MLE model won't converge. Here's why — separation, multicollinearity, bad starting values — and the options that fix it.
Why ChatGPT Fails at Stata: The Imperative-Declarative Divide
ChatGPT treats Stata like Python with different syntax. It's not — it's a declarative inference language. Here are 5 real failures and why they happen.
7 Stata Commands ChatGPT Gets Wrong (And the Correct Syntax)
We tested ChatGPT on 7 common Stata commands. It got the syntax wrong on 5 of them. Here's what it generates vs. what actually works.
Stata Error r(198): Every Cause of 'Invalid Syntax' and How to Fix It
r(198) is Stata's most cryptic error. Here are the 15 causes — from invisible characters to smart quotes — and the one-line fix for each.
Stata Error r(111): '[variable] not found' — Complete Fix Guide
Your variable exists. You can see it. But Stata says "not found." Here's every reason that happens and how to fix each one in under a minute.
Stata Error r(2000) and r(2001): 'No Observations' — Why It Happens and How to Fix
You had 50,000 observations a minute ago. Now Stata says zero. Here's why your if condition, merge, or drop wiped your dataset — and how to recover.