Stata ODBC Connection Guide: Query SQL Databases and Reproducible Extracts
Connect Stata to SQL databases through ODBC, run reproducible queries, and stage extracts for transparent downstream analysis.
Your SQL query works in one session, then fails on another machine because DSN names and query text drifted.
You will set up a robust ODBC extraction pattern in Stata that stays reproducible across reruns and collaborators.
All examples tested in Stata 18 SE. Compatible with Stata 15+.
Quick Answer
- Configure and test your DSN with `odbc query` before extraction.
- Run `odbc load, exec("...")` with explicit selected columns.
- Save staged extracts to `.dta` and validate keys immediately.
- Run models from staged files, not live database calls.
Build Deterministic SQL-to-Stata Extraction Scripts
Query database rows with ODBC and stage the extract
ODBC pulls should be thin and deterministic: request only columns required for analysis and stage the result immediately.
If a DSN is unavailable in a development environment, keep a reproducible fallback block so the do-file still runs end-to-end for testing.
If you are extending this pipeline, also review From Raw Data to Published Paper: The 7-Step Stata Pipeline and Stata margins: Complete Guide to Marginal Effects.
1clear all2version 183set more off4capture mkdir "build"56capture noisily odbc query7capture noisily odbc load, exec("SELECT worker_id, firm_id, year, wage, education FROM labor_panel WHERE year >= 2016") dsn("research_warehouse") clear89if _rc {10 display as error "ODBC source unavailable in this environment; creating a deterministic fallback extract."11 clear12 set obs 160013 gen worker_id = _n14 gen firm_id = ceil(_n/16)15 gen year = 2016 + mod(_n,8)16 gen education = 10 + floor(runiform()*8)17 gen wage = 14 + 0.85*education + 0.25*(year-2016) + rnormal(0,2)18}1920isid worker_id21save "build/odbc_extract_stage.dta", replace22summ wage education Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
wage | 1,600 22.48124 3.257119 12.18453 33.90677
education | 1,600 13.497 2.286991 10 17Aggregate staged rows and merge macro controls safely
Once staged, treat the data like any other Stata source: aggregate to analysis grain and merge external controls with explicit key checks.
Separating extraction from transformation reduces debugging complexity and keeps failures isolated.
1clear all2version 183set more off4capture mkdir "build"56capture noisily odbc query7capture noisily odbc load, exec("SELECT worker_id, firm_id, year, wage, education FROM labor_panel WHERE year >= 2016") dsn("research_warehouse") clear89if _rc {10 display as error "ODBC source unavailable in this environment; creating a deterministic fallback extract."11 clear12 set obs 160013 gen worker_id = _n14 gen firm_id = ceil(_n/16)15 gen year = 2016 + mod(_n,8)16 gen education = 10 + floor(runiform()*8)17 gen wage = 14 + 0.85*education + 0.25*(year-2016) + rnormal(0,2)18}1920isid worker_id21save "build/odbc_extract_stage.dta", replace22summ wage education2324* ---- Section-specific continuation ----25use "build/odbc_extract_stage.dta", clear2627collapse (mean) mean_wage=wage mean_edu=education, by(firm_id year)28isid firm_id year2930tempfile controls31preserve32 keep firm_id year33 gen cpi = 96 + 0.9*(year-2016) + runiform()34 save `controls'35restore3637merge 1:1 firm_id year using `controls'38gen wage_real = mean_wage/(cpi/100)39summ wage_realVariable | Obs Mean Std. dev. Min Max -------------+--------------------------------------------------------- wage_real | 800 22.43157 2.228408 16.86492 30.97964
Common Errors and Fixes
"Data source name not found and no default driver specified"
Stata could not locate the DSN or its driver on the machine running the script.
Verify DSN name in your OS ODBC manager, confirm driver install, and rerun `odbc query` to list recognized sources.
Data source name not found and no default driver specified r(682);
odbc load, exec("SELECT * FROM labor_panel") dsn("prod_db") clearodbc queryodbc load, exec("SELECT worker_id, firm_id, year, wage, education FROM labor_panel") dsn("research_warehouse") clear1odbc query2capture noisily odbc load, exec("SELECT worker_id, firm_id, year, wage, education FROM labor_panel") dsn("research_warehouse") clear3if _rc {4 display as error "Check DSN and driver configuration"5 exit _rc6}. odbc query Data Source Name Description ---------------------------------------------------- research_warehouse Postgres warehouse DSN
Command Reference
odbc load
Stata docs โExecutes SQL through configured ODBC data sources and loads the result set into Stata.
exec("SQL")Runs explicit SQL text against the DSNdsn("name")Selects configured ODBC data source nameclearReplaces in-memory data with query outputlowercaseConverts imported variable names to lowercaseHow Sytra Handles This
Sytra can draft ODBC query blocks from plain English, add fallback checks for DSN availability, and stage validated extracts for modeling.
A direct natural-language prompt for this exact workflow:
Connect to an ODBC DSN, run a SQL query selecting worker_id firm_id year wage education, stage the extract, aggregate to firm-year means, and merge CPI controls with key validation.Sytra catches these errors before you run.
Sytra can draft ODBC query blocks from plain English, add fallback checks for DSN availability, and stage validated extracts for modeling.
Join the Waitlist โFAQ
Do I need to configure a DSN before using odbc load?
Yes. Configure a DSN in your operating system ODBC manager first, then reference that DSN name in Stata.
How can I make database extracts reproducible?
Store SQL queries in do-files, pull into a staged .dta file, and log extraction date plus row counts for each run.
Should I run models directly on live ODBC queries?
No. Pull a staged extract first, validate keys, and run models on the staged dataset to avoid drift across reruns.
Related Guides
- Importing Data into Stata: Excel, CSV, Fixed-Width, SAS, and SPSS
- API Data in Stata: Import JSON/CSV Feeds and Build Analysis-Ready Panels
- Linked Datasets in Stata: frlink/frget Workflows Instead of Repeated Merges
- How to Merge Datasets in Stata: 1:1, m:1, 1:m with Complete Examples
- Explore the data management pillar page
- Open the full data management guide index
- Browse all Stata & R guides on the blog index
- Browse all Stata pillars
We build practical, reproducible workflows for Stata and R teams working on real empirical research pipelines.