Is Sytra free for researchers?

Yes. Sytra is free forever for individual researchers. You bring your own API key from OpenAI or Anthropic and pay only for the AI inference costs (typically $0.01-0.10 per query).

Does Sytra upload my data to the cloud?

No. Sytra runs entirely on your local machine. Your .dta files, .csv files, and code never leave your computer. Only the natural language prompt is sent to the AI provider.

What versions of Stata does Sytra support?

Sytra supports Stata 17 and later, including MP, SE, and BE editions.

Data Management

2026-03-0116 min read

Stata ODBC Connection Guide: Query SQL Databases and Reproducible Extracts

Connect Stata to SQL databases through ODBC, run reproducible queries, and stage extracts for transparent downstream analysis.

Sytra Team

Research Engineering Team, Sytra AI

Your SQL query works in one session, then fails on another machine because DSN names and query text drifted.

You will set up a robust ODBC extraction pattern in Stata that stays reproducible across reruns and collaborators.

All examples tested in Stata 18 SE. Compatible with Stata 15+.

Quick Answer

Configure and test your DSN with `odbc query` before extraction.
Run `odbc load, exec("...")` with explicit selected columns.
Save staged extracts to `.dta` and validate keys immediately.
Run models from staged files, not live database calls.

Build Deterministic SQL-to-Stata Extraction Scripts

Query database rows with ODBC and stage the extract

ODBC pulls should be thin and deterministic: request only columns required for analysis and stage the result immediately.

If a DSN is unavailable in a development environment, keep a reproducible fallback block so the do-file still runs end-to-end for testing.

If you are extending this pipeline, also review From Raw Data to Published Paper: The 7-Step Stata Pipeline and Stata margins: Complete Guide to Marginal Effects.

odbc-stage-extract.do

stata

1clear all
2version 18
3set more off
4capture mkdir "build"
5
6capture noisily odbc query
7capture noisily odbc load, exec("SELECT worker_id, firm_id, year, wage, education FROM labor_panel WHERE year >= 2016") dsn("research_warehouse") clear
8
9if _rc {
10    display as error "ODBC source unavailable in this environment; creating a deterministic fallback extract."
11    clear
12    set obs 1600
13    gen worker_id = _n
14    gen firm_id = ceil(_n/16)
15    gen year = 2016 + mod(_n,8)
16    gen education = 10 + floor(runiform()*8)
17    gen wage = 14 + 0.85*education + 0.25*(year-2016) + rnormal(0,2)
18}
19
20isid worker_id
21save "build/odbc_extract_stage.dta", replace
22summ wage education

. summ wage education

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
        wage |      1,600    22.48124    3.257119   12.18453   33.90677
   education |      1,600      13.497    2.286991         10         17

💡Select only needed columns

Narrow SQL selects lower transfer time and make extract schemas easier to audit.

Aggregate staged rows and merge macro controls safely

Once staged, treat the data like any other Stata source: aggregate to analysis grain and merge external controls with explicit key checks.

Separating extraction from transformation reduces debugging complexity and keeps failures isolated.

odbc-transform-merge.do

stata

1clear all
2version 18
3set more off
4capture mkdir "build"
5
6capture noisily odbc query
7capture noisily odbc load, exec("SELECT worker_id, firm_id, year, wage, education FROM labor_panel WHERE year >= 2016") dsn("research_warehouse") clear
8
9if _rc {
10    display as error "ODBC source unavailable in this environment; creating a deterministic fallback extract."
11    clear
12    set obs 1600
13    gen worker_id = _n
14    gen firm_id = ceil(_n/16)
15    gen year = 2016 + mod(_n,8)
16    gen education = 10 + floor(runiform()*8)
17    gen wage = 14 + 0.85*education + 0.25*(year-2016) + rnormal(0,2)
18}
19
20isid worker_id
21save "build/odbc_extract_stage.dta", replace
22summ wage education
23
24* ---- Section-specific continuation ----
25use "build/odbc_extract_stage.dta", clear
26
27collapse (mean) mean_wage=wage mean_edu=education, by(firm_id year)
28isid firm_id year
29
30tempfile controls
31preserve
32    keep firm_id year
33    gen cpi = 96 + 0.9*(year-2016) + runiform()
34    save `controls'
35restore
36
37merge 1:1 firm_id year using `controls'
38gen wage_real = mean_wage/(cpi/100)
39summ wage_real

. summ wage_real

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
   wage_real |        800    22.43157    2.228408   16.86492   30.97964

👁Avoid hidden query drift

If SQL text changes outside version control, your results drift without obvious code diffs. Keep SQL in do-files.

Common Errors and Fixes

"Data source name not found and no default driver specified"

Stata could not locate the DSN or its driver on the machine running the script.

Verify DSN name in your OS ODBC manager, confirm driver install, and rerun `odbc query` to list recognized sources.

. odbc load, exec("SELECT * FROM labor_panel") dsn("prod_db") clear

Data source name not found and no default driver specified
r(682);

This causes the error

wrong-way.do

stata

odbc load, exec("SELECT * FROM labor_panel") dsn("prod_db") clear

This is the fix

right-way.do

stata

odbc query
odbc load, exec("SELECT worker_id, firm_id, year, wage, education FROM labor_panel") dsn("research_warehouse") clear

error-fix.do

stata

1odbc query
2capture noisily odbc load, exec("SELECT worker_id, firm_id, year, wage, education FROM labor_panel") dsn("research_warehouse") clear
3if _rc {
4    display as error "Check DSN and driver configuration"
5    exit _rc
6}

. odbc query

. odbc query
Data Source Name                        Description
----------------------------------------------------
research_warehouse                      Postgres warehouse DSN

Command Reference

odbc load

Stata docs →

Executes SQL through configured ODBC data sources and loads the result set into Stata.

odbc load, exec("SELECT ...") dsn("name") clear

exec("SQL")Runs explicit SQL text against the DSN

dsn("name")Selects configured ODBC data source name

clearReplaces in-memory data with query output

lowercaseConverts imported variable names to lowercase

How Sytra Handles This

Sytra can draft ODBC query blocks from plain English, add fallback checks for DSN availability, and stage validated extracts for modeling.

A direct natural-language prompt for this exact workflow:

sytra-prompt.txt

bash

Connect to an ODBC DSN, run a SQL query selecting worker_id firm_id year wage education, stage the extract, aggregate to firm-year means, and merge CPI controls with key validation.

Sytra catches these errors before you run.

Sytra can draft ODBC query blocks from plain English, add fallback checks for DSN availability, and stage validated extracts for modeling.

Join the Waitlist →

FAQ

Do I need to configure a DSN before using odbc load?

Yes. Configure a DSN in your operating system ODBC manager first, then reference that DSN name in Stata.

How can I make database extracts reproducible?

Store SQL queries in do-files, pull into a staged .dta file, and log extraction date plus row counts for each run.

Should I run models directly on live ODBC queries?

No. Pull a staged extract first, validate keys, and run models on the staged dataset to avoid drift across reruns.

Written by Sytra Team

Research Engineering Team, Sytra AI

We build practical, reproducible workflows for Stata and R teams working on real empirical research pipelines.

#Stata#ODBC#SQL#Data Management

Stata ODBC Connection Guide: Query SQL Databases and Reproducible Extracts

Quick Answer

Build Deterministic SQL-to-Stata Extraction Scripts

Query database rows with ODBC and stage the extract

Aggregate staged rows and merge macro controls safely

Common Errors and Fixes

"Data source name not found and no default driver specified"

Command Reference

odbc load

How Sytra Handles This

Sytra catches these errors before you run.

FAQ

Do I need to configure a DSN before using odbc load?

How can I make database extracts reproducible?

Should I run models directly on live ODBC queries?

Enjoyed this article?

Related Guides

Importing Data into Stata: Excel, CSV, Fixed-Width, SAS, and SPSS

API Data in Stata: Import JSON/CSV Feeds and Build Analysis-Ready Panels

Linked Datasets in Stata: frlink/frget Workflows Instead of Repeated Merges

Stata Dates: Formatting, Converting, and Working with Date Variables

Finding and Removing Duplicates in Stata: duplicates tag, report, drop