Importing Data into Stata: Excel, CSV, Fixed-Width, SAS, and SPSS
How to get data into Stata from every format โ with the exact import syntax, encoding options, and gotchas that waste 30 minutes.
Your import worked yesterday, but today all IDs are shifted, dates are malformed, and Unicode characters are broken.
You will build a robust import-export block that protects schema, encoding, and key fields.
All examples tested in Stata 18 SE. Compatible with Stata 15+.
Quick Answer
- Import raw files once and save clean `.dta` staging datasets.
- Specify encoding and string handling options explicitly.
- Check key variables and date fields immediately after import.
- Export only finalized outputs with controlled formats.
Treat I/O as a Controlled Boundary in Your Pipeline
Import Excel and CSV with explicit typing checks
Raw file ingest is where many silent data-type errors enter applied projects. Stabilize import commands with explicit options and checks.
After import, inspect variable types and key uniqueness before any transformation.
If you are extending this pipeline, also review How to Structure a Stata Project and Clustered Standard Errors in Stata.
1clear all23capture mkdir "raw"4capture mkdir "build"5capture mkdir "exports"67* Create a reproducible CSV source file8set obs 3009gen firm_id = "F" + string(ceil(_n/3), "%03.0f")10gen year = 2015 + mod(_n,8)11gen wage = 14 + rnormal(0,2)12gen education = 8 + floor(runiform()*10)13export delimited using "raw/worker_panel.csv", replace1415* Create a reproducible Excel source file16preserve17clear18set obs 10019gen firm_id = "F" + string(_n, "%03.0f")20gen year = 2015 + mod(_n,8)21gen firm_size = 50 + floor(runiform()*500)22export excel using "raw/firm_covariates.xlsx", firstrow(variables) replace23restore2425* Import CSV with explicit settings26import delimited using "raw/worker_panel.csv", clear varnames(1) encoding("UTF-8")2728describe firm_id year wage education29count if missing(firm_id) | missing(year)3031* Save staging dataset32save "build/worker_panel_stage.dta", replace3334* Example Excel import with first row names35import excel "raw/firm_covariates.xlsx", sheet("Sheet1") firstrow clear36save "build/firm_covariates_stage.dta", replacestorage display value variable name type format label --------------------------------------------- firm_id str8 %9s year int %8.0g wage float %9.0g education byte %8.0g
Export reproducible tables and analysis extracts
Export steps should be deterministic and versioned. Use one script that creates all outbound files from validated staging data.
Keep IDs and date formats explicit before exporting to spreadsheet consumers.
1clear all23capture mkdir "raw"4capture mkdir "build"5capture mkdir "exports"67* Create a reproducible CSV source file8set obs 3009gen firm_id = "F" + string(ceil(_n/3), "%03.0f")10gen year = 2015 + mod(_n,8)11gen wage = 14 + rnormal(0,2)12gen education = 8 + floor(runiform()*10)13export delimited using "raw/worker_panel.csv", replace1415* Create a reproducible Excel source file16preserve17clear18set obs 10019gen firm_id = "F" + string(_n, "%03.0f")20gen year = 2015 + mod(_n,8)21gen firm_size = 50 + floor(runiform()*500)22export excel using "raw/firm_covariates.xlsx", firstrow(variables) replace23restore2425* Import CSV with explicit settings26import delimited using "raw/worker_panel.csv", clear varnames(1) encoding("UTF-8")2728describe firm_id year wage education29count if missing(firm_id) | missing(year)3031* Save staging dataset32save "build/worker_panel_stage.dta", replace3334* Example Excel import with first row names35import excel "raw/firm_covariates.xlsx", sheet("Sheet1") firstrow clear36save "build/firm_covariates_stage.dta", replace3738* ---- Section-specific continuation ----39use "build/worker_panel_stage.dta", clear4041* Keep analysis subset for collaborators42keep firm_id year wage education43export delimited using "exports/worker_panel_clean.csv", replace4445* Export QA summary to Excel46collapse (mean) mean_wage=wage mean_edu=education, by(year)47export excel using "exports/yearly_summary.xlsx", firstrow(variables) replacefile exports/worker_panel_clean.csv saved
Common Errors and Fixes
"file raw/worker_panel.csv not found"
Stata cannot locate the path relative to current working directory.
Run `pwd`, verify relative path, and standardize project root via `cd` in master do-file.
file raw/worker_panel.csv not found r(601);
import delimited using "worker_panel.csv", clearcd "/project/root"import delimited using "raw/worker_panel.csv", clear1pwd2capture confirm file "raw/worker_panel.csv"3if _rc {4 display as error "raw file missing"5 exit 6016}7import delimited using "raw/worker_panel.csv", clear. capture confirm file "raw/worker_panel.csv" . import delimited using "raw/worker_panel.csv", clear (encoding automatically selected: UTF-8)
Command Reference
import delimited / import excel
Stata docs โReads external data files into Stata while controlling encoding and variable handling.
encoding("UTF-8")Sets character encoding explicitlyvarnames(1)Uses first row as variable namesclearReplaces data in memoryallstringImports all columns as strings for strict ID controlHow Sytra Handles This
Sytra can generate import blocks with file existence checks, encoding options, and automatic staging-file saves.
A direct natural-language prompt for this exact workflow:
Write an import pipeline for worker_panel.csv and firm_covariates.xlsx with UTF-8 encoding checks, key validation for firm_id-year, staging saves to build/, and export of a yearly summary Excel file.Sytra catches these errors before you run.
Sytra can generate import blocks with file existence checks, encoding options, and automatic staging-file saves.
Join the Waitlist โFAQ
Why do imported IDs lose leading zeros?
Stata often reads ID columns as numeric. Use `allstring` or convert with display formats before export to preserve leading zeros.
How do I handle encoding problems when importing CSV?
Use the encoding() option in import delimited and verify with sample string checks after import.
Should I import in every do-file run?
Import raw data once into .dta staging files, then load .dta in downstream scripts for faster and reproducible pipelines.
Related Guides
- API Data in Stata: Import JSON/CSV Feeds and Build Analysis-Ready Panels
- Stata ODBC Connection Guide: Query SQL Databases and Reproducible Extracts
- Stata Dates: Formatting, Converting, and Working with Date Variables
- How to Merge Datasets in Stata: 1:1, m:1, 1:m with Complete Examples
- Explore the data management pillar page
- Open the full data management guide index
- Browse all Stata & R guides on the blog index
- Browse all Stata pillars
We build practical, reproducible workflows for Stata and R teams working on real empirical research pipelines.