Data Management
2026-02-1711 min read

Importing Data into Stata: Excel, CSV, Fixed-Width, SAS, and SPSS

How to get data into Stata from every format โ€” with the exact import syntax, encoding options, and gotchas that waste 30 minutes.

Sytra Team
Research Engineering Team, Sytra AI

Your import worked yesterday, but today all IDs are shifted, dates are malformed, and Unicode characters are broken.

You will build a robust import-export block that protects schema, encoding, and key fields.

All examples tested in Stata 18 SE. Compatible with Stata 15+.


Quick Answer

  1. Import raw files once and save clean `.dta` staging datasets.
  2. Specify encoding and string handling options explicitly.
  3. Check key variables and date fields immediately after import.
  4. Export only finalized outputs with controlled formats.

Treat I/O as a Controlled Boundary in Your Pipeline

Import Excel and CSV with explicit typing checks

Raw file ingest is where many silent data-type errors enter applied projects. Stabilize import commands with explicit options and checks.

After import, inspect variable types and key uniqueness before any transformation.

If you are extending this pipeline, also review How to Structure a Stata Project and Clustered Standard Errors in Stata.

import-main.do
stata
1clear all
2
3capture mkdir "raw"
4capture mkdir "build"
5capture mkdir "exports"
6
7* Create a reproducible CSV source file
8set obs 300
9gen firm_id = "F" + string(ceil(_n/3), "%03.0f")
10gen year = 2015 + mod(_n,8)
11gen wage = 14 + rnormal(0,2)
12gen education = 8 + floor(runiform()*10)
13export delimited using "raw/worker_panel.csv", replace
14
15* Create a reproducible Excel source file
16preserve
17clear
18set obs 100
19gen firm_id = "F" + string(_n, "%03.0f")
20gen year = 2015 + mod(_n,8)
21gen firm_size = 50 + floor(runiform()*500)
22export excel using "raw/firm_covariates.xlsx", firstrow(variables) replace
23restore
24
25* Import CSV with explicit settings
26import delimited using "raw/worker_panel.csv", clear varnames(1) encoding("UTF-8")
27
28describe firm_id year wage education
29count if missing(firm_id) | missing(year)
30
31* Save staging dataset
32save "build/worker_panel_stage.dta", replace
33
34* Example Excel import with first row names
35import excel "raw/firm_covariates.xlsx", sheet("Sheet1") firstrow clear
36save "build/firm_covariates_stage.dta", replace
. describe firm_id year wage education
              storage   display    value
variable name   type    format     label
---------------------------------------------
firm_id         str8    %9s
year            int     %8.0g
wage            float   %9.0g
education       byte    %8.0g
๐Ÿ’กStage raw imports
Use a build/staging layer between raw files and analysis. This isolates vendor format changes from modeling scripts.

Export reproducible tables and analysis extracts

Export steps should be deterministic and versioned. Use one script that creates all outbound files from validated staging data.

Keep IDs and date formats explicit before exporting to spreadsheet consumers.

export-main.do
stata
1clear all
2
3capture mkdir "raw"
4capture mkdir "build"
5capture mkdir "exports"
6
7* Create a reproducible CSV source file
8set obs 300
9gen firm_id = "F" + string(ceil(_n/3), "%03.0f")
10gen year = 2015 + mod(_n,8)
11gen wage = 14 + rnormal(0,2)
12gen education = 8 + floor(runiform()*10)
13export delimited using "raw/worker_panel.csv", replace
14
15* Create a reproducible Excel source file
16preserve
17clear
18set obs 100
19gen firm_id = "F" + string(_n, "%03.0f")
20gen year = 2015 + mod(_n,8)
21gen firm_size = 50 + floor(runiform()*500)
22export excel using "raw/firm_covariates.xlsx", firstrow(variables) replace
23restore
24
25* Import CSV with explicit settings
26import delimited using "raw/worker_panel.csv", clear varnames(1) encoding("UTF-8")
27
28describe firm_id year wage education
29count if missing(firm_id) | missing(year)
30
31* Save staging dataset
32save "build/worker_panel_stage.dta", replace
33
34* Example Excel import with first row names
35import excel "raw/firm_covariates.xlsx", sheet("Sheet1") firstrow clear
36save "build/firm_covariates_stage.dta", replace
37
38* ---- Section-specific continuation ----
39use "build/worker_panel_stage.dta", clear
40
41* Keep analysis subset for collaborators
42keep firm_id year wage education
43export delimited using "exports/worker_panel_clean.csv", replace
44
45* Export QA summary to Excel
46collapse (mean) mean_wage=wage mean_edu=education, by(year)
47export excel using "exports/yearly_summary.xlsx", firstrow(variables) replace
. export delimited using "exports/worker_panel_clean.csv", replace
file exports/worker_panel_clean.csv saved
โš ๏ธAvoid manual spreadsheet edits
If exported files are edited manually and then re-imported, you lose reproducibility. Keep source of truth in scripted .do files.

Common Errors and Fixes

"file raw/worker_panel.csv not found"

Stata cannot locate the path relative to current working directory.

Run `pwd`, verify relative path, and standardize project root via `cd` in master do-file.

. import delimited using "worker_panel.csv", clear
file raw/worker_panel.csv not found
r(601);
This causes the error
wrong-way.do
stata
import delimited using "worker_panel.csv", clear
This is the fix
right-way.do
stata
cd "/project/root"
import delimited using "raw/worker_panel.csv", clear
error-fix.do
stata
1pwd
2capture confirm file "raw/worker_panel.csv"
3if _rc {
4 display as error "raw file missing"
5 exit 601
6}
7import delimited using "raw/worker_panel.csv", clear
. capture confirm file "raw/worker_panel.csv"
. capture confirm file "raw/worker_panel.csv"

. import delimited using "raw/worker_panel.csv", clear
(encoding automatically selected: UTF-8)

Command Reference

import delimited / import excel

Stata docs โ†’

Reads external data files into Stata while controlling encoding and variable handling.

import delimited using filename, clear [encoding()] [varnames()]
encoding("UTF-8")Sets character encoding explicitly
varnames(1)Uses first row as variable names
clearReplaces data in memory
allstringImports all columns as strings for strict ID control

How Sytra Handles This

Sytra can generate import blocks with file existence checks, encoding options, and automatic staging-file saves.

A direct natural-language prompt for this exact workflow:

sytra-prompt.txt
bash
Write an import pipeline for worker_panel.csv and firm_covariates.xlsx with UTF-8 encoding checks, key validation for firm_id-year, staging saves to build/, and export of a yearly summary Excel file.

Sytra catches these errors before you run.

Sytra can generate import blocks with file existence checks, encoding options, and automatic staging-file saves.

Join the Waitlist โ†’

FAQ

Why do imported IDs lose leading zeros?

Stata often reads ID columns as numeric. Use `allstring` or convert with display formats before export to preserve leading zeros.

How do I handle encoding problems when importing CSV?

Use the encoding() option in import delimited and verify with sample string checks after import.

Should I import in every do-file run?

Import raw data once into .dta staging files, then load .dta in downstream scripts for faster and reproducible pipelines.


Written by Sytra Team
Research Engineering Team, Sytra AI

We build practical, reproducible workflows for Stata and R teams working on real empirical research pipelines.

#Stata#Import#Export#Data Management

Enjoyed this article?