reshape in Stata: Wide to Long and Back with Repeatable Patterns
Use reshape stata in Stata with full runnable code, realistic panel variables, and QA checks before downstream estimation.
You are applying reshape stata under deadline pressure, and one unnoticed data issue can invalidate the full analysis pass.
You will get a deterministic transform path from raw files to validated analysis input. This guide keeps the path anchored to turning raw partner files into one analysis-ready firm-year panel.
All examples tested in Stata 18 SE. Compatible with Stata 15+.
Quick Answer
- Start with a defined research task before running reshape stata.
- Run reshape only after preflight checks on keys, types, and missingness.
- Audit command output immediately and document expected vs observed counts.
- Add a reusable QA block focused on key uniqueness, row counts, and type stability.
Execution Blueprint: reshape stata for turning raw partner files into one analysis-ready firm-year panel
Anchor the use case and run preflight checks
This workflow is built for turning raw partner files into one analysis-ready firm-year panel. A small key mismatch can break merges and remove observations without obvious warnings.
Run a deterministic setup first so every command in later sections executes against known data structure and known variable types.
If you are extending this pipeline, also review destring and real() in Stata: Convert String Numbers Safely and rename in Stata: Bulk Rename Patterns with Wildcards.
1clear all2version 183set seed 2602104set obs 12005gen firm_id = ceil(_n/12)6gen year = 2014 + mod(_n,10)7gen worker_id = _n8gen education = 10 + floor(runiform()*8)9gen wage = 18 + 0.8*education + 0.2*(year-2014) + rnormal(0,2)1011* Preflight checks12assert !missing(firm_id, year)13assert !missing(wage, education)14count1200
Execute reshape with full diagnostics
Run reshape as its own block and inspect output before proceeding. This preserves a clean debug boundary and supports peer review.
The command example below is complete and runnable; it is designed to mirror real panel workflows rather than toy x/y placeholders.
1clear all2version 183set seed 2602104set obs 12005gen firm_id = ceil(_n/12)6gen year = 2014 + mod(_n,10)7gen worker_id = _n8gen education = 10 + floor(runiform()*8)9gen wage = 18 + 0.8*education + 0.2*(year-2014) + rnormal(0,2)1011* Preflight checks12assert !missing(firm_id, year)13assert !missing(wage, education)14count1516* ---- Section-specific continuation ----17* Core execution block for reshape stata18collapse (mean) wage education, by(firm_id year)19reshape wide wage education, i(firm_id) j(year)20reshape long wage education, i(firm_id) j(year)21isid firm_id year2223* Immediate output audit24isid firm_id year. isid firm_id year variables firm_id year uniquely identify the observations
Harden for production: assertions, logs, and reusable checks
After command execution, enforce key uniqueness, row counts, and type stability so downstream inference and exports remain stable across reruns.
This final block makes the workflow team-ready: logs are captured, failures are explicit, and diagnostics are repeatable.
1clear all2version 183set seed 2602104set obs 12005gen firm_id = ceil(_n/12)6gen year = 2014 + mod(_n,10)7gen worker_id = _n8gen education = 10 + floor(runiform()*8)9gen wage = 18 + 0.8*education + 0.2*(year-2014) + rnormal(0,2)1011* Preflight checks12assert !missing(firm_id, year)13assert !missing(wage, education)14count1516* ---- Section-specific continuation ----17* Production hardening block18capture log close19log using reshape-stata-wide-long-qa.log, text replace2021collapse (mean) wage education, by(firm_id year)22reshape wide wage education, i(firm_id) j(year)23reshape long wage education, i(firm_id) j(year)24isid firm_id year2526assert !missing(firm_id, year)27count28isid firm_id year29log close. isid firm_id year variables firm_id year uniquely identify the observations
Common Errors and Fixes
"variables firm_id year do not uniquely identify observations in the using data"
The side expected to be unique has duplicate keys.
Run duplicates report on key variables and aggregate or redefine key design.
variables firm_id year do not uniquely identify observations in the using data r(459);
merge m:1 firm_id year using using_data.dtause using_data.dta, clearcollapse (mean) education, by(firm_id year)save using_data_unique.dta, replaceuse master_data.dta, clearmerge m:1 firm_id year using using_data_unique.dta1use using_data.dta, clear2duplicates report firm_id year3collapse (mean) education, by(firm_id year)4isid firm_id yearDuplicates in terms of firm_id year
--------------------------------------
Copies | Observations Surplus
----------+---------------------------
2 | 120 60
--------------------------------------Command Reference
reshape
Stata docs โPrimary command reference for reshape stata workflows in Stata.
Preflight checksValidate keys, types, and missingness before executionExecution blockRun the command in an isolated, reviewable sectionDiagnosticsInspect output immediately and compare against expectationsQA footerKeep assertions and logs for reproducible rerunsHow Sytra Handles This
Sytra can execute reshape stata as a staged workflow: preflight validation, runnable Stata code generation, and QA assertions before final output.
A direct natural-language prompt for this exact workflow:
Execute reshape stata for a firm_id-year wage dataset. Use variables wage, education, firm_id, and year. Include preflight checks, runnable Stata code, output diagnostics, and post-command assertions with a log file.Sytra catches these errors before you run.
Sytra can execute reshape stata as a staged workflow: preflight validation, runnable Stata code generation, and QA assertions before final output.
Join the Waitlist โFAQ
What is the safest order for reshape stata in a production do-file?
Use a three-step order: preflight checks, reshape execution, and post-command assertions. This sequence catches breakpoints before models or exports depend on the result.
How do I verify that reshape stata did not damage my sample?
Track count before and after each transformation, then validate key uniqueness and missingness changes on core variables. Keep those checks in the script, not in ad hoc console runs.
Which Stata versions are compatible with this workflow?
All examples are tested in Stata 18 SE and are compatible with Stata 15+, with installation checks included when community packages are used.
Related Guides
- merge in Stata: 1:1, m:1, 1:m with Match Audits
- import excel in Stata: Clean Types, Headers, Ranges, and Dates
- append in Stata: Stack Datasets Safely with Variable Alignment Checks
- collapse in Stata: Group Summaries Without Losing Design Integrity
- egen in Stata: Group IDs, Totals, Ranks, and Practical Cookbook Patterns
- Explore the data management pillar page
- Open the full data management guide index
- Browse all Stata & R guides on the blog index
- Browse all Stata pillars
We build practical, reproducible workflows for Stata and R teams working on real empirical research pipelines.