How to Merge Datasets in Stata: 1:1, m:1, 1:m with Complete Examples
The definitive guide to merging in Stata. Covers every merge type, _merge diagnostics, keepusing, common errors, and when to use joinby instead.
You have been staring at a merge error for 20 minutes, and every rerun gives a different number of matched rows.
By the end, you will know exactly which merge type to use, how to validate it, and how to recover from bad merges safely.
All examples tested in Stata 18 SE. Compatible with Stata 15+.
Quick Answer
- Confirm the merge key with `isid` on the side that must be unique.
- Run the right merge type (`1:1`, `m:1`, or `1:m`) and keep `_merge`.
- Audit unmatched observations before dropping anything.
- Save a validated merged dataset only after diagnostics pass.
Build a Merge Workflow That Survives Peer Review
Create stable keys and run the correct merge type
Most merge failures start with unstable keys. In firm-year data, key drift usually comes from string IDs, missing years, or duplicate observations that went unnoticed during cleaning.
Start by building numeric keys, checking uniqueness, and only then run the merge. This pattern avoids almost every downstream merge bug that appears later in regressions.
If you are extending this pipeline, also review reghdfe in Stata: High-Dimensional Fixed Effects Made Simple and How to Structure a Stata Project.
1clear all2set seed 26021134* Master dataset: worker-year wages5set obs 12006gen firm_id = ceil(_n/12)7gen year = 2014 + mod(_n, 10)8gen worker_id = _n9gen education = 10 + floor(runiform()*8)10gen wage = 14 + 0.8*education + 0.2*(year-2014) + rnormal(0,2)11tempfile master using12save `master'1314* Using dataset: firm-year covariates15clear16set obs 100017gen firm_id = ceil(_n/10)18gen year = 2014 + mod(_n, 10)19gen industry = mod(firm_id, 6) + 120gen firm_size = 40 + floor(runiform()*300)21isid firm_id year22save `using'2324use `master', clear25merge m:1 firm_id year using `using'26tab _mergeResult from merge | Freq. Percent Cum. -----------------------+----------------------------------- Master only (1) | 192 16.00 16.00 Using only (2) | 0 0.00 16.00 Matched (3) | 1,008 84.00 100.00 -----------------------+----------------------------------- Total | 1,200 100.00
Diagnose duplicates and key conflicts before estimation
If keys are duplicated on a side that must be unique, Stata stops with a hard error. This is useful: it protects your design from accidental many-to-many merges.
When duplicates are real, aggregate first or switch design logic. Never force a merge and hope model fixed effects will absorb the data issue.
1clear all2set seed 26021134* Master dataset: worker-year wages5set obs 12006gen firm_id = ceil(_n/12)7gen year = 2014 + mod(_n, 10)8gen worker_id = _n9gen education = 10 + floor(runiform()*8)10gen wage = 14 + 0.8*education + 0.2*(year-2014) + rnormal(0,2)11tempfile master using12save `master'1314* Using dataset: firm-year covariates15clear16set obs 100017gen firm_id = ceil(_n/10)18gen year = 2014 + mod(_n, 10)19gen industry = mod(firm_id, 6) + 120gen firm_size = 40 + floor(runiform()*300)21isid firm_id year22save `using'2324use `master', clear25merge m:1 firm_id year using `using'26tab _merge2728* ---- Section-specific continuation ----29use `using', clear3031* Check uniqueness directly32isid firm_id year3334* If the command above fails, diagnose duplicates35duplicates report firm_id year36duplicates tag firm_id year, gen(dup)37list firm_id year if dup > 0, sepby(firm_id year)3839* Safe cleanup example40bysort firm_id year: egen avg_firm_size = mean(firm_size)41by firm_id year: keep if _n == 142replace firm_size = avg_firm_size43drop avg_firm_size dup44isid firm_id yearDuplicates in terms of firm_id year
--------------------------------------
Copies | Observations Surplus
----------+---------------------------
1 | 968 0
2 | 32 16
--------------------------------------Common Errors and Fixes
"variables firm_id year do not uniquely identify observations in the using data"
This happens when your using dataset has duplicate key combinations while your command expects uniqueness on that side.
Run `duplicates report firm_id year` in the using dataset and decide whether to aggregate or redefine the merge key.
variables firm_id year do not uniquely identify observations in the using data r(459);
use worker_panel.dta, clearmerge m:1 firm_id year using firm_covariates.dtause firm_covariates.dta, clearbysort firm_id year: egen firm_size_avg = mean(firm_size)by firm_id year: keep if _n==1replace firm_size = firm_size_avgdrop firm_size_avgsave firm_covariates_clean.dta, replaceuse worker_panel.dta, clearmerge m:1 firm_id year using firm_covariates_clean.dta1use worker_panel.dta, clear2merge m:1 firm_id year using firm_covariates_clean.dta3tab _merge4assert _merge != 2(0 real changes made) . assert _merge != 2
Command Reference
merge
Stata docs โCombines master and using datasets by key variables while generating merge diagnostics via `_merge`.
keepusing(varlist)Pull only required variables from using datasetnogenSuppresses _merge generation when diagnostics are already capturedassert(match)Fails fast when unmatched observations appearupdate replaceControlled overwrites when merging revised fieldsHow Sytra Handles This
Sytra can run key uniqueness checks, type checks, and merge diagnostics before writing the final merge command, reducing silent data loss.
A direct natural-language prompt for this exact workflow:
Merge worker_panel.dta with firm_covariates.dta on firm_id and year using m:1. Before merge, verify key uniqueness in using, report duplicates, aggregate duplicates by mean firm_size, then rerun merge and output a table of _merge counts.Sytra catches these errors before you run.
Sytra can run key uniqueness checks, type checks, and merge diagnostics before writing the final merge command, reducing silent data loss.
Join the Waitlist โFAQ
What merge type should I use in Stata?
Use merge 1:1 when both datasets are unique on the key, merge m:1 when master has repeated keys and using is unique, and merge 1:m for the opposite case.
Do I need to sort before merge 1:1?
No. Modern merge syntax in Stata 11+ does not require manual sorting, but key uniqueness checks are still required for valid results.
How do I verify merge quality quickly?
Always tabulate _merge, inspect unmatched rows, and run isid on keys before merging so you catch duplicates before they damage estimates.
Related Guides
- Stata Type Mismatch Error in Merge: String vs Numeric Key Variables
- Stata 'not sorted' Error in Merge: The Fix That Takes 5 Seconds
- Linked Datasets in Stata: frlink/frget Workflows Instead of Repeated Merges
- Finding and Removing Duplicates in Stata: duplicates tag, report, drop
- Explore the data management pillar page
- Open the full data management guide index
- Browse all Stata & R guides on the blog index
- Browse all Stata pillars
We build practical, reproducible workflows for Stata and R teams working on real empirical research pipelines.