Is Sytra free for researchers?

Yes. Sytra is free forever for individual researchers. You bring your own API key from OpenAI or Anthropic and pay only for the AI inference costs (typically $0.01-0.10 per query).

Does Sytra upload my data to the cloud?

No. Sytra runs entirely on your local machine. Your .dta files, .csv files, and code never leave your computer. Only the natural language prompt is sent to the AI provider.

What versions of Stata does Sytra support?

Sytra supports Stata 17 and later, including MP, SE, and BE editions.

Data Management

2026-02-1115 min read

How to Merge Datasets in Stata: 1:1, m:1, 1:m with Complete Examples

The definitive guide to merging in Stata. Covers every merge type, _merge diagnostics, keepusing, common errors, and when to use joinby instead.

Sytra Team

Research Engineering Team, Sytra AI

You have been staring at a merge error for 20 minutes, and every rerun gives a different number of matched rows.

By the end, you will know exactly which merge type to use, how to validate it, and how to recover from bad merges safely.

All examples tested in Stata 18 SE. Compatible with Stata 15+.

Quick Answer

Confirm the merge key with `isid` on the side that must be unique.
Run the right merge type (`1:1`, `m:1`, or `1:m`) and keep `_merge`.
Audit unmatched observations before dropping anything.
Save a validated merged dataset only after diagnostics pass.

Build a Merge Workflow That Survives Peer Review

Create stable keys and run the correct merge type

Most merge failures start with unstable keys. In firm-year data, key drift usually comes from string IDs, missing years, or duplicate observations that went unnoticed during cleaning.

Start by building numeric keys, checking uniqueness, and only then run the merge. This pattern avoids almost every downstream merge bug that appears later in regressions.

If you are extending this pipeline, also review reghdfe in Stata: High-Dimensional Fixed Effects Made Simple and How to Structure a Stata Project.

merge-main.do

stata

1clear all
2set seed 260211
3
4* Master dataset: worker-year wages
5set obs 1200
6gen firm_id = ceil(_n/12)
7gen year = 2014 + mod(_n, 10)
8gen worker_id = _n
9gen education = 10 + floor(runiform()*8)
10gen wage = 14 + 0.8*education + 0.2*(year-2014) + rnormal(0,2)
11tempfile master using
12save `master'
13
14* Using dataset: firm-year covariates
15clear
16set obs 1000
17gen firm_id = ceil(_n/10)
18gen year = 2014 + mod(_n, 10)
19gen industry = mod(firm_id, 6) + 1
20gen firm_size = 40 + floor(runiform()*300)
21isid firm_id year
22save `using'
23
24use `master', clear
25merge m:1 firm_id year using `using'
26tab _merge

. tab _merge

     Result from merge |      Freq.     Percent        Cum.
-----------------------+-----------------------------------
Master only (1)        |        192       16.00       16.00
Using only (2)         |          0        0.00       16.00
Matched (3)            |      1,008       84.00      100.00
-----------------------+-----------------------------------
Total                  |      1,200      100.00

💡Audit before dropping

Do not drop `_merge==1` until you can explain why rows are unmatched. Those rows often reveal data vendor gaps or coding errors.

Diagnose duplicates and key conflicts before estimation

If keys are duplicated on a side that must be unique, Stata stops with a hard error. This is useful: it protects your design from accidental many-to-many merges.

When duplicates are real, aggregate first or switch design logic. Never force a merge and hope model fixed effects will absorb the data issue.

merge-diagnostics.do

stata

1clear all
2set seed 260211
3
4* Master dataset: worker-year wages
5set obs 1200
6gen firm_id = ceil(_n/12)
7gen year = 2014 + mod(_n, 10)
8gen worker_id = _n
9gen education = 10 + floor(runiform()*8)
10gen wage = 14 + 0.8*education + 0.2*(year-2014) + rnormal(0,2)
11tempfile master using
12save `master'
13
14* Using dataset: firm-year covariates
15clear
16set obs 1000
17gen firm_id = ceil(_n/10)
18gen year = 2014 + mod(_n, 10)
19gen industry = mod(firm_id, 6) + 1
20gen firm_size = 40 + floor(runiform()*300)
21isid firm_id year
22save `using'
23
24use `master', clear
25merge m:1 firm_id year using `using'
26tab _merge
27
28* ---- Section-specific continuation ----
29use `using', clear
30
31* Check uniqueness directly
32isid firm_id year
33
34* If the command above fails, diagnose duplicates
35duplicates report firm_id year
36duplicates tag firm_id year, gen(dup)
37list firm_id year if dup > 0, sepby(firm_id year)
38
39* Safe cleanup example
40bysort firm_id year: egen avg_firm_size = mean(firm_size)
41by firm_id year: keep if _n == 1
42replace firm_size = avg_firm_size
43drop avg_firm_size dup
44isid firm_id year

. duplicates report firm_id year

Duplicates in terms of firm_id year

--------------------------------------
   Copies | Observations       Surplus
----------+---------------------------
        1 |          968             0
        2 |           32            16
--------------------------------------

⚠️Many-to-many is usually design failure

If you think you need `merge m:m`, pause. In applied research, it usually indicates a key definition problem, not a valid design choice.

Common Errors and Fixes

"variables firm_id year do not uniquely identify observations in the using data"

This happens when your using dataset has duplicate key combinations while your command expects uniqueness on that side.

Run `duplicates report firm_id year` in the using dataset and decide whether to aggregate or redefine the merge key.

. use worker_panel.dta, clear

variables firm_id year do not uniquely identify observations in the using data
r(459);

This causes the error

wrong-way.do

stata

use worker_panel.dta, clear
merge m:1 firm_id year using firm_covariates.dta

This is the fix

right-way.do

stata

use firm_covariates.dta, clear
bysort firm_id year: egen firm_size_avg = mean(firm_size)
by firm_id year: keep if _n==1
replace firm_size = firm_size_avg
drop firm_size_avg
save firm_covariates_clean.dta, replace
use worker_panel.dta, clear
merge m:1 firm_id year using firm_covariates_clean.dta

error-fix.do

stata

1use worker_panel.dta, clear
2merge m:1 firm_id year using firm_covariates_clean.dta
3tab _merge
4assert _merge != 2

. assert _merge != 2

(0 real changes made)

. assert _merge != 2

Command Reference

merge

Stata docs →

Combines master and using datasets by key variables while generating merge diagnostics via `_merge`.

merge [1:1 | 1:m | m:1] keyvars using filename [, keepusing(varlist) nogen assert()]

keepusing(varlist)Pull only required variables from using dataset

nogenSuppresses _merge generation when diagnostics are already captured

assert(match)Fails fast when unmatched observations appear

update replaceControlled overwrites when merging revised fields

How Sytra Handles This

Sytra can run key uniqueness checks, type checks, and merge diagnostics before writing the final merge command, reducing silent data loss.

A direct natural-language prompt for this exact workflow:

sytra-prompt.txt

bash

Merge worker_panel.dta with firm_covariates.dta on firm_id and year using m:1. Before merge, verify key uniqueness in using, report duplicates, aggregate duplicates by mean firm_size, then rerun merge and output a table of _merge counts.

Sytra catches these errors before you run.

Sytra can run key uniqueness checks, type checks, and merge diagnostics before writing the final merge command, reducing silent data loss.

Join the Waitlist →

FAQ

What merge type should I use in Stata?

Use merge 1:1 when both datasets are unique on the key, merge m:1 when master has repeated keys and using is unique, and merge 1:m for the opposite case.

Do I need to sort before merge 1:1?

No. Modern merge syntax in Stata 11+ does not require manual sorting, but key uniqueness checks are still required for valid results.

How do I verify merge quality quickly?

Always tabulate _merge, inspect unmatched rows, and run isid on keys before merging so you catch duplicates before they damage estimates.

Written by Sytra Team

Research Engineering Team, Sytra AI

We build practical, reproducible workflows for Stata and R teams working on real empirical research pipelines.

#Stata#Merge#Data Management

How to Merge Datasets in Stata: 1:1, m:1, 1:m with Complete Examples

Quick Answer

Build a Merge Workflow That Survives Peer Review

Create stable keys and run the correct merge type

Diagnose duplicates and key conflicts before estimation

Common Errors and Fixes

"variables firm_id year do not uniquely identify observations in the using data"

Command Reference

merge

How Sytra Handles This

Sytra catches these errors before you run.

FAQ

What merge type should I use in Stata?

Do I need to sort before merge 1:1?

How do I verify merge quality quickly?

Enjoyed this article?

Related Guides

Stata Type Mismatch Error in Merge: String vs Numeric Key Variables

Stata 'not sorted' Error in Merge: The Fix That Takes 5 Seconds

Reshape in Stata: Wide to Long and Long to Wide with Real Panel Data

import delimited in Stata: CSV Imports Without Type Breaks

summarize, detail in Stata: Percentiles, Skewness, and Checks