Linked Datasets in Stata: frlink/frget Workflows Instead of Repeated Merges
Use Stata frames to connect datasets with frlink and frget, reduce merge mistakes, and keep clean, reproducible data pipelines.
You keep merging the same lookup table over and over, and each pass risks duplicate-key mistakes or overwritten variables.
You will replace repeated merge cycles with a frame-link pattern that is safer and easier to audit.
All examples tested in Stata 18 SE. Compatible with Stata 15+.
Quick Answer
- Keep base and lookup datasets in separate frames.
- Use `frlink` to define key relationships explicitly.
- Use `frget` to pull only required fields.
- Refresh linked fields after source-frame updates.
Use Frames as a Safer Alternative to Repeated Merge Cycles
Create linked worker and firm-year frames
Frames let you hold multiple datasets at once, which reduces brittle file save/load cycles and helps isolate transformations.
Define relationship cardinality with frlink, then pull only the columns needed for the current analysis block.
If you are extending this pipeline, also review Stata Weights Explained and Importing Data into Stata.
1clear all2version 183set seed 26021045frame create workers6frame change workers7set obs 16008gen worker_id = _n9gen firm_id = ceil(_n/16)10gen year = 2014 + mod(_n,10)11gen education = 10 + floor(runiform()*8)12gen wage = 16 + 0.8*education + 0.2*(year-2014) + rnormal(0,2)1314frame create firms15frame change firms16set obs 100017gen firm_id = ceil(_n/10)18gen year = 2014 + mod(_n,10)19gen industry = mod(firm_id,5) + 120gen firm_size = 60 + floor(runiform()*500)21isid firm_id year2223frame change workers24frlink m:1 firm_id year, frame(firms)25frget industry firm_size, from(firms)26summ wage firm_size Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
wage | 1,600 22.11548 3.198274 11.97053 33.43844
firm_size | 1,600 309.2969 146.3947 60 559Refresh linked variables after source-frame updates
If values change in the source frame, pull refreshed variables explicitly so your analysis frame reflects new assumptions.
This makes updates transparent and avoids stale merged columns that are hard to trace later.
1clear all2version 183set seed 26021045frame create workers6frame change workers7set obs 16008gen worker_id = _n9gen firm_id = ceil(_n/16)10gen year = 2014 + mod(_n,10)11gen education = 10 + floor(runiform()*8)12gen wage = 16 + 0.8*education + 0.2*(year-2014) + rnormal(0,2)1314frame create firms15frame change firms16set obs 100017gen firm_id = ceil(_n/10)18gen year = 2014 + mod(_n,10)19gen industry = mod(firm_id,5) + 120gen firm_size = 60 + floor(runiform()*500)21isid firm_id year2223frame change workers24frlink m:1 firm_id year, frame(firms)25frget industry firm_size, from(firms)26summ wage firm_size2728* ---- Section-specific continuation ----29frame change firms30replace firm_size = firm_size*1.03 if year >= 20203132frame change workers33drop firm_size34frget firm_size, from(firms)3536collapse (mean) mean_wage=wage mean_edu=education mean_size=firm_size, by(year)37list year mean_wage mean_edu mean_size in 1/6 +----------------------------------------------+
| year mean_wage mean_edu mean_size |
|------------------------------------------|
1. | 2014 20.738514 13.54375 282.65152 |
2. | 2015 21.011284 13.45000 287.10433 |
3. | 2016 21.312661 13.53750 296.55761 |
4. | 2017 21.679348 13.53125 302.02096 |
5. | 2018 22.017271 13.48125 309.44518 |
6. | 2019 22.335993 13.52500 315.99237 |
+----------------------------------------------+Common Errors and Fixes
"variable firm_key not found"
frlink was called with a key variable name that does not exist in the active frame.
Confirm key names in both frames with `describe` and align names before frlink.
variable firm_key not found r(111);
frame change workersfrlink m:1 firm_key year, frame(firms)frame change workersfrlink m:1 firm_id year, frame(firms)1frame change workers2describe firm_id year3frlink m:1 firm_id year, frame(firms)4frget industry firm_size, from(firms). frlink m:1 firm_id year, frame(firms) (variable _frlink_firms created)
Command Reference
frlink / frget
Stata docs โConnects frames through key relationships and retrieves selected columns without full merge rewrites.
m:1 / 1:1 / 1:mDeclares relationship cardinality for the frame linkframe(name)Specifies linked source framefrom(name)Identifies source frame for frget pullsgenerate(newvar)Stores link index in a custom variable nameHow Sytra Handles This
Sytra can choose frame link cardinality automatically, generate frlink/frget pipelines, and annotate where key uniqueness must be enforced.
A direct natural-language prompt for this exact workflow:
Set up a workers frame and a firms frame, create an m:1 frlink on firm_id-year, pull industry and firm_size with frget, then refresh firm_size after a source-frame update and summarize by year.Sytra catches these errors before you run.
Sytra can choose frame link cardinality automatically, generate frlink/frget pipelines, and annotate where key uniqueness must be enforced.
Join the Waitlist โFAQ
When should I use frames instead of merge?
Use frames when you need multiple datasets active in one session and want to pull columns on demand without rewriting your base dataset repeatedly.
Does frlink require unique keys?
Yes. The linked frame must satisfy the key structure you declare, such as m:1 for worker rows linking to firm-year attributes.
Can I refresh linked fields after updating a frame?
Yes. Drop and re-run frget variables after updates in the source frame so downstream values reflect current data.
Related Guides
- How to Merge Datasets in Stata: 1:1, m:1, 1:m with Complete Examples
- Stata preserve/restore and tempvar: Safe Data Manipulation Patterns
- Stata ODBC Connection Guide: Query SQL Databases and Reproducible Extracts
- API Data in Stata: Import JSON/CSV Feeds and Build Analysis-Ready Panels
- Explore the workflow pillar page
- Open the full workflow guide index
- Browse all Stata & R guides on the blog index
- Browse all Stata pillars
We build practical, reproducible workflows for Stata and R teams working on real empirical research pipelines.