Workflow
2026-03-0415 min read

Linked Datasets in Stata: frlink/frget Workflows Instead of Repeated Merges

Use Stata frames to connect datasets with frlink and frget, reduce merge mistakes, and keep clean, reproducible data pipelines.

Sytra Team
Research Engineering Team, Sytra AI

You keep merging the same lookup table over and over, and each pass risks duplicate-key mistakes or overwritten variables.

You will replace repeated merge cycles with a frame-link pattern that is safer and easier to audit.

All examples tested in Stata 18 SE. Compatible with Stata 15+.


Quick Answer

  1. Keep base and lookup datasets in separate frames.
  2. Use `frlink` to define key relationships explicitly.
  3. Use `frget` to pull only required fields.
  4. Refresh linked fields after source-frame updates.

Use Frames as a Safer Alternative to Repeated Merge Cycles

Create linked worker and firm-year frames

Frames let you hold multiple datasets at once, which reduces brittle file save/load cycles and helps isolate transformations.

Define relationship cardinality with frlink, then pull only the columns needed for the current analysis block.

If you are extending this pipeline, also review Stata Weights Explained and Importing Data into Stata.

frames-link-core.do
stata
1clear all
2version 18
3set seed 260210
4
5frame create workers
6frame change workers
7set obs 1600
8gen worker_id = _n
9gen firm_id = ceil(_n/16)
10gen year = 2014 + mod(_n,10)
11gen education = 10 + floor(runiform()*8)
12gen wage = 16 + 0.8*education + 0.2*(year-2014) + rnormal(0,2)
13
14frame create firms
15frame change firms
16set obs 1000
17gen firm_id = ceil(_n/10)
18gen year = 2014 + mod(_n,10)
19gen industry = mod(firm_id,5) + 1
20gen firm_size = 60 + floor(runiform()*500)
21isid firm_id year
22
23frame change workers
24frlink m:1 firm_id year, frame(firms)
25frget industry firm_size, from(firms)
26summ wage firm_size
. summ wage firm_size
    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
        wage |      1,600    22.11548    3.198274   11.97053   33.43844
   firm_size |      1,600    309.2969    146.3947         60        559
๐Ÿ’กLink once, pull many times
A stable frlink setup lets you frget additional fields later without rerunning full merges.

Refresh linked variables after source-frame updates

If values change in the source frame, pull refreshed variables explicitly so your analysis frame reflects new assumptions.

This makes updates transparent and avoids stale merged columns that are hard to trace later.

frames-refresh.do
stata
1clear all
2version 18
3set seed 260210
4
5frame create workers
6frame change workers
7set obs 1600
8gen worker_id = _n
9gen firm_id = ceil(_n/16)
10gen year = 2014 + mod(_n,10)
11gen education = 10 + floor(runiform()*8)
12gen wage = 16 + 0.8*education + 0.2*(year-2014) + rnormal(0,2)
13
14frame create firms
15frame change firms
16set obs 1000
17gen firm_id = ceil(_n/10)
18gen year = 2014 + mod(_n,10)
19gen industry = mod(firm_id,5) + 1
20gen firm_size = 60 + floor(runiform()*500)
21isid firm_id year
22
23frame change workers
24frlink m:1 firm_id year, frame(firms)
25frget industry firm_size, from(firms)
26summ wage firm_size
27
28* ---- Section-specific continuation ----
29frame change firms
30replace firm_size = firm_size*1.03 if year >= 2020
31
32frame change workers
33drop firm_size
34frget firm_size, from(firms)
35
36collapse (mean) mean_wage=wage mean_edu=education mean_size=firm_size, by(year)
37list year mean_wage mean_edu mean_size in 1/6
. list year mean_wage mean_edu mean_size in 1/6
     +----------------------------------------------+
     | year   mean_wage   mean_edu   mean_size |
     |------------------------------------------|
  1. | 2014   20.738514   13.54375   282.65152 |
  2. | 2015   21.011284   13.45000   287.10433 |
  3. | 2016   21.312661   13.53750   296.55761 |
  4. | 2017   21.679348   13.53125   302.02096 |
  5. | 2018   22.017271   13.48125   309.44518 |
  6. | 2019   22.335993   13.52500   315.99237 |
     +----------------------------------------------+
โš ๏ธDropped variable is intentional
Drop and re-pull fields when source-frame values change so old columns do not persist silently.

Common Errors and Fixes

"variable firm_key not found"

frlink was called with a key variable name that does not exist in the active frame.

Confirm key names in both frames with `describe` and align names before frlink.

. frame change workers
variable firm_key not found
r(111);
This causes the error
wrong-way.do
stata
frame change workers
frlink m:1 firm_key year, frame(firms)
This is the fix
right-way.do
stata
frame change workers
frlink m:1 firm_id year, frame(firms)
error-fix.do
stata
1frame change workers
2describe firm_id year
3frlink m:1 firm_id year, frame(firms)
4frget industry firm_size, from(firms)
. frlink m:1 firm_id year, frame(firms)
. frlink m:1 firm_id year, frame(firms)
(variable _frlink_firms created)

Command Reference

frlink / frget

Stata docs โ†’

Connects frames through key relationships and retrieves selected columns without full merge rewrites.

frlink m:1 keyvars, frame(framename) ; frget varlist, from(framename)
m:1 / 1:1 / 1:mDeclares relationship cardinality for the frame link
frame(name)Specifies linked source frame
from(name)Identifies source frame for frget pulls
generate(newvar)Stores link index in a custom variable name

How Sytra Handles This

Sytra can choose frame link cardinality automatically, generate frlink/frget pipelines, and annotate where key uniqueness must be enforced.

A direct natural-language prompt for this exact workflow:

sytra-prompt.txt
bash
Set up a workers frame and a firms frame, create an m:1 frlink on firm_id-year, pull industry and firm_size with frget, then refresh firm_size after a source-frame update and summarize by year.

Sytra catches these errors before you run.

Sytra can choose frame link cardinality automatically, generate frlink/frget pipelines, and annotate where key uniqueness must be enforced.

Join the Waitlist โ†’

FAQ

When should I use frames instead of merge?

Use frames when you need multiple datasets active in one session and want to pull columns on demand without rewriting your base dataset repeatedly.

Yes. The linked frame must satisfy the key structure you declare, such as m:1 for worker rows linking to firm-year attributes.

Can I refresh linked fields after updating a frame?

Yes. Drop and re-run frget variables after updates in the source frame so downstream values reflect current data.


Written by Sytra Team
Research Engineering Team, Sytra AI

We build practical, reproducible workflows for Stata and R teams working on real empirical research pipelines.

#Stata#Frames#frlink#Workflow

Enjoyed this article?