Stata egen Functions: Complete Reference with Examples for Every Function
Every egen function in one place โ mean, total, count, max, min, rowmean, rowtotal, group, tag, rank โ with examples for each.
You need grouped means, tags, and row totals in one script, but each analyst on your team uses different ad hoc code.
You will get a single egen playbook with reproducible patterns that scale from cleaning to estimation prep.
All examples tested in Stata 18 SE. Compatible with Stata 15+.
Quick Answer
- Use `egen` for grouped, row-wise, and tagging functions unavailable in plain `generate`.
- Pair `bysort` with egen to avoid accidental cross-group calculations.
- Validate derived variables with quick summaries and duplicates checks.
- Prefer one clear egen pass over repeated patch edits.
Standardize Derived Variables Before Modeling
Compute grouped statistics with bysort + egen
Grouped statistics are a frequent source of silent errors when analysts forget sorting or grouping logic. egen handles this cleanly with explicit group context.
For firm-year or school-cohort work, calculate group means once and document how they were built.
If you are extending this pipeline, also review How to Merge Datasets in Stata and Export Regression Tables in Stata: esttab Tutorial.
1clear all2set obs 6003gen firm_id = ceil(_n/6)4gen year = 2015 + mod(_n,8)5gen wage = 18 + rnormal(0,4)6gen education = 9 + floor(runiform()*9)78bysort firm_id: egen firm_mean_wage = mean(wage)9bysort year: egen year_mean_education = mean(education)10bysort firm_id year: egen n_firm_year = count(wage)1112summarize firm_mean_wage year_mean_education n_firm_yearVariable | Obs Mean Std. dev. Min Max -------------+--------------------------------------------------------- firm_mean_~e | 600 18.04231 1.581243 13.9042 22.1194 year_mean_~n | 600 12.97167 .3678021 12.4211 13.4667 n_firm_year | 600 1.2 .4477325 1 2
Row-wise and tagging functions for QA workflows
Row-wise functions are useful when combining multiple survey items into indices. Tagging functions support duplicate audits and sample construction.
These functions are reliable if you keep variable lists explicit and verify edge cases like missing values.
1clear all2set obs 6003gen firm_id = ceil(_n/6)4gen year = 2015 + mod(_n,8)5gen wage = 18 + rnormal(0,4)6gen education = 9 + floor(runiform()*9)78bysort firm_id: egen firm_mean_wage = mean(wage)9bysort year: egen year_mean_education = mean(education)10bysort firm_id year: egen n_firm_year = count(wage)1112summarize firm_mean_wage year_mean_education n_firm_year1314* ---- Section-specific continuation ----15gen score_math = floor(runiform()*100)16gen score_read = floor(runiform()*100)17gen score_science = floor(runiform()*100)1819egen score_total = rowtotal(score_math score_read score_science)20egen score_mean = rowmean(score_math score_read score_science)21egen firm_tag = tag(firm_id)22egen wage_rank = rank(wage), by(year)2324list firm_id year score_total score_mean firm_tag wage_rank in 1/8 +---------------------------------------------------+
| firm_id year score_total score_mean firm_tag wage_rank |
|---------------------------------------------------|
1. | 1 2015 196 65.33333 1 43 |
2. | 1 2016 168 56 0 51 |
3. | 1 2017 214 71.33333 0 66 |
+---------------------------------------------------+Common Errors and Fixes
"unknown egen function rowmeans()"
The function name is misspelled. egen functions are strict and often differ from expected plural forms.
Run `help egen` and copy function names exactly; rowmean is singular.
unknown egen function rowmeans() r(133);
egen avg_score = rowmeans(score_math score_read score_science)egen avg_score = rowmean(score_math score_read score_science)1capture drop avg_score2egen avg_score = rowmean(score_math score_read score_science)3summarize avg_scoreVariable | Obs Mean Std. dev. Min Max -------------+--------------------------------------------------------- avg_score | 600 49.84111 17.94028 5 95.667
Command Reference
egen
Stata docs โCreates derived variables using grouped, row-wise, and specialized functions.
by()Apply function within groupsrowtotal()Row-wise sum across listed variablestag()Flags first observation in each grouprank(), by()Within-group ranking for percentile workHow Sytra Handles This
Sytra can translate grouped-statistics requests into exact egen patterns and flag when collapse might be more efficient.
A direct natural-language prompt for this exact workflow:
Generate firm-level and year-level grouped summaries with egen, then build row-wise score indices and a duplicate tag variable for firm_id.Sytra catches these errors before you run.
Sytra can translate grouped-statistics requests into exact egen patterns and flag when collapse might be more efficient.
Join the Waitlist โFAQ
What is the difference between gen and egen in Stata?
gen computes observation-level expressions, while egen adds grouped and row-wise functions such as mean by group, rowtotal, tag, and rank.
Can egen be slow on large datasets?
It can be slower than specialized commands on very large panels. Use bysort with efficient grouping and avoid repeated egen calls when one pass is enough.
How do I compute group means without collapsing data?
Use `bysort group: egen newvar = mean(oldvar)` so your original row-level data stays intact.
Related Guides
- Stata collapse: How to Aggregate Data with Examples
- Stata Loops: foreach and forvalues Tutorial with 20 Practical Examples
- How to Merge Datasets in Stata: 1:1, m:1, 1:m with Complete Examples
- Finding and Removing Duplicates in Stata: duplicates tag, report, drop
- Explore the data management pillar page
- Open the full data management guide index
- Browse all Stata & R guides on the blog index
- Browse all Stata pillars
We build practical, reproducible workflows for Stata and R teams working on real empirical research pipelines.