Is Sytra free for researchers?

Yes. Sytra is free forever for individual researchers. You bring your own API key from OpenAI or Anthropic and pay only for the AI inference costs (typically $0.01-0.10 per query).

Does Sytra upload my data to the cloud?

No. Sytra runs entirely on your local machine. Your .dta files, .csv files, and code never leave your computer. Only the natural language prompt is sent to the AI provider.

What versions of Stata does Sytra support?

Sytra supports Stata 17 and later, including MP, SE, and BE editions.

Data Management

2026-02-1416 min read

Stata egen Functions: Complete Reference with Examples for Every Function

Every egen function in one place — mean, total, count, max, min, rowmean, rowtotal, group, tag, rank — with examples for each.

Sytra Team

Research Engineering Team, Sytra AI

You need grouped means, tags, and row totals in one script, but each analyst on your team uses different ad hoc code.

You will get a single egen playbook with reproducible patterns that scale from cleaning to estimation prep.

All examples tested in Stata 18 SE. Compatible with Stata 15+.

Quick Answer

Use `egen` for grouped, row-wise, and tagging functions unavailable in plain `generate`.
Pair `bysort` with egen to avoid accidental cross-group calculations.
Validate derived variables with quick summaries and duplicates checks.
Prefer one clear egen pass over repeated patch edits.

Standardize Derived Variables Before Modeling

Compute grouped statistics with bysort + egen

Grouped statistics are a frequent source of silent errors when analysts forget sorting or grouping logic. egen handles this cleanly with explicit group context.

For firm-year or school-cohort work, calculate group means once and document how they were built.

If you are extending this pipeline, also review How to Merge Datasets in Stata and Export Regression Tables in Stata: esttab Tutorial.

egen-grouped-stats.do

stata

1clear all
2set obs 600
3gen firm_id = ceil(_n/6)
4gen year = 2015 + mod(_n,8)
5gen wage = 18 + rnormal(0,4)
6gen education = 9 + floor(runiform()*9)
7
8bysort firm_id: egen firm_mean_wage = mean(wage)
9bysort year: egen year_mean_education = mean(education)
10bysort firm_id year: egen n_firm_year = count(wage)
11
12summarize firm_mean_wage year_mean_education n_firm_year

. summarize firm_mean_wage year_mean_education n_firm_year

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
firm_mean_~e |        600    18.04231    1.581243    13.9042    22.1194
year_mean_~n |        600    12.97167    .3678021    12.4211    13.4667
n_firm_year  |        600         1.2    .4477325          1          2

💡Name derived variables explicitly

Use prefixes like `firm_` or `year_` so collaborators can tell whether a variable is raw or derived without scanning your do-file.

Row-wise and tagging functions for QA workflows

Row-wise functions are useful when combining multiple survey items into indices. Tagging functions support duplicate audits and sample construction.

These functions are reliable if you keep variable lists explicit and verify edge cases like missing values.

egen-row-tag.do

stata

1clear all
2set obs 600
3gen firm_id = ceil(_n/6)
4gen year = 2015 + mod(_n,8)
5gen wage = 18 + rnormal(0,4)
6gen education = 9 + floor(runiform()*9)
7
8bysort firm_id: egen firm_mean_wage = mean(wage)
9bysort year: egen year_mean_education = mean(education)
10bysort firm_id year: egen n_firm_year = count(wage)
11
12summarize firm_mean_wage year_mean_education n_firm_year
13
14* ---- Section-specific continuation ----
15gen score_math = floor(runiform()*100)
16gen score_read = floor(runiform()*100)
17gen score_science = floor(runiform()*100)
18
19egen score_total = rowtotal(score_math score_read score_science)
20egen score_mean = rowmean(score_math score_read score_science)
21egen firm_tag = tag(firm_id)
22egen wage_rank = rank(wage), by(year)
23
24list firm_id year score_total score_mean firm_tag wage_rank in 1/8

. list firm_id year score_total score_mean firm_tag wage_rank in 1/8

     +---------------------------------------------------+
     | firm_id   year   score_total   score_mean   firm_tag   wage_rank |
     |---------------------------------------------------|
  1. |       1   2015          196     65.33333          1          43 |
  2. |       1   2016          168           56          0          51 |
  3. |       1   2017          214     71.33333          0          66 |
     +---------------------------------------------------+

👁rowtotal handles missing differently

rowtotal skips missing values by default. If all inputs are missing, the result is 0, so verify whether that is appropriate for your design.

Common Errors and Fixes

"unknown egen function rowmeans()"

The function name is misspelled. egen functions are strict and often differ from expected plural forms.

Run `help egen` and copy function names exactly; rowmean is singular.

. egen avg_score = rowmeans(score_math score_read score_science)

unknown egen function rowmeans()
r(133);

This causes the error

wrong-way.do

stata

egen avg_score = rowmeans(score_math score_read score_science)

This is the fix

right-way.do

stata

egen avg_score = rowmean(score_math score_read score_science)

error-fix.do

stata

1capture drop avg_score
2egen avg_score = rowmean(score_math score_read score_science)
3summarize avg_score

. summarize avg_score

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
   avg_score |        600    49.84111    17.94028          5     95.667

Command Reference

egen

Stata docs →

Creates derived variables using grouped, row-wise, and specialized functions.

egen newvar = function(arguments) [, by(groupvars)]

by()Apply function within groups

rowtotal()Row-wise sum across listed variables

tag()Flags first observation in each group

rank(), by()Within-group ranking for percentile work

How Sytra Handles This

Sytra can translate grouped-statistics requests into exact egen patterns and flag when collapse might be more efficient.

A direct natural-language prompt for this exact workflow:

sytra-prompt.txt

bash

Generate firm-level and year-level grouped summaries with egen, then build row-wise score indices and a duplicate tag variable for firm_id.

Sytra catches these errors before you run.

Sytra can translate grouped-statistics requests into exact egen patterns and flag when collapse might be more efficient.

Join the Waitlist →

FAQ

What is the difference between gen and egen in Stata?

gen computes observation-level expressions, while egen adds grouped and row-wise functions such as mean by group, rowtotal, tag, and rank.

Can egen be slow on large datasets?

It can be slower than specialized commands on very large panels. Use bysort with efficient grouping and avoid repeated egen calls when one pass is enough.

How do I compute group means without collapsing data?

Use `bysort group: egen newvar = mean(oldvar)` so your original row-level data stays intact.

Written by Sytra Team

Research Engineering Team, Sytra AI

We build practical, reproducible workflows for Stata and R teams working on real empirical research pipelines.

#Stata#egen#Data Management#Reference

Stata egen Functions: Complete Reference with Examples for Every Function

Quick Answer

Standardize Derived Variables Before Modeling

Compute grouped statistics with bysort + egen

Row-wise and tagging functions for QA workflows

Common Errors and Fixes

"unknown egen function rowmeans()"

Command Reference

egen

How Sytra Handles This

Sytra catches these errors before you run.

FAQ

What is the difference between gen and egen in Stata?

Can egen be slow on large datasets?

How do I compute group means without collapsing data?

Enjoyed this article?

Related Guides

Stata collapse: How to Aggregate Data with Examples

Stata Loops: foreach and forvalues Tutorial with 20 Practical Examples

How to Merge Datasets in Stata: 1:1, m:1, 1:m with Complete Examples

Reshape in Stata: Wide to Long and Long to Wide with Real Panel Data

import delimited in Stata: CSV Imports Without Type Breaks