Stata Factor Variables: i., c., ibn., and # Notation Explained
Factor variable notation is powerful but confusing. Here's when to use i., c., ibn., ##, and # โ with interaction examples and common mistakes.
You know your model needs categorical controls, but one small syntax slip turns continuous covariates into hundreds of dummy variables.
You will use factor-variable syntax correctly for main effects and interactions, with interpretable base categories.
All examples tested in Stata 18 SE. Compatible with Stata 15+.
Quick Answer
- Use `i.var` for categorical predictors and `c.var` for continuous predictors.
- Use `##` to include main effects plus interactions in one term.
- Set reference categories with `ib#.` notation when interpretation needs explicit baseline control.
- Inspect expanded terms with `fvexpand` when debugging.
Control Model Design Through Explicit Variable Typing
Estimate models with categorical and continuous terms
Factor-variable notation tells Stata how to encode each regressor. This prevents manual dummy errors and keeps postestimation tools compatible.
Explicit typing is essential in interaction models, where variable class changes interpretation.
If you are extending this pipeline, also review Export Regression Tables in Stata and How to Merge Datasets in Stata.
1clear all2set obs 22003gen firm_id = ceil(_n/11)4gen year = 2010 + mod(_n,10)5gen sector = mod(firm_id,4) + 16gen education = 8 + floor(runiform()*10)7gen experience = 18 + floor(runiform()*20)8gen wage = 10 + 0.8*education + 0.25*experience + 0.9*(sector==2) + rnormal(0,2)910regress wage c.education c.experience i.sector i.year, vce(cluster firm_id)------------------------------------------------------------------------------
wage | Coefficient std. err. t P>|t|
-------------+----------------------------------------
education | .8014075 .0262187 30.57 0.000
experience | .2461084 .0109241 22.53 0.000
|
sector |
2 | .9318814 .1746318 5.34 0.000
3 | .4421158 .1700442 2.60 0.010
4 | .1934431 .1694074 1.14 0.256
------------------------------------------------------------------------------Set base categories and inspect expanded terms
Reference-category control matters when interpretation focuses on specific benchmark groups. Stata supports this directly through ib# prefixes.
Use fvexpand to inspect exactly which terms Stata generated before final reporting.
1clear all2set obs 22003gen firm_id = ceil(_n/11)4gen year = 2010 + mod(_n,10)5gen sector = mod(firm_id,4) + 16gen education = 8 + floor(runiform()*10)7gen experience = 18 + floor(runiform()*20)8gen wage = 10 + 0.8*education + 0.25*experience + 0.9*(sector==2) + rnormal(0,2)910regress wage c.education c.experience i.sector i.year, vce(cluster firm_id)111213* ---- Section-specific continuation ----14* Set sector 3 as base category15regress wage c.education ib3.sector, vce(cluster firm_id)1617* Inspect expanded term list18fvexpand i.sector##c.education19display "expanded terms: `r(varlist)'"expanded terms: 1.sector 2.sector 4.sector c.education 1.sector#c.education 2.sector#c.education 4.sector#c.education
Common Errors and Fixes
"factor variables may not contain noninteger values"
A continuous variable was prefixed with i., so Stata expected integer category codes.
Use c. for continuous covariates and reserve i. for categorical integer-coded variables.
factor variables may not contain noninteger values r(452);
regress wage i.educationregress wage c.education1summarize education2regress wage c.education i.sectorLinear regression . regress wage c.education i.sector
Command Reference
factor-variable notation
Stata docs โSpecifies variable type and interaction structure directly in estimation commands.
i.varCategorical treatment codingc.varContinuous variable markerib#.Custom base category selectionibn.All categories included (no omitted base)How Sytra Handles This
Sytra can translate plain-language model descriptions into correct factor-variable syntax with explicit base-category choices.
A direct natural-language prompt for this exact workflow:
Rewrite my regression using factor-variable notation: treat sector as categorical with base category 3, education as continuous, include sector-by-education interactions, and show expanded term names.Sytra catches these errors before you run.
Sytra can translate plain-language model descriptions into correct factor-variable syntax with explicit base-category choices.
Join the Waitlist โFAQ
What does i. mean in Stata regressions?
i. marks a variable as categorical, so Stata creates indicator contrasts automatically using a base category.
When do I use c. prefix?
Use c. for continuous variables, especially inside interactions, so Stata treats them numerically rather than categorically.
Why would I use ibn. prefix?
ibn. includes indicators for all categories without omitting a base, useful in constrained or no-constant specifications.
Related Guides
- Interaction Effects in Stata: Factor Variables, margins, and Interpretation
- Stata margins: Complete Guide to Marginal Effects with Interpretation
- Stata Macros: local, global, and Extended Functions Explained
- Logistic Regression in Stata: Marginal Effects That Actually Make Sense
- Explore the workflow pillar page
- Open the full workflow guide index
- Browse all Stata & R guides on the blog index
- Browse all Stata pillars
We build practical, reproducible workflows for Stata and R teams working on real empirical research pipelines.