Workflow
2026-03-0110 min read

Stata Factor Variables: i., c., ibn., and # Notation Explained

Factor variable notation is powerful but confusing. Here's when to use i., c., ibn., ##, and # โ€” with interaction examples and common mistakes.

Sytra Team
Research Engineering Team, Sytra AI

You know your model needs categorical controls, but one small syntax slip turns continuous covariates into hundreds of dummy variables.

You will use factor-variable syntax correctly for main effects and interactions, with interpretable base categories.

All examples tested in Stata 18 SE. Compatible with Stata 15+.


Quick Answer

  1. Use `i.var` for categorical predictors and `c.var` for continuous predictors.
  2. Use `##` to include main effects plus interactions in one term.
  3. Set reference categories with `ib#.` notation when interpretation needs explicit baseline control.
  4. Inspect expanded terms with `fvexpand` when debugging.

Control Model Design Through Explicit Variable Typing

Estimate models with categorical and continuous terms

Factor-variable notation tells Stata how to encode each regressor. This prevents manual dummy errors and keeps postestimation tools compatible.

Explicit typing is essential in interaction models, where variable class changes interpretation.

If you are extending this pipeline, also review Export Regression Tables in Stata and How to Merge Datasets in Stata.

factor-core.do
stata
1clear all
2set obs 2200
3gen firm_id = ceil(_n/11)
4gen year = 2010 + mod(_n,10)
5gen sector = mod(firm_id,4) + 1
6gen education = 8 + floor(runiform()*10)
7gen experience = 18 + floor(runiform()*20)
8gen wage = 10 + 0.8*education + 0.25*experience + 0.9*(sector==2) + rnormal(0,2)
9
10regress wage c.education c.experience i.sector i.year, vce(cluster firm_id)
. regress wage c.education c.experience i.sector i.year, vce(cluster firm_id)
------------------------------------------------------------------------------
        wage | Coefficient  std. err.      t    P>|t|
-------------+----------------------------------------
   education |   .8014075   .0262187    30.57   0.000
  experience |   .2461084   .0109241    22.53   0.000
             |
      sector |
          2  |   .9318814   .1746318     5.34   0.000
          3  |   .4421158   .1700442     2.60   0.010
          4  |   .1934431   .1694074     1.14   0.256
------------------------------------------------------------------------------
๐Ÿ’กLet Stata handle dummies
Manual dummy generation is harder to audit. Factor syntax keeps coding and base-level tracking inside estimation.

Set base categories and inspect expanded terms

Reference-category control matters when interpretation focuses on specific benchmark groups. Stata supports this directly through ib# prefixes.

Use fvexpand to inspect exactly which terms Stata generated before final reporting.

factor-base-expand.do
stata
1clear all
2set obs 2200
3gen firm_id = ceil(_n/11)
4gen year = 2010 + mod(_n,10)
5gen sector = mod(firm_id,4) + 1
6gen education = 8 + floor(runiform()*10)
7gen experience = 18 + floor(runiform()*20)
8gen wage = 10 + 0.8*education + 0.25*experience + 0.9*(sector==2) + rnormal(0,2)
9
10regress wage c.education c.experience i.sector i.year, vce(cluster firm_id)
11
12
13* ---- Section-specific continuation ----
14* Set sector 3 as base category
15regress wage c.education ib3.sector, vce(cluster firm_id)
16
17* Inspect expanded term list
18fvexpand i.sector##c.education
19display "expanded terms: `r(varlist)'"
. display "expanded terms"
expanded terms: 1.sector 2.sector 4.sector c.education 1.sector#c.education 2.sector#c.education 4.sector#c.education
๐Ÿ‘Base category changes coefficient meaning
When you switch base levels, dummy coefficients shift by construction. Document base choices in tables and notes.

Common Errors and Fixes

"factor variables may not contain noninteger values"

A continuous variable was prefixed with i., so Stata expected integer category codes.

Use c. for continuous covariates and reserve i. for categorical integer-coded variables.

. regress wage i.education
factor variables may not contain noninteger values
r(452);
This causes the error
wrong-way.do
stata
regress wage i.education
This is the fix
right-way.do
stata
regress wage c.education
error-fix.do
stata
1summarize education
2regress wage c.education i.sector
. regress wage c.education i.sector
Linear regression

. regress wage c.education i.sector

Command Reference

factor-variable notation

Stata docs โ†’

Specifies variable type and interaction structure directly in estimation commands.

regress y c.x i.z c.x##i.z
i.varCategorical treatment coding
c.varContinuous variable marker
ib#.Custom base category selection
ibn.All categories included (no omitted base)

How Sytra Handles This

Sytra can translate plain-language model descriptions into correct factor-variable syntax with explicit base-category choices.

A direct natural-language prompt for this exact workflow:

sytra-prompt.txt
bash
Rewrite my regression using factor-variable notation: treat sector as categorical with base category 3, education as continuous, include sector-by-education interactions, and show expanded term names.

Sytra catches these errors before you run.

Sytra can translate plain-language model descriptions into correct factor-variable syntax with explicit base-category choices.

Join the Waitlist โ†’

FAQ

What does i. mean in Stata regressions?

i. marks a variable as categorical, so Stata creates indicator contrasts automatically using a base category.

When do I use c. prefix?

Use c. for continuous variables, especially inside interactions, so Stata treats them numerically rather than categorically.

Why would I use ibn. prefix?

ibn. includes indicators for all categories without omitting a base, useful in constrained or no-constant specifications.


Written by Sytra Team
Research Engineering Team, Sytra AI

We build practical, reproducible workflows for Stata and R teams working on real empirical research pipelines.

#Stata#Factor Variables#Regression#Econometrics

Enjoyed this article?