Is Sytra free for researchers?

Yes. Sytra is free forever for individual researchers. You bring your own API key from OpenAI or Anthropic and pay only for the AI inference costs (typically $0.01-0.10 per query).

Does Sytra upload my data to the cloud?

No. Sytra runs entirely on your local machine. Your .dta files, .csv files, and code never leave your computer. Only the natural language prompt is sent to the AI provider.

What versions of Stata does Sytra support?

Sytra supports Stata 17 and later, including MP, SE, and BE editions.

Workflow

2026-03-0110 min read

Stata Factor Variables: i., c., ibn., and # Notation Explained

Factor variable notation is powerful but confusing. Here's when to use i., c., ibn., ##, and # — with interaction examples and common mistakes.

Sytra Team

Research Engineering Team, Sytra AI

You know your model needs categorical controls, but one small syntax slip turns continuous covariates into hundreds of dummy variables.

You will use factor-variable syntax correctly for main effects and interactions, with interpretable base categories.

All examples tested in Stata 18 SE. Compatible with Stata 15+.

Quick Answer

Use `i.var` for categorical predictors and `c.var` for continuous predictors.
Use `##` to include main effects plus interactions in one term.
Set reference categories with `ib#.` notation when interpretation needs explicit baseline control.
Inspect expanded terms with `fvexpand` when debugging.

Control Model Design Through Explicit Variable Typing

Estimate models with categorical and continuous terms

Factor-variable notation tells Stata how to encode each regressor. This prevents manual dummy errors and keeps postestimation tools compatible.

Explicit typing is essential in interaction models, where variable class changes interpretation.

If you are extending this pipeline, also review Export Regression Tables in Stata and How to Merge Datasets in Stata.

factor-core.do

stata

1clear all
2set obs 2200
3gen firm_id = ceil(_n/11)
4gen year = 2010 + mod(_n,10)
5gen sector = mod(firm_id,4) + 1
6gen education = 8 + floor(runiform()*10)
7gen experience = 18 + floor(runiform()*20)
8gen wage = 10 + 0.8*education + 0.25*experience + 0.9*(sector==2) + rnormal(0,2)
9
10regress wage c.education c.experience i.sector i.year, vce(cluster firm_id)

. regress wage c.education c.experience i.sector i.year, vce(cluster firm_id)

------------------------------------------------------------------------------
        wage | Coefficient  std. err.      t    P>|t|
-------------+----------------------------------------
   education |   .8014075   .0262187    30.57   0.000
  experience |   .2461084   .0109241    22.53   0.000
             |
      sector |
          2  |   .9318814   .1746318     5.34   0.000
          3  |   .4421158   .1700442     2.60   0.010
          4  |   .1934431   .1694074     1.14   0.256
------------------------------------------------------------------------------

💡Let Stata handle dummies

Manual dummy generation is harder to audit. Factor syntax keeps coding and base-level tracking inside estimation.

Set base categories and inspect expanded terms

Reference-category control matters when interpretation focuses on specific benchmark groups. Stata supports this directly through ib# prefixes.

Use fvexpand to inspect exactly which terms Stata generated before final reporting.

factor-base-expand.do

stata

1clear all
2set obs 2200
3gen firm_id = ceil(_n/11)
4gen year = 2010 + mod(_n,10)
5gen sector = mod(firm_id,4) + 1
6gen education = 8 + floor(runiform()*10)
7gen experience = 18 + floor(runiform()*20)
8gen wage = 10 + 0.8*education + 0.25*experience + 0.9*(sector==2) + rnormal(0,2)
9
10regress wage c.education c.experience i.sector i.year, vce(cluster firm_id)
11
12
13* ---- Section-specific continuation ----
14* Set sector 3 as base category
15regress wage c.education ib3.sector, vce(cluster firm_id)
16
17* Inspect expanded term list
18fvexpand i.sector##c.education
19display "expanded terms: `r(varlist)'"

. display "expanded terms"

expanded terms: 1.sector 2.sector 4.sector c.education 1.sector#c.education 2.sector#c.education 4.sector#c.education

👁Base category changes coefficient meaning

When you switch base levels, dummy coefficients shift by construction. Document base choices in tables and notes.

Common Errors and Fixes

"factor variables may not contain noninteger values"

A continuous variable was prefixed with i., so Stata expected integer category codes.

Use c. for continuous covariates and reserve i. for categorical integer-coded variables.

. regress wage i.education

factor variables may not contain noninteger values
r(452);

This causes the error

wrong-way.do

stata

regress wage i.education

This is the fix

right-way.do

stata

regress wage c.education

error-fix.do

stata

1summarize education
2regress wage c.education i.sector

. regress wage c.education i.sector

Linear regression

. regress wage c.education i.sector

Command Reference

factor-variable notation

Stata docs →

Specifies variable type and interaction structure directly in estimation commands.

regress y c.x i.z c.x##i.z

i.varCategorical treatment coding

c.varContinuous variable marker

ib#.Custom base category selection

ibn.All categories included (no omitted base)

How Sytra Handles This

Sytra can translate plain-language model descriptions into correct factor-variable syntax with explicit base-category choices.

A direct natural-language prompt for this exact workflow:

sytra-prompt.txt

bash

Rewrite my regression using factor-variable notation: treat sector as categorical with base category 3, education as continuous, include sector-by-education interactions, and show expanded term names.

Sytra catches these errors before you run.

Sytra can translate plain-language model descriptions into correct factor-variable syntax with explicit base-category choices.

Join the Waitlist →

FAQ

What does i. mean in Stata regressions?

i. marks a variable as categorical, so Stata creates indicator contrasts automatically using a base category.

When do I use c. prefix?

Use c. for continuous variables, especially inside interactions, so Stata treats them numerically rather than categorically.

Why would I use ibn. prefix?

ibn. includes indicators for all categories without omitting a base, useful in constrained or no-constant specifications.

Written by Sytra Team

Research Engineering Team, Sytra AI

We build practical, reproducible workflows for Stata and R teams working on real empirical research pipelines.

#Stata#Factor Variables#Regression#Econometrics

Stata Factor Variables: i., c., ibn., and # Notation Explained

Quick Answer

Control Model Design Through Explicit Variable Typing

Estimate models with categorical and continuous terms

Set base categories and inspect expanded terms

Common Errors and Fixes

"factor variables may not contain noninteger values"

Command Reference

factor-variable notation

How Sytra Handles This

Sytra catches these errors before you run.

FAQ

What does i. mean in Stata regressions?

When do I use c. prefix?

Why would I use ibn. prefix?

Enjoyed this article?

Related Guides

Interaction Effects in Stata: Factor Variables, margins, and Interpretation

Stata margins: Complete Guide to Marginal Effects with Interpretation

Stata Macros: local, global, and Extended Functions Explained

The Copy-Paste Workflow Is Killing Your Research

Publication-Ready Tables in Stata: esttab, outreg2, and collect