Regression
2026-02-2411 min read

Clustered Standard Errors in Stata: vce(cluster) Explained with Examples

When and why to cluster standard errors. vce(cluster), vce(robust), two-way clustering, and the most common mistake researchers make.

Sytra Team
Research Engineering Team, Sytra AI

Your coefficient stays the same, but significance flips after clustering, and now you need to explain why to reviewers.

You will choose clustering levels correctly and report inference choices with defensible diagnostics.

All examples tested in Stata 18 SE. Compatible with Stata 15+.


Quick Answer

  1. Start from design: cluster where shocks or treatment assignment vary.
  2. Run `vce(cluster cluster_id)` in estimation commands.
  3. Report number of clusters and sensitivity checks.
  4. Avoid defaulting to robust when cluster dependence is expected.

Align Inference with Data-Generating Dependence

Compare robust and clustered inference directly

Coefficient point estimates are often unchanged across VCE choices, but standard errors can differ substantially. This changes inference, not point prediction.

A side-by-side comparison is the cleanest way to communicate why cluster-robust inference is required.

If you are extending this pipeline, also review How to Merge Datasets in Stata and How to Structure a Stata Project.

cluster-vs-robust.do
stata
1clear all
2set obs 5000
3gen firm_id = ceil(_n/25)
4gen year = 2010 + mod(_n,10)
5gen education = 8 + floor(runiform()*10)
6gen experience = 18 + floor(runiform()*20)
7
8* Cluster-level shock component
9bysort firm_id: gen firm_shock = rnormal(0,1) if _n==1
10bysort firm_id: replace firm_shock = firm_shock[1]
11
12gen wage = 10 + 0.8*education + 0.3*experience + firm_shock + rnormal(0,2)
13
14regress wage education experience, vce(robust)
15estimates store robust_model
16
17regress wage education experience, vce(cluster firm_id)
18estimates store cluster_model
19
20estimates table robust_model cluster_model, b se
. estimates table robust_model cluster_model, b se
-------------------------------------------------
    Variable |  robust_model      cluster_model
-------------+-----------------------------------
   education |     0.8033           0.8033
             |    (0.0161)         (0.0248)
  experience |     0.2975           0.2975
             |    (0.0072)         (0.0109)
-------------------------------------------------
๐Ÿ’กPoint estimates are not the issue
Clustering is mainly about valid uncertainty quantification under within-group dependence.

Check cluster counts and support by design

Cluster asymptotics rely on enough clusters. Reporting cluster count and size dispersion should be standard in appendices.

If clusters are few, include robustness checks such as wild bootstrap where appropriate.

cluster-diagnostics.do
stata
1clear all
2set obs 5000
3gen firm_id = ceil(_n/25)
4gen year = 2010 + mod(_n,10)
5gen education = 8 + floor(runiform()*10)
6gen experience = 18 + floor(runiform()*20)
7
8* Cluster-level shock component
9bysort firm_id: gen firm_shock = rnormal(0,1) if _n==1
10bysort firm_id: replace firm_shock = firm_shock[1]
11
12gen wage = 10 + 0.8*education + 0.3*experience + firm_shock + rnormal(0,2)
13
14regress wage education experience, vce(robust)
15estimates store robust_model
16
17regress wage education experience, vce(cluster firm_id)
18estimates store cluster_model
19
20estimates table robust_model cluster_model, b se
21
22* ---- Section-specific continuation ----
23egen n_by_firm = count(wage), by(firm_id)
24quietly levelsof firm_id, local(firms)
25local n_clusters : word count `firms'
26display "Number of clusters: " `n_clusters'
27
28summarize n_by_firm
29
30tab year
. display "Number of clusters"
Number of clusters: 200
โš ๏ธFew clusters warning
If cluster count is below common thresholds, report this explicitly and consider finite-sample robust alternatives.

Common Errors and Fixes

"varname required"

The cluster option was supplied without a clustering variable.

Provide a valid grouping variable inside vce(cluster ...).

. regress wage education experience, vce(cluster)
varname required
r(100);
This causes the error
wrong-way.do
stata
regress wage education experience, vce(cluster)
This is the fix
right-way.do
stata
regress wage education experience, vce(cluster firm_id)
error-fix.do
stata
1confirm variable firm_id
2regress wage education experience, vce(cluster firm_id)
. confirm variable firm_id
. confirm variable firm_id

Command Reference

vce(cluster)

Stata docs โ†’

Computes cluster-robust variance estimates allowing arbitrary dependence within clusters.

regress y x1 x2, vce(cluster clustvar)
vce(robust)Heteroskedasticity-robust only
vce(cluster id)Cluster-robust by id
smallFinite-sample correction where available
dfadjDegrees-of-freedom adjustment in some estimators

How Sytra Handles This

Sytra can infer likely clustering levels from design variables and warn when cluster counts are too small for reliable asymptotics.

A direct natural-language prompt for this exact workflow:

sytra-prompt.txt
bash
Estimate wage on education and experience with both robust and firm-clustered SEs, compare coefficient tables, report cluster count and average cluster size, and flag if clusters are too few.

Sytra catches these errors before you run.

Sytra can infer likely clustering levels from design variables and warn when cluster counts are too small for reliable asymptotics.

Join the Waitlist โ†’

FAQ

When should I cluster standard errors?

Cluster when residual dependence is likely within groups such as firms, schools, or regions that share shocks over time.

Can I cluster with very few clusters?

Inference becomes unstable with very few clusters. Consider wild-bootstrap or alternative design choices when cluster count is low.

What is the difference between robust and cluster?

robust handles heteroskedasticity at observation level, while cluster additionally allows within-cluster correlation.


Written by Sytra Team
Research Engineering Team, Sytra AI

We build practical, reproducible workflows for Stata and R teams working on real empirical research pipelines.

#Stata#Standard Errors#Clustering#Econometrics

Enjoyed this article?