Singleton Observations in Stata reghdfe: What They Are and What to Do
reghdfe dropped 12,000 observations and you don't know why. They're singletons — fixed effect groups with one observation. Here's what that means for your paper.
You ran reghdfe and the output says it dropped 12,000 observations. Your carefully constructed sample just lost a quarter of its observations and you have no idea why. The answer is almost always singletons.
(MWFE estimator converged in 5 iterations)
HDFE Linear regression Number of obs = 38,247
Absorbing 2 HDFE groups F(2, 4182) = 124.56
Prob > F = 0.0000
R-squared = 0.4231
Adj R-squared = 0.3891
Within R-sq. = 0.0892
Number of clusters (firm_id) = 4,183 Root MSE = 8.2341
(12,403 singleton observations dropped)Singleton observations are fixed effect groups that contain only one observation. If a firm appears exactly once in your dataset, that observation is a “singleton” for the firm fixed effect. reghdfe drops them because they cannot contribute to estimation and their inclusion biases standard errors.
All examples use the reghdfe package by Sergio Correia. Install with: ssc install reghdfe, replace
Quick Answer
Singletons are observations in fixed effect groups with only one member. reghdfedrops them iteratively because:
- A single observation perfectly identifies the fixed effect — there is no within-group variation to estimate from
- Including them biases standard errors downward (overstates precision)
- They do not contribute to the coefficient estimates at all
This is correct behavior. If many singletons are dropped, check your data structure — it may indicate a data problem rather than a normal estimation feature.
What Exactly Is a Singleton?
Consider a firm-year panel. Firm ABC appears in years 2015, 2016, 2017, 2018. Firm XYZ appears only in year 2020. With firm fixed effects, the XYZ observation is a singleton: the firm fixed effect for XYZ is perfectly determined by that single observation, absorbing all its variation.
1* Check for singletons before running reghdfe2bysort firm_id: gen firm_count = _N3tab firm_count if firm_count == 1 firm_count | Freq. Percent Cum.
────────────┼───────────────────────────────────
1 | 12,403 100.00 100.00
────────────┼───────────────────────────────────
Total | 12,403 100.00With two-way fixed effects (firm + year), the singleton problem is more complex. An observation can become a singleton iteratively — after dropping first-round singletons from one dimension, new singletons may appear in the other dimension. reghdfe handles this iterative process automatically.
reghdfe repeats until no singletons remain. The number dropped can exceed the count from a simple bysort: gen count = _N check.Why Does reghdfe Drop Singletons?
The key insight from Correia (2015): including singletons in fixed effect estimation does not affect point estimates but does affect standard errors. Singletons contribute zero degrees of freedom to the residual but are counted in the degrees-of-freedom adjustment for standard errors. The result: standard errors are too small and t-statistics are too large.
* areg keeps singletons — SEs may be too smallareg wage education experience, "stata-comment">/// absorb(firm_id) cluster(firm_id)* SE on education: 0.0123* reghdfe drops singletons — SEs are correctreghdfe wage education experience, "stata-comment">/// absorb(firm_id) cluster(firm_id)* SE on education: 0.0141The difference is often small (5-15%) but can matter for borderline significance. In published research, this is exactly the kind of detail that replication teams check.
How to Check for Singletons Before Estimation
Check your data structure before running the regression. This helps you understand how many observations you’ll lose and whether the singleton count is reasonable.
1* One-way: check groups with only one observation2bysort firm_id: gen n_firm = _N3count if n_firm == 14drop n_firm56* Two-way: check both dimensions7bysort firm_id: gen n_firm = _N8bysort year: gen n_year = _N9count if n_firm == 110count if n_year == 11112* For exact count, use reghdfe with verbose option13reghdfe wage education experience, absorb(firm_id year) cluster(firm_id) verbose(1)The keepsingleton Option
reghdfe provides a keepsingleton option that prevents dropping. This exists primarily for comparability with areg and xtreg,fe — not because keeping singletons is a good idea.
1* Keep singletons (NOT recommended for final results)2reghdfe wage education experience, "stata-comment">///3 absorb(firm_id year) cluster(firm_id) keepsingleton45* Compare with default (singletons dropped)6reghdfe wage education experience, "stata-comment">///7 absorb(firm_id year) cluster(firm_id)keepsingleton produces standard errors that are biased downward. Reviewers familiar with Correia (2015) will flag this. Use it only for diagnostic comparison, not for your final tables.Comparison: reghdfe vs areg vs xtreg,fe
reghdfe
High-dimensional fixed effects estimator with automatic singleton deletion and multi-way clustering.
absorb()Fixed effects to absorb (any number)cluster()Cluster variable(s) for robust SEskeepsingletonDo not drop singletons (not recommended)verbose(#)Show iteration details* Keeps singletons (biased SEs)* Only one set of fixed effectsareg wage education, absorb(firm_id) robustxtreg wage education, fe cluster(firm_id)* Drops singletons (correct SEs)* Multiple fixed effectsreghdfe wage education, "stata-comment">/// absorb(firm_id year) "stata-comment">/// cluster(firm_id)Key differences:
- Singleton handling:
reghdfedrops singletons by default;aregandxtreg,fekeep them - Multiple FEs:
reghdfehandles any number of fixed effects;areghandles one - Speed:
reghdfeis dramatically faster for high-dimensional fixed effects - Two-way clustering:
reghdfesupportscluster(var1 var2)
When Singletons Signal a Data Problem
If reghdfe drops a large fraction of your sample (say >30%), that’s not just a statistical technicality — it likely indicates a structural issue with your data or research design:
- Too many fixed effect categories. If you have almost as many firm IDs as observations, most groups will be singletons. Consider whether you need that granularity.
- Unbalanced panel. Short panels where most firms appear for only 1-2 years will have massive singleton attrition.
- Wrong fixed effect specification. Using zip code × year fixed effects when you should be using state × year.
- Sample restriction too aggressive. After subsetting, the remaining data may be too sparse for the fixed effect structure.
1* Understand your panel structure2xtset firm_id year3xtdescribe45* How many observations per firm?6bysort firm_id: gen T_firm = _N7summarize T_firm, detail89* How many firms per year?10bysort year: gen N_year = _N11summarize N_year, detailReporting Singletons in Your Paper
Always report singleton information. A standard approach:
“Our initial sample contains 50,650 firm-year observations. The reghdfe estimator drops 12,403 singleton observations, leaving an estimation sample of 38,247. Results are robust to including singletons (Online Appendix Table A3).”
For the appendix, show the keepsingleton comparison to demonstrate that your point estimates are stable and that the standard error differences are modest.
Sytra catches these errors before you run.
Sytra understands fixed effect estimation. When you describe a panel regression, Sytra generates reghdfe code with appropriate singleton handling and warns you if your fixed effect structure will drop a large fraction of your sample.
Join the Waitlist →FAQ
What are singleton observations in reghdfe?
Singleton observations belong to fixed effect groups that contain only one observation. For example, if firm XYZ appears only once in your panel, that observation is a singleton.reghdfe drops them because they cannot contribute to within-group estimation and their inclusion biases standard errors.
Why does reghdfe drop singletons but areg does not?
areg and xtreg,fe were written before the statistical issue was well understood. Correia (2015) showed that including singletons biases standard errors downward. reghdfe implements the correct approach by default.
Should I report singleton drops in my paper?
Yes. Always report the initial sample size, the number of singletons dropped, and the final estimation sample. Include a keepsingleton robustness check in the appendix.
Can I keep singletons in reghdfe?
Yes, use reghdfe y x, absorb(fe) keepsingleton. But this is not recommended for published results — it biases standard errors downward, potentially inflating significance.
We build practical, reproducible workflows for Stata and R teams working on real empirical research pipelines.