Stata 'Convergence Not Achieved': Causes and Solutions for ML Estimation
Your logit, probit, or MLE model won't converge. Here's why — separation, multicollinearity, bad starting values — and the options that fix it.
Your logit, probit, or maximum likelihood model starts iterating. The log-likelihood values bounce around. Then Stata gives up:
Iteration 0: log pseudolikelihood = -24831.204 Iteration 1: log pseudolikelihood = -18422.891 Iteration 2: log pseudolikelihood = -17956.332 ... Iteration 25: log pseudolikelihood = -17901.445 convergence not achieved r(430);
“Convergence not achieved” means Stata’s optimization algorithm could not find stable parameter estimates. The log-likelihood function never settled at a maximum within the allowed number of iterations. This guide covers every common cause and the specific fix for each.
All examples tested in Stata 18 SE. Compatible with Stata 15+.
Quick Answer
Convergence failure in ML estimation has a few main causes:
- Separation (perfect prediction) — by far the most common cause in logit/probit
- Multicollinearity — near-identical predictors confuse the optimizer
- Too many parameters for the sample size
- Bad starting values — the optimizer starts too far from the solution
- Flat or irregular likelihood surface
The fastest diagnostic: check for separation first, then try the difficult option, then simplify the model.
What Convergence Means in Maximum Likelihood
Maximum likelihood estimation (MLE) is iterative. Stata starts with initial parameter guesses, evaluates the log-likelihood function, adjusts the parameters to increase the likelihood, and repeats. “Convergence” means the parameters stopped changing — the algorithm found a stable maximum.
When convergence fails, it means:
- The parameters keep changing between iterations (no stable maximum exists)
- One or more parameters are drifting toward infinity
- The algorithm is oscillating between two regions
- The likelihood surface is too flat for the algorithm to find a direction
Cause 1: Separation (Perfect Prediction) in Logit/Probit
This is the most common cause of convergence failure in binary outcome models. Separation occurs when a predictor (or combination of predictors) perfectly predicts the outcome. The MLE coefficient wants to go to infinity — which obviously prevents convergence.
1* Example: rare disease with strong predictor2* All patients with biomarker > 100 have the disease3logit disease biomarker age gender "stata-comment">// convergence not achieved45* Diagnose: check for perfect prediction6tab disease if biomarker > 100 "stata-comment">// all 1s7tab disease if biomarker <= 100 "stata-comment">// mix of 0s and 1s disease | Freq. Percent Cum.
────────────┼───────────────────────────────────
1 | 847 100.00 100.00
────────────┼───────────────────────────────────
Total | 847 100.00Fixes for separation
1* 1. Remove the separating variable2logit disease age gender, robust34* 2. Use Firth's penalized likelihood (reduces bias from separation)5firthlogit disease biomarker age gender67* 3. Categorize the continuous predictor to break separation8gen bio_cat = irecode(biomarker, 25, 50, 75, 100)9logit disease i.bio_cat age gender1011* 4. Use Bayesian estimation (puts a prior on the coefficient)12* bayes: logit disease biomarker age genderssc install firthlogit for Firth’s penalized likelihood, which handles separation gracefully. You can also use logit, iterate(100) and watch the iteration log — if one coefficient keeps growing without bound, that variable is separated.Cause 2: Multicollinearity
When two or more predictors are highly correlated (or perfectly collinear), the likelihood surface becomes a ridge with no clear maximum. The optimizer wanders along this ridge without converging.
1* income_thousands and income_dollars are perfectly correlated2logit outcome income_thousands income_dollars age3// convergence not achieved — two measures of the same thing45* Diagnose collinearity6correlate income_thousands income_dollars7* r = 1.0 — perfectly collinear89* Fix: drop one of the collinear variables10logit outcome income_thousands age, robust1112* For near-collinearity, check VIF after OLS13regress outcome income_thousands income_dollars age14vif Variable | VIF 1/VIF
─────────────┼──────────────────────
income_tho~s | 99999.99 0.000010
income_dol~s | 99999.99 0.000010
age | 1.02 0.980392
─────────────┼──────────────────────
Mean VIF | 66667.00Cause 3: Too Many Parameters for the Sample Size
ML estimation requires enough observations per estimated parameter. A logit model with 50 dummy variables and 200 observations is asking the optimizer to find a needle in a very high-dimensional haystack.
1* 200 observations, 48 state dummies + 5 continuous predictors2logit outcome treatment age income i.state, robust3// convergence not achieved — 53 parameters from 200 observations45* Fix: reduce the number of parameters6* Option 1: fewer fixed effects7logit outcome treatment age income i.region, robust89* Option 2: penalized likelihood10firthlogit outcome treatment age income i.state1112* Option 3: conditional logit (for grouped data)13clogit outcome treatment age income, group(state)Cause 4: Bad Starting Values
Stata picks starting values automatically, usually from a simplified version of the model. If the default starting values are far from the true maximum, the optimizer may get stuck in a flat region or diverge.
1* Provide your own starting values with from()2* First, estimate a simpler model3logit outcome treatment age, robust4matrix b0 = e(b)56* Use those estimates as starting values for the full model7logit outcome treatment age income education, from(b0) robust89* Or provide specific values10logit outcome treatment age income, "stata-comment">///11 from(treatment:0.5 age:0.02 income:0.001 _cons:-2) robustCause 5: The iterate() and difficult Options
Sometimes convergence is simply slow — the algorithm is making progress but needs more iterations than the default (usually 25 or 50).
ML estimation options
Options that control the convergence behavior of maximum likelihood estimation commands.
iterate(#)Maximum number of iterations (default varies by command)difficultUse a more robust but slower optimization algorithmfrom()Provide starting valuestechnique()Optimization algorithm: nr, bhhh, dfp, bfgs1* Increase max iterations2logit outcome treatment age income, iterate(100) robust34* Use the difficult option (slower but more robust algorithm)5logit outcome treatment age income, difficult robust67* Combine both8logit outcome treatment age income, iterate(200) difficult robust910* Try a different optimization algorithm11logit outcome treatment age income, technique(bfgs) robust1213* Hybrid: start with one algorithm, switch to another14logit outcome treatment age income, technique(bfgs 10 nr 20) robustiterate() only helps when the algorithm is making steady progress toward convergence.Cause 6: Small Sample Size
With very small samples, the likelihood surface can be irregular — multiple local maxima, saddle points, or flat regions that confuse the optimizer.
1* With N = 30, logit can struggle2logit outcome treatment age if subsample == 1, robust3// convergence not achieved45* Options for small samples:6* 1. Exact logistic regression7exlogistic outcome treatment age89* 2. Firth's penalized likelihood10firthlogit outcome treatment age1112* 3. Linear probability model (robust to small samples)13regress outcome treatment age, robustSimplifying the Model
When nothing else works, simplify. Remove variables one at a time to isolate which predictor is causing the convergence failure. Start with the most complex terms (interactions, polynomials, high-dimensional dummies).
1* Full model — doesn't converge2logit outcome treatment age income education "stata-comment">///3 i.state i.year i.industry c.age#c.income, robust45* Step 1: Remove interaction6logit outcome treatment age income education "stata-comment">///7 i.state i.year i.industry, robust89* Step 2: Remove smallest FE group10logit outcome treatment age income education "stata-comment">///11 i.state i.year, robust1213* Step 3: Continue until it converges14* Then add terms back one at a time to find the culpritSytra catches these errors before you run.
Sytra validates your model specification before estimation. It checks for separation, near-collinearity, and events-per-variable ratios — warning you before you hit convergence failure. Describe your analysis and get code that converges on the first try.
Join the Waitlist →Debugging Checklist
- Check for separation. Cross-tabulate the outcome with each predictor. Look for cells with zero counts.
- Check for multicollinearity. Run OLS first and check VIF.
- Try
difficult. Add, difficultto your command. - Increase iterations. Add
iterate(100)and watch the iteration log. - Provide starting values. Estimate a simpler model first and use
from(). - Simplify. Remove variables one at a time until convergence is achieved.
- Consider alternatives. Firth logit, exact logistic, or a linear probability model.
FAQ
What does convergence not achieved mean in Stata?
It means Stata’s maximum likelihood optimization algorithm could not find stable parameter estimates. The log-likelihood function did not settle at a maximum within the allowed number of iterations.
How do I fix convergence not achieved in Stata logit?
The most common cause is separation (perfect prediction). Check for variables that perfectly predict the outcome. Other fixes: use difficult, increase iterate(), provide starting values with from(), or simplify the model.
What is separation in logistic regression?
Separation occurs when a predictor perfectly predicts the outcome for a subset of observations. The MLE coefficient for that variable wants to go to infinity, which prevents convergence. Firth’s penalized likelihood (firthlogit) is the standard solution.
Should I just increase iterate() until it converges?
Not blindly. If the log-likelihood is still changing meaningfully between iterations, more iterations may help. If it’s barely moving or oscillating, the problem is structural and more iterations won’t fix it. Check for separation and collinearity first.
We build practical, reproducible workflows for Stata and R teams working on real empirical research pipelines.