Stata Errors
2026-02-0810 min read

Stata 'Convergence Not Achieved': Causes and Solutions for ML Estimation

Your logit, probit, or MLE model won't converge. Here's why — separation, multicollinearity, bad starting values — and the options that fix it.

Sytra Team
Research Engineering Team, Sytra AI

Your logit, probit, or maximum likelihood model starts iterating. The log-likelihood values bounce around. Then Stata gives up:

. logit outcome treatment age income i.state, robust
Iteration 0:   log pseudolikelihood = -24831.204
Iteration 1:   log pseudolikelihood = -18422.891
Iteration 2:   log pseudolikelihood = -17956.332
...
Iteration 25:  log pseudolikelihood = -17901.445
convergence not achieved
r(430);

“Convergence not achieved” means Stata’s optimization algorithm could not find stable parameter estimates. The log-likelihood function never settled at a maximum within the allowed number of iterations. This guide covers every common cause and the specific fix for each.

All examples tested in Stata 18 SE. Compatible with Stata 15+.


Quick Answer

Convergence failure in ML estimation has a few main causes:

  1. Separation (perfect prediction) — by far the most common cause in logit/probit
  2. Multicollinearity — near-identical predictors confuse the optimizer
  3. Too many parameters for the sample size
  4. Bad starting values — the optimizer starts too far from the solution
  5. Flat or irregular likelihood surface

The fastest diagnostic: check for separation first, then try the difficult option, then simplify the model.


What Convergence Means in Maximum Likelihood

Maximum likelihood estimation (MLE) is iterative. Stata starts with initial parameter guesses, evaluates the log-likelihood function, adjusts the parameters to increase the likelihood, and repeats. “Convergence” means the parameters stopped changing — the algorithm found a stable maximum.

When convergence fails, it means:

  • The parameters keep changing between iterations (no stable maximum exists)
  • One or more parameters are drifting toward infinity
  • The algorithm is oscillating between two regions
  • The likelihood surface is too flat for the algorithm to find a direction

Cause 1: Separation (Perfect Prediction) in Logit/Probit

This is the most common cause of convergence failure in binary outcome models. Separation occurs when a predictor (or combination of predictors) perfectly predicts the outcome. The MLE coefficient wants to go to infinity — which obviously prevents convergence.

separation-example.do
stata
1* Example: rare disease with strong predictor
2* All patients with biomarker > 100 have the disease
3logit disease biomarker age gender "stata-comment">// convergence not achieved
4
5* Diagnose: check for perfect prediction
6tab disease if biomarker > 100 "stata-comment">// all 1s
7tab disease if biomarker <= 100 "stata-comment">// mix of 0s and 1s
. tab disease if biomarker > 100
    disease |      Freq.     Percent        Cum.
────────────┼───────────────────────────────────
          1 |        847      100.00      100.00
────────────┼───────────────────────────────────
      Total |        847      100.00

Fixes for separation

separation-fixes.do
stata
1* 1. Remove the separating variable
2logit disease age gender, robust
3
4* 2. Use Firth's penalized likelihood (reduces bias from separation)
5firthlogit disease biomarker age gender
6
7* 3. Categorize the continuous predictor to break separation
8gen bio_cat = irecode(biomarker, 25, 50, 75, 100)
9logit disease i.bio_cat age gender
10
11* 4. Use Bayesian estimation (puts a prior on the coefficient)
12* bayes: logit disease biomarker age gender
💡Detecting separation
Install and run ssc install firthlogit for Firth’s penalized likelihood, which handles separation gracefully. You can also use logit, iterate(100) and watch the iteration log — if one coefficient keeps growing without bound, that variable is separated.

Cause 2: Multicollinearity

When two or more predictors are highly correlated (or perfectly collinear), the likelihood surface becomes a ridge with no clear maximum. The optimizer wanders along this ridge without converging.

collinearity.do
stata
1* income_thousands and income_dollars are perfectly correlated
2logit outcome income_thousands income_dollars age
3// convergence not achieved — two measures of the same thing
4
5* Diagnose collinearity
6correlate income_thousands income_dollars
7* r = 1.0 — perfectly collinear
8
9* Fix: drop one of the collinear variables
10logit outcome income_thousands age, robust
11
12* For near-collinearity, check VIF after OLS
13regress outcome income_thousands income_dollars age
14vif
. vif
    Variable |       VIF       1/VIF
─────────────┼──────────────────────
income_tho~s |  99999.99    0.000010
income_dol~s |  99999.99    0.000010
         age |      1.02    0.980392
─────────────┼──────────────────────
    Mean VIF |  66667.00
⚠️Rule of thumb
VIF > 10 suggests problematic multicollinearity. VIF > 100 almost certainly causes convergence issues in MLE. Drop or combine the collinear variables.

Cause 3: Too Many Parameters for the Sample Size

ML estimation requires enough observations per estimated parameter. A logit model with 50 dummy variables and 200 observations is asking the optimizer to find a needle in a very high-dimensional haystack.

too-many-params.do
stata
1* 200 observations, 48 state dummies + 5 continuous predictors
2logit outcome treatment age income i.state, robust
3// convergence not achieved — 53 parameters from 200 observations
4
5* Fix: reduce the number of parameters
6* Option 1: fewer fixed effects
7logit outcome treatment age income i.region, robust
8
9* Option 2: penalized likelihood
10firthlogit outcome treatment age income i.state
11
12* Option 3: conditional logit (for grouped data)
13clogit outcome treatment age income, group(state)
💡Rule of thumb
For logit/probit, you need roughly 10-20 events (cases where outcome = 1) per estimated parameter. With rare outcomes, you run out of events quickly as you add dummies.

Cause 4: Bad Starting Values

Stata picks starting values automatically, usually from a simplified version of the model. If the default starting values are far from the true maximum, the optimizer may get stuck in a flat region or diverge.

starting-values.do
stata
1* Provide your own starting values with from()
2* First, estimate a simpler model
3logit outcome treatment age, robust
4matrix b0 = e(b)
5
6* Use those estimates as starting values for the full model
7logit outcome treatment age income education, from(b0) robust
8
9* Or provide specific values
10logit outcome treatment age income, "stata-comment">///
11 from(treatment:0.5 age:0.02 income:0.001 _cons:-2) robust

Cause 5: The iterate() and difficult Options

Sometimes convergence is simply slow — the algorithm is making progress but needs more iterations than the default (usually 25 or 50).

ML estimation options

Options that control the convergence behavior of maximum likelihood estimation commands.

logit y x, [iterate(#)] [difficult] [from(matname)] [technique(algo)]
iterate(#)Maximum number of iterations (default varies by command)
difficultUse a more robust but slower optimization algorithm
from()Provide starting values
technique()Optimization algorithm: nr, bhhh, dfp, bfgs
convergence-options.do
stata
1* Increase max iterations
2logit outcome treatment age income, iterate(100) robust
3
4* Use the difficult option (slower but more robust algorithm)
5logit outcome treatment age income, difficult robust
6
7* Combine both
8logit outcome treatment age income, iterate(200) difficult robust
9
10* Try a different optimization algorithm
11logit outcome treatment age income, technique(bfgs) robust
12
13* Hybrid: start with one algorithm, switch to another
14logit outcome treatment age income, technique(bfgs 10 nr 20) robust
👁When iterate() won't help
If the model hasn’t converged after 100 iterations and the log-likelihood is barely changing, more iterations won’t help. The problem is structural — go back and check for separation or multicollinearity. iterate() only helps when the algorithm is making steady progress toward convergence.

Cause 6: Small Sample Size

With very small samples, the likelihood surface can be irregular — multiple local maxima, saddle points, or flat regions that confuse the optimizer.

small-sample.do
stata
1* With N = 30, logit can struggle
2logit outcome treatment age if subsample == 1, robust
3// convergence not achieved
4
5* Options for small samples:
6* 1. Exact logistic regression
7exlogistic outcome treatment age
8
9* 2. Firth's penalized likelihood
10firthlogit outcome treatment age
11
12* 3. Linear probability model (robust to small samples)
13regress outcome treatment age, robust

Simplifying the Model

When nothing else works, simplify. Remove variables one at a time to isolate which predictor is causing the convergence failure. Start with the most complex terms (interactions, polynomials, high-dimensional dummies).

simplify.do
stata
1* Full model — doesn't converge
2logit outcome treatment age income education "stata-comment">///
3 i.state i.year i.industry c.age#c.income, robust
4
5* Step 1: Remove interaction
6logit outcome treatment age income education "stata-comment">///
7 i.state i.year i.industry, robust
8
9* Step 2: Remove smallest FE group
10logit outcome treatment age income education "stata-comment">///
11 i.state i.year, robust
12
13* Step 3: Continue until it converges
14* Then add terms back one at a time to find the culprit

Sytra catches these errors before you run.

Sytra validates your model specification before estimation. It checks for separation, near-collinearity, and events-per-variable ratios — warning you before you hit convergence failure. Describe your analysis and get code that converges on the first try.

Join the Waitlist →

Debugging Checklist

  1. Check for separation. Cross-tabulate the outcome with each predictor. Look for cells with zero counts.
  2. Check for multicollinearity. Run OLS first and check VIF.
  3. Try difficult. Add , difficult to your command.
  4. Increase iterations. Add iterate(100) and watch the iteration log.
  5. Provide starting values. Estimate a simpler model first and use from().
  6. Simplify. Remove variables one at a time until convergence is achieved.
  7. Consider alternatives. Firth logit, exact logistic, or a linear probability model.

FAQ

What does convergence not achieved mean in Stata?

It means Stata’s maximum likelihood optimization algorithm could not find stable parameter estimates. The log-likelihood function did not settle at a maximum within the allowed number of iterations.

How do I fix convergence not achieved in Stata logit?

The most common cause is separation (perfect prediction). Check for variables that perfectly predict the outcome. Other fixes: use difficult, increase iterate(), provide starting values with from(), or simplify the model.

What is separation in logistic regression?

Separation occurs when a predictor perfectly predicts the outcome for a subset of observations. The MLE coefficient for that variable wants to go to infinity, which prevents convergence. Firth’s penalized likelihood (firthlogit) is the standard solution.

Should I just increase iterate() until it converges?

Not blindly. If the log-likelihood is still changing meaningfully between iterations, more iterations may help. If it’s barely moving or oscillating, the problem is structural and more iterations won’t fix it. Check for separation and collinearity first.

Written by Sytra Team
Research Engineering Team, Sytra AI

We build practical, reproducible workflows for Stata and R teams working on real empirical research pipelines.

#Stata#Errors#MLE#Logit#Probit

Enjoyed this article?