Matching Estimators in Stata: PSM, CEM, and Modern Alternatives
A guide to matching methods in Stata — propensity score matching, coarsened exact matching, and why you should probably use teffects instead of psmatch2.
Matching aims to make treated and control groups comparable by pairing units with similar observable characteristics. When randomization isn’t possible, matching approximates the balance you’d get from a random experiment — at least on observed covariates.
But matching in Stata has a complicated ecosystem. There’s psmatch2 (the old standard), teffects (the official built-in), cem (coarsened exact matching), kmatch (kernel matching), and nnmatch (nearest neighbor). This guide tells you which to use, when, and what diagnostic checks to run.
Propensity Score Matching (PSM)
The Old Way: psmatch2
psmatch2 has been the workhorse PSM command for 20 years. It works. But it has limitations: no built-in standard error adjustment for the matching step, limited matching algorithms, and diagnostics that require separate commands.
The Modern Way: teffects psmatch
Advantages of teffects over psmatch2:
- Correct standard errors:
teffectsaccounts for the fact that propensity scores are estimated, not known.psmatch2doesn’t, which means its standard errors are too small. - Built-in diagnostics:
teoverlapandtebalanceare integrated. Withpsmatch2, you need separate community-written commands. - Multiple estimators:
teffectssupports PSM, IPW, and augmented IPW in a unified syntax.
Stop fighting with syntax.
Sytra is an AI research assistant built specifically for statistical computing. No more copy-pasting code into ChatGPT.
Get Early AccessCoarsened Exact Matching (CEM)
CEM takes a different approach: instead of matching on a single propensity score, it coarsens each covariate into bins and matches exactly on the binned values. This guarantees balance within the bins.
The numbers after # specify the number of bins. More bins = more precise matching but more unmatched units. #0 means exact matching on that variable (useful for categorical variables).
CEM’s main advantage: you can verify balance by construction. If units are matched, they have the same binned covariate values. No need for balance tables or bias tests — the matching step guarantees it.
Inverse Probability Weighting (IPW)
IPW reweights observations by the inverse of their estimated probability of treatment. Treated units with a high probability of treatment get lower weight (they’re not adding much information); control units with a high probability of treatment get higher weight (they’re the best counterfactuals).
Augmented IPW (AIPW) is doubly robust: it’s consistent if either the propensity score model or the outcome model is correctly specified. This is the gold standard for observational studies.
Diagnostics: What to Check
- Common support: Run
teoverlapafterteffects. If the propensity score distributions for treated and control groups don’t overlap, matching is extrapolating — and the results are unreliable. - Balance: Run
tebalance summarize. Standardized differences should be below 0.1. Variance ratios should be between 0.8 and 1.25. - Sensitivity to unobservables: Matching only works if selection is on observables. Use the Rosenbaum bounds test (
rbounds) to assess how sensitive your results are to unobserved confounders.
Which Matching Method to Use
- Small sample, few covariates: CEM
- Large sample, many continuous covariates: PSM via
teffects psmatch - Want robustness: AIPW via
teffects aipw - Legacy code / compatibility:
psmatch2
When in doubt, run AIPW. It’s doubly robust, it comes with built-in diagnostics, and its standard errors are correct by construction. If the results differ meaningfully from PSM or CEM, investigate why — the difference usually reveals a specification issue.