Is Sytra free for researchers?

Yes. Sytra is free forever for individual researchers. You bring your own API key from OpenAI or Anthropic and pay only for the AI inference costs (typically $0.01-0.10 per query).

Does Sytra upload my data to the cloud?

No. Sytra runs entirely on your local machine. Your .dta files, .csv files, and code never leave your computer. Only the natural language prompt is sent to the AI provider.

What versions of Stata does Sytra support?

Sytra supports Stata 17 and later, including MP, SE, and BE editions.

Methodology

2026-03-2011 min read

Matching Estimators in Stata: PSM, CEM, and Modern Alternatives

A guide to matching methods in Stata — propensity score matching, coarsened exact matching, and why you should probably use teffects instead of psmatch2.

Sytra Team

Research Engineering Team, Sytra AI

Matching aims to make treated and control groups comparable by pairing units with similar observable characteristics. When randomization isn’t possible, matching approximates the balance you’d get from a random experiment — at least on observed covariates.

But matching in Stata has a complicated ecosystem. There’s psmatch2 (the old standard), teffects (the official built-in), cem (coarsened exact matching), kmatch (kernel matching), and nnmatch (nearest neighbor). This guide tells you which to use, when, and what diagnostic checks to run.

Propensity Score Matching (PSM)

The Old Way: psmatch2

* PSM with psmatch2 (still widely used)

ssc install psmatch2, replace

psmatch2 treatment age income education i.race, outcome(y) caliper(0.01) neighbor(1)

* Check balance

pstest age income education i.race

psmatch2 has been the workhorse PSM command for 20 years. It works. But it has limitations: no built-in standard error adjustment for the matching step, limited matching algorithms, and diagnostics that require separate commands.

The Modern Way: teffects psmatch

* Official Stata PSM with correct standard errors

teffects psmatch (y) (treatment age income education i.race), atet nn(1)

* Check overlap (common support)

teoverlap

* Balance table

tebalance summarize

* Balance density plots

tebalance density age

Advantages of teffects over psmatch2:

Correct standard errors: teffects accounts for the fact that propensity scores are estimated, not known. psmatch2 doesn’t, which means its standard errors are too small.
Built-in diagnostics: teoverlap and tebalance are integrated. With psmatch2, you need separate community-written commands.
Multiple estimators: teffects supports PSM, IPW, and augmented IPW in a unified syntax.

Stop fighting with syntax.

Sytra is an AI research assistant built specifically for statistical computing. No more copy-pasting code into ChatGPT.

Get Early Access

Coarsened Exact Matching (CEM)

CEM takes a different approach: instead of matching on a single propensity score, it coarsens each covariate into bins and matches exactly on the binned values. This guarantees balance within the bins.

* CEM matching

ssc install cem, replace

cem age (#5) income (#10) education (#3) race (#0), treatment(treatment)

* Estimate treatment effect on matched sample

regress y treatment [iweight = cem_weights], vce(robust)

The numbers after # specify the number of bins. More bins = more precise matching but more unmatched units. #0 means exact matching on that variable (useful for categorical variables).

CEM’s main advantage: you can verify balance by construction. If units are matched, they have the same binned covariate values. No need for balance tables or bias tests — the matching step guarantees it.

Inverse Probability Weighting (IPW)

* IPW estimation

teffects ipw (y) (treatment age income education i.race), atet

* Augmented IPW (doubly robust)

teffects aipw (y age income education) (treatment age income education i.race), atet

IPW reweights observations by the inverse of their estimated probability of treatment. Treated units with a high probability of treatment get lower weight (they’re not adding much information); control units with a high probability of treatment get higher weight (they’re the best counterfactuals).

Augmented IPW (AIPW) is doubly robust: it’s consistent if either the propensity score model or the outcome model is correctly specified. This is the gold standard for observational studies.

Diagnostics: What to Check

Common support: Run teoverlap after teffects. If the propensity score distributions for treated and control groups don’t overlap, matching is extrapolating — and the results are unreliable.
Balance: Run tebalance summarize. Standardized differences should be below 0.1. Variance ratios should be between 0.8 and 1.25.
Sensitivity to unobservables: Matching only works if selection is on observables. Use the Rosenbaum bounds test (rbounds) to assess how sensitive your results are to unobserved confounders.

Which Matching Method to Use

Small sample, few covariates: CEM
Large sample, many continuous covariates: PSM via teffects psmatch
Want robustness: AIPW via teffects aipw
Legacy code / compatibility: psmatch2

When in doubt, run AIPW. It’s doubly robust, it comes with built-in diagnostics, and its standard errors are correct by construction. If the results differ meaningfully from PSM or CEM, investigate why — the difference usually reveals a specification issue.

#Matching#Stata#Causal Inference#Political Science

Matching Estimators in Stata: PSM, CEM, and Modern Alternatives

Propensity Score Matching (PSM)

The Old Way: psmatch2

The Modern Way: teffects psmatch

Stop fighting with syntax.

Coarsened Exact Matching (CEM)

Inverse Probability Weighting (IPW)

Diagnostics: What to Check

Which Matching Method to Use

Enjoyed this article?

Related Guides

Difference-in-Differences in Stata: A Complete Guide

Instrumental Variables in Stata: When and How

Regression Discontinuity in Stata: From Theory to Code

Logistic Regression in Stata: Marginal Effects That Actually Make Sense

Panel Data in Stata: xtreg vs. reghdfe vs. areg