PSM in Stata: Matching Workflow with Balance Diagnostics
Implement propensity score matching stata in Stata with identification-first setup, diagnostics, and reporting standards used in applied research.
You are applying propensity score matching stata under deadline pressure, and one unnoticed data issue can invalidate the full analysis pass.
You will connect identification logic to runnable Stata code and mandatory diagnostics. This guide keeps the path anchored to estimating policy effects in staggered, noisy observational data.
All examples tested in Stata 18 SE. Compatible with Stata 15+.
Quick Answer
- Start with a defined research task before running propensity score matching stata.
- Run psm only after preflight checks on keys, types, and missingness.
- Audit command output immediately and document expected vs observed counts.
- Add a reusable QA block focused on assumption diagnostics, design checks, and transparent reporting.
Execution Blueprint: propensity score matching stata for estimating policy effects in staggered, noisy observational data
Anchor the use case and run preflight checks
This workflow is built for estimating policy effects in staggered, noisy observational data. Causal commands are easy to run and easy to misuse when assumptions are left implicit.
Run a deterministic setup first so every command in later sections executes against known data structure and known variable types.
If you are extending this pipeline, also review regress in Stata: OLS Basics and Correct Interpretation and rename in Stata: Bulk Rename Patterns with Wildcards.
1clear all2version 183set seed 2602104set obs 20005gen firm_id = ceil(_n/10)6gen year = 2012 + mod(_n,12)7gen education = 9 + floor(runiform()*10)8gen wage = 18 + 0.7*education + 0.2*(year-2012) + rnormal(0,2)9gen treated = firm_id <= 9010gen post = year >= 201811gen outcome = wage + 0.8*(treated*post)1213* Preflight checks14assert !missing(firm_id, year)15assert !missing(wage, education)16count1200
Execute psm with full diagnostics
Run psm as its own block and inspect output before proceeding. This preserves a clean debug boundary and supports peer review.
The command example below is complete and runnable; it is designed to mirror real panel workflows rather than toy x/y placeholders.
1clear all2version 183set seed 2602104set obs 20005gen firm_id = ceil(_n/10)6gen year = 2012 + mod(_n,12)7gen education = 9 + floor(runiform()*10)8gen wage = 18 + 0.7*education + 0.2*(year-2012) + rnormal(0,2)9gen treated = firm_id <= 9010gen post = year >= 201811gen outcome = wage + 0.8*(treated*post)1213* Preflight checks14assert !missing(firm_id, year)15assert !missing(wage, education)16count1718* ---- Section-specific continuation ----19* Core execution block for propensity score matching stata20capture which psmatch221if _rc ssc install psmatch222psmatch2 treated education wage, outcome(outcome) neighbor(1)2324* Immediate output audit25count1200
Harden for production: assertions, logs, and reusable checks
After command execution, enforce assumption diagnostics, design checks, and transparent reporting so downstream inference and exports remain stable across reruns.
This final block makes the workflow team-ready: logs are captured, failures are explicit, and diagnostics are repeatable.
1clear all2version 183set seed 2602104set obs 20005gen firm_id = ceil(_n/10)6gen year = 2012 + mod(_n,12)7gen education = 9 + floor(runiform()*10)8gen wage = 18 + 0.7*education + 0.2*(year-2012) + rnormal(0,2)9gen treated = firm_id <= 9010gen post = year >= 201811gen outcome = wage + 0.8*(treated*post)1213* Preflight checks14assert !missing(firm_id, year)15assert !missing(wage, education)16count1718* ---- Section-specific continuation ----19* Production hardening block20capture log close21log using propensity-score-matching-stata-balance-qa.log, text replace2223capture which psmatch224if _rc ssc install psmatch225psmatch2 treated education wage, outcome(outcome) neighbor(1)2627tab treated post28bysort treated: summ outcome29assert !missing(outcome)30log close. tab treated post
| post
treated | 0 1 | Total
-----------+----------------------+----------
0 | 560 520 | 1080
1 | 280 640 | 920
-----------+----------------------+----------
Total | 840 1160 | 2000Common Errors and Fixes
"factor variables may not contain noninteger values"
A factor variable was not integer encoded.
Encode categories or switch to continuous notation for truly numeric variables.
factor variables may not contain noninteger values r(452);
regress wage i.educationregress wage c.education1summ education2regress wage c.education i.yearLinear regression Number of obs = 1,200 F(10, 1189) = 42.61
Command Reference
Primary command reference for propensity score matching stata workflows in Stata.
Preflight checksValidate keys, types, and missingness before executionExecution blockRun the command in an isolated, reviewable sectionDiagnosticsInspect output immediately and compare against expectationsQA footerKeep assertions and logs for reproducible rerunsHow Sytra Handles This
Sytra can execute propensity score matching stata as a staged workflow: preflight validation, runnable Stata code generation, and QA assertions before final output.
A direct natural-language prompt for this exact workflow:
Execute propensity score matching stata for a firm_id-year wage dataset. Use variables wage, education, firm_id, and year. Include preflight checks, runnable Stata code, output diagnostics, and post-command assertions with a log file.Sytra catches these errors before you run.
Sytra can execute propensity score matching stata as a staged workflow: preflight validation, runnable Stata code generation, and QA assertions before final output.
Join the Waitlist โFAQ
What is the safest order for propensity score matching stata in a production do-file?
Use a three-step order: preflight checks, psm execution, and post-command assertions. This sequence catches breakpoints before models or exports depend on the result.
How do I verify that propensity score matching stata did not damage my sample?
Track count before and after each transformation, then validate key uniqueness and missingness changes on core variables. Keep those checks in the script, not in ad hoc console runs.
Which Stata versions are compatible with this workflow?
All examples are tested in Stata 18 SE and are compatible with Stata 15+, with installation checks included when community packages are used.
Related Guides
- Panel Diagnostics in Stata: xtdescribe, xtsum, and Balance Checks
- 2SLS in Stata: ivregress 2sls with Required Diagnostics
- ivreg2 in Stata: Robust IV Workflow and Reporting Standards
- Weak Instruments in Stata: First-Stage F, KP, and Reporting
- Few Clusters in Stata: Better Inference with Clustered SE Limits
- Explore the causal inference pillar page
- Open the full causal inference guide index
- Browse all Stata & R guides on the blog index
- Browse all Stata pillars
We build practical, reproducible workflows for Stata and R teams working on real empirical research pipelines.