Is Sytra free for researchers?

Yes. Sytra is free forever for individual researchers. You bring your own API key from OpenAI or Anthropic and pay only for the AI inference costs (typically $0.01-0.10 per query).

Does Sytra upload my data to the cloud?

No. Sytra runs entirely on your local machine. Your .dta files, .csv files, and code never leave your computer. Only the natural language prompt is sent to the AI provider.

What versions of Stata does Sytra support?

Sytra supports Stata 17 and later, including MP, SE, and BE editions.

Causal Inference

2026-06-1321 min read

PSM in Stata: Matching Workflow with Balance Diagnostics

Implement propensity score matching stata in Stata with identification-first setup, diagnostics, and reporting standards used in applied research.

Sytra Team

Research Engineering Team, Sytra AI

You are applying propensity score matching stata under deadline pressure, and one unnoticed data issue can invalidate the full analysis pass.

You will connect identification logic to runnable Stata code and mandatory diagnostics. This guide keeps the path anchored to estimating policy effects in staggered, noisy observational data.

All examples tested in Stata 18 SE. Compatible with Stata 15+.

Quick Answer

Start with a defined research task before running propensity score matching stata.
Run psm only after preflight checks on keys, types, and missingness.
Audit command output immediately and document expected vs observed counts.
Add a reusable QA block focused on assumption diagnostics, design checks, and transparent reporting.

Execution Blueprint: propensity score matching stata for estimating policy effects in staggered, noisy observational data

Anchor the use case and run preflight checks

This workflow is built for estimating policy effects in staggered, noisy observational data. Causal commands are easy to run and easy to misuse when assumptions are left implicit.

Run a deterministic setup first so every command in later sections executes against known data structure and known variable types.

If you are extending this pipeline, also review regress in Stata: OLS Basics and Correct Interpretation and rename in Stata: Bulk Rename Patterns with Wildcards.

propensity-score-matching-stata-balance-setup.do

stata

1clear all
2version 18
3set seed 260210
4set obs 2000
5gen firm_id = ceil(_n/10)
6gen year = 2012 + mod(_n,12)
7gen education = 9 + floor(runiform()*10)
8gen wage = 18 + 0.7*education + 0.2*(year-2012) + rnormal(0,2)
9gen treated = firm_id <= 90
10gen post = year >= 2018
11gen outcome = wage + 0.8*(treated*post)
12
13* Preflight checks
14assert !missing(firm_id, year)
15assert !missing(wage, education)
16count

. count

💡Use realistic variable names

Keep names like wage, education, firm_id, and year so collaborators can audit logic quickly.

Execute psm with full diagnostics

Run psm as its own block and inspect output before proceeding. This preserves a clean debug boundary and supports peer review.

The command example below is complete and runnable; it is designed to mirror real panel workflows rather than toy x/y placeholders.

propensity-score-matching-stata-balance-execution.do

stata

1clear all
2version 18
3set seed 260210
4set obs 2000
5gen firm_id = ceil(_n/10)
6gen year = 2012 + mod(_n,12)
7gen education = 9 + floor(runiform()*10)
8gen wage = 18 + 0.7*education + 0.2*(year-2012) + rnormal(0,2)
9gen treated = firm_id <= 90
10gen post = year >= 2018
11gen outcome = wage + 0.8*(treated*post)
12
13* Preflight checks
14assert !missing(firm_id, year)
15assert !missing(wage, education)
16count
17
18* ---- Section-specific continuation ----
19* Core execution block for propensity score matching stata
20capture which psmatch2
21if _rc ssc install psmatch2
22psmatch2 treated education wage, outcome(outcome) neighbor(1)
23
24* Immediate output audit
25count

. count

⚠️Treat diagnostics as part of the estimator

In causal work, diagnostics are not optional extras. Keep them in the same script as the main estimate.

Harden for production: assertions, logs, and reusable checks

After command execution, enforce assumption diagnostics, design checks, and transparent reporting so downstream inference and exports remain stable across reruns.

This final block makes the workflow team-ready: logs are captured, failures are explicit, and diagnostics are repeatable.

propensity-score-matching-stata-balance-qa.do

stata

1clear all
2version 18
3set seed 260210
4set obs 2000
5gen firm_id = ceil(_n/10)
6gen year = 2012 + mod(_n,12)
7gen education = 9 + floor(runiform()*10)
8gen wage = 18 + 0.7*education + 0.2*(year-2012) + rnormal(0,2)
9gen treated = firm_id <= 90
10gen post = year >= 2018
11gen outcome = wage + 0.8*(treated*post)
12
13* Preflight checks
14assert !missing(firm_id, year)
15assert !missing(wage, education)
16count
17
18* ---- Section-specific continuation ----
19* Production hardening block
20capture log close
21log using propensity-score-matching-stata-balance-qa.log, text replace
22
23capture which psmatch2
24if _rc ssc install psmatch2
25psmatch2 treated education wage, outcome(outcome) neighbor(1)
26
27tab treated post
28bysort treated: summ outcome
29assert !missing(outcome)
30log close

. tab treated post

. tab treated post

           |          post
   treated |         0          1 |     Total
-----------+----------------------+----------
         0 |       560        520 |     1080
         1 |       280        640 |      920
-----------+----------------------+----------
     Total |       840       1160 |     2000

💡Keep a reusable QA footer

A standard QA footer with assert and count checks prevents repeat debugging in future projects.

Common Errors and Fixes

"factor variables may not contain noninteger values"

A factor variable was not integer encoded.

Encode categories or switch to continuous notation for truly numeric variables.

. regress wage i.education

factor variables may not contain noninteger values
r(452);

This causes the error

wrong-way.do

stata

regress wage i.education

This is the fix

right-way.do

stata

regress wage c.education

error-fix.do

stata

1summ education
2regress wage c.education i.year

. regress wage c.education i.year

Linear regression

Number of obs   =      1,200
F(10, 1189)     =      42.61

Command Reference

psm

Stata docs →

Primary command reference for propensity score matching stata workflows in Stata.

psmatch2 treat xvars, outcome(y)

Preflight checksValidate keys, types, and missingness before execution

Execution blockRun the command in an isolated, reviewable section

DiagnosticsInspect output immediately and compare against expectations

QA footerKeep assertions and logs for reproducible reruns

How Sytra Handles This

Sytra can execute propensity score matching stata as a staged workflow: preflight validation, runnable Stata code generation, and QA assertions before final output.

A direct natural-language prompt for this exact workflow:

sytra-prompt.txt

bash

Execute propensity score matching stata for a firm_id-year wage dataset. Use variables wage, education, firm_id, and year. Include preflight checks, runnable Stata code, output diagnostics, and post-command assertions with a log file.

Sytra catches these errors before you run.

Sytra can execute propensity score matching stata as a staged workflow: preflight validation, runnable Stata code generation, and QA assertions before final output.

Join the Waitlist →

FAQ

What is the safest order for propensity score matching stata in a production do-file?

Use a three-step order: preflight checks, psm execution, and post-command assertions. This sequence catches breakpoints before models or exports depend on the result.

How do I verify that propensity score matching stata did not damage my sample?

Track count before and after each transformation, then validate key uniqueness and missingness changes on core variables. Keep those checks in the script, not in ad hoc console runs.

Which Stata versions are compatible with this workflow?

All examples are tested in Stata 18 SE and are compatible with Stata 15+, with installation checks included when community packages are used.

Written by Sytra Team

Research Engineering Team, Sytra AI

We build practical, reproducible workflows for Stata and R teams working on real empirical research pipelines.

#Stata#propensity#Causal Inference

PSM in Stata: Matching Workflow with Balance Diagnostics

Quick Answer

Execution Blueprint: propensity score matching stata for estimating policy effects in staggered, noisy observational data

Anchor the use case and run preflight checks

Execute psm with full diagnostics

Harden for production: assertions, logs, and reusable checks

Common Errors and Fixes

"factor variables may not contain noninteger values"

Command Reference

psm

How Sytra Handles This

Sytra catches these errors before you run.

FAQ

What is the safest order for propensity score matching stata in a production do-file?

How do I verify that propensity score matching stata did not damage my sample?

Which Stata versions are compatible with this workflow?

Enjoyed this article?

Related Guides

Panel Diagnostics in Stata: xtdescribe, xtsum, and Balance Checks

2SLS in Stata: ivregress 2sls with Required Diagnostics

ivreg2 in Stata: Robust IV Workflow and Reporting Standards

Weak Instruments in Stata: First-Stage F, KP, and Reporting

Regression Discontinuity in Stata: Bandwidth, Bins, and Reporting