Is Sytra free for researchers?

Yes. Sytra is free forever for individual researchers. You bring your own API key from OpenAI or Anthropic and pay only for the AI inference costs (typically $0.01-0.10 per query).

Does Sytra upload my data to the cloud?

No. Sytra runs entirely on your local machine. Your .dta files, .csv files, and code never leave your computer. Only the natural language prompt is sent to the AI provider.

What versions of Stata does Sytra support?

Sytra supports Stata 17 and later, including MP, SE, and BE editions.

Public Health

2026-02-2811 min read

Survival Analysis in Stata: A Guide for Epidemiologists

A practical guide to survival analysis in Stata — from stset to Cox PH to competing risks. Written for public health researchers and epidemiologists.

Sytra Team

Research Engineering Team, Sytra AI

Survival analysis is the backbone of epidemiological research. Whether you’re estimating time to disease onset, time to hospital readmission, or time to death after a clinical intervention, the analytical framework is the same: you have time-to-event data with censoring, and you need an estimator that handles both.

Stata’s survival analysis suite is among the most mature in any statistical software. But it requires a specific workflow — starting with stset — that is unlike anything in the rest of Stata. If you skip or misconfigure this step, every downstream analysis is wrong. And this is exactly where ChatGPT falls apart.

Step 1: stset — Declaring Survival Data

Before any survival analysis, you must tell Stata your data is survival data. The stset command is not optional — it defines the time variable, the failure event, and any entry time or censoring structure.

* Basic stset: time variable + failure indicator

stset followup_time, failure(died)

* With late entry (left truncation)

stset followup_time, failure(died) enter(entry_time)

* With ID variable for multiple records per subject

stset followup_time, failure(died) id(patient_id)

Key decisions at this stage:

What is failure? — The failure() option defines what counts as an event. If died = 1 means the event occurred and died = 0 means censored, specify failure(died). If the failure variable has multiple values (e.g., 1 = disease, 2 = death), specify which value: failure(event == 1).
Scale matters. — Is time in days, months, or years? The scale affects hazard ratio interpretation. If time is in days and hazard ratios are near 1.0001, consider rescaling to months.
Check your stset: After running stset, always run stsum to see the summary statistics and verify that the number of subjects, failures, and time at risk look correct.

Step 2: Descriptive Survival — Kaplan-Meier

* Kaplan-Meier survival curves

sts graph, by(treatment) ci

* Log-rank test for equality of survival functions

sts test treatment

* Median survival time

stci, by(treatment)

The Kaplan-Meier curve is the standard visualization for survival data. Always include confidence intervals (ci). Always run the log-rank test. And always report median survival time — it’s more interpretable than the hazard ratio for non-technical audiences.

Stop fighting with syntax.

Sytra is an AI research assistant built specifically for statistical computing. No more copy-pasting code into ChatGPT.

Get Early Access

Step 3: Cox Proportional Hazards

* Cox PH model

stcox treatment age i.sex i.comorbidity, vce(robust)

* Report hazard ratios (default — but be explicit)

stcox treatment age i.sex i.comorbidity, vce(robust) hr

The Cox model estimates hazard ratios: HR = 1.5 means the treatment group has a 50% higher instantaneous rate of the event at any given time, conditional on covariates. An HR < 1 means the treatment is protective.

Critical: the proportional hazards assumption.

The Cox model assumes that hazard ratios are constant over time. If the effect of treatment changes as time passes (e.g., a drug works well initially but wears off), the PH assumption is violated and the hazard ratio is misleading.

* Test the proportional hazards assumption

estat phtest, detail

* Visual check: Schoenfeld residuals

estat phtest, plot(treatment)

A significant p-value on estat phtest means the PH assumption is violated for that variable. Solutions: (1) stratify on the offending variable (stcox ..., strata(variable)), (2) include a time interaction, or (3) use a different model (parametric or accelerated failure time).

ChatGPT never runs this test. It generates stcox and stops. But a Cox model without a PH test is like a regression without checking residuals — you’re publishing results from a model whose key assumption may be violated.

Step 4: Competing Risks

In many studies, there are multiple ways to “fail.” A cancer patient might die from cancer, die from cardiovascular disease, or die from other causes. If you treat non-cancer deaths as censoring, you overestimate the cancer-specific hazard — because you’re assuming that patients who die from other causes would eventually have died from cancer given enough time.

* Competing risks: Fine-Gray subdistribution hazard model

stset time, failure(cause == 1) id(patient_id)

* Fine-Gray model

stcrreg treatment age i.sex, compete(cause == 2)

* Cumulative incidence function

stcurve, cif at1(treatment = 0) at2(treatment = 1)

The Fine-Gray model estimates subdistribution hazard ratios (SHR), which account for the competing risk. An SHR > 1 means the treatment group has a higher cumulative incidence of the event of interest, accounting for the fact that some patients experience the competing event instead.

Step 5: Parametric Models

When you have a theoretical reason to believe the hazard follows a specific functional form, parametric models can be more efficient than Cox:

* Weibull model (common in engineering and epidemiology)

streg treatment age i.sex, distribution(weibull)

* Exponential model (constant hazard)

streg treatment age i.sex, distribution(exponential)

* Compare models with AIC/BIC

estimates stats .

How Sytra Handles Survival Analysis

Sytra understands the full survival analysis pipeline. When you say “run a Cox regression of treatment on time to readmission, adjusting for age, sex, and comorbidity,” it generates:

stset with the correct time and failure variables
stsum and stci for descriptive statistics
sts graph for Kaplan-Meier curves
stcox with vce(robust)
estat phtest to check the PH assumption
If PH is violated, it suggests stratification or a parametric alternative

If your data has competing risks, Sytra detects multiple failure types and suggests the Fine-Gray model. Because it understands the epidemiological methodology, not just the Stata syntax.

#Survival Analysis#Stata#Public Health#Epidemiology

Survival Analysis in Stata: A Guide for Epidemiologists

Step 1: stset — Declaring Survival Data

Step 2: Descriptive Survival — Kaplan-Meier

Stop fighting with syntax.

Step 3: Cox Proportional Hazards

Step 4: Competing Risks

Step 5: Parametric Models

How Sytra Handles Survival Analysis

Enjoyed this article?

Related Guides

Logistic Regression in Stata: Marginal Effects That Actually Make Sense

Why ChatGPT Fails at Stata: The Imperative-Declarative Divide

Multilevel Models in Stata: HLM for Education Research

Survival Analysis in R: survminer, Cox PH, and Competing Risks