Stata Errors
2026-02-107 min read

Stata 'variable already defined': Why gen Fails and How to Fix It

You ran gen and Stata said the variable already exists. Here's when to use replace, when to drop first, and the safe pattern for do-files.

Sytra Team
Research Engineering Team, Sytra AI

You’re running your do-file for the second time. The first run worked perfectly. Now you get:

. gen log_wage = log(wage)
variable log_wage already defined
r(110);

This happens because gen can only create new variables. If the variable already exists — because you created it on the first run and didn’t clear the data — Stata refuses to overwrite it. Here’s how to handle it properly.

All examples tested in Stata 18 SE. Compatible with Stata 15+.


Quick Answer

quick-fix.do
stata
1// Option 1: Replace if variable exists
2replace log_wage = log(wage)
3
4// Option 2: Drop first, then generate
5capture drop log_wage
6gen log_wage = log(wage)
7
8// Option 3: Start fresh — reload data
9use analysis_data.dta, clear
10gen log_wage = log(wage)

gen vs. replace: The Core Distinction

Stata enforces a strict separation between creating and modifying variables:

CommandVariable must...If not...
genNOT existr(110) — already defined
replaceALREADY existr(111) — not found

This is intentional. Stata prevents you from accidentally overwriting variables. In interactive use, it’s a safety net. In do-files you run repeatedly, it’s an annoyance you need to handle.


Pattern 1: Use replace When Modifying Values

If the variable exists and you want to change its values, use replace:

replace-pattern.do
stata
1// First run: create the variable
2gen treatment_post = treatment * post_period
3
4// Second run: update values (maybe you changed the definition)
5replace treatment_post = treatment * post_period
6
7// Replace with conditional
8replace wage = . if wage < 0 "stata-comment">// Set negative wages to missing
9replace education = 16 if education > 16 "stata-comment">// Top-code education
💡Tip
replace preserves variable labels, value labels, and notes. If you use drop +gen, all metadata is lost.

Pattern 2: capture drop + gen (The Do-File Pattern)

The most common pattern for do-files that run repeatedly:

capture-drop.do
stata
1// capture suppresses the error if the variable doesn't exist
2// drop removes the variable if it does exist
3// gen creates it fresh
4
5capture drop log_wage
6gen log_wage = log(wage)
7
8capture drop age_sq
9gen age_sq = age^2
10
11capture drop treatment_post
12gen treatment_post = treatment * post_period

The capture prefix tells Stata: “Run this command. If it throws an error, ignore it and continue.” If log_wage doesn’t exist yet, drop log_wagewould fail — but capture swallows the error.

⚠️Don't overuse capture
capture hides ALL errors, not just “variable not found.” Don’t wrap your entire do-file in capture blocks — you’ll miss real problems. Use it narrowly: capture drop varname.

Pattern 3: Reload the Data

The simplest and safest approach for do-files: reload the original data at the top.

reload-pattern.do
stata
1// Master do-file pattern — always starts clean
2clear all
3set more off
4
5// Load raw data
6use "data/raw/survey_2024.dta", clear
7
8// All gen commands work because we started fresh
9gen log_wage = log(wage)
10gen age_sq = age^2
11gen treatment_post = treatment * post_period
12
13// Save constructed dataset
14save "data/constructed/analysis_sample.dta", replace
💡Best practice
Structure your do-files to load raw data at the top and save processed data at the bottom. Each run starts clean. No need for capture drop anywhere. This is the most reproducible approach.

Pattern 4: Use tempvar for Intermediate Variables

If you’re creating temporary variables for intermediate calculations, use tempvar. Temporary variables are automatically dropped when the do-file or program ends.

tempvar-pattern.do
stata
1// Temporary variables — automatically cleaned up
2tempvar residual predicted
3regress wage education experience
4predict `predicted'
5gen `residual' = wage - `predicted'
6
7// Use temporary variables for intermediate calculations
8summarize `residual', detail
9
10// When the do-file ends, these variables disappear
11// No "already defined" errors on next run

Common Mistake: gen with an if Condition

A subtle issue: gen with an if condition creates the variable for ALL observations, setting unmatched observations to missing. Running it twice still fails.

Fails on second run
stata
// Creates log_wage for ALL obs
// (missing for wage <= 0)
gen log_wage = log(wage) if wage > 0
// Second run:
gen log_wage = log(wage) if wage > 0
// r(110) — already defined!
Works every time
stata
// Safe pattern:
capture drop log_wage
gen log_wage = log(wage) if wage > 0
// OR: reload data first
use mydata.dta, clear
gen log_wage = log(wage) if wage > 0

Sytra catches these errors before you run.

Sytra tracks which variables exist in your dataset and automatically uses gen for new variables and replace for existing ones. No more r(110) errors.

Join the Waitlist →

FAQ

What does “variable already defined” mean in Stata?

It means you used gen to create a variable that already exists in your dataset.gen can only create new variables. To modify an existing variable, use replace.

What is the difference between gen and replace in Stata?

gen creates a new variable that does not yet exist. replace modifies the values of an existing variable. Using gen on an existing variable throws r(110); using replace on a non-existent variable throws r(111).

How do I safely overwrite a variable in a do-file?

Use capture drop varname followed by gen varname = expression. Thecapture suppresses errors if the variable doesn’t exist yet. Or better: reload your data at the top of the do-file with use data.dta, clear.

Should I use capture drop or replace?

Use replace when modifying values of an existing variable (it preserves labels and notes). Use capture drop + gen when recreating a variable from scratch. Best practice: reload data at the top of each do-file so everything starts clean.


Written by Sytra Team
Research Engineering Team, Sytra AI

We build practical, reproducible workflows for Stata and R teams working on real empirical research pipelines.

#Stata#Errors#Debugging#Data Management

Enjoyed this article?