Stata Errors
2026-02-098 min read

Stata Type Mismatch Error in Merge: String vs Numeric Key Variables

One dataset stores FIPS codes as numbers. The other as strings. Stata refuses to merge. Here's the 5-second diagnosis and fix.

Sytra Team
Research Engineering Team, Sytra AI

You have two datasets. Both have a variable called state_fips. You try to merge and Stata says:

. merge 1:1 state_fips using census_data.dta
variable state_fips is str5 in the master data but byte in the using data
    You could -tostring- the using variable before merging, or
    -destring- the master variable before merging.
r(106);

The fix takes 5 seconds once you know it. But the diagnosis matters — because the wrong conversion can silently destroy your data.

All examples tested in Stata 18 SE. Compatible with Stata 15+.


Quick Answer

quick-fix.do
stata
1// Step 1: Check types in both datasets
2describe state_fips
3use census_data.dta, clear
4describe state_fips
5
6// Step 2: Convert the mismatched variable
7// Option A: Convert string to numeric
8destring state_fips, replace
9
10// Option B: Convert numeric to string
11tostring state_fips, replace
12
13// Step 3: Now merge
14merge 1:1 state_fips using census_data.dta

Why Type Mismatch Happens

Stata stores variables as either numeric (byte, int, long, float, double) orstring (str#, strL). When two datasets have the same variable name but different storage types, Stata cannot merge them.

The most common scenarios:

  • CSV imports: One file has "06" (string), another has 6 (numeric)
  • Excel imports: Excel cells formatted as “Text” vs. “Number”
  • Different sources: Census data uses string FIPS codes; your dataset uses numeric
  • Manual entry errors: Someone typed "N/A" in a numeric column, forcing the entire column to string

Step 1: Diagnose with describe

Before fixing anything, check what you’re dealing with in both datasets:

diagnose.do
stata
1// Check the master dataset
2use master_data.dta, clear
3describe state_fips
4codebook state_fips
5
6// Check the using dataset
7use census_data.dta, clear
8describe state_fips
9codebook state_fips
. describe state_fips
Variable      Storage   Display    Value
    name         type    format    label      Variable label
─────────────────────────────────────────────────────────────
state_fips      str5    %5s                   State FIPS code
. use census_data.dta, clear\ndescribe state_fips
Variable      Storage   Display    Value
    name         type    format    label      Variable label
─────────────────────────────────────────────────────────────
state_fips      byte    %8.0g                 State FIPS code

One is str5 (string). The other is byte (numeric). That’s your type mismatch.


Step 2: Decide Which Direction to Convert

⚠️Leading zeros matter
If your variable has leading zeros (FIPS codes, ZIP codes, NAICS codes), converting to numeric destroys the leading zeros. The string "06" becomes the number 6. You cannot get the zero back without knowing the expected format.

When to convert to numeric (destring)

  • The variable is a pure number with no leading zeros (person ID, year, age)
  • The string version contains only digits
  • You want to do arithmetic with the variable later

When to convert to string (tostring)

  • The variable has meaningful leading zeros (FIPS: "06" for California)
  • The variable is a code, not a quantity (ZIP codes, industry codes)
  • You need exact character matching

Fix A: Convert String to Numeric with destring

destring-fix.do
stata
1// Basic destring
2destring state_fips, replace
3
4// If the string contains non-numeric characters, force it
5destring state_fips, replace force
6// WARNING: force converts non-numeric values to missing
7
8// Generate a new variable instead of replacing
9destring state_fips, generate(state_fips_num)
10
11// Check for problems before converting
12destring state_fips, replace
13// If this fails, check what's in the variable:
14tab state_fips if real(state_fips) == .
Never use force blindly
destring var, replace force silently converts non-numeric values to missing (.). If your string contains “N/A”, “missing”, or any text, those observations become missing without warning. Always check with tab first.

Fix B: Convert Numeric to String with tostring

tostring-fix.do
stata
1// Basic tostring
2tostring state_fips, replace
3
4// Add leading zeros to match the string format
5tostring state_fips, replace format(%02.0f)
6// Now 6 becomes "06", 12 stays "12"
7
8// For 5-digit FIPS (state + county):
9tostring county_fips, replace format(%05.0f)
10// Now 6037 becomes "06037" (LA County, CA)
11
12// Check the result
13list state_fips in 1/5
. tostring state_fips, replace format(%02.0f)\nlist state_fips in 1/5
     ┌────────────┐
     │ state_fips │
     ├────────────┤
  1. │         01 │
  2. │         02 │
  3. │         06 │
  4. │         12 │
  5. │         36 │
     └────────────┘

The Leading Zeros Problem

This deserves its own section because it trips up every researcher working with geographic data.

Wrong — loses leading zeros
stata
// Dataset 1: FIPS as string "06"
// Dataset 2: FIPS as numeric 6
// Converting string to numeric:
destring state_fips, replace
// "06" becomes 6 — they now match!
// But: you've lost the leading zero forever
// Later when you export:
// California shows as "6" not "06"
// Merge with another string dataset? Fails again.
Right — preserves leading zeros
stata
// Convert numeric to string WITH format:
tostring state_fips, replace format(%02.0f)
// 6 becomes "06" — they now match!
// Leading zero is preserved
// Even better — keep both:
tostring state_fips, gen(state_fips_str) format(%02.0f)
// Now you have both versions
💡Best practice for geographic codes
Always store FIPS codes, ZIP codes, NAICS codes, and similar identifiers as strings. This preserves leading zeros and prevents accidental arithmetic on codes. Add confirm string variable fipsat the top of your do-file to catch mistakes early.

Preventing Type Mismatch

prevention.do
stata
1// Check types before merging — add this to your do-file
2confirm string variable state_fips
3// Throws an error if state_fips is not string
4
5confirm numeric variable year
6// Throws an error if year is not numeric
7
8// Or use assert with storage type checking:
9describe state_fips
10assert r(type) == "str5"

Handling Whitespace in String Merges

Even after matching types, string merges can fail silently if one variable has trailing spaces or different capitalization. The merge runs but matches zero observations.

string-cleanup.do
stata
1// Remove leading and trailing whitespace
2replace state_name = strtrim(state_name)
3
4// Remove ALL internal extra spaces
5replace state_name = stritrim(state_name)
6
7// Standardize capitalization
8replace state_name = strupper(state_name)
9// or
10replace state_name = strlower(state_name)
11// or
12replace state_name = strproper(state_name)
👁Silent merge failures
If your merge key is a string with trailing spaces in one dataset, the merge will run without errors but match zero observations. Always strtrim() string keys before merging.

Sytra catches these errors before you run.

Sytra detects type mismatches before you run the merge and suggests the correct conversion — including leading zero preservation for geographic codes.

Join the Waitlist →

FAQ

What does type mismatch mean in Stata merge?

Type mismatch means the key variable is stored as a string in one dataset and as a number in the other. Stata cannot merge incompatible types. Use describe in both datasets to check.

How do I fix type mismatch in Stata merge?

Use destring to convert string to numeric, or tostring to convert numeric to string. Make both datasets use the same type before merging.

Why are my FIPS codes causing type mismatch?

FIPS codes like 01, 02, 06 are often stored as strings to preserve leading zeros. When one dataset stores them as numeric (1, 2, 6), you get a type mismatch. Convert numeric to string withtostring var, replace format(%02.0f) to add leading zeros.

Should I convert to string or numeric for merging?

For codes with leading zeros (FIPS, ZIP, NAICS), convert to string. For numeric IDs without leading zeros, numeric is simpler and avoids whitespace issues.


Written by Sytra Team
Research Engineering Team, Sytra AI

We build practical, reproducible workflows for Stata and R teams working on real empirical research pipelines.

#Stata#Errors#Merge#Data Management

Enjoyed this article?