Graphics
2026-03-0318 min read

GIS Data in Stata: Spatial Coordinates, Distance Features, and Regional Plots

A practical GIS data workflow in Stata using latitude/longitude validation, distance engineering, and map-ready regional outputs.

Sytra Team
Research Engineering Team, Sytra AI

You imported coordinates, but one malformed latitude value can push distance features off by hundreds of kilometers.

You will build a defensible GIS workflow in Stata that validates coordinates and produces model-ready spatial variables.

All examples tested in Stata 18 SE. Compatible with Stata 15+.


Quick Answer

  1. Validate latitude and longitude bounds before any feature engineering.
  2. Create stable keys (`firm_id`, `year`) and region labels early.
  3. Compute distances with a documented formula and inspect distributions.
  4. Collapse to analysis grain and visualize regional trends before modeling.

Create Spatial Features You Can Defend in Review

Validate spatial coordinates and construct regional groups

Spatial workflows fail early when coordinate quality checks are skipped. Run bound assertions first so malformed rows are caught before derived variables propagate.

After validation, assign region tags that match your study design and are easy to audit in summaries.

If you are extending this pipeline, also review reghdfe in Stata: High-Dimensional Fixed Effects and Importing Data into Stata.

gis-qa-regions.do
stata
1clear all
2version 18
3set seed 260210
4set obs 1500
5
6gen firm_id = ceil(_n/6)
7gen year = 2016 + mod(_n,8)
8gen education = 10 + floor(runiform()*8)
9gen wage = 18 + 0.7*education + 0.2*(year-2016) + rnormal(0,2)
10
11gen latitude = 25 + runiform()*24
12gen longitude = -124 + runiform()*58
13
14assert inrange(latitude, -90, 90)
15assert inrange(longitude, -180, 180)
16
17gen region = cond(latitude >= 37, "north", "south")
18tab region
. tab region
     region |      Freq.     Percent        Cum.
------------+-----------------------------------
      north |        745       49.67       49.67
      south |        755       50.33      100.00
------------+-----------------------------------
      Total |      1,500      100.00
๐Ÿ’กBounds checks are not optional
Coordinate assertions are cheap and prevent subtle downstream errors in spatial features.

Distance-to-hub features are often more informative than raw coordinates in economic applications. Keep the formula explicit and constant choices documented.

After feature construction, aggregate by region-year and plot trends to spot structural differences before formal estimation.

gis-distance-plot.do
stata
1clear all
2version 18
3set seed 260210
4set obs 1500
5
6gen firm_id = ceil(_n/6)
7gen year = 2016 + mod(_n,8)
8gen education = 10 + floor(runiform()*8)
9gen wage = 18 + 0.7*education + 0.2*(year-2016) + rnormal(0,2)
10
11gen latitude = 25 + runiform()*24
12gen longitude = -124 + runiform()*58
13
14assert inrange(latitude, -90, 90)
15assert inrange(longitude, -180, 180)
16
17gen region = cond(latitude >= 37, "north", "south")
18tab region
19
20* ---- Section-specific continuation ----
21gen dlat = (latitude-41.8781)*c(pi)/180
22gen dlon = (longitude+87.6298)*c(pi)/180
23gen a = sin(dlat/2)^2 + cos(latitude*c(pi)/180)*cos(41.8781*c(pi)/180)*sin(dlon/2)^2
24gen c_arc = 2*asin(min(1,sqrt(a)))
25gen distance_km = 6371*c_arc
26drop dlat dlon a c_arc
27
28collapse (mean) mean_wage=wage mean_edu=education mean_distance=distance_km, by(region year)
29
30twoway (line mean_wage year if region=="north", lcolor(navy)) "stata-comment">///
31 (line mean_wage year if region=="south", lcolor(maroon)), "stata-comment">///
32 legend(order(1 "North region" 2 "South region")) "stata-comment">///
33 ytitle("Mean wage") xtitle("Year")
34
35summ mean_distance
. summ mean_distance
    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
mean_distance|         16    1842.553    496.2241   1194.822   2710.447
โš ๏ธDocument radius and hub assumptions
Distance features depend on the Earth radius constant and hub coordinates; report these choices in methods notes.

Common Errors and Fixes

"type mismatch"

Latitude or longitude was imported as a string and then used in arithmetic without conversion.

Inspect variable types with `describe` and convert coordinate strings using `destring` before calculations.

. gen latitude = "34.50"
type mismatch
r(109);
This causes the error
wrong-way.do
stata
gen latitude = "34.50"
gen distance_km = abs(latitude-37.77)*111
This is the fix
right-way.do
stata
gen latitude = "34.50"
destring latitude, replace
gen distance_km = abs(latitude-37.77)*111
error-fix.do
stata
1describe latitude longitude
2destring latitude longitude, replace
3gen distance_km = abs(latitude-37.77)*111
4summ distance_km
. destring latitude longitude, replace
. destring latitude longitude, replace
latitude: all characters numeric; replaced as double
longitude: all characters numeric; replaced as double

Command Reference

Plots spatially grouped trends after coordinate-derived feature engineering.

twoway (line y x if group==...), legend(order()) ytitle() xtitle()
if conditionDraws separate layers for regions or groups
lcolor()Assigns clear color separation across regions
legend(order())Controls legend text and order
ytitle()/xtitle()Adds publication-ready axis labels

How Sytra Handles This

Sytra can audit coordinate quality, generate distance features from hub locations, and build reusable spatial feature blocks for panel regressions.

A direct natural-language prompt for this exact workflow:

sytra-prompt.txt
bash
Validate latitude and longitude bounds, create region indicators, compute distance_km from a specified hub, collapse to region-year means, and produce a two-line trend chart for mean wage.

Sytra catches these errors before you run.

Sytra can audit coordinate quality, generate distance features from hub locations, and build reusable spatial feature blocks for panel regressions.

Join the Waitlist โ†’

FAQ

Can Stata handle GIS workflows without external mapping software?

Yes. Stata can validate coordinates, engineer distance features, and produce spatially structured plots for many empirical workflows.

What is the first quality check for spatial data?

Check latitude and longitude bounds immediately. Out-of-range coordinates can silently corrupt distance calculations and regional assignments.

How should I merge spatial features with panel data?

Create stable keys such as firm_id-year, compute spatial features in one script, and merge only after uniqueness checks with isid.


Written by Sytra Team
Research Engineering Team, Sytra AI

We build practical, reproducible workflows for Stata and R teams working on real empirical research pipelines.

#Stata#GIS#Spatial Data#Graphics

Enjoyed this article?