GIS Data in Stata: Spatial Coordinates, Distance Features, and Regional Plots
A practical GIS data workflow in Stata using latitude/longitude validation, distance engineering, and map-ready regional outputs.
You imported coordinates, but one malformed latitude value can push distance features off by hundreds of kilometers.
You will build a defensible GIS workflow in Stata that validates coordinates and produces model-ready spatial variables.
All examples tested in Stata 18 SE. Compatible with Stata 15+.
Quick Answer
- Validate latitude and longitude bounds before any feature engineering.
- Create stable keys (`firm_id`, `year`) and region labels early.
- Compute distances with a documented formula and inspect distributions.
- Collapse to analysis grain and visualize regional trends before modeling.
Create Spatial Features You Can Defend in Review
Validate spatial coordinates and construct regional groups
Spatial workflows fail early when coordinate quality checks are skipped. Run bound assertions first so malformed rows are caught before derived variables propagate.
After validation, assign region tags that match your study design and are easy to audit in summaries.
If you are extending this pipeline, also review reghdfe in Stata: High-Dimensional Fixed Effects and Importing Data into Stata.
1clear all2version 183set seed 2602104set obs 150056gen firm_id = ceil(_n/6)7gen year = 2016 + mod(_n,8)8gen education = 10 + floor(runiform()*8)9gen wage = 18 + 0.7*education + 0.2*(year-2016) + rnormal(0,2)1011gen latitude = 25 + runiform()*2412gen longitude = -124 + runiform()*581314assert inrange(latitude, -90, 90)15assert inrange(longitude, -180, 180)1617gen region = cond(latitude >= 37, "north", "south")18tab region region | Freq. Percent Cum.
------------+-----------------------------------
north | 745 49.67 49.67
south | 755 50.33 100.00
------------+-----------------------------------
Total | 1,500 100.00Engineer distance features and visualize regional wage trends
Distance-to-hub features are often more informative than raw coordinates in economic applications. Keep the formula explicit and constant choices documented.
After feature construction, aggregate by region-year and plot trends to spot structural differences before formal estimation.
1clear all2version 183set seed 2602104set obs 150056gen firm_id = ceil(_n/6)7gen year = 2016 + mod(_n,8)8gen education = 10 + floor(runiform()*8)9gen wage = 18 + 0.7*education + 0.2*(year-2016) + rnormal(0,2)1011gen latitude = 25 + runiform()*2412gen longitude = -124 + runiform()*581314assert inrange(latitude, -90, 90)15assert inrange(longitude, -180, 180)1617gen region = cond(latitude >= 37, "north", "south")18tab region1920* ---- Section-specific continuation ----21gen dlat = (latitude-41.8781)*c(pi)/18022gen dlon = (longitude+87.6298)*c(pi)/18023gen a = sin(dlat/2)^2 + cos(latitude*c(pi)/180)*cos(41.8781*c(pi)/180)*sin(dlon/2)^224gen c_arc = 2*asin(min(1,sqrt(a)))25gen distance_km = 6371*c_arc26drop dlat dlon a c_arc2728collapse (mean) mean_wage=wage mean_edu=education mean_distance=distance_km, by(region year)2930twoway (line mean_wage year if region=="north", lcolor(navy)) "stata-comment">///31 (line mean_wage year if region=="south", lcolor(maroon)), "stata-comment">///32 legend(order(1 "North region" 2 "South region")) "stata-comment">///33 ytitle("Mean wage") xtitle("Year")3435summ mean_distanceVariable | Obs Mean Std. dev. Min Max -------------+--------------------------------------------------------- mean_distance| 16 1842.553 496.2241 1194.822 2710.447
Common Errors and Fixes
"type mismatch"
Latitude or longitude was imported as a string and then used in arithmetic without conversion.
Inspect variable types with `describe` and convert coordinate strings using `destring` before calculations.
type mismatch r(109);
gen latitude = "34.50"gen distance_km = abs(latitude-37.77)*111gen latitude = "34.50"destring latitude, replacegen distance_km = abs(latitude-37.77)*1111describe latitude longitude2destring latitude longitude, replace3gen distance_km = abs(latitude-37.77)*1114summ distance_km. destring latitude longitude, replace latitude: all characters numeric; replaced as double longitude: all characters numeric; replaced as double
Command Reference
twoway line
Stata docs โPlots spatially grouped trends after coordinate-derived feature engineering.
if conditionDraws separate layers for regions or groupslcolor()Assigns clear color separation across regionslegend(order())Controls legend text and orderytitle()/xtitle()Adds publication-ready axis labelsHow Sytra Handles This
Sytra can audit coordinate quality, generate distance features from hub locations, and build reusable spatial feature blocks for panel regressions.
A direct natural-language prompt for this exact workflow:
Validate latitude and longitude bounds, create region indicators, compute distance_km from a specified hub, collapse to region-year means, and produce a two-line trend chart for mean wage.Sytra catches these errors before you run.
Sytra can audit coordinate quality, generate distance features from hub locations, and build reusable spatial feature blocks for panel regressions.
Join the Waitlist โFAQ
Can Stata handle GIS workflows without external mapping software?
Yes. Stata can validate coordinates, engineer distance features, and produce spatially structured plots for many empirical workflows.
What is the first quality check for spatial data?
Check latitude and longitude bounds immediately. Out-of-range coordinates can silently corrupt distance calculations and regional assignments.
How should I merge spatial features with panel data?
Create stable keys such as firm_id-year, compute spatial features in one script, and merge only after uniqueness checks with isid.
Related Guides
- Stata Dates: Formatting, Converting, and Working with Date Variables
- API Data in Stata: Import JSON/CSV Feeds and Build Analysis-Ready Panels
- How to Merge Datasets in Stata: 1:1, m:1, 1:m with Complete Examples
- Linked Datasets in Stata: frlink/frget Workflows Instead of Repeated Merges
- Explore the graphics pillar page
- Open the full graphics guide index
- Browse all Stata & R guides on the blog index
- Browse all Stata pillars
We build practical, reproducible workflows for Stata and R teams working on real empirical research pipelines.