CacheU
High Level Design

Geospatial System Design

A deep dive into geospatial indexing and querying using Quadtrees, Geohashing, and Google S2. Explains how large-scale systems like ride-sharing and mapping platforms efficiently store and query location data at global scale.

Geospatial System Design

Modern applications frequently deal with location-based data:

  • Ride-sharing apps
  • Food delivery services
  • Navigation systems
  • Location-based search
  • Social check-ins
  • Logistics tracking

Companies like:

  • :contentReference[oaicite:0]{index=0}
  • :contentReference[oaicite:1]{index=1}
  • :contentReference[oaicite:2]{index=2}

process millions of geospatial queries every second.

Examples of such queries include:

  • Find drivers within 3 km
  • Show restaurants near the user
  • Track vehicles moving across regions
  • Detect objects inside a geographic boundary

Efficiently answering these queries requires specialized spatial indexing techniques.

This article explores three key methods used in distributed systems:

  1. Quadtrees
  2. Geohashing
  3. Google S2 Geometry

The Challenge of Geospatial Queries

Location data is typically represented as:

(latitude, longitude)

Example:

LocationLatitudeLongitude
New York40.7128-74.0060
London51.5072-0.1276
Tokyo35.6762139.6503

A naive database query might look like:

SELECT * FROM drivers
WHERE distance(driver_location, user_location) < 5km;

This becomes inefficient at scale because:

  • Distance calculations are expensive
  • Millions of points must be scanned
  • Databases cannot easily index spherical coordinates

Thus we need spatial partitioning techniques.


Spatial Partitioning

The core idea:

Divide the Earth's surface into smaller regions so queries only search relevant areas.

Diagram
flowchart TD Earth --> Region1 Earth --> Region2 Earth --> Region3 Earth --> Region4

Benefits:

  • Faster queries
  • Efficient indexing
  • Scalable distributed systems

Quadtree

Concept

A Quadtree recursively divides a 2D space into four equal quadrants.

Each node represents a geographic area.

If too many points exist in a region, it is subdivided further.


Quadtree Structure

Diagram
graph TD Root --> Q1 Root --> Q2 Root --> Q3 Root --> Q4 Q1 --> Q11 Q1 --> Q12 Q1 --> Q13 Q1 --> Q14

Each subdivision increases spatial resolution.


Example

Imagine storing drivers across a city.

Diagram
flowchart TD City --> A City --> B City --> C City --> D A --> A1 A --> A2 A --> A3 A --> A4

If region A contains many drivers, it is subdivided further.


Quadtree Node Structure

Example structure:

class QuadTreeNode {
  constructor(boundary, capacity) {
    this.boundary = boundary
    this.capacity = capacity
    this.points = []
    this.divided = false
  }
}

Querying with Quadtrees

To find nearby objects:

  1. Identify the quadrant containing the query location
  2. Search that node
  3. Search adjacent nodes if needed

Quadtree Query Flow

Diagram
sequenceDiagram participant Client participant GeoService participant QuadTree Client->>GeoService: Find drivers near location GeoService->>QuadTree: Locate quadrant QuadTree-->>GeoService: Candidate drivers GeoService-->>Client: Nearby drivers

Advantages

BenefitExplanation
Efficient range queriesSearch limited area
Simple structureEasy to implement
DynamicHandles uneven distributions

Limitations

ProblemExplanation
Uneven densityUrban areas cause deep trees
Memory overheadMany nodes
Poor global indexingHarder for distributed systems

Geohashing

Concept

Geohashing converts geographic coordinates into string-based grid identifiers.

Example:

Latitude: 37.7749
Longitude: -122.4194
Geohash: 9q8yy

Nearby locations share similar prefixes.


Geohash Grid

Diagram
flowchart TD World --> Cell1 World --> Cell2 World --> Cell3 World --> Cell4

Each cell represents a rectangular region.

Longer geohashes = higher precision.


Precision Levels

Geohash LengthArea Size
3~150 km
5~5 km
7~150 m
9~5 m

Geohash Example

Nearby locations:

Location A → 9q8yy
Location B → 9q8yz
Location C → 9q8yv

Shared prefix:

9q8y

This indicates they are geographically close.


Geohash Indexing in Databases

Location data can be indexed as strings.

Example schema:

CREATE TABLE drivers (
  id INT,
  geohash VARCHAR(12),
  latitude FLOAT,
  longitude FLOAT
);

Query:

SELECT * FROM drivers
WHERE geohash LIKE '9q8y%';

Geohash Architecture

Diagram
flowchart LR Location --> GeohashEncoder GeohashEncoder --> Database Query --> PrefixSearch PrefixSearch --> Results

Advantages

BenefitExplanation
Simple indexingWorks with standard DB indexes
Fast prefix queriesEfficient for nearby searches
CompactSmall representation

Limitations

ProblemExplanation
Edge issuesNearby points may fall in different cells
Rectangular gridNot ideal for spherical Earth
Precision tradeoffsSmaller cells increase complexity

Google S2 Geometry

Concept

The S2 geometry system represents Earth as a hierarchical grid of cells mapped onto a sphere.

It was developed by:

  • Google

and is used in:

  • Google Maps
  • Uber

Why Not Use Flat Grids?

Earth is spherical.

Flat grid systems introduce distortions.

S2 solves this by mapping the sphere onto a cube projection.


S2 Projection Model

Diagram
flowchart TD Earth --> Cube Cube --> Face1 Cube --> Face2 Cube --> Face3 Cube --> Face4 Cube --> Face5 Cube --> Face6

Each cube face is then subdivided into smaller cells.


S2 Hierarchical Cells

Each cell has a unique identifier.

Cells can be subdivided recursively.

Diagram
graph TD Cell --> SubCell1 Cell --> SubCell2 Cell --> SubCell3 Cell --> SubCell4

This creates a multi-resolution spatial index.


Example Cell Levels

LevelCell Size
Level 5~250 km
Level 10~10 km
Level 15~300 m
Level 20~10 m

S2 Cell ID

Each cell is encoded as a 64-bit integer.

Example:

Cell ID: 9260949627242122337

This allows efficient database indexing.


Spatial Query with S2

Diagram
sequenceDiagram participant Client participant GeoService participant S2Index Client->>GeoService: Find nearby drivers GeoService->>S2Index: Compute cell ID S2Index-->>GeoService: Neighbor cells GeoService-->>Client: Nearby drivers

Comparing the Approaches

FeatureQuadtreeGeohashS2
Space representation2D gridRectangular gridSpherical cells
Index typeTreeString prefixInteger hierarchy
Query performanceGoodGoodExcellent
Distributed systemsModerateGoodExcellent
Global accuracyModerateModerateHigh

Large Scale Geospatial Architecture

Diagram
flowchart LR MobileApp --> API API --> GeoService GeoService --> SpatialIndex SpatialIndex --> Database

Components:

ComponentRole
Mobile appSends location queries
APIReceives requests
Geo serviceHandles spatial logic
Spatial indexEfficient location lookup
DatabaseStores entities

Example: Ride Matching System

Used by ride-sharing services.

Diagram
flowchart LR Rider --> RideService RideService --> GeoIndex GeoIndex --> DriverDatabase DriverDatabase --> NearbyDrivers

Steps:

  1. Rider requests ride
  2. System computes geospatial cell
  3. Nearby drivers retrieved
  4. Matching algorithm selects driver

Geospatial Scaling Challenges

Large systems must handle:

ChallengeExplanation
Billions of locationsVehicles, users
Real-time updatesGPS updates every few seconds
Low latencyQuery must return quickly
Global coverageEntire Earth indexed

Optimization Techniques

Region-Based Sharding

Split data by geographic regions.

Diagram
flowchart LR World --> USShard World --> EuropeShard World --> AsiaShard

Caching Hot Regions

Cities with heavy traffic are cached.

Example:

CityCache
New YorkHot region
LondonHot region
TokyoHot region

Streaming Location Updates

Vehicle positions are streamed through messaging systems.

Example architecture:

Diagram
flowchart LR DriverApp --> LocationStream LocationStream --> GeoIndexer GeoIndexer --> SpatialDB

Best Practices

Use Hierarchical Spatial Indexes

Hierarchical systems support efficient range queries.


Combine Spatial Index with Database

Use geospatial indexes with distributed databases.


Cache Hot Locations

Urban areas produce the majority of queries.


Batch Location Updates

Frequent updates can overwhelm systems.


Summary

Geospatial system design is essential for location-based applications.

Three common spatial indexing approaches are:

TechniqueUse Case
QuadtreesSimple spatial partitioning
GeohashingDatabase-friendly indexing
S2 GeometryHigh precision global systems

Large-scale platforms like:

  • Uber
  • Google Maps

rely on these techniques to efficiently process millions of geospatial queries every second.

Understanding these spatial indexing methods is crucial for designing scalable location-aware distributed systems.