CacheU
High Level Design

Distributed ID Generation

A comprehensive guide to distributed ID generation in large-scale systems, covering why unique IDs are hard in distributed environments, common strategies like UUID, database sequences, Twitter Snowflake, and modern approaches used by large-scale platforms.

Distributed ID Generation

Introduction

Every system needs identifiers.

Examples:

EntityExample ID
User`user_984223`
Order`order_883912`
Tweet/Post`174823749128`
Payment`pay_223821`

At small scale, generating IDs is simple:

Auto Increment: 1, 2, 3, 4, 5

But in distributed systems, things become much harder.

Modern platforms like:

  • Twitter
  • Instagram
  • Uber

generate millions of IDs per second across thousands of servers.

The challenge becomes:

RequirementExplanation
Global uniquenessNo two IDs should ever collide
High throughputMillions of IDs per second
DistributedGenerated across many machines
Orderable (sometimes)IDs may need time ordering
Fault tolerantSystem should work even if nodes fail

Generating such IDs reliably is the problem of Distributed ID Generation.


Why ID Generation Becomes Hard in Distributed Systems

Imagine a system with 100 application servers.

If every server generates IDs locally:

Server A → ID 1
Server B → ID 1
Server C → ID 1

Now collisions happen.

Multiple servers generate duplicate IDs.

This creates severe issues:

  • Data corruption
  • Wrong records
  • Inconsistent references
  • Broken relationships between entities

Distributed System Architecture Problem

Diagram
flowchart LR User1 --> Server1 User2 --> Server2 User3 --> Server3 Server1 --> DB Server2 --> DB Server3 --> DB

Each server must generate IDs without coordination bottlenecks.


Key Requirements of Distributed ID Generators

PropertyWhy It Matters
UniquenessPrevent collisions
ScalabilityHandle millions of requests
AvailabilityWork even when nodes fail
PerformanceLow latency ID generation
Time orderingUseful for logs, analytics
CompactnessSmaller storage size

Approaches to Distributed ID Generation

Several strategies exist.

MethodOrderingScalabilityCoordination
Database Auto IncrementYesPoorCentralized
UUIDNoExcellentNone
Database Sequence with ShardingPartialGoodModerate
Twitter SnowflakeYesExcellentMinimal
Central ID ServiceYesMediumCentralized

Let's explore them deeply.


1. Database Auto-Increment IDs

How It Works

The database generates IDs automatically.

CREATE TABLE users (
   id BIGINT AUTO_INCREMENT,
   name VARCHAR(100)
);

Insert query:

INSERT INTO users(name) VALUES ('Alice');

Generated IDs:

1
2
3
4

Architecture

Diagram
flowchart LR App1 --> DB App2 --> DB App3 --> DB DB --> ID["Auto Increment ID Generator"]

Problems

IssueExplanation
BottleneckAll requests hit one DB
ScalabilityLimits horizontal scaling
LatencyDB round trip needed
Failure riskDB outage blocks ID generation

For large-scale systems, this approach does not scale.


2. UUID (Universally Unique Identifier)

UUID is a 128-bit identifier designed to be globally unique.

Example:

550e8400-e29b-41d4-a716-446655440000

UUID Generation

Generated locally without coordination.

import { randomUUID } from "crypto"
 
const id = randomUUID()
console.log(id)

Structure (UUID v4)

Diagram
flowchart LR RandomBits --> UUID["128-bit Random ID"]

Advantages

AdvantageExplanation
No coordinationEach node generates IDs independently
Highly scalableWorks across millions of nodes
Practically collision-freeExtremely low collision probability

Problems

ProblemExplanation
Large size128-bit storage
Poor indexingRandom order harms DB indexes
No orderingCannot determine creation time

For high-scale databases, UUIDs fragment indexes heavily.


3. Centralized ID Generation Service

Another approach is creating a dedicated ID generation service.

Architecture:

Diagram
flowchart LR App1 --> IDService App2 --> IDService App3 --> IDService IDService --> DB

Each request:

GET /generate-id

Response:

983482394

Advantages

AdvantageExplanation
Central controlEasy management
Ordered IDsSequential generation

Problems

ProblemExplanation
Single point of failureService outage blocks system
Scalability limitsHigh load on ID service
Network latencyExtra API call

Large systems avoid this design.


4. Twitter Snowflake ID Generator

One of the most famous distributed ID generators.

Developed by:

  • Twitter

Key Idea

Generate IDs using:

Timestamp + Machine ID + Sequence

This ensures:

  • uniqueness
  • ordering
  • scalability

Snowflake ID Structure

64-bit integer:

BitsComponent
41 bitsTimestamp
10 bitsMachine ID
12 bitsSequence Number

Visual Structure

Diagram
flowchart LR Timestamp["41 bits\nTimestamp"] Machine["10 bits\nMachine ID"] Sequence["12 bits\nSequence"] Timestamp --> ID Machine --> ID Sequence --> ID ID["64-bit Snowflake ID"]

Example ID

154742918274918

Internally contains:

timestamp = creation time
machine id = server
sequence = request counter

How Snowflake Generates IDs

Steps:

  1. Get current timestamp
  2. Use server's machine ID
  3. Increment sequence number
  4. Combine bits

Snowflake Generation Flow

Diagram
flowchart TD Request --> Timestamp Timestamp --> MachineID MachineID --> Sequence Sequence --> Combine Combine --> GeneratedID

Benefits

FeatureExplanation
DistributedEach node generates IDs locally
OrderedIDs roughly follow time
FastNo network calls
CompactOnly 64 bits

Capacity

Snowflake supports:

MetricCapacity
Machines1024
IDs per machine per millisecond4096
IDs per secondMillions

Real-World Systems Using Snowflake-like IDs

Many companies built similar systems.

CompanySystem
TwitterSnowflake
InstagramSharded ID generation
DiscordSnowflake variant
SonySonyflake

5. Database Sequence with Sharding

Another approach:

Split sequences across shards.

Example:

Server1 generates:

1,4,7,10

Server2 generates:

2,5,8,11

Server3 generates:

3,6,9,12

Architecture

Diagram
flowchart LR App1 --> DB1 App2 --> DB2 App3 --> DB3 DB1 --> Seq1 DB2 --> Seq2 DB3 --> Seq3

Trade-offs

AdvantageDisadvantage
SimpleStill DB dependent
Ordered within shardNot globally ordered

ID Ordering and Database Indexing

Why ordering matters.

Databases store indexes as B-trees.

Sequential IDs:

1 → 2 → 3 → 4

Work efficiently.

Random IDs:

A23 → 7FF → 91A

Cause:

  • page splits
  • index fragmentation
  • slower inserts

Snowflake solves this because IDs are time ordered.


Global ID Generation Architecture

Typical distributed architecture:

Diagram
flowchart LR UserRequests --> AppServer1 UserRequests --> AppServer2 UserRequests --> AppServer3 AppServer1 --> Snowflake1 AppServer2 --> Snowflake2 AppServer3 --> Snowflake3 Snowflake1 --> DB Snowflake2 --> DB Snowflake3 --> DB

Each server generates IDs locally.


Failure Handling

Edge cases:

Clock Drift

If system clock moves backward:

Snowflake may generate duplicate IDs.

Solutions:

  • NTP synchronization
  • wait until time catches up

Machine ID Conflicts

If two machines use the same ID.

Solution:

  • assign IDs via configuration
  • service discovery

Comparison of ID Generation Strategies

MethodOrderedDistributedPerformanceStorage
Auto IncrementYesNoSlowSmall
UUIDNoYesFastLarge
Central ServiceYesLimitedMediumSmall
SnowflakeYesYesVery FastSmall

Snowflake-style generators are the industry standard today.


Best Practices

Prefer 64-bit IDs

Efficient storage and indexing.


Avoid Random UUIDs in Databases

They cause index fragmentation.


Use Snowflake-like systems

For:

  • distributed microservices
  • high throughput platforms

Ensure Clock Synchronization

Critical for time-based IDs.


Final Architecture Summary

Diagram
flowchart TD Requests --> ServiceCluster ServiceCluster --> Node1 ServiceCluster --> Node2 ServiceCluster --> Node3 Node1 --> IDGenerator1 Node2 --> IDGenerator2 Node3 --> IDGenerator3 IDGenerator1 --> Database IDGenerator2 --> Database IDGenerator3 --> Database

Every node generates unique, ordered IDs without coordination.


Key Takeaways

ConceptInsight
Distributed ID generationEssential for scalable systems
Auto incrementNot suitable for distributed architecture
UUIDHighly scalable but poor indexing
SnowflakeBest balance of ordering and scalability
Time-based IDsImprove database performance

Distributed ID generation is a fundamental building block of scalable architectures, enabling large platforms to create billions of records reliably without coordination bottlenecks.