Design a Ride-Sharing App (Uber) — Cracking the System Design Interview

Ride-sharing platforms like Uber and Lyft process millions of rides daily. Behind the seamless “tap a button, get a car” experience lies a distributed system that must locate nearby drivers in milliseconds, match them intelligently, track vehicles in real time, and compute fares dynamically. This lesson walks through the architecture that makes it all work.

Understanding the Problem

Before we draw a single box, let’s pin down what the system must do and how well it must do it.

Functional Requirements

Request a ride — Rider specifies pickup and drop-off locations
Match with a nearby driver — Find available drivers and assign the best one
Real-time tracking — Both rider and driver see each other’s location live
Fare estimation — Show estimated cost before confirming
Payment processing — Charge rider, pay driver (minus commission)
Ride history — Past trips for both riders and drivers
Ratings — Mutual rating after each ride (1-5 stars)
Surge pricing — Increase fares during high-demand periods

Non-Functional Requirements

Requirement	Target
Matching latency	< 30 seconds from request to driver assignment
Location update frequency	Every 3-4 seconds per active driver
System availability	99.99% (riders stranded = bad)
ETA accuracy	Within 2 minutes of actual arrival
Scale	5 million active drivers, 20 million rides/day

A back-of-the-envelope calculation: 5 million drivers sending location updates every 4 seconds = 1.25 million writes/second to the location store alone. This is the defining scalability challenge.

Core Entities and APIs

Data Model

-- Core entities
CREATE TABLE riders (
    id          UUID PRIMARY KEY,
    name        VARCHAR(100),
    email       VARCHAR(255) UNIQUE,
    phone       VARCHAR(20),
    rating_avg  DECIMAL(2,1) DEFAULT 5.0,
    created_at  TIMESTAMP DEFAULT NOW()
);

CREATE TABLE drivers (
    id              UUID PRIMARY KEY,
    name            VARCHAR(100),
    vehicle_type    VARCHAR(20),  -- 'economy', 'premium', 'xl'
    license_plate   VARCHAR(20),
    rating_avg      DECIMAL(2,1) DEFAULT 5.0,
    is_available    BOOLEAN DEFAULT false,
    current_lat     DECIMAL(10,7),
    current_lng     DECIMAL(10,7),
    updated_at      TIMESTAMP
);

CREATE TABLE rides (
    id              UUID PRIMARY KEY,
    rider_id        UUID REFERENCES riders(id),
    driver_id       UUID REFERENCES drivers(id),
    status          VARCHAR(20),  -- MATCHING, EN_ROUTE, IN_PROGRESS, COMPLETED, CANCELLED
    pickup_lat      DECIMAL(10,7),
    pickup_lng      DECIMAL(10,7),
    dropoff_lat     DECIMAL(10,7),
    dropoff_lng     DECIMAL(10,7),
    fare_estimate   DECIMAL(10,2),
    fare_actual     DECIMAL(10,2),
    surge_multiplier DECIMAL(3,2) DEFAULT 1.00,
    requested_at    TIMESTAMP,
    started_at      TIMESTAMP,
    completed_at    TIMESTAMP
);

CREATE TABLE ratings (
    id          UUID PRIMARY KEY,
    ride_id     UUID REFERENCES rides(id),
    from_user   UUID,
    to_user     UUID,
    score       SMALLINT CHECK (score BETWEEN 1 AND 5),
    comment     TEXT
);

API Design

# Rider APIs
POST   /api/v1/rides/estimate
  body: { pickup: {lat, lng}, dropoff: {lat, lng}, vehicle_type }
  response: { fare_min, fare_max, surge_multiplier, eta_minutes }

POST   /api/v1/rides
  body: { pickup: {lat, lng}, dropoff: {lat, lng}, vehicle_type, payment_method_id }
  response: { ride_id, status: "MATCHING", eta_seconds }

GET    /api/v1/rides/{ride_id}
  response: { ride_id, status, driver: {name, vehicle, lat, lng}, eta }

DELETE /api/v1/rides/{ride_id}
  response: { cancellation_fee }

POST   /api/v1/rides/{ride_id}/rate
  body: { score: 5, comment: "Great driver" }

# Driver APIs
PUT    /api/v1/drivers/me/location
  body: { lat, lng, heading, speed }

PUT    /api/v1/drivers/me/availability
  body: { is_available: true }

POST   /api/v1/rides/{ride_id}/accept
POST   /api/v1/rides/{ride_id}/decline
POST   /api/v1/rides/{ride_id}/start
POST   /api/v1/rides/{ride_id}/complete

High-Level Design

The system decomposes into six core services, each owning a clear domain.

Uber ride-sharing high-level architecture showing Rider App and Driver App connecting through API Gateway to Ride Service, Matching Service, Location Service, ETA Service, Payment Service, and Notification Service, with Redis for driver locations, PostgreSQL for trips, and Kafka for events

Ride Service orchestrates the trip lifecycle: create, match, start, complete, cancel. It’s the single source of truth for ride state.

Location Service ingests millions of driver location updates per second. It maintains a geospatial index (GeoHash in Redis) so the matching service can query “find all available drivers within 3km of this point” in under 10ms.

Matching Service takes a ride request, queries nearby drivers from the location service, ranks them, and sends ride offers. If a driver declines or times out, it moves to the next candidate.

ETA Service integrates with routing/map data (OSRM or Google Maps) to estimate travel time. This feeds into fare estimation and driver ranking.

Payment Service handles fare calculation, payment processing, and driver payouts. It integrates with Stripe or similar payment gateways.

Notification Service pushes real-time updates to riders and drivers via WebSockets, and fall-back push notifications when the app is backgrounded.

Real-Time Communication

REST APIs handle request-response flows, but real-time tracking needs persistent connections:

Rider App  ←→  WebSocket Server  ←→  Location Service
Driver App ←→  WebSocket Server  ←→  Location Service

When a driver’s location updates, the WebSocket server pushes the new position to the matched rider’s connection. During an active ride, this happens every 3-4 seconds, giving the smooth “car moving on map” experience.

Deep Dive: Geospatial Indexing

The most critical data structure question in this design: how do you efficiently answer “find all drivers within 3km of point (lat, lng)“?

Comparison of QuadTree and GeoHash spatial indexing approaches showing how QuadTree adaptively subdivides dense areas while GeoHash uses fixed grid cells with string prefixes for proximity

Option 1: GeoHash (Uber’s Choice)

GeoHash encodes a latitude/longitude pair into a string. Points that are geographically close share a common prefix:

San Francisco downtown: 9q8yy
San Francisco Mission:  9q8yz
New York Times Square:  dr5ru

The key insight: string prefix = spatial proximity. To find nearby drivers, you search for all GeoHash strings sharing a prefix with the rider’s location.

# Redis stores driver locations with GEOADD
# GEOADD key longitude latitude member
GEOADD drivers:available -122.4194 37.7749 "driver_abc"
GEOADD drivers:available -122.4089 37.7839 "driver_xyz"

# Find drivers within 3km radius
GEOSEARCH drivers:available
  FROMLONLAT -122.4194 37.7749
  BYRADIUS 3 km
  ASC                    # sort by distance
  COUNT 20               # limit results
  WITHCOORD WITHDIST     # include coordinates and distance

Redis’s GEOSEARCH uses a sorted set with GeoHash-encoded scores internally, giving O(N+log(M)) performance where N is results and M is total members.

Why GeoHash wins for Uber:

Trivially sharded — partition by GeoHash prefix (city/region)
Redis native support — no custom data structures needed
Simple neighbor queries — check the 8 adjacent cells for edge cases
Horizontal scaling — add Redis nodes per region as demand grows

Option 2: QuadTree

A QuadTree recursively subdivides 2D space into four quadrants. Dense areas (downtown Manhattan) get subdivided further; sparse areas (rural Wyoming) stay as large cells.

class QuadTree:
    def __init__(self, boundary, capacity=10):
        self.boundary = boundary   # Rectangle
        self.capacity = capacity   # Max points before split
        self.drivers = []
        self.divided = False
        self.nw = self.ne = self.sw = self.se = None

    def insert(self, driver):
        if not self.boundary.contains(driver.location):
            return False
        if len(self.drivers) < self.capacity:
            self.drivers.append(driver)
            return True
        if not self.divided:
            self._subdivide()
        return (self.nw.insert(driver) or self.ne.insert(driver) or
                self.sw.insert(driver) or self.se.insert(driver))

    def query_range(self, search_area):
        """Find all drivers within a search rectangle."""
        results = []
        if not self.boundary.intersects(search_area):
            return results
        for driver in self.drivers:
            if search_area.contains(driver.location):
                results.append(driver)
        if self.divided:
            results.extend(self.nw.query_range(search_area))
            results.extend(self.ne.query_range(search_area))
            results.extend(self.sw.query_range(search_area))
            results.extend(self.se.query_range(search_area))
        return results

QuadTrees are elegant but harder to distribute across servers. They’re better suited for in-memory, single-node scenarios. Uber historically used a custom S2-based approach (Google’s S2 geometry library), which combines the best of both worlds: hierarchical cell IDs that map to GeoHash-like strings.

Option 3: S2 Cells (Google’s S2 Geometry)

S2 projects Earth’s surface onto a cube and recursively subdivides each face into cells at different levels. Each cell has a 64-bit ID. Like GeoHash, nearby cells share prefixes. Unlike GeoHash, S2 cells have no edge discontinuities across poles or the antimeridian.

For this interview, GeoHash + Redis is the recommended answer — it’s well-understood, production-proven, and Redis provides native support.

Deep Dive: Driver Matching Algorithm

When a rider requests a ride, the matching service must pick the optimal driver. This is not simply “closest driver.”

The Ranking Formula

def rank_drivers(rider_location, candidate_drivers, ride_details):
    """
    Score each driver. Higher score = better match.
    """
    scored = []
    for driver in candidate_drivers:
        eta = eta_service.get_eta(driver.location, rider_location)
        
        score = (
            W_ETA     * normalize(eta, max_eta=600)       +  # 0-1, lower ETA = higher
            W_RATING  * (driver.rating / 5.0)              +  # 0-1
            W_ACCEPT  * driver.acceptance_rate              +  # 0-1
            W_VEHICLE * vehicle_match(driver, ride_details)    # 1 if match, 0.5 if upgrade
        )
        scored.append((driver, score, eta))
    
    # Sort descending by score
    scored.sort(key=lambda x: x[1], reverse=True)
    return scored

# Typical weights (tuned via A/B testing)
W_ETA     = 0.50   # ETA dominates — riders want fast pickup
W_RATING  = 0.20   # Quality matters
W_ACCEPT  = 0.20   # Prefer drivers who accept (fewer re-matches)
W_VEHICLE = 0.10   # Vehicle type match

The Offer Cascade

The matching service doesn’t blast the ride to all nearby drivers simultaneously. It uses a cascading offer pattern:

Send the ride offer to the top-ranked driver
Start a 15-second timer
If the driver accepts → match confirmed, notify rider
If the driver declines or times out → move to the next driver
Repeat up to 5 attempts
If all 5 fail → expand the search radius and try again
After 60 seconds with no match → notify rider “no drivers available”

This approach prevents the “thundering herd” problem where all drivers receive the same request and only one wins.

Deep Dive: Real-Time Location Tracking at Scale

5 million active drivers, each sending GPS coordinates every 4 seconds. That’s 1.25 million location writes per second. Here’s how to handle it.

Write Path

Driver App → API Gateway → Location Service → Redis (GEOADD)
                                            → Kafka (location-updates topic)

The location service is stateless. It receives PUT /drivers/me/location and:

Updates the driver’s position in Redis with GEOADD
Publishes the update to Kafka for downstream consumers (analytics, ETA model training, surge pricing)

Why Redis Handles This

A single Redis instance can handle ~100K GEOADD operations per second. With 13 Redis shards (partitioned by geographic region), we comfortably handle 1.3M writes/second.

Region Shard Map:
  us-west    → redis-geo-01, redis-geo-02
  us-east    → redis-geo-03, redis-geo-04
  eu-west    → redis-geo-05, redis-geo-06
  southeast-asia → redis-geo-07, redis-geo-08
  ...

Each shard has a hot standby replica for failover. If a shard goes down, the replica promotes within seconds.

Read Path (During Active Ride)

When a rider is tracking their driver:

Location update from driver
  → Redis GEOADD
  → Publish to "ride:{ride_id}:location" channel
  → WebSocket server pushes to rider's connection
  → Rider's map updates

Latency from driver GPS to rider’s screen: under 500ms typically.

Deep Dive: Surge Pricing

Surge pricing balances supply and demand. When demand exceeds supply in an area, prices increase, which simultaneously:

Encourages more drivers to go online (supply increases)
Discourages marginal ride requests (demand decreases)

Algorithm

def compute_surge_multiplier(geohash_cell):
    """
    Compute surge for a geographic cell.
    Run every 2 minutes per cell.
    """
    # Count active ride requests in this cell (last 5 min)
    demand = count_requests(geohash_cell, window_minutes=5)
    
    # Count available drivers in this cell right now
    supply = count_available_drivers(geohash_cell)
    
    if supply == 0:
        return MAX_SURGE  # 5.0x cap
    
    ratio = demand / supply
    
    # Piecewise multiplier
    if ratio <= 1.0:
        return 1.0           # No surge
    elif ratio <= 2.0:
        return 1.0 + (ratio - 1.0) * 0.5   # 1.0x - 1.5x
    elif ratio <= 3.0:
        return 1.5 + (ratio - 2.0) * 1.0   # 1.5x - 2.5x
    else:
        return min(2.5 + (ratio - 3.0) * 0.5, MAX_SURGE)  # Up to 5.0x

Surge pricing is computed at the GeoHash cell level (roughly city-block-sized areas). A Kafka consumer reads ride request events and driver availability updates, recomputes the multiplier every 2 minutes, and stores it in Redis:

HSET surge:multipliers "9q8yy" "1.8"
HSET surge:multipliers "9q8yz" "1.0"

The fare estimation API reads this value when computing the estimate shown to the rider.

Deep Dive: Handling Network Partitions

What happens when a driver goes through a tunnel and loses connectivity?

Location staleness detection:

# On each GEOADD, also set an expiry key
SET driver:{id}:heartbeat 1 EX 30   # expires in 30 seconds

# Background job checks for stale drivers
def cleanup_stale_drivers():
    for driver_id in get_active_drivers():
        if not EXISTS(f"driver:{driver_id}:heartbeat"):
            # No heartbeat for 30+ seconds
            ZREM("drivers:available", driver_id)
            mark_driver_status(driver_id, "OFFLINE")

If a driver is mid-ride and disconnects:

The rider sees the last known position (location stops updating)
After 60 seconds of no updates, the system flags the ride as “driver unreachable”
The rider gets the option to cancel without a fee
When the driver reconnects, their location history (buffered on the device) is batch-uploaded to reconcile the trip

Key Takeaways

Decision	Choice	Why
Spatial index	GeoHash + Redis GEOSEARCH	Native support, easy to shard by region
Location store	Redis Cluster	1M+ writes/sec, GEOADD/GEOSEARCH built-in
Trip data	PostgreSQL	ACID for financial transactions (fares, payments)
Real-time push	WebSockets	Bidirectional, low-latency location streaming
Event bus	Kafka	Decouple services, replay-able location history
Matching	Ranked scoring + cascade	Balance speed, quality, and fairness
Surge pricing	Cell-level demand/supply ratio	Recomputed every 2 min, stored in Redis

The ride-sharing design is a showcase for geospatial data structures, real-time systems, and event-driven architecture. The interviewer will almost certainly probe on the GeoHash vs QuadTree trade-off and how you handle the scale of location updates — be ready to discuss both in depth.