software-design|March 19, 2026|13 min read

System Design Patterns for Scaling Writes

TL;DR

Writes are harder to scale than reads because you can't cache them. Use async queues to absorb spikes, shard your database to distribute write load, batch small writes into bulk operations, and pick storage engines (LSM-tree based) optimized for write throughput. Event sourcing gives you an immutable audit trail as a bonus.

System Design Patterns for Scaling Writes

In the companion article on scaling reads, we covered caching, replicas, and CQRS. Reads are the easier problem — you can throw caches and replicas at them. Writes are fundamentally harder because every write must eventually hit durable storage, maintain consistency, and survive failures.

This article covers the patterns I reach for when a system needs to handle high write throughput — from tens of thousands to millions of writes per second.

The Architecture at a Glance

Here’s how write-scaling patterns compose in a real system:

Write-Heavy Architecture Overview

The key insight: writes flow through layers. Each layer absorbs pressure so the next layer can process at its own pace. Let’s work through each pattern.


Pattern 1: Async Write Queues — Absorb the Spike

The single most effective pattern for write scaling: don’t write to the database synchronously. Put a queue in front of it.

enqueue

dequeue

dequeue

dequeue

API Server

Message Queue

Worker 1

Worker 2

Worker 3

Database

The queue acts as a shock absorber. Traffic spikes hit the queue, not the database. Workers consume at a steady rate the database can handle.

Implementation with Kafka

import { Kafka, Partitioners } from 'kafkajs';

const kafka = new Kafka({
  brokers: [process.env.KAFKA_BROKER],
  clientId: 'write-service',
});

// --- Producer: API Server ---
const producer = kafka.producer({
  createPartitioner: Partitioners.DefaultPartitioner,
});

async function handleOrderCreate(req: Request, res: Response) {
  const order = req.body;

  // Validate synchronously
  if (!order.items?.length) {
    return res.status(400).json({ error: 'Empty order' });
  }

  // Enqueue — returns instantly
  await producer.send({
    topic: 'orders',
    messages: [{
      key: order.userId, // Same user = same partition = ordering
      value: JSON.stringify({
        type: 'ORDER_CREATED',
        payload: order,
        timestamp: Date.now(),
      }),
    }],
  });

  // Return 202 Accepted — not 201 Created
  res.status(202).json({
    status: 'accepted',
    message: 'Order is being processed',
  });
}

// --- Consumer: Worker ---
const consumer = kafka.consumer({ groupId: 'order-writers' });

async function startWorker() {
  await consumer.connect();
  await consumer.subscribe({ topic: 'orders', fromBeginning: false });

  await consumer.run({
    eachMessage: async ({ message, heartbeat }) => {
      const event = JSON.parse(message.value.toString());

      try {
        await processOrder(event.payload);
        // Offset auto-committed on success
      } catch (err) {
        // Send to dead-letter queue for manual review
        await producer.send({
          topic: 'orders-dlq',
          messages: [{ value: message.value }],
        });
      }

      await heartbeat();
    },
  });
}

Key Design Decisions

Return 202, not 201. The write hasn’t happened yet. The client gets a fast acknowledgment, and the actual write happens asynchronously. If the client needs confirmation, use a webhook or polling endpoint.

Partition by entity ID. Kafka guarantees ordering within a partition. By keying on userId, all writes for the same user are processed sequentially — no race conditions.

Dead-letter queues. Failed writes go to a DLQ instead of being retried infinitely. An operations team reviews and replays them.

When NOT to Use Async Writes

  • Financial transactions requiring synchronous confirmation
  • Operations where the user must see the result immediately
  • Writes that other writes depend on in the same request

For these cases, write synchronously but use the other patterns below to handle throughput.


Pattern 2: Database Sharding — Distribute the Load

A single database instance has a write ceiling — typically 5,000-50,000 writes/second depending on hardware and write complexity. Sharding splits your data across multiple database instances so each handles a fraction of the load.

Sharding Strategies

Hash-Based Sharding in Practice

import { createHash } from 'crypto';
import { Pool } from 'pg';

class ShardedDatabase {
  private shards: Pool[];

  constructor(shardConfigs: { host: string; port: number }[]) {
    this.shards = shardConfigs.map(
      (config) =>
        new Pool({
          host: config.host,
          port: config.port,
          database: 'app',
          max: 20,
        })
    );
  }

  // Consistent hashing — deterministic shard selection
  private getShardIndex(key: string): number {
    const hash = createHash('md5').update(key).digest('hex');
    const numeric = parseInt(hash.substring(0, 8), 16);
    return numeric % this.shards.length;
  }

  getShard(key: string): Pool {
    return this.shards[this.getShardIndex(key)];
  }

  async write(userId: string, sql: string, params: any[]): Promise<any> {
    const shard = this.getShard(userId);
    return shard.query(sql, params);
  }

  // Scatter-gather for cross-shard queries (expensive!)
  async queryAll(sql: string, params: any[]): Promise<any[]> {
    const results = await Promise.all(
      this.shards.map((shard) => shard.query(sql, params))
    );
    return results.flatMap((r) => r.rows);
  }
}

// Usage
const db = new ShardedDatabase([
  { host: 'shard-0.db.internal', port: 5432 },
  { host: 'shard-1.db.internal', port: 5432 },
  { host: 'shard-2.db.internal', port: 5432 },
]);

// All writes for user "abc123" go to the same shard
await db.write('abc123', 'INSERT INTO orders ...', [orderData]);

The Resharding Problem

The biggest pain with hash-based sharding: adding a shard changes hash % N for every key. Consistent hashing minimizes the damage:

import ConsistentHash from 'consistent-hash';

class ConsistentShardRouter {
  private ring: ConsistentHash;
  private shards: Map<string, Pool> = new Map();

  constructor(shardConfigs: { name: string; host: string }[]) {
    this.ring = new ConsistentHash();

    for (const config of shardConfigs) {
      // Add each shard multiple times (virtual nodes) for better distribution
      this.ring.add(config.name, 150);
      this.shards.set(
        config.name,
        new Pool({ host: config.host, database: 'app', max: 20 })
      );
    }
  }

  getShard(key: string): Pool {
    const shardName = this.ring.get(key);
    return this.shards.get(shardName)!;
  }

  // Adding a new shard only moves ~1/N of keys
  addShard(name: string, host: string): void {
    this.ring.add(name, 150);
    this.shards.set(
      name,
      new Pool({ host, database: 'app', max: 20 })
    );
  }
}

With consistent hashing, adding a 4th shard to a 3-shard cluster only moves ~25% of keys instead of rehashing everything.

Choosing a Shard Key

This is the most important decision in sharding. Get it wrong and you’ll have hotspots or impossible cross-shard queries.

Shard Key Good For Bad For
user_id Social apps, per-user data Analytics across all users
tenant_id Multi-tenant SaaS Tenants with wildly different sizes
order_id E-commerce writes Queries by customer
timestamp Time-series / IoT Everything writes to latest shard
geo_region Regional data laws Users who travel

Rule of thumb: shard by the key that appears in your WHERE clause most often.


Pattern 3: Write-Ahead Log (WAL) — Durability Before Performance

Every serious database uses a write-ahead log. The idea: before modifying actual data, write the intended change to a sequential, append-only log. If the system crashes mid-write, replay the log to recover.

Write Request

Append to WAL

Acknowledge Client

Apply to Data Pages

Checkpoint

Truncate Old WAL

The crucial insight: sequential writes are 100x faster than random writes on both SSD and HDD. The WAL converts random updates into sequential appends.

Building a Simple WAL

import { createWriteStream, readFileSync, appendFileSync } from 'fs';

interface WALEntry {
  lsn: number;       // Log Sequence Number
  timestamp: number;
  operation: 'INSERT' | 'UPDATE' | 'DELETE';
  table: string;
  data: Record<string, any>;
}

class WriteAheadLog {
  private lsn = 0;
  private logPath: string;
  private stream: ReturnType<typeof createWriteStream>;

  constructor(logPath: string) {
    this.logPath = logPath;
    this.stream = createWriteStream(logPath, { flags: 'a' });
    this.recoverLSN();
  }

  private recoverLSN(): void {
    try {
      const content = readFileSync(this.logPath, 'utf-8');
      const lines = content.trim().split('\n').filter(Boolean);
      if (lines.length > 0) {
        const lastEntry = JSON.parse(lines[lines.length - 1]);
        this.lsn = lastEntry.lsn;
      }
    } catch {
      this.lsn = 0;
    }
  }

  // Append to WAL — must be durable before returning
  async append(
    operation: WALEntry['operation'],
    table: string,
    data: Record<string, any>
  ): Promise<number> {
    this.lsn++;
    const entry: WALEntry = {
      lsn: this.lsn,
      timestamp: Date.now(),
      operation,
      table,
      data,
    };

    return new Promise((resolve, reject) => {
      const line = JSON.stringify(entry) + '\n';
      this.stream.write(line, (err) => {
        if (err) reject(err);
        else resolve(this.lsn);
      });
    });
  }

  // Replay all entries after a given LSN
  async replay(afterLSN: number): Promise<WALEntry[]> {
    const content = readFileSync(this.logPath, 'utf-8');
    return content
      .trim()
      .split('\n')
      .filter(Boolean)
      .map((line) => JSON.parse(line) as WALEntry)
      .filter((entry) => entry.lsn > afterLSN);
  }
}

// Usage
const wal = new WriteAheadLog('/data/wal/orders.log');

// 1. Write to WAL first (fast — sequential append)
const lsn = await wal.append('INSERT', 'orders', {
  id: 'ord_123',
  amount: 4999,
  userId: 'usr_456',
});

// 2. Then apply to actual database (can be async)
await db.query('INSERT INTO orders ...', [orderData]);

// 3. On crash recovery
const missed = await wal.replay(lastCheckpointLSN);
for (const entry of missed) {
  await applyToDatabase(entry);
}

WAL in PostgreSQL

PostgreSQL’s WAL is why it can guarantee ACID even after a power failure. You can tune it for write throughput:

-- postgresql.conf tuning for write-heavy workloads

-- Larger WAL buffers = fewer disk flushes
wal_buffers = 64MB

-- Group commit: wait up to 10ms to batch WAL flushes
-- Trades tiny latency increase for massive throughput gain
commit_delay = 10000
commit_siblings = 5

-- Async commit: acknowledge before WAL flush
-- Risk: lose last ~200ms of transactions on crash
synchronous_commit = off

-- Checkpoint tuning
checkpoint_completion_target = 0.9
max_wal_size = 4GB

Setting synchronous_commit = off is the single biggest PostgreSQL write throughput optimization. You trade a tiny crash-recovery window (~200ms of commits) for 2-5x write throughput.


Pattern 4: Event Sourcing — Writes as Immutable Facts

Instead of storing the current state, store every change as an immutable event. The current state is derived by replaying events. This is a fundamentally write-optimized pattern because writes are append-only — no updates, no deletes, no locking.

Event Store (append-only)

AccountCreated

MoneyDeposited $100

MoneyWithdrawn $30

MoneyDeposited $50

MoneyWithdrawn $10

Current State: $110

Implementation

interface DomainEvent {
  eventId: string;
  aggregateId: string;
  type: string;
  data: Record<string, any>;
  timestamp: number;
  version: number;
}

class EventStore {
  private db: Pool;

  constructor(db: Pool) {
    this.db = db;
  }

  // Append events — optimistic concurrency via version check
  async append(
    aggregateId: string,
    events: Omit<DomainEvent, 'eventId' | 'timestamp'>[],
    expectedVersion: number
  ): Promise<void> {
    const client = await this.db.connect();

    try {
      await client.query('BEGIN');

      // Optimistic concurrency check
      const result = await client.query(
        `SELECT MAX(version) as current_version
         FROM events WHERE aggregate_id = $1`,
        [aggregateId]
      );
      const currentVersion = result.rows[0]?.current_version ?? 0;

      if (currentVersion !== expectedVersion) {
        throw new Error(
          `Concurrency conflict: expected version ${expectedVersion}, ` +
          `got ${currentVersion}`
        );
      }

      // Append all events
      for (const event of events) {
        await client.query(
          `INSERT INTO events
             (event_id, aggregate_id, type, data, timestamp, version)
           VALUES ($1, $2, $3, $4, $5, $6)`,
          [
            crypto.randomUUID(),
            aggregateId,
            event.type,
            JSON.stringify(event.data),
            Date.now(),
            event.version,
          ]
        );
      }

      await client.query('COMMIT');
    } catch (err) {
      await client.query('ROLLBACK');
      throw err;
    } finally {
      client.release();
    }
  }

  // Load all events for an aggregate
  async loadEvents(aggregateId: string): Promise<DomainEvent[]> {
    const result = await this.db.query(
      `SELECT * FROM events
       WHERE aggregate_id = $1
       ORDER BY version ASC`,
      [aggregateId]
    );
    return result.rows;
  }
}

// --- Aggregate: reconstruct state from events ---
class BankAccount {
  id: string;
  balance: number = 0;
  version: number = 0;

  private pendingEvents: DomainEvent[] = [];

  static fromEvents(events: DomainEvent[]): BankAccount {
    const account = new BankAccount();
    for (const event of events) {
      account.apply(event);
    }
    return account;
  }

  deposit(amount: number): void {
    if (amount <= 0) throw new Error('Amount must be positive');
    this.addEvent('MONEY_DEPOSITED', { amount });
  }

  withdraw(amount: number): void {
    if (amount > this.balance) throw new Error('Insufficient funds');
    this.addEvent('MONEY_WITHDRAWN', { amount });
  }

  private addEvent(type: string, data: any): void {
    const event = {
      aggregateId: this.id,
      type,
      data,
      version: this.version + 1,
    } as DomainEvent;

    this.apply(event);
    this.pendingEvents.push(event);
  }

  private apply(event: DomainEvent): void {
    switch (event.type) {
      case 'ACCOUNT_CREATED':
        this.id = event.aggregateId;
        break;
      case 'MONEY_DEPOSITED':
        this.balance += event.data.amount;
        break;
      case 'MONEY_WITHDRAWN':
        this.balance -= event.data.amount;
        break;
    }
    this.version = event.version;
  }
}

Snapshots: Avoiding the Replay Penalty

Replaying 10 million events to get current state is slow. Take periodic snapshots:

class SnapshotStore {
  async saveSnapshot(
    aggregateId: string,
    state: any,
    version: number
  ): Promise<void> {
    await this.db.query(
      `INSERT INTO snapshots (aggregate_id, state, version)
       VALUES ($1, $2, $3)
       ON CONFLICT (aggregate_id)
       DO UPDATE SET state = $2, version = $3`,
      [aggregateId, JSON.stringify(state), version]
    );
  }

  async loadAggregate(aggregateId: string): Promise<BankAccount> {
    // 1. Load latest snapshot
    const snap = await this.db.query(
      `SELECT * FROM snapshots WHERE aggregate_id = $1`,
      [aggregateId]
    );

    let account: BankAccount;
    let fromVersion = 0;

    if (snap.rows[0]) {
      account = Object.assign(new BankAccount(), JSON.parse(snap.rows[0].state));
      fromVersion = snap.rows[0].version;
    } else {
      account = new BankAccount();
    }

    // 2. Replay only events AFTER snapshot
    const events = await this.eventStore.db.query(
      `SELECT * FROM events
       WHERE aggregate_id = $1 AND version > $2
       ORDER BY version ASC`,
      [aggregateId, fromVersion]
    );

    for (const event of events.rows) {
      account.apply(event);
    }

    return account;
  }
}

Rule of thumb: snapshot every 100-1000 events depending on event complexity.


Pattern 5: Storage Engine Choice — LSM-Tree vs B-Tree

The storage engine under your database determines raw write performance. This is an infrastructure choice, not an application pattern, but it’s the most impactful decision for write throughput.

LSM-Tree vs B-Tree

When to Pick Which

LSM-Tree (Cassandra, ScyllaDB, RocksDB, DynamoDB):

  • Writes are sequential appends to a memtable, then flushed as sorted files
  • Write throughput: 100,000+ writes/sec per node easily
  • Reads require checking multiple levels (bloom filters help)
  • Best for: write-heavy workloads, time-series, IoT, logging

B-Tree (PostgreSQL, MySQL, MongoDB):

  • Writes require finding and updating the right page in-place
  • Write throughput: 5,000-50,000 writes/sec depending on indexes
  • Reads are always O(log N) — predictable
  • Best for: read-heavy or balanced workloads, complex queries, transactions
// Cassandra write — optimized for throughput
const cassandra = new Client({
  contactPoints: ['node1', 'node2', 'node3'],
  localDataCenter: 'dc1',
  keyspace: 'analytics',
});

// This write is absurdly fast — just appends to memtable
await cassandra.execute(
  `INSERT INTO events (partition_key, event_time, event_type, data)
   VALUES (?, ?, ?, ?)`,
  [partitionKey, new Date(), 'page_view', JSON.stringify(eventData)],
  { prepare: true, consistency: types.consistencies.localOne }
);

With consistency: localOne, Cassandra acknowledges after writing to a single replica’s memtable — sub-millisecond latency. The tradeoff: you might lose that write if that one node dies before replication.


Pattern 6: Write Batching and Buffering

Small writes are expensive. A database round-trip costs 1-5ms regardless of payload size. Batch 1,000 small writes into one bulk operation and you’ve saved 999 round trips.

Application-Level Batching

class WriteBatcher<T> {
  private buffer: T[] = [];
  private timer: NodeJS.Timeout | null = null;
  private readonly maxSize: number;
  private readonly maxWaitMs: number;
  private readonly flushFn: (items: T[]) => Promise<void>;

  constructor(options: {
    maxSize: number;
    maxWaitMs: number;
    flushFn: (items: T[]) => Promise<void>;
  }) {
    this.maxSize = options.maxSize;
    this.maxWaitMs = options.maxWaitMs;
    this.flushFn = options.flushFn;
  }

  async add(item: T): Promise<void> {
    this.buffer.push(item);

    if (this.buffer.length >= this.maxSize) {
      await this.flush();
    } else if (!this.timer) {
      this.timer = setTimeout(() => this.flush(), this.maxWaitMs);
    }
  }

  async flush(): Promise<void> {
    if (this.buffer.length === 0) return;

    if (this.timer) {
      clearTimeout(this.timer);
      this.timer = null;
    }

    const batch = this.buffer.splice(0);

    try {
      await this.flushFn(batch);
    } catch (err) {
      // Put items back for retry
      this.buffer.unshift(...batch);
      throw err;
    }
  }
}

// Usage: batch up to 500 analytics events or flush every 2 seconds
const batcher = new WriteBatcher<AnalyticsEvent>({
  maxSize: 500,
  maxWaitMs: 2000,
  flushFn: async (events) => {
    // Single bulk INSERT instead of 500 individual ones
    const values = events.map(
      (e) => `('${e.userId}', '${e.event}', '${e.timestamp}')`
    );
    await db.query(
      `INSERT INTO analytics (user_id, event, timestamp)
       VALUES ${values.join(',')}`
    );
  },
});

// Each call is near-instant — actual write happens in batches
await batcher.add({ userId: 'u1', event: 'page_view', timestamp: Date.now() });

PostgreSQL COPY for Maximum Throughput

For bulk loading, COPY is 5-10x faster than INSERT:

import { pipeline } from 'stream/promises';
import { from as copyFrom } from 'pg-copy-streams';

async function bulkLoad(records: any[]): Promise<void> {
  const client = await pool.connect();

  try {
    const stream = client.query(
      copyFrom(`COPY events (user_id, event_type, payload, created_at)
                FROM STDIN WITH (FORMAT csv)`)
    );

    for (const record of records) {
      const line = [
        record.userId,
        record.eventType,
        JSON.stringify(record.payload),
        new Date().toISOString(),
      ].join(',') + '\n';

      stream.write(line);
    }

    stream.end();
    await pipeline(stream);
  } finally {
    client.release();
  }
}

Putting It All Together: The Decision Framework

How many writes/second do you need?
│
├── < 5,000/sec
│   └── Single PostgreSQL with tuned WAL settings
│       (synchronous_commit=off, wal_buffers=64MB)
│
├── 5,000 - 50,000/sec
│   ├── Can writes be async? → Queue + Workers
│   ├── Are writes small? → Batching
│   └── Still hitting limits? → Vertical scaling first
│
├── 50,000 - 500,000/sec
│   ├── Shard the database (hash-based)
│   ├── Queue + Sharded workers
│   └── Consider LSM-tree database (Cassandra, ScyllaDB)
│
└── > 500,000/sec
    ├── LSM-tree database + sharding
    ├── Event sourcing for audit-heavy domains
    └── Multi-region with conflict resolution

Cost vs Complexity

Pattern Complexity Throughput Gain When to Add
WAL tuning Low 2-5x Day 1
Write batching Low 5-10x for small writes When round-trip latency dominates
Async queues Medium Absorbs 10-100x spikes When writes are bursty
Sharding High Linear with shard count When single node maxed out
LSM-tree DB Medium 10-50x vs B-tree When writes dominate reads
Event sourcing High Append-only = fast writes When you need full audit trail

The Golden Rule

Optimize the write path in this order:

  1. Tune what you have — WAL settings, connection pooling, index cleanup
  2. Batch — stop doing one-at-a-time inserts
  3. Go async — queue writes and process at the database’s pace
  4. Shard — only when a single node is truly maxed out
  5. Change engines — switch to an LSM-tree database for extreme write loads

Most systems never need to go past step 3. The teams that jump straight to sharding usually regret it — you’re trading simple queries for distributed systems complexity. Make sure you’ve exhausted the simple options first.


Further Reading

Related Posts

Deep Dive on Apache Kafka: A System Design Interview Perspective

Deep Dive on Apache Kafka: A System Design Interview Perspective

“Kafka is not a message queue. It’s a distributed commit log that happens to be…

Deep Dive on Consistent Hashing

Deep Dive on Consistent Hashing

Every distributed system eventually faces the same problem: you have N servers…

System Design Patterns for Scaling Reads

System Design Patterns for Scaling Reads

Most production systems are read-heavy. A typical web application sees 90-9…

Deep Dive on Redis: Architecture, Data Structures, and Production Usage

Deep Dive on Redis: Architecture, Data Structures, and Production Usage

“Redis is not just a cache. It’s a data structure server that happens to be…

Deep Dive on Elasticsearch: A System Design Interview Perspective

Deep Dive on Elasticsearch: A System Design Interview Perspective

“If you’re searching, filtering, or aggregating over large volumes of semi…

Deep Dive on Caching: From Browser to Database

Deep Dive on Caching: From Browser to Database

“There are only two hard things in Computer Science: cache invalidation and…

Latest Posts

Claude Code Skills — Build a Better Engineering Workflow with AI-Powered Code Reviews, Security Scans, and More

Claude Code Skills — Build a Better Engineering Workflow with AI-Powered Code Reviews, Security Scans, and More

Most developers use Claude Code like a search engine — ask a question, get an…

Building an AI Voicebot for Visitor Check-In — A Practical Guide to Handling the Messy Parts

Building an AI Voicebot for Visitor Check-In — A Practical Guide to Handling the Messy Parts

Every office lobby has the same problem: a visitor walks in, nobody’s at the…

Server Security Best Practices — Complete Hardening Guide for Production Systems

Server Security Best Practices — Complete Hardening Guide for Production Systems

Every breach post-mortem tells the same story: an unpatched service, a…

Staff Engineer Study Plan for MAANG Interviews — The Complete 12-Week Roadmap

Staff Engineer Study Plan for MAANG Interviews — The Complete 12-Week Roadmap

If you’re a Senior Engineer (L5) preparing for Staff (L6+) roles at MAANG…

XSS and CSRF Explained — The Complete Guide with Real Attack Examples and Defenses

XSS and CSRF Explained — The Complete Guide with Real Attack Examples and Defenses

XSS and CSRF have been in the OWASP Top 10 for over a decade. They’re among the…

OWASP Top 10 (2021) — Every Vulnerability Explained with Code

OWASP Top 10 (2021) — Every Vulnerability Explained with Code

The OWASP Top 10 is the industry standard for web application security risks. If…