Skip to main content
    Technical Prep

    System Design Cheat Sheet: The Tables I'd Print Before an Interview

    Updated May 2026
    7 min read

    This is the reference card, not the tutorial. Lookup tables for latency, datastore selection, and scaling patterns. Skim the morning of, not the night before. If you need the full process walkthrough, the minute-by-minute guide covers that.

    Five questions to ask in minute one

    1. What's in scope? What's explicitly out?
    2. What's the scale? (DAU, peak QPS, average payload size, read/write ratio.)
    3. What's the latency target? (p50, p99, separate per operation if reads and writes differ.)
    4. What consistency model? (Strong, eventual, read-your- writes, monotonic.)
    5. What region setup? Single-region or multi-region?

    Latency numbers worth memorising

    The Jeff Dean "numbers every programmer should know" list, updated for current hardware:

    OperationLatencyWhat you can do in that time
    L1 cache reference0.5 ns~
    L2 cache reference7 ns14x L1
    Main memory reference100 ns200x L1
    Send 1KB over 1 Gbps network10 μs100x main memory
    SSD random read150 μs15x network 1KB
    Read 1MB sequentially from SSD1 msn/a
    Disk seek (spinning, rare now)10 ms10x SSD sequential
    Same-region datacentre round-trip500 μsn/a
    Cross-region (US-East to US-West)70 ms140x same-region
    Cross-continent (US to EU)150 msn/a

    Two implications that come up in interviews: (1) a request that touches three services in series, each calling its own DB, can't beat ~5ms even in the happy path. (2) Anything cross-region in the hot path is a design smell. Cache aggressively or shard by region.

    Pick the right datastore

    WorkloadPickWhy
    Transactional, joins, ACIDPostgres / MySQLMature, predictable, you know it
    High write throughput, append-onlyCassandra / ScyllaLinear write scaling, tunable consistency
    Key-value, sub-ms readsRedis / DynamoDBMemory or SSD-backed key-value
    Time-series / metricsTimescaleDB / InfluxDBTime-partitioned compression
    Full-text searchElasticsearch / OpenSearchInverted index, relevance scoring
    Graph traversalNeo4j / NeptuneNative graph storage, Cypher
    Vector similarity (LLM apps)Pinecone / pgvector / QdrantANN index for embeddings
    Object / file blobsS3 / GCSCheap, durable, range reads

    Pick the workload first. The datastore follows. The reverse order is how candidates end up justifying MongoDB for a join-heavy workload.

    Scaling patterns by problem

    ProblemPatternWatch out for
    Read-heavyRead replicas + cacheCache stampede on cold start
    Write-heavySharding by user_id / hashHot shards on celebrity accounts
    Bursty trafficQueue + async workersConsumer lag, dead-letter handling
    Cross-region readsCDN + edge cacheCache invalidation lag
    Fanout (timeline, notifications)Fan-out-on-write + pull for whalesMaterialise lag, follower-count spikes
    Idempotent writesClient-generated idempotency keyKey cardinality, key TTL
    Long-running jobsJob queue with status pollingStatus table contention
    Rate limitingToken bucket in RedisDistributed clock skew

    CAP one-liner the panel actually wants

    "Under network partition, you pick between consistency and availability. In practice, you pick eventual consistency for most user-facing reads and strong consistency only for the operations where stale data corrupts something: payment confirmation, inventory count, account balance. For everything else, eventual is cheaper, faster, and fine."

    That's the answer. Don't quote the CAP paper. Don't draw the triangle.

    Failure-mode checklist for the last 5 minutes

    Walk the panel through each:

    • What happens if the cache layer goes down? (Latency spike, degrade reads, don't return 5xx.)
    • What happens if a downstream service is slow? (Circuit breaker, timeout, fallback path.)
    • What happens if traffic spikes 10×? (Auto-scale, load shed at the gateway, queue absorb.)
    • What happens if a single shard becomes hot? (Resharding, request hedging, fan-out exception.)
    • What happens during a deploy? (Canary, rolling, traffic split, automated rollback on error-rate alarm.)

    References

    The system-design-primer is the standard. The Google SRE Book is free online and has the load-shedding and SLO chapters every senior round expects you to have read.

    Run a timed mock with feedback

    LastRound AI runs mock system-design rounds with a timer per phase and live prompts when you're missing the failure-mode walk-through panels want.

    Venkat

    Written by

    Venkat

    Engineering, LastRound AI

    Engineer at LastRound AI. Writes about full-stack engineering interviews, certifications, and how technical hiring is shifting in the AI era.

    View Venkat's LinkedIn profile →

    Further reading

    Share this post

    Related articles