System Design Cheat Sheet: The Tables I'd Print Before an Interview
This is the reference card, not the tutorial. Lookup tables for latency, datastore selection, and scaling patterns. Skim the morning of, not the night before. If you need the full process walkthrough, the minute-by-minute guide covers that.
Five questions to ask in minute one
- What's in scope? What's explicitly out?
- What's the scale? (DAU, peak QPS, average payload size, read/write ratio.)
- What's the latency target? (p50, p99, separate per operation if reads and writes differ.)
- What consistency model? (Strong, eventual, read-your- writes, monotonic.)
- What region setup? Single-region or multi-region?
Latency numbers worth memorising
The Jeff Dean "numbers every programmer should know" list, updated for current hardware:
| Operation | Latency | What you can do in that time |
|---|---|---|
| L1 cache reference | 0.5 ns | ~ |
| L2 cache reference | 7 ns | 14x L1 |
| Main memory reference | 100 ns | 200x L1 |
| Send 1KB over 1 Gbps network | 10 μs | 100x main memory |
| SSD random read | 150 μs | 15x network 1KB |
| Read 1MB sequentially from SSD | 1 ms | n/a |
| Disk seek (spinning, rare now) | 10 ms | 10x SSD sequential |
| Same-region datacentre round-trip | 500 μs | n/a |
| Cross-region (US-East to US-West) | 70 ms | 140x same-region |
| Cross-continent (US to EU) | 150 ms | n/a |
Two implications that come up in interviews: (1) a request that touches three services in series, each calling its own DB, can't beat ~5ms even in the happy path. (2) Anything cross-region in the hot path is a design smell. Cache aggressively or shard by region.
Pick the right datastore
| Workload | Pick | Why |
|---|---|---|
| Transactional, joins, ACID | Postgres / MySQL | Mature, predictable, you know it |
| High write throughput, append-only | Cassandra / Scylla | Linear write scaling, tunable consistency |
| Key-value, sub-ms reads | Redis / DynamoDB | Memory or SSD-backed key-value |
| Time-series / metrics | TimescaleDB / InfluxDB | Time-partitioned compression |
| Full-text search | Elasticsearch / OpenSearch | Inverted index, relevance scoring |
| Graph traversal | Neo4j / Neptune | Native graph storage, Cypher |
| Vector similarity (LLM apps) | Pinecone / pgvector / Qdrant | ANN index for embeddings |
| Object / file blobs | S3 / GCS | Cheap, durable, range reads |
Pick the workload first. The datastore follows. The reverse order is how candidates end up justifying MongoDB for a join-heavy workload.
Scaling patterns by problem
| Problem | Pattern | Watch out for |
|---|---|---|
| Read-heavy | Read replicas + cache | Cache stampede on cold start |
| Write-heavy | Sharding by user_id / hash | Hot shards on celebrity accounts |
| Bursty traffic | Queue + async workers | Consumer lag, dead-letter handling |
| Cross-region reads | CDN + edge cache | Cache invalidation lag |
| Fanout (timeline, notifications) | Fan-out-on-write + pull for whales | Materialise lag, follower-count spikes |
| Idempotent writes | Client-generated idempotency key | Key cardinality, key TTL |
| Long-running jobs | Job queue with status polling | Status table contention |
| Rate limiting | Token bucket in Redis | Distributed clock skew |
CAP one-liner the panel actually wants
"Under network partition, you pick between consistency and availability. In practice, you pick eventual consistency for most user-facing reads and strong consistency only for the operations where stale data corrupts something: payment confirmation, inventory count, account balance. For everything else, eventual is cheaper, faster, and fine."
That's the answer. Don't quote the CAP paper. Don't draw the triangle.
Failure-mode checklist for the last 5 minutes
Walk the panel through each:
- What happens if the cache layer goes down? (Latency spike, degrade reads, don't return 5xx.)
- What happens if a downstream service is slow? (Circuit breaker, timeout, fallback path.)
- What happens if traffic spikes 10×? (Auto-scale, load shed at the gateway, queue absorb.)
- What happens if a single shard becomes hot? (Resharding, request hedging, fan-out exception.)
- What happens during a deploy? (Canary, rolling, traffic split, automated rollback on error-rate alarm.)
References
The system-design-primer is the standard. The Google SRE Book is free online and has the load-shedding and SLO chapters every senior round expects you to have read.
Run a timed mock with feedback
LastRound AI runs mock system-design rounds with a timer per phase and live prompts when you're missing the failure-mode walk-through panels want.
Written by
Venkat
Engineering, LastRound AI
Engineer at LastRound AI. Writes about full-stack engineering interviews, certifications, and how technical hiring is shifting in the AI era.
Further reading
- NeetCode 150 — Curated DSA practice with video explanations
- System Design Primer — 270k★ open-source system design study guide
- Designing Data-Intensive Applications — Industry-standard distributed-systems text
Share this post
Related articles
Technical prep
Data Structures for Coding Interviews 2026: The 8 You Actually Need | LastRound AI
Technical prep
LeetCode Patterns Cheat Sheet 2026: The 15 Patterns That Cover 90% of Problems | LastRound AI
Technical prep
NeetCode 150 vs Blind 75: Which Should You Study? 2026 Guide | LastRound AI
Technical prep
AI Coding Interview Assistant: What Actually Works in Technical Interviews (2025)
