System Design Cheat Sheet 2026: Patterns, Numbers, Templates

This is the reference card, not the tutorial. Lookup tables for latency, datastore selection, and scaling patterns. Skim the morning of, not the night before. If you need the full process walkthrough, the minute-by-minute guide covers that.

Five questions to ask in minute one

What’s in scope? What’s explicitly out?
What’s the scale? (DAU, peak QPS, average payload size, read/write ratio.)
What’s the latency target? (p50, p99, separate per operation if reads and writes differ.)
What consistency model? (Strong, eventual, read-your- writes, monotonic.)
What region setup? Single-region or multi-region?

Latency numbers worth memorising

The Jeff Dean “numbers every programmer should know” list, updated for current hardware:

Operation	Latency	What you can do in that time
L1 cache reference	0.5 ns	~
L2 cache reference	7 ns	14x L1
Main memory reference	100 ns	200x L1
Send 1KB over 1 Gbps network	10 μs	100x main memory
SSD random read	150 μs	15x network 1KB
Read 1MB sequentially from SSD	1 ms	n/a
Disk seek (spinning, rare now)	10 ms	10x SSD sequential
Same-region datacentre round-trip	500 μs	n/a
Cross-region (US-East to US-West)	70 ms	140x same-region
Cross-continent (US to EU)	150 ms	n/a

Two implications that come up in interviews: (1) a request that touches three services in series, each calling its own DB, can’t beat ~5ms even in the happy path. (2) Anything cross-region in the hot path is a design smell. Cache aggressively or shard by region.

Pick the right datastore

Workload	Pick	Why
Transactional, joins, ACID	Postgres / MySQL	Mature, predictable, you know it
High write throughput, append-only	Cassandra / Scylla	Linear write scaling, tunable consistency
Key-value, sub-ms reads	Redis / DynamoDB	Memory or SSD-backed key-value
Time-series / metrics	TimescaleDB / InfluxDB	Time-partitioned compression
Full-text search	Elasticsearch / OpenSearch	Inverted index, relevance scoring
Graph traversal	Neo4j / Neptune	Native graph storage, Cypher
Vector similarity (LLM apps)	Pinecone / pgvector / Qdrant	ANN index for embeddings
Object / file blobs	S3 / GCS	Cheap, durable, range reads

Pick the workload first. The datastore follows. The reverse order is how candidates end up justifying MongoDB for a join-heavy workload.

Scaling patterns by problem

Problem	Pattern	Watch out for
Read-heavy	Read replicas + cache	Cache stampede on cold start
Write-heavy	Sharding by user_id / hash	Hot shards on celebrity accounts
Bursty traffic	Queue + async workers	Consumer lag, dead-letter handling
Cross-region reads	CDN + edge cache	Cache invalidation lag
Fanout (timeline, notifications)	Fan-out-on-write + pull for whales	Materialise lag, follower-count spikes
Idempotent writes	Client-generated idempotency key	Key cardinality, key TTL
Long-running jobs	Job queue with status polling	Status table contention
Rate limiting	Token bucket in Redis	Distributed clock skew

CAP one-liner the panel actually wants

“Under network partition, you pick between consistency and availability. In practice, you pick eventual consistency for most user-facing reads and strong consistency only for the operations where stale data corrupts something: payment confirmation, inventory count, account balance. For everything else, eventual is cheaper, faster, and fine.”

That’s the answer. Don’t quote the CAP paper. Don’t draw the triangle.

Failure-mode checklist for the last 5 minutes

Walk the panel through each:

What happens if the cache layer goes down? (Latency spike, degrade reads, don’t return 5xx.)
What happens if a downstream service is slow? (Circuit breaker, timeout, fallback path.)
What happens if traffic spikes 10×? (Auto-scale, load shed at the gateway, queue absorb.)
What happens if a single shard becomes hot? (Resharding, request hedging, fan-out exception.)
What happens during a deploy? (Canary, rolling, traffic split, automated rollback on error-rate alarm.)

References

The system-design-primer is the standard. The Google SRE Book is free online and has the load-shedding and SLO chapters every senior round expects you to have read.

Run a timed mock with feedback

LastRound AI runs mock system-design rounds with a timer per phase and live prompts when you’re missing the failure-mode walk-through panels want.

System Design Cheat Sheet: The Tables I’d Print Before an Interview

Five questions to ask in minute one

Latency numbers worth memorising

Pick the right datastore

Scaling patterns by problem

CAP one-liner the panel actually wants

Failure-mode checklist for the last 5 minutes

References

Run a timed mock with feedback

Leave a Reply Cancel reply

System Design Cheat Sheet: The Tables I’d Print Before an Interview

Five questions to ask in minute one

Latency numbers worth memorising

Pick the right datastore

Scaling patterns by problem

CAP one-liner the panel actually wants

Failure-mode checklist for the last 5 minutes

References

Run a timed mock with feedback

Keep reading

Leave a Reply Cancel reply