Last month, our database infrastructure handled over 1 million requests per second during a major product launch. Here's what we learned along the way.
The Problem
Our original architecture was simple: one primary database, a few read replicas. It worked great until it didn't.
At around 100k requests/second, we started seeing increased latency. At 200k, timeouts. Something had to change.
Solution 1: Read Replicas
We added more read replicas. This helped—briefly. The primary was still a bottleneck for writes, and replication lag was causing consistency issues.
Solution 2: Sharding
We implemented horizontal sharding based on tenant ID. Each shard handles a subset of tenants, distributing the load.
Key decisions:
- Consistent hashing for shard assignment
- 64 logical shards, 8 physical servers
- Each physical server hosts 8 shards
Solution 3: Connection Pooling
Connection pooling was a game-changer. We went from thousands of direct connections to a pool of 100 persistent connections per shard.
What We'd Do Differently
- Start with connection pooling from day one
- Build observability early—we were flying blind for too long
- Plan for sharding even if you don't need it yet
Scaling databases is hard. But with the right architecture, it's entirely tractable. The key is to plan ahead and iterate based on real data.