Designing LeetCode at Scale
LeetCode looks simple. Users browse problems, write code, and get feedback.
The reality is different. Behind the screen is a distributed system. It must handle millions of users and thousands of code submissions every second. It must survive massive traffic spikes during contests.
Here is how you design a scalable coding platform.
Core Features
- Browse and search problems by difficulty or tags.
- Write code in multiple languages.
- Submit solutions and receive results in seconds.
- Join weekly contests with live leaderboards.
The Scale Requirements
- 10 million registered users.
- 2 million daily active users.
- 100k concurrent users during contests.
- 20k code submissions per minute.
System Architecture
You cannot run code execution synchronously. If a user submits code, the API should not wait for the result. This causes timeouts.
Instead, use an asynchronous approach:
- User submits code via API.
- The system places the submission into a Message Queue (like Kafka or RabbitMQ).
- Code Execution Workers pull tasks from the queue.
- Workers run the code in a sandbox.
- The Result Service updates the status.
The Sandbox is the most critical part. Running random code on your servers is dangerous. You must use isolated environments.
Docker containers are the best choice because they are:
- Lightweight.
- Fast to start.
- Easy to scale horizontally.
- Secure when you limit CPU, memory, and network access.
Data Storage Strategy
Use different databases for different needs:
- Relational Databases (PostgreSQL/MySQL): Use these for users, contests, and rankings. They ensure data integrity.
- NoSQL Databases (DynamoDB/MongoDB): Use these for problems, test cases, and submission logs. They scale better as data grows.
- Caching (Redis): Use this for problem details and contest info to reduce database load.
Real-Time Leaderboards
During contests, rankings change every second. Do not recalculate the entire leaderboard every time someone submits code.
Instead:
- Update scores asynchronously.
- Use Redis Sorted Sets to manage rankings.
- Push updates to users via WebSockets.
Reliability and Trade-offs
In this design, availability is the priority. A slight delay in a leaderboard update is fine. A system crash during a contest is not. We choose eventual consistency to keep the platform running.
How would you design the code execution engine? Would you use Docker or microVMs?
Source: https://dev.to/himanshudevgupta/system-design-designing-leetcode-at-scale-45op
Optional learning community: https://t.me/GyaanSetuAi
