Popular System Design Questions
These are the most common system design questions asked at top tech companies. Understanding these patterns will prepare you for variations and similar problems.Communication & Messaging Systems
Design a Chat Application (WhatsApp/Messenger/Discord)
A classic question that tests your understanding of real-time communication.Requirements & Scale
Requirements & Scale
Functional Requirements:
- 1-on-1 messaging
- Group chats
- Online presence indicators
- Message history
- Read receipts
- Media sharing
- Millions of concurrent users
- Billions of messages per day
- Real-time delivery (<1 second)
- High availability required
Key Design Components
Key Design Components
User Login Flow:
- User logs in and establishes WebSocket connection
- Presence service receives notification
- Updates user’s online status
- Notifies user’s contacts about presence
- Alice sends message to Bob via WebSocket
- Message routed to Chat Service
- Sequencing service generates unique message ID
- Message persisted in message store
- Message sent to sync queue
- Message sync service checks Bob’s presence
- If online: deliver via WebSocket
- If offline: send push notification
Technical Decisions
Technical Decisions
Communication Protocol: WebSocket for bidirectional real-time communicationDatabase Design:
- User data: SQL (PostgreSQL)
- Messages: NoSQL (Cassandra) for horizontal scaling
- Presence: Redis for fast in-memory lookups
- Chat Service: Handle message routing
- Presence Service: Track online/offline status
- Sequencing Service: Generate message IDs
- Message Sync Service: Deliver to recipients
- Push Notification Service: Offline delivery
- Shard users across multiple chat servers
- Use message queues (Kafka) for reliable delivery
- Cache active conversations in Redis
- CDN for media content
Design a Notification System
Tests understanding of multi-channel communication and async processing.Notification Channels
Notification Channels
- In-App Notifications: Real-time updates within the application
- Email Notifications: Marketing, summaries, important updates
- SMS/OTP: Verification codes, critical alerts
- Push Notifications: Mobile device alerts
- Social Media: Twitter, Facebook posts
Architecture Components
Architecture Components
Flow:
- Business services send notifications to gateway
- Gateway accepts single or batch notifications
- Distribution service validates and formats messages
- Template repository provides message formats
- Preference repository determines delivery channels
- Routers (message queues) distribute to channels
- Channel services communicate with delivery providers
- Tracking service captures delivery metrics
- Rate limiting per channel
- Retry logic for failed deliveries
- User notification preferences
- Template management
- Analytics and tracking
- Priority queues for urgent notifications
Content & Media Systems
Design Netflix/YouTube
A comprehensive question covering video streaming, storage, and CDN.Core Requirements
Core Requirements
Functional:
- Video upload and processing
- Video playback with adaptive quality
- Search and recommendations
- User profiles and watch history
- Subtitles and multiple audio tracks
- Millions of concurrent viewers
- Low latency streaming (<2s buffer time)
- High availability (99.99%)
- Global distribution
- Petabytes of video storage
System Architecture
System Architecture
Video Upload Pipeline:
- Upload to object storage (S3)
- Trigger transcoding service
- Generate multiple quality versions (1080p, 720p, 480p, etc.)
- Create thumbnails and previews
- Extract metadata
- Distribute to CDN edge locations
- Update database with video info
- User requests video
- API returns video metadata and CDN URLs
- Client requests appropriate quality based on bandwidth
- CDN serves video chunks (HLS/DASH)
- Track watch progress
- Update recommendations
- Storage: Object storage (S3) for source videos
- CDN: CloudFront/Akamai for global delivery
- Transcoding: AWS Elastic Transcoder or custom
- Database: SQL for metadata, NoSQL for viewing history
- Streaming: HLS or DASH protocols
- Caching: Redis for metadata, CDN for content
Optimization Strategies
Optimization Strategies
Caching (How Netflix uses caching):
- Edge caching for popular content
- Pre-fetching upcoming video chunks
- Metadata caching for quick browsing
- Thumbnail and preview caching
- Distribute transcoding across worker fleet
- Shard user data by region
- Use CDN POPs in every major city
- Adaptive bitrate streaming
- Lazy loading for UI elements
Design Gmail
Email system design covering SMTP, storage, and search.Email Flow
Email Flow
Sending an Email:
- Alice composes email in client (Outlook)
- Client sends via SMTP to mail server
- Outlook server queries DNS for recipient’s server
- Transfers email via SMTP to Gmail server
- Gmail stores email in recipient’s mailbox
- Bob’s Gmail client connects to server
- Client fetches new emails via IMAP/POP3
- Emails downloaded to client
- Mark as read, delete, archive, etc.
Key Components
Key Components
- SMTP Server: Send and receive emails
- IMAP/POP Server: Client email retrieval
- Storage: Email content and attachments
- Search Index: Fast email search
- Spam Filter: Machine learning-based filtering
- Attachment Service: Handle large files
- Sync Service: Multi-device synchronization
Collaborative & Document Systems
Design Google Docs
Tests knowledge of real-time collaboration and conflict resolution.Real-Time Collaboration Challenge
Real-Time Collaboration Challenge
The biggest challenge: How do multiple users edit the same document simultaneously without conflicts?Conflict Resolution Algorithms:
- Operational Transformation (OT): Used by Google Docs
- Conflict-free Replicated Data Type (CRDT): Active research area
- Differential Synchronization (DS): Alternative approach
Architecture
Architecture
Components:
- WebSocket Server: Handle real-time communication
- Message Queue: Persist document operations
- File Operation Server: Transform and apply edits
- Storage:
- File metadata (SQL)
- File content (Document DB)
- Operations log (NoSQL)
- User makes edit in browser
- Send operation via WebSocket
- Operation persisted in queue
- Server transforms operation using OT
- Broadcast to all connected clients
- Clients apply transformation
- Periodically save snapshots
Location-Based Systems
Design Google Maps
Comprehensive system covering location services, routing, and map rendering.Three Core Components
Three Core Components
1. Location Service:
- Records user location updates (every few seconds)
- Detects new and closed roads
- Improves map accuracy over time
- Feeds live traffic data
- World map divided into tiles
- Pre-calculated at different zoom levels
- Served via CDN from S3
- Client loads necessary tiles
- Efficient zooming and panning
- Geocoding: Address → GPS coordinates
- Route Planning:
- Calculate top-K shortest paths (Dijkstra’s, A*)
- Estimate time based on traffic
- Rank paths by user preferences
- Turn-by-turn directions
- Real-time rerouting
Technical Considerations
Technical Considerations
Geospatial Indexing:
- Quad-trees or Geohash for location indexing
- Quick nearby location queries
- Efficient spatial searches
- Dijkstra’s algorithm for shortest path
- A* for optimal pathfinding
- Contraction Hierarchies for fast routing
- Petabytes of map imagery
- Billions of location updates daily
- Millions of concurrent users
Social Media & Feed Systems
Design Twitter/News Feed
Classic question testing feed generation and timeline algorithms.Requirements
Requirements
Functional:
- Post tweets (280 characters)
- Follow users
- View timeline (following + recommendations)
- Like, retweet, reply
- Trending topics
- 300M daily active users
- 600M tweets/day
- 100:1 read-to-write ratio
- Timeline load <300ms
Feed Generation Strategies
Feed Generation Strategies
Fan-out on Write (Twitter’s approach):
- When user posts, immediately push to followers’ feeds
- Pros: Fast reads
- Cons: Slow writes for users with many followers
- Solution: Hybrid approach for celebrities
- Generate feed when user requests it
- Pros: Fast writes
- Cons: Slow reads
- Fan-out on write for normal users
- Fan-out on read for celebrities
- Best of both worlds
System Components
System Components
- Tweet Service: Create and store tweets
- Timeline Service: Generate user feeds
- Follow Graph: Store user relationships
- Fan-out Service: Distribute tweets to feeds
- Cache Layer: Redis for hot timelines
- Search Service: Index tweets for search
- Trending Service: Calculate trending topics
E-Commerce & Marketplace Systems
Design Amazon/E-Commerce Platform
Complex system covering inventory, orders, payments, and recommendations.Core Services
Core Services
- Product Catalog: Search and browse products
- Inventory Management: Track stock levels
- Shopping Cart: Temporary order storage
- Order Service: Process purchases
- Payment Service: Handle transactions
- Recommendation Engine: Suggest products
- Review Service: User ratings and reviews
Critical Challenges
Critical Challenges
Inventory Consistency:
- Prevent overselling
- Handle concurrent purchases
- Use optimistic locking or distributed locks
- Idempotency for retry safety
- Two-phase commit for orders
- Integration with payment gateways
- Handle refunds and cancellations
- Elasticsearch for product search
- ML-based recommendations
- Faceted search and filters
- Personalized rankings
Infrastructure & Platform Systems
Design a URL Shortener (bit.ly)
Simpler question, great for demonstrating fundamentals.Core Functionality
Core Functionality
Requirements:
- Shorten long URLs to short codes
- Redirect short URLs to originals
- Optional: Custom aliases, expiration
- Optional: Analytics (click tracking)
- 100M new URLs per month
- 100:1 read-to-write ratio
- Low latency (<100ms redirects)
Short URL Generation
Short URL Generation
Option 1: Hash Function
- Use MD5/SHA256 on long URL
- Take first 7 characters
- Risk: Collisions
- Use auto-incrementing ID
- Convert to base62 (a-z, A-Z, 0-9)
- 7 characters = 62^7 ≈ 3.5 trillion URLs
- No collisions
- Generate random string
- Check for collisions
- Retry if exists
System Design
System Design
- Cache popular URLs in Redis
- Database read replicas
- CDN for global access
- Rate limiting to prevent abuse
Design Stack Overflow
Q&A platform testing knowledge of search, ranking, and reputation systems.Surprising Reality
Surprising Reality
What people expect:
- Microservices architecture
- Cloud-native deployment
- Heavy sharding and caching
- Event sourcing with CQRS
- Monolithic architecture
- Only 9 on-premise servers
- No cloud infrastructure
- Serves all traffic efficiently
Key Features to Design
Key Features to Design
- Questions & Answers: Post, edit, delete
- Voting: Upvote/downvote with reputation
- Tags: Categorization and filtering
- Search: Full-text search across Q&A
- Reputation System: Points and badges
- User Profiles: Activity and statistics
Problem-Solving Patterns
8 Common System Design Problems & Solutions
Recognize these patterns and apply appropriate solutions.Read-Heavy System
Problem: Most traffic is reads, database becomes bottleneckSolution: Use caching extensively
- Application cache (Redis/Memcached)
- Database query cache
- CDN for static content
High Write Traffic
Problem: Database can’t handle write volumeSolution:
- Use async workers to process writes
- Choose databases optimized for writes (LSM-trees)
- Examples: Cassandra, RocksDB, LevelDB
Single Point of Failure
Problem: Critical component failure breaks entire systemSolution:
- Implement redundancy for critical components
- Database replication (primary + replicas)
- Multiple application server instances
- Geographic distribution
High Availability Requirements
Problem: System must stay operational 99.9%+ uptimeSolution:
- Load balancing across healthy instances
- Database replication for durability
- Auto-failover mechanisms
- Health checks and monitoring
High Latency
Problem: Users experiencing slow response timesSolution:
- CDN for global content delivery
- Edge computing for processing close to users
- Database query optimization
- Connection pooling
Handling Large Files
Problem: Need to store and serve large media filesSolution:
- Block storage for structured large files
- Object storage (S3) for unstructured data
- CDN for delivery
- Chunked upload/download
Monitoring & Alerting
Problem: Need visibility into system healthSolution:
- Centralized logging (ELK stack)
- Metrics collection (Prometheus, DataDog)
- Distributed tracing (Jaeger)
- Alert management (PagerDuty)
Practice Strategy
Recommended Practice Order
- Start Simple: URL Shortener
- Add Complexity: Chat Application
- Scale Up: Twitter/News Feed
- Real-Time: Google Docs
- Media Heavy: Netflix/YouTube
- Location Services: Google Maps
- Full Stack: E-Commerce Platform
How to Practice
Solve Alone
Time yourself (45 minutes)
Follow the 7-step framework
Draw diagrams on paper
Talk through your solution out loud
Review Solutions
Compare your approach to published solutions
Identify what you missed
Understand alternative approaches
Note different tradeoffs
Remember: In real interviews, you’ll likely encounter variations of these questions. Focus on understanding the underlying patterns and principles rather than memorizing specific solutions.
Next Steps
- Essential Algorithms - Learn the algorithms that power these systems
- How to Ace Interviews - Review the 7-step framework