Scalability Strategies - System Design 101

What is Scalability?

Scalability is the ability of a system to handle an increased workload without losing performance. More precisely, it’s the system’s ability to handle increased workload by repeatedly applying a cost-effective strategy.

Scalability isn’t just about handling more load - it’s about doing so in a financially viable way. A system can be difficult to scale beyond a certain point if the scaling strategy becomes too expensive.

Understanding Scalability

What do Amazon, Netflix, and Uber have in common? They are extremely good at scaling their systems whenever needed.

Three Main Bottlenecks to Scalability

Centralized Components

Can become a single point of failure

High Latency Components

Perform time-consuming operations

Tight Coupling

Makes components difficult to scale

Core Principles for Scalable Systems

To build a scalable system, follow these principles:

Statelessness - Don’t rely on server-specific data
Loose Coupling - Minimize dependencies between components
Asynchronous Processing - Handle long-running tasks in the background

8 Must-Know Scalability Strategies

Here are 8 essential strategies to scale your system effectively:

1. Stateless Services

Why Stateless Services Matter

Design stateless services because they don’t rely on server-specific data and are easier to scale.Benefits:

Any server can handle any request
Easy to add/remove servers
Simpler load balancing
Better fault tolerance

Implementation:

Store session data in distributed cache (Redis)
Use JWT tokens for authentication
Externalize configuration

2. Horizontal Scaling

Add more servers so that the workload can be shared across multiple instances.

Horizontal Scaling

Add more machines to distribute load

Vertical Scaling

Add more power (CPU, RAM) to existing machines

Horizontal Scaling Benefits:

No upper limit on capacity
Better fault tolerance
Cost-effective with commodity hardware
Easier rollback and updates

3. Load Balancing

Use a load balancer to distribute incoming requests evenly across multiple servers, preventing any single server from becoming a bottleneck.

Load Balancing Algorithms:

Round Robin
Least Connections
IP Hash
Weighted Round Robin
Least Response Time

Benefits:

Even distribution of traffic
High availability
Health checking
SSL termination

4. Auto Scaling

Implement auto-scaling policies to adjust resources based on real-time traffic.

Auto Scaling Strategies

Scaling Triggers:

CPU utilization
Memory usage
Request count
Custom metrics

Scaling Policies:

Target Tracking: Maintain a specific metric target
Step Scaling: Scale by different amounts based on thresholds
Scheduled Scaling: Scale based on predictable patterns

Cloud Provider Solutions:

AWS Auto Scaling
Azure Virtual Machine Scale Sets
Google Cloud Autoscaler
Kubernetes Horizontal Pod Autoscaler

5. Caching

Use caching to reduce the load on the database and handle repetitive requests at scale. Cache frequently accessed data in faster storage layers.

Caching Layers:

Client-side caching - Browser, mobile app
CDN caching - Edge locations
Application caching - Redis, Memcached
Database caching - Query result caching

Popular Caching Solutions:

Redis
Memcached
Varnish
CloudFront

6. Database Replication

Replicate data across multiple nodes to scale read operations while improving redundancy.

Read Replicas

Handle read queries to reduce primary database load

Primary-Replica

Write to primary, read from replicas

Benefits:

Improved read performance
High availability
Disaster recovery
Geographic distribution

Replication Patterns:

Master-Slave replication
Master-Master replication
Multi-region replication

7. Database Sharding

Distribute data across multiple database instances to scale both writes and reads. Each shard is a horizontal partition of the data.

Sharding Strategies:

Common Sharding Approaches

1. Range-based Sharding

Partition by ID ranges
Simple but can create hotspots

2. Hash-based Sharding

Use hash function on key
Even distribution
Difficult to add/remove shards

3. Geographic Sharding

Partition by location
Reduces latency
Supports data compliance

4. Directory-based Sharding

Lookup table for routing
Flexible but adds complexity

Challenges:

Complex queries across shards
Distributed transactions
Rebalancing shards
Maintaining referential integrity

8. Async Processing

Move time-consuming and resource-intensive tasks to background workers using async processing to scale out new requests.

Use Cases:

Email sending
Image/video processing
Report generation
Data analytics
Batch jobs

Implementation Patterns:

Message queues (RabbitMQ, SQS)
Event streams (Kafka)
Task queues (Celery, Bull)
Job schedulers (Airflow, Cron)

Database Scaling Strategies

7 Must-Know Database Scaling Techniques

1. Indexing

Indexing Best Practices

Check the query patterns of your application and create the right indexes.Types of Indexes:

B-tree indexes (default)
Hash indexes
Full-text indexes
Geospatial indexes

Best Practices:

Index columns used in WHERE clauses
Index foreign keys
Avoid over-indexing (writes slow down)
Monitor index usage

2. Materialized Views

Pre-compute complex query results and store them for faster access. Benefits:

Faster query performance
Reduced computation load
Better for complex aggregations

Trade-offs:

Additional storage required
Need refresh strategy
Potential staleness

3. Denormalization

Reduce complex joins to improve query performance by storing redundant data.

When to Denormalize:

Read-heavy workloads
Complex joins impacting performance
Data that changes infrequently

Considerations:

Data consistency challenges
Increased storage
Update complexity

4. Vertical Scaling

Boost your database server by adding more CPU, RAM, or storage. Limitations:

Hardware limits
Expensive at scale
Downtime during upgrades
Single point of failure

5. Caching

Store frequently accessed data in a faster storage layer to reduce database load. Caching Strategies:

Cache-aside
Write-through
Write-behind
Refresh-ahead

6. Replication

Create replicas of your primary database on different servers for scaling reads.

7. Sharding

Split your database tables into smaller pieces and spread them across servers. Used for scaling both writes and reads.

Scaling from One to Millions of Users

The diagram illustrates the evolution of a simplified eCommerce website from a monolithic design on a single server to a service-oriented/microservice architecture.

Step 1: Separate Application and Database

Initial Separation

With the growth of the user base, one single application server cannot handle the traffic anymore.Solution: Put the application server and database server on separate servers.Benefits:

Better resource allocation
Independent scaling
Improved security

Step 2: Application Server Cluster

The business continues to grow, and a single application server is no longer enough. Solution: Deploy a cluster of application servers.

Step 3: Load Balancer

Now the incoming requests have to be routed to multiple application servers. How can we ensure each application server gets an even load? The load balancer handles this perfectly.

Step 4: Database Read Replicas

With business continuing to grow, the database might become the bottleneck. Solution: Separate reads and writes so that frequent read queries go to read replicas. Benefits:

Greatly increased throughput for database writes
Reduced load on primary database
Better performance

Step 5: Horizontal Partition and Caching

One single database cannot handle the load on both the inventory table and user table.

Vertical Partition

Add more power to the database server (has hard limits)

Horizontal Partition

Add more database servers

Caching Layer

Offload read requests

Step 6: Microservices Architecture

Modularize the functions into different services. The architecture becomes service-oriented/microservice.

Benefits:

Independent deployment
Technology flexibility
Team autonomy
Better fault isolation
Easier scaling

Common Scalability Techniques Summary

Load Balancing

Spread requests across multiple servers to prevent bottlenecks

Caching

Store commonly requested information in memory

Event-Driven Processing

Use async processing for long-running tasks

Sharding

Split large datasets into smaller shards for horizontal scalability

Performance Optimization

Reduce Latency Strategies

Top 5 Strategies to Reduce Latency

Use CDN - Serve static content from edge locations
Optimize Database Queries - Add indexes, optimize joins
Implement Caching - Multiple layers of caching
Async Processing - Move work to background
Connection Pooling - Reuse database connections

System Design Trade-offs

Common Trade-offs:

Consistency vs. Availability (CAP theorem)
Latency vs. Throughput
Read vs. Write Performance
Cost vs. Performance
Complexity vs. Maintainability

Monitoring and Observability

Key Metrics to Track

Response Time

Track latency at different percentiles (p50, p95, p99)

Throughput

Requests per second (RPS)

Error Rate

Percentage of failed requests

Resource Utilization

CPU, memory, disk, network usage

Logging and Tracing

Observability Stack:

Metrics: Prometheus, Grafana
Logging: ELK Stack, Loki
Tracing: Jaeger, Zipkin
APM: New Relic, Datadog, Dynatrace

Best Practices

Start Simple

Don’t over-engineer initially. Scale when you need to.

Measure Everything

You can’t optimize what you don’t measure.

Plan for Failure

Design systems to be resilient to failures.

Automate Scaling

Use auto-scaling to handle traffic spikes.

Test at Scale

Load test before production traffic hits.

Common Pitfalls to Avoid

What NOT to Do

Premature Optimization:

Don’t scale before you need to
Measure first, optimize later
Start with vertical scaling if appropriate

Over-Engineering:

Keep it simple initially
Add complexity only when necessary
Consider operational overhead

Ignoring Bottlenecks:

Profile and identify real bottlenecks
Don’t assume - measure
Fix the biggest bottleneck first

Not Planning for Failure:

Everything fails eventually
Design for graceful degradation
Implement proper monitoring

Next Steps

Software Architecture

Learn architectural patterns for scalability

Microservices

Scale with microservices architecture

Design Patterns

Apply patterns for scalable systems

Documentation Index

​What is Scalability?

​Understanding Scalability

​Three Main Bottlenecks to Scalability

Centralized Components

High Latency Components

Tight Coupling

​Core Principles for Scalable Systems

​8 Must-Know Scalability Strategies

​1. Stateless Services

​2. Horizontal Scaling

Horizontal Scaling

Vertical Scaling

​3. Load Balancing

​4. Auto Scaling

​5. Caching

​6. Database Replication

Read Replicas

Primary-Replica

​7. Database Sharding

​8. Async Processing

​Database Scaling Strategies

​7 Must-Know Database Scaling Techniques

​1. Indexing

​2. Materialized Views

​3. Denormalization

​4. Vertical Scaling

​5. Caching

​6. Replication

​7. Sharding

​Scaling from One to Millions of Users

​Step 1: Separate Application and Database

​Step 2: Application Server Cluster

​Step 3: Load Balancer

​Step 4: Database Read Replicas

​Step 5: Horizontal Partition and Caching

Vertical Partition

Horizontal Partition

Caching Layer

​Step 6: Microservices Architecture

​Common Scalability Techniques Summary

Load Balancing

Caching

Event-Driven Processing

Sharding

​Performance Optimization

​Reduce Latency Strategies

​System Design Trade-offs

​Monitoring and Observability

​Key Metrics to Track

Response Time

Throughput

Error Rate

Resource Utilization

​Logging and Tracing

​Best Practices

Start Simple

Measure Everything

Plan for Failure

Automate Scaling

Test at Scale

​Common Pitfalls to Avoid

​Next Steps

Software Architecture

Microservices

Design Patterns

What is Scalability?

Understanding Scalability

Three Main Bottlenecks to Scalability

Core Principles for Scalable Systems

8 Must-Know Scalability Strategies

1. Stateless Services

2. Horizontal Scaling

3. Load Balancing

4. Auto Scaling

5. Caching

6. Database Replication

7. Database Sharding

8. Async Processing

Database Scaling Strategies

7 Must-Know Database Scaling Techniques

1. Indexing

2. Materialized Views

3. Denormalization

4. Vertical Scaling

5. Caching

6. Replication

7. Sharding

Scaling from One to Millions of Users

Step 1: Separate Application and Database

Step 2: Application Server Cluster

Step 3: Load Balancer

Step 4: Database Read Replicas

Step 5: Horizontal Partition and Caching

Step 6: Microservices Architecture

Common Scalability Techniques Summary

Performance Optimization

Reduce Latency Strategies

System Design Trade-offs

Monitoring and Observability

Key Metrics to Track

Logging and Tracing

Best Practices

Common Pitfalls to Avoid

Next Steps