Introduction to System Design

What is System Design?

System design is the process of defining the architecture, components, modules, interfaces, and data for a system to satisfy specified requirements. It focuses on creating scalable, reliable, and maintainable solutions that can handle real‑world workloads.

Why System Design Matters for Software Engineers

  • Translates business needs into technical specifications
  • Ensures the system can grow with increasing traffic and data
  • Improves reliability and fault tolerance
  • Helps identify trade‑offs early (e.g., latency vs. consistency)
  • Prepares engineers for technical interviews and architecture reviews

Core Principles of System Design

  1. Scalability
  2. Reliability & Fault Tolerance
  3. Performance (Latency & Throughput)
  4. Maintainability
  5. Security
  6. Cost Efficiency

Scalability

Scalability is the ability of a system to handle increased load by adding resources. It can be vertical (scale‑up) or horizontal (scale‑out). Horizontal scaling is preferred for large‑scale services because it provides better fault isolation and cost control.

Reliability & Fault Tolerance

A reliable system continues to operate correctly even when components fail. Techniques include redundancy, graceful degradation, retries, and circuit breakers.

Performance

Performance is measured in terms of latency (time to respond) and throughput (requests per second). Optimizing one often impacts the other, so engineers must balance them based on product requirements.

Typical Steps in a System Design Interview

  1. Clarify requirements and define scope
  2. Identify core entities and their relationships
  3. Sketch high‑level architecture (clients, API layer, services, storage, etc.)
  4. Discuss data flow and API contracts
  5. Address scalability, reliability, and consistency
  6. Consider trade‑offs and choose appropriate technologies
  7. Summarize the design and highlight potential improvements

Key Architectural Components

  • Load Balancer
  • Caching Layer
  • Database (SQL / NoSQL)
  • Message Queue / Pub‑Sub
  • Search Engine
  • Content Delivery Network (CDN)
  • Monitoring & Alerting

Load Balancer

Distributes incoming traffic across multiple backend instances to achieve high availability and better resource utilization.

# Simple round‑robin load balancer in Python
import itertools, socket, threading

def handle_client(client_sock, backend_addrs):
    for addr in itertools.cycle(backend_addrs):
        try:
            backend = socket.create_connection(addr)
            break
        except Exception:
            continue
    # proxy data between client and selected backend
    # (implementation omitted for brevity)

if __name__ == "__main__":
    LISTEN_ADDR = ("0.0.0.0", 8080)
    BACKENDS = [("127.0.0.1", 9001), ("127.0.0.1", 9002)]
    server = socket.socket()
    server.bind(LISTEN_ADDR)
    server.listen()
    while True:
        client, _ = server.accept()
        threading.Thread(target=handle_client, args=(client, BACKENDS)).start()

Caching Layer

Caches store frequently accessed data closer to the client, reducing latency and load on the primary database. Common choices are Redis, Memcached, and CDN edge caches.

Database Choices

Relational databases provide strong consistency and complex queries, while NoSQL databases offer flexible schemas and horizontal scaling. The choice depends on the data model and consistency requirements.

AspectSQL DatabasesNoSQL Databases
SchemaFixed & normalizedDynamic / schema‑less
ConsistencyStrong (ACID)Eventual or configurable
ScalabilityVertical / limited horizontalHorizontal by design
Use CasesTransactions, analyticsLarge‑scale reads, flexible data

Design Example: Scalable URL Shortener

Let’s walk through a classic interview problem: designing a service like tinyurl.com that shortens long URLs and redirects users efficiently.

Requirements Clarification

  • Create a short alias for any given URL
  • Redirect short URL to original URL
  • Support 100M+ URLs
  • Low latency (< 50 ms) for redirects
  • High availability (99.99% uptime)
  • Analytics (optional) – number of clicks per URL

High‑Level Architecture Diagram (textual)

Client → API Gateway → URL Service (Create/Read) → Cache (Redis) → DB (MySQL) → Analytics Service → Message Queue → Worker → Storage

Component Details

  • API Gateway: Handles authentication, rate limiting, and routing
  • URL Service: Generates a unique short code (base62) and stores mapping
  • Cache: Stores hot URL mappings for O(1) read latency
  • Database: Persistent storage of URL‑code pairs
  • Analytics Service: Consumes click events from a queue and aggregates counts

Scalability Strategies

  • Sharding the URL table by short code hash
  • Read‑through caching – fallback to DB on cache miss
  • Asynchronous write‑behind for analytics (Kafka → Spark)
  • Stateless API servers behind a load balancer

Reliability Measures

  • Multi‑AZ deployment for each service
  • Automatic failover for Redis (Redis Sentinel) and MySQL (replication)
  • Circuit breaker pattern for downstream services

Trade‑Off Discussion

Choosing a short code length balances collision probability and URL length. A 7‑character base62 code yields ~3.5 × 10¹² combinations, sufficient for 100 M URLs with negligible collision risk.

In system design, every decision involves a trade‑off. Always justify choices with respect to the primary product metrics (e.g., latency, throughput, cost).
⚠ Warning: Never store raw user‑provided URLs without validation; they can contain malicious payloads.
💡 Tip: Use a deterministic hash function (e.g., MurmurHash) for code generation to avoid database‑level uniqueness checks.
📝 Note: If analytics is not required, you can omit the message queue and worker, simplifying the design.
📘 Summary: System design bridges business goals and technical solutions. By mastering core principles—scalability, reliability, performance, and trade‑offs—engineers can architect systems that grow gracefully, remain resilient, and deliver a great user experience.

Q: What is the difference between vertical and horizontal scaling?
A: Vertical scaling adds more resources (CPU, RAM) to a single node, while horizontal scaling adds more nodes to distribute the load. Horizontal scaling is generally more fault‑tolerant and cost‑effective for large systems.


Q: When should I choose a NoSQL database over a relational one?
A: Choose NoSQL when you need flexible schemas, massive horizontal scalability, or when the data access pattern is simple key‑value or document‑oriented. Use relational databases for complex transactions and strong consistency requirements.


Q: How does a CDN improve system performance?
A: A CDN caches static assets at edge locations close to users, reducing latency and offloading traffic from origin servers.


Q. Which component is primarily responsible for distributing traffic across multiple service instances?
  • Cache
  • Load Balancer
  • Message Queue
  • Database

Answer: Load Balancer
A load balancer routes incoming requests to healthy backend instances, enabling horizontal scaling and high availability.

Q. If you need strong consistency for financial transactions, which storage type should you prioritize?
  • NoSQL (eventual consistency)
  • SQL (ACID)
  • In‑memory cache
  • File storage

Answer: SQL (ACID)
SQL databases provide ACID guarantees, ensuring that all parts of a transaction either complete successfully or roll back together.

References
🎥 Video

Full System Design Interview Guide