Caching Strategies in System Design - Tutorials

Caching is a cornerstone technique in modern system design that dramatically improves performance, reduces latency, and lowers load on backend services. This tutorial walks you through the fundamental concepts, various caching strategies, implementation patterns, and best‑practice guidelines for building robust, scalable systems.

Why Caching Matters

In high‑traffic applications, repeatedly fetching the same data from a database or external service can become a bottleneck. By storing frequently accessed data closer to the consumer—whether in memory, on a local machine, or in a distributed cache—you can achieve:

Reduced response times (often from hundreds of milliseconds to a few microseconds)
Decreased database load, allowing the primary store to focus on write‑heavy operations
Improved scalability as cache nodes can be added horizontally

Fundamental Concepts

Cache Hit vs. Miss
Cache Eviction Policies (LRU, LFU, FIFO, TTL)
Cold Start & Warm‑up
Cache Coherency and Consistency
Staleness and Freshness

Types of Caches

In‑Memory Cache (e.g., Guava, Caffeine, ConcurrentHashMap)
Local Process Cache (embedded caches like Ehcache)
Distributed Cache (Redis, Memcached, Amazon ElastiCache)
CDN Edge Cache (for static assets)
Browser Cache (client‑side HTTP cache)

Cache Type	Scope	Typical Latency	Persistence	Use‑Case
In‑Memory	Process	≤ 1 µs	No	Session data, request‑level objects
Local Process	JVM/Node	≈ 1 µs	Optional (disk)	Feature flags, short‑lived lookups
Distributed	Cluster	≈ 1‑5 ms	Yes (snapshot)	User profiles, product catalogs
CDN Edge	Network Edge	≈ 10‑20 ms	Yes (replicated)	Static assets, media files

Caching Strategies

Cache Aside (Lazy Loading)

The application checks the cache first; if the data is missing, it loads the data from the source, stores it in the cache, and then returns it. This approach is simple and gives precise control over what gets cached.

python java

# Python example using redis-py
import redis, json
r = redis.StrictRedis(host='localhost', port=6379, db=0)

def get_user(user_id):
    key = f'user:{user_id}'
    cached = r.get(key)
    if cached:
        return json.loads(cached)
    # Fallback to DB (placeholder)
    user = db_query_user(user_id)
    r.setex(key, 300, json.dumps(user))  # TTL 5 minutes
    return user

// Java example using Jedis (Redis client)
import redis.clients.jedis.Jedis;
import com.fasterxml.jackson.databind.ObjectMapper;

public class UserCache {
    private static final Jedis jedis = new Jedis("localhost");
    private static final ObjectMapper mapper = new ObjectMapper();

    public static User getUser(String userId) throws Exception {
        String key = "user:" + userId;
        String cached = jedis.get(key);
        if (cached != null) {
            return mapper.readValue(cached, User.class);
        }
        User user = DB.fetchUser(userId); // your DB call
        jedis.setex(key, 300, mapper.writeValueAsString(user));
        return user;
    }
}

Read‑Through Cache

The cache sits in front of the data source and automatically loads missing entries. The application always reads from the cache; the cache itself handles fetching from the backing store when needed.

javascript

// Pseudocode for a read‑through wrapper
Cache cache = new RedisCache();
DataSource db = new MySQLDataSource();

function get(key) {
    value = cache.get(key);
    if (value == null) {
        value = db.read(key);
        cache.put(key, value);
    }
    return value;
}

Write‑Through Cache

Writes go to both the cache and the underlying data store synchronously, ensuring that the cache is always up‑to‑date.

📝 Note: Write‑through is ideal for workloads where write latency is not a primary concern and strong consistency is required.

python

# Write‑through in Python (Redis pipeline)
import redis
r = redis.StrictRedis()

def update_user(user_id, data):
    key = f'user:{user_id}'
    # Update DB first (placeholder)
    db_update_user(user_id, data)
    # Then update cache atomically
    r.set(key, json.dumps(data))

Write‑Behind (Write‑Back) Cache

Writes are applied to the cache first and persisted to the backing store asynchronously. This yields the lowest write latency but introduces potential data loss on cache failure.

💡 Tip: Use a durable write‑behind queue (e.g., Kafka) to guarantee eventual consistency even if the cache node crashes.

Cache Invalidation Techniques

Time‑Based Expiration (TTL)
Least Recently Used (LRU) / Least Frequently Used (LFU)
Write Invalidation (explicit delete on update)
Versioned Keys (e.g., :v)
Cache‑Aside Manual Refresh

⚠ Warning: Improper invalidation can lead to stale reads, which may cause severe business logic errors (e.g., displaying out‑of‑date pricing).

Cache Consistency Models

Model	Guarantee	Typical Latency	Complexity
Strong Consistency	Read always returns latest write	Higher (synchronous)	High
Eventual Consistency	Reads converge to latest value eventually	Low	Medium
Read‑Your‑Writes	Client sees its own writes immediately	Low‑Medium	Medium

Choosing the Right Strategy

Selecting a caching strategy depends on three primary dimensions:

Data Volatility – How often does the data change?
Read‑Write Ratio – Is the workload read‑heavy or write‑heavy?
Consistency Requirements – Can the system tolerate stale data?

A practical decision matrix:

Scenario	Recommended Strategy	Rationale
Product catalog (read‑heavy, low updates)	Cache Aside + TTL	Simple, low staleness risk
User session data (frequent writes, short life)	Write‑Through	Ensures session updates are instantly visible
Analytics aggregates (batch updated)	Write‑Behind	Maximizes write throughput, tolerates slight delay

📘 Summary: Caching, when applied thoughtfully, can turn a bottleneck into a performance accelerator. Understanding the trade‑offs between latency, consistency, and complexity is essential for architecting resilient systems.

Q: When should I avoid using a cache?
A: If the data is highly volatile, requires strong real‑time consistency, or the overhead of cache management exceeds the performance gains, it may be better to query the source directly.

Q: How do I monitor cache health?
A: Track hit/miss ratios, eviction rates, latency, and resource utilization (CPU/memory). Tools like Redis INFO, Prometheus exporters, and APM solutions provide these metrics.

Q: What is the difference between TTL and LRU?
A: TTL expires entries after a fixed duration regardless of usage, while LRU evicts the least recently accessed items when the cache reaches its size limit.

Q. Which caching strategy guarantees that a read after a write will see the latest value?

Cache Aside
Write‑Through
Write‑Behind
Read‑Through

Answer: Write-Through
Write‑Through synchronously updates both cache and data store, ensuring reads always reflect the most recent write.

Q. What is the primary risk associated with a Write‑Behind cache?

Higher read latency
Cache stampede
Potential data loss on crash
Increased DB load

Answer: Potential data loss on crash
Since writes are persisted asynchronously, a cache failure before the write is flushed can result in lost updates.

References

#ad