🎯 Web Caching
Caching stores information that is requested by many clients in memory and serves this information as the results to client requests. While the information is still valid, it can be served potentially millions of times without the cost of re-creation.
Introduction
One of the reasons that websites are so highly responsive is that the internet is littered with web caches. Web caches store a copy of a given resource—for example, a web page or an image—for a defined time period. The caches intercept client requests and, if they have a requested resource cached locally, they return the copy rather than forwarding the request to the target service. Hence, many requests can be satisfied without placing a burden on the service. Also, as the caches are physically closer to the client, the requests will have lower latencies.
Multiple levels of caches exist, starting with the client’s web browser cache and local organization-based caches. ISPs will also implement general web proxy caches, and reverse proxy caches can be deployed within the application services execution domain. Web browser caches are also known as private caches (for a single user). Organizational and ISP proxy caches are shared caches that support requests from multiple users. Edge caches, also known as content delivery networks (CDNs), live at various strategic geographical locations globally, so that they cache frequently accessed data close to clients.
Types of web caches:
- Browser cache (private cache): Stores resources locally for a single user.
- Proxy cache (shared cache): Used by organizations or ISPs to serve multiple users.
- Reverse proxy cache: Placed in front of servers to cache responses for many clients (e.g., Varnish, NGINX).
- CDN (edge cache): Globally distributed caches that bring content closer to users.
Caches typically store the results of HTTP GET
requests only, and the cache key is the URI of the associated GET
. When a client sends a GET
request, it may be intercepted by one or more caches along the request path. Any cache with a fresh copy of the requested resource may respond to the request. If no cached content is found, the request is served by the service endpoint, which is also called the origin server in web technology parlance. Services can control what results are cached and for how long they are stored by using HTTP caching directives. Services set these directives in various HTTP response headers.
Cache-Control
The Cache-Control
HTTP header can be used by client requests and service responses to specify how the caching should be utilized for the resources of interest. Possible values are:
no-store
: Specifies that a resource from a request/response should never be cached. This is typically used for sensitive data (e.g., banking information, personal data) that must always be retrieved from the origin server.no-cache
: Specifies that a cached resource must be revalidated with an origin server before use. The cache can store the resource, but must check with the server before serving it.private
: Specifies a resource can be cached only by a user-specific device such as a web browser. Shared caches (like proxies or CDNs) must not store it.public
: Specifies a resource can be cached by any cache, including shared caches.max-age
: Defines the maximum amount of time (in seconds) a cached copy of a resource is considered fresh. After expiration, a cache must refresh the resource by sending a request to the origin server.s-maxage
: Likemax-age
, but applies only to shared caches (e.g., CDNs, proxies).must-revalidate
: Once a resource becomes stale, caches must revalidate it with the origin server before serving it.proxy-revalidate
: Likemust-revalidate
, but applies only to shared caches.immutable
: Indicates that the resource will not be updated during its freshness lifetime, so the browser can use the cached version without revalidation.
Example:
Cache-Control: public, max-age=3600, must-revalidate
HTTP is designed to cache as much as possible, so even if no Cache-Control
is given, responses may get stored and reused if certain conditions are met.
This is called heuristic caching. Heuristic caching is a workaround that
came before Cache-Control
support became widely adopted, and basically all
responses should explicitly specify a Cache-Control
header.
Expires and Last-Modified
The Expires
and Last-Modified
HTTP headers interact with the max-age
directive to control how long cached data is retained. Caches have limited storage resources and hence must periodically evict items from memory to create space. To influence cache eviction, services can specify how long resources in the cache should remain valid, or fresh. When a request arrives for a fresh resource, the cache serves the locally stored results without contacting the origin server. Once any specified retention period for a cached resource expires, it becomes stale and becomes a candidate for eviction.
Freshness is calculated using a combination of header values. The Cache-Control: max-age=N
header is the primary directive, and this value specifies the freshness period in seconds. If max-age
is not specified, the Expires
header is checked next. If this header exists, then it is used to calculate the freshness period. The Expires
header specifies an explicit date and time after which the resource should be considered stale. For example:
Expires: Wed, 21 Oct 2015 07:28:00 GMT
As a last resort, the Last-Modified
header can be used to calculate resource retention periods. This header is set by the origin server to specify when a resource was last updated, and uses the same format as the Expires
header. A cache server can use Last-Modified
to determine the freshness lifetime of a resource based on a heuristic calculation that the cache supports. The calculation uses the Date
header, which specifies the time a response message was sent from an origin server. How long to reuse is up to the implementation, but the specification recommends about 10% of the time after storing, which means a resource retention period subsequently becomes equal to the value of the Date
header minus the value of the Last-Modified
header divided by 10.
ETag
HTTP provides another directive that can be used to control cache item freshness. This is known as an ETag
. An ETag (entity tag) is a unique identifier assigned by the server to a specific version of a resource. When a client requests a resource, the server includes an ETag in the response header. When the client later requests the same resource, it can send the ETag value in the If-None-Match
header to check if the resource has changed.
This is known as revalidation.
There are two possible responses to this request:
- If the
ETag
in the request matches the value associated with the resource in the service, the cached value is still valid. The origin server can therefore return a304 (Not Modified)
response. No response body is needed as the cached value is still current, thus saving bandwidth, especially for large resources. The response may also include new cache directives to update the freshness of the cached resource. - If the resource has changed, the origin server responds with a
200 OK
response code, a response body, and a new ETag representing the latest version of the resource.
Tip: ETags are especially useful for caching resources that change infrequently or are expensive to generate, such as images, large documents, or API responses.
Cache Invalidation and Purging
A critical aspect of caching is invalidation—ensuring that outdated or incorrect data is not served to users. There are several strategies:
- Time-based expiration: Use
max-age
,Expires
, or similar headers to automatically expire cached content after a certain period. - Manual purging: Administrators or applications can explicitly remove or refresh cached content (e.g., via CDN APIs).
- Revalidation: Use ETag or
Last-Modified
headers to check with the origin server if the cached resource is still valid. - Cache busting: Change the resource URL (e.g., by adding a version or hash) to force clients to fetch a new version.
Pitfall: If cache invalidation is not handled properly, users may see stale or incorrect data. Always design your cache strategy with invalidation in mind.
Vary Header
The Vary
header tells caches to store different versions of a resource based on the value of specified request headers. For example, if a server responds with Vary: Accept-Encoding
, the cache will store separate versions for gzip, br, or identity encodings.
Vary: Accept-Encoding, User-Agent
This is important for content negotiation (e.g., language, encoding, device-specific content) to ensure the correct version is served to each client.
Practical Caching Strategies
- Static assets (JS, CSS, images): Use long
max-age
andimmutable
for versioned files. Use cache busting (e.g., file hashes) to force updates. - API responses: Use short
max-age
orno-cache
for dynamic data. Use ETag orLast-Modified
for revalidation. - HTML pages: Use conservative caching, as content may change frequently. Consider using
stale-while-revalidate
for improved perceived performance. - Sensitive data: Always use
no-store
to prevent caching.
Summary
When used effectively, web caching can significantly reduce latencies and save network bandwidth. This is especially true for large items such as images and documents. Further, as web caches handle requests rather than application services, this reduces the request load on origin servers, creating additional capacity.
Key takeaways:
- Use
Cache-Control
headers to explicitly define caching behavior. - Combine
Expires
,Last-Modified
, andETag
for robust cache management. - Always consider cache invalidation and purging strategies.
- Use the
Vary
header for content negotiation. - Caching is a powerful tool, but must be used thoughtfully to avoid serving stale data.
For more details, see the MDN HTTP caching documentation.