This chapter contains advice for determining what Drupal queries may be safely cached, why, and for how long – the time to live (TTL) to assign. This guidance is based on documentation from the Drupal Structure Guide using query frequency statistics gathered from internal product testing.
7.1 Heuristic guidance
Drupal issues dozens of database queries for each and every web page.
Consequently, it may be more appropriate to base TTL values on heuristics such as the site’s purpose, how frequently the content changes, and the site’s
7.1.1 Site purpose
One heuristic to consider when setting TTL values is what the overall purpose of the Drupal site is. A site publishing gossip about popular culture has a different purpose and revenue-generation model than a site that books tickets for specialized events. While the former site may meet its mission with some content being as much as five minutes out of date, the latter may not be able to afford being even five seconds out of date.
7.1.2 Update frequency
Another heuristic to consider is how often site content is updated. Let’s consider Whitehouse.gov, which averages four to six thousand page views per day.8 For
the week of March 16-22, 2014, the site had twenty blog posts, which is on the order of three publishing events per day. Given eight hours per work day, this rate equates to one publishing event every 160 minutes.
Using Nyquist-Shannon,9 a TTL value of 80 minutes should be short enough that,
in theory, no visitor will miss more than the most recent blog post. Depending on a site’s fidelity goal, computed TTL values may be relaxed or tightened.
7.1.3 Organizational impact
A third heuristic to consider is the consequence of site failure to the organization. Also consider, among failure types, whether avoiding failure from lack of availability or lack of integrity is more important.
This impact may change over time. Consider a site serving an annual conference as an example. In the days leading up to the conference, lack of availability to accept new registrations may have the highest impact. During this time, the best course would be to set TTL values longer so that website performance is fastest. During the conference itself, lack of integrity to report last-minute room assignments may be more worrisome. At this point, setting TTL values shorter is important.
7.1.4 Be conservative
In debating whether a TTL value should be longer or shorter, be conservative. Our testing shows that TTL values as short as one minute yield huge performance benefits, a 15-fold improvement between a query request and its associated response. Although longer TTL values should yield higher performance, more conservative values may be sufficient to address a site’s performance issues.
7.2 Structured-based guidance
We can offer specific guidance on setting a TTL value based on what table is queried. Some Drupal tables are used to store content, others store associations between and groupings of content, and still others serve site administration. The following sections offer guidance for each of these table types.
8 According to Alexa, Whitehouse.gov is the 935th most popular website in the US.
7.2.1 Optimization tables
Drupal’s schema includes several tables to optimize the interaction with the database. Notably, Drupal implements its own caching scheme. It also stores session information in the database, not the file system. Further, Drupal places contention information in the database so locking items are updated. Consequently, we recommend not caching the following tables:
cache* queue semaphore session system watchdog
Other Drupal tables can be cached as discussed in the following sections.
7.2.2 Content
Drupal uses nodes as its main content structure. Nodes can be of different types, such as blog posts or pages. Custom nodes can serve specific purposes.
If a site is fairly static, the nodes* tables can be cached with fairly long TTL values. Five minutes on a site that posts five posts a day is sufficiently generous.
7.2.3 Taxonomy
Drupal uses two sets of tables to build a taxonomy. The field* tables define the element names and descriptions, and the taxonomy* tables define the relations between elements.
TTL values should be selected based on how often the site taxonomy changes. Sites that use a relatively static taxonomy, updated only by content owners, might consider longer TTL values for these tables. Sites that have dynamic updates to these tables from site users – such as sites offering end-user tagging – should consider shorter TTL values.
7.2.4 Site administration
Drupal has two sets of tables to administer users and permissions. The user* tables define accounts, while the role* tables define the permission set for each account.
TTL values should be selected based on the site’s purpose. Broadcast sites not accepting end-user content submissions may consider longer TTL values. Blogging sites, in contrast, should consider shorter TTL values for these tables.