There are two key requirements for architecting a centralized logging strategy.
Direct Logs to an Isolated Storage Area
The fi rst requirement is to direct all logs to a redundant and isolated storage area. In an Infrastructure as a Service (IaaS) implementation, a design for a centralized logging solution might look like that depicted in Figure 10.1 .
In this design, all logs are directed to syslog instead of being written directly to disk on the local machine. Syslog on each server is piped directly to a dedicated logging server farm. The logging server farm should be redundant across data centers, or availability zones (AZ) as they are called on Amazon Web Services. Once the data arrives on the logging server farm there are a
number of logging solutions, both open source and commercial, that transform the data into a NoSQL database. These tools also provide a rich user interface that allows the end user to search the logs and schedule jobs, trigger alerts, create reports, and much more. This strategy provides the following benefi ts:
■ Allows the administrators to block all developer access from all servers in the production environment. Developers can only access produc-tion logging servers and only through a secure user interface (UI) or through application programming interface (API) access.
■ Auditing becomes much simpler since all logs are in one place.
■ Data mining and trend analysis become feasible because all logging data is stored in a NoSQL database.
■ Implementing intrusion detection becomes simpler because tools can be run on top of the central logging database.
■ Loss of log data is minimized because data is not stored on local disk of servers that may be deprovisioned on the fl y.
For applications built on top of IaaS, there are a few options. If the appli-cation team wants to build and manage its own logging solution, it will need to stand up a logging server (or two or more for redundancy) and confi gure
Web Servers
DB Logs | App Svr Logs | Web Logs
Admins have total access
Developers access log server API Servers only
DB Logs | App Svr Logs | API Logs
Database Servers
DB Logs | App Svr Logs | App Logs
Utility Servers
DB Logs | App Svr Logs | App Logs
(Logging Service)
Figure 10.1 Centralized Logging Strategy
the operating system to use a command like syslogd for Linux-based sys-tems, log4J for Apache, or Log4Net for .NET. These are just a few of many tools for assisting with logging. Once the logs are all routed to a central repository, there are many open source and commercial products that can be used that sit on top of the central repository and provide easy-to-use searching, job scheduling, event processing, and notifi cation capabilities for sifting through log data.
Another option is to leverage a Software as a Service (SaaS) logging solution. In this model the logs are sent to a cloud-based centralized logging Database as a Service solution. SaaS logging solutions have many advantages.
First, the team no longer has to build, manage, and maintain logging func-tionality, which is usually not a core competency. Second, the logs are main-tained off-site on a scalable, reliable third party ’s cloud infrastructure. Third, if any part of the data center goes down, the logs service will not be impacted.
If a company is leveraging more than one cloud platform (for example, AWS and Rackspace), SaaS logging solutions are even more attractive because the log fi les from the different cloud service providers ’ (CSPs ’) cloud platforms can be managed and maintained in one place.
Many Platform as a Service (PaaS) solutions are integrated with the most popular logging SaaS solutions, like Loggly and Logentries, which provide API access to the central logging solution. Instead of building and managing logging services, a PaaS user can simply pay for what it uses. Logging add-ons, or plug-ins, as these are often called on PaaS platforms, are one of the reasons why PaaS is so attractive to developers. Developers can simply turn on plug-ins like logging, monitoring, Database as a Service, message queues, payment services, and more without having to write all of that code or fi gure out how to integrate with these solutions.
Some companies may choose to manage the logs themselves because they do not want any of their data to leave their premises. These companies are trading control for speed-to-market since they are taking on much more work than they would have to do had they gone with the SaaS solution. Another advantage of leveraging a SaaS solution for logging becomes clear when the CSP has an out-age. If a company is managing the logging system on that CSP ’s infrastructure, the logging solution might be down, as well. Without access to the logs it will likely be incredibly diffi cult to troubleshoot the issues that the outage is causing.
In the SaaS model, all the logs would be available during the CSP outage.
Standardize Log Formats
The second key requirement of a centralized logging strategy is to stan-dardize all log formats, naming conventions, severity levels, and error codes for all messages. Storing all logs in a central location is a good start, but if
the actual log messages are not designed in a standard way, the value of the data is very limited. A best practice is to build a utility service for writing application messages with a common log message format. In addition, APIs should be designed to use standard http error codes and leverage a standard like the RFC 5424 Syslog protocol to standardize on severity levels. See Table 10.1 .
Finally, create a common vocabulary for error descriptions including tracking attributes such as date, time, server, module, or API name and consis-tently use the same terms. For example, if a system has numerous areas where IDs are being authenticated, always use the same terms, such as authentication failed or access denied . If the same terms are always used, then a simple search from the logging tools will provide consistent results. One way to enforce consistent naming is to use a database or XML data store that the develop-ers can pull from. This eliminates the chance that developdevelop-ers use different descriptions, which would diminish the value of the logging data. Also, by storing these attributes in a data store, changes can be made to the data with-out the need for a build or a deployment.
Standards are crucial for optimizing searches and producing consistent results. The more that the data contained in log messages is standardized, the more automation can be designed. Instead of being reactive and paying people to search for anomalies in the logs, jobs can be run to detect patterns and alert the appropriate personnel if the log content is inconsistent. Audit reports can be automatically generated. Trend reports can be derived, detecting common issues. Statistics can be tied to deployments to proactively analyze the quality of each deployment. There is no end to the number of proactive insights that can be programmed if all log data is standardized. This is a key strategy for
Table 10.1 RFC 5424 Severity Codes
Code Severity
0 Emergency, system is unusable 1 Alert: action must be taken immediately 2 Critical: critical conditions
3 Error: error conditions 4 Warning: warning conditions
5 Notice: normal but signifi cant condition 6 Informational: informational messages 7 Debug: debug-level messages Source: tools.ieft.org/html/rfc5424.
increasing automation and proactive monitoring, which leads to higher ser-vice level agreements (SLAs) and customer satisfaction.