IT Critical Systems and Failure Points 1
IT Critical Systems and Failure Points
Summary
This report identifies critical IT systems and/or potential critical failure points for STRATFOR and their current state of reliability and redundancy. Each entry is broken down as follows: • Description of the critical system or potential critical failure point • Current state of the critical system or critical potential failure point • Recommended actions to improve stability and/or redundancy Entry order is “Company‐wide” issues first, followed by specific systems and issues.Company‐wide Issues and Systems
STRATFOR.com Website Description The STRATFOR website consists of 3 key systems: • Apache web server running Drupal Content Management System • MySQL database server • STRATFOR custom mail “QUEUE” server These servers together constitute the entire functionality of the STRATFOR website, and provide together the functionality necessary to allow STRATFOR customers access to content both via the “web” and via e‐mail. Current State Physical failure of these systems are mitigated by: • Physically being located at a secure facility that provides redundant power and network connectivity • Redundant power supply and storage in each server • “Spare” servers prepared for immediate fail‐over in cases of catastrophic system failure • Nightly off‐site backup of data and configuration The Drupal software currently used, Drupal Version 5, is a significantly customized version launched in late 2007. Because of its heavy customization and age it now creates risk in several ways: • Significant overhead in labor costs for adding new functionality due to age and poorly written customizations• Limited support from Drupal in the form of security updates for version 5 of Drupal. Currently iPay failure creates a critical failure point. Our ability to process credit card transactions is effectively halted if iPay becomes unavailable. Credit Card security does not adhere to Visa/American Express requirements. Credit Card numbers are not encrypted and employee access is not rigidly controlled. Recommendations Upgrade to a standard version of Drupal 6 is already in progress for delivery on October 15th, 2009. This action is intended to shorten future development cycles, significantly lessen the number of “bugs” and unexpected behavior experienced by customers and employees, and make security updates available from Drupal easier to routinely implement. Physical stability, performance, and redundancy is currently acceptable. If desired, improvements could be achieved by adding more “spare” or “redundant” servers. Improve Credit Card security by encrypting and otherwise following Visa requirements both physical and electronic. Incorporate and implement support for a second credit card merchant to provide backup in case of iPay outage. Company E‐Mail System Description The STRATFOR corporate email system is arguably the most critical communication medium used by the company staff. Failures in the email system can effectively destroy our ability to produce content, communicate with customers, and have a functional analytical staff. Current State The STRATFOR e‐mail system consists of two highly redundant servers physically located at the same secure facility that houses our production website. The primary server runs a collaborative email software system called Zimbra and provides e‐mail, calendaring, and address book services to users. Zimbra duplicates functionality provided by a Microsoft Exchange Server while providing further email and calendaring support to Apple computers, iPhones, and our mailing list software.
IT Critical Systems and Failure Points 3 The secondary server acts as a gateway system to other mail servers on the Internet and hosts our mailing list software, Mailman. Mailman provides support for large mailing lists along with mailing list archives and our sophisticated “tagging” system for [email protected] list content. Web‐based email access is available for employees in cases where their e‐mail client has failed or is unavailable at: https://core.stratfor.com/ E‐mail access is available via desktop and laptop computers, iPhones, Palm phones, Blackberries, Windows Mobile Phones, and other diverse devices. Our corporate email client deployment is overly fragmented. Consisting of multiple versions of Microsoft Outlook, Mozilla Thunderbird, Apple’s Mail.app among others. This constitutes an IT support nightmare and guarantees that IT desktop support cannot maintain expertise for all clients and that client email behavior is not uniform for all employees. Mail for employees that maintain server side storage is backed up nightly in an encrypted format. This allows for recovery in cases of catastrophic failure of the employees computer. Recommendations Maintain regular update schedule for both Zimbra and Mailman. Zimbra 6.01, a recent release, includes significant improvements to the ability for users to share calendars. Zimbra regularly releases updates and each update provides further functionality. Replace mailman with a list manager that provides even more sophisticated archives of mailing lists such as [email protected] or [email protected]. Mailing List archive improvements allowing for more sophisticated searches would be particularly useful to the analytical staff. Providing staff with the ability to easily make changes to their mailing list subscriptions at will without IT assistance would also be useful. Allowing for easy handling by department managers of lists they “own” and allowing employees to suspend mailing list subscriptions during vacations or business trips would be advantageous. Standardize our email client deployment to 3 distinct software solutions: • Microsoft Outlook 2008 • Mozilla Thunderbird 2.x • Zimbra Desktop 1.x This action will allow IT support to be highly familiar with every employees email setup and minimize the chance of crippling bugs or other service interruptions caused by email client issues.
Company Phone System Description After our e‐mail system the company phone system is our second most critical corporate communications medium. It provides a medium for customers to communication with our support team, sales team, and business staff. It is a key medium for employee communication and an integral part of the analytical team’s toolset. The ability for employees to hold teleconferences between staff members and customers has become a critical ability and improvements and support for this functionality consistently improve efficiency for staff. Current State Our phone system runs in the Austin office on a redundant Internet connection. The phone system is a VOIP solution provided by Digium Corporation called Asterisk Business Edition. A VOIP based phone system provides us with a high‐level of potential integration with other critical corporate systems along with the ability to provide software and hardware based phones to off‐site users that tie directly into the corporate phone system. Recommendations Better company “phone list” support is needed. Multiple easy to use means for employees to access and view the company phone directory need to be provided. Preferably the phone directory should be available via web browser, Instant Messaging client, and e‐mail client. Existing ability to manage teleconferencing via web interface needs to be explained and taught to staff comprehensively. General phone system usage should be well documented and taught to staff. Documentation should be easily available to staff. Integration of phone system with Instant Messaging and other corporate systems should continue to be expanded upon. This includes access to phone directory and ad‐hoc teleconference abilities. Instant Messaging System Description Instant Messaging provides a third communication method to employees ‐ one that is more “real‐time” than email but less intrusive than a phone call. Our instant
IT Critical Systems and Failure Points 5 messaging system consists of the “Openfire” instant messaging server and two client applications used by employees: “Spark” and “Adium”. Current State We are experiencing an unacceptable level of issues with the “Spark” client including disconnects, missed messages, and general user frustration. This is complicated by Spark being the most feature rich solution currently available with diverse support for integration with other systems and support for Windows and Apple platforms. Recommendations Continue to work with Spark developers to address existing “bugs” and reliability issues with Spark. Continue to monitor status of alternatives to the Spark client: Bria – At $50 per user and with some annoying missing IM features, Bria is a mixed bag. On the other hand it has an integrated high‐quality software based VOIP phone allowing for integration between Instant Messaging and the phone system that is unavailable with any other solution currently. Unfortunately, an Apple version is not unavailable yet, but is expected in Q4 2009. This is a likely replacement for Spark when the Apple version becomes available. Adium – Apple solution, minor functionality loss, and some functionality is less intuitive. Pidgin – Windows solution, minor functionality loss, and some functionality is less intuitive. Regardless of client solution IT should continue to improve phone system integration level. Including access to phone directory and ad‐hoc conferencing.
Specific Critical Issues and Failure Points
PGP Deployment / Encrypted Communications Description PGP currently provides a critical secure communications system for Analytical staff via encrypted email. Current State PGP deployment has been severely hampered by lack of standardization of e‐mail clients as discussed earlier regarding the e‐mail system as a whole. Available PGP solutions for different email clients differ widely in functionality and reliability.PGP Corporation further hampered the STRATFOR solution by implementing significant changes in PGP 9.x that decreased integration with Microsoft Outlook ‐ one of the most utilized email clients in the company. Recommendations Standardize email client deployment in the company, allowing for standardization of the encryption solutions used in the company. • IT Desktop support should be heavily familiar with deployed encryption solution. • List of encryption users should be maintained • A solution for easy deployment of up‐to‐date encryption keys should be provided Critical Staff Equipment and Connectivity Redundancy Description Several staff members are critical failure points in and of themselves. Loss of Internet connectivity remotely, or computer failure can destroy the company’s ability to function by halting the OS monitoring process, the editorial process, or otherwise impacting critical business. Current State Although not all inclusive the following users are susceptible to critical failure due to lack of redundancy by other staff members or due to critical importance: • George Friedman – phone or internet connectivity remotely can result in significant corporate impact • Kelly Polden – Night‐time editor/writer – home internet connectivity is susceptible to potential failure at critical time • Off‐site analytical staff / AOR leads – home internet connectivity or computer failure can lead to analytical staff work stoppage or unacceptable level of loss in AOR coverage • John Gibbons – Home Internet connectivity or laptop failure can lead to significant customer service quality depreciation Recommendations Maintain replacement laptops for immediate deployment in cases of equipment failure. In cases of off‐site users provide replacement solutions or allow the user to expense purchase of replacement equipment for use in emergencies. Off‐site users in a critical position should have a second computer available for emergencies.
IT Critical Systems and Failure Points 7 Provide backup Internet connectivity to critical users who travel routinely or are regularly off‐site. Wireless Phone Network solutions like those provided by AT&T are excellent solutions for Internet connectivity redundancy. Similar solutions should be identified for individuals outside the United States. Aging Computer Fleet Description The desktop and laptop deployment at STRATFOR is aging. Both critical and non‐ critical users are operating on slow or dangerously old laptops and desktops. Current State This list of individuals is currently running on equipment 4‐7 years old, personal equipment, or a desktop when a laptop would be more appropriate: • Kamran Bokhari • Reva Bhalla • Nate Hughes • Marko Papic • Matt Gertken • Kristen Cooper • Alex Posey • Ben West • Rodger Baker • Jennifer Richmond • Entire Intern computer fleet Recommendations Start replacing this equipment as soon as feasible ‐ On a monthly schedule if necessary. Trickle down from executive staff by buying new equipment for executives and handing down replaced equipment possibly. Work with management for each department to make sure employees have appropriate equipment. Vertical Response for E‐mail Marketing Description Vertical Response is the solution provider we have chosen for distributing our email marketing campaigns. Current State This is a single point of failure for our email marketing campaigns. If Vertical Response is down, we cannot send email campaigns until they have recovered.
Or resign ourselves to delaying campaigns until Vertical Response has recovered from any outage they are experiencing.