by Laura DiDio, Enabling Technologies Enterprise, Software Economics and Infrastructure Research Fellow, [email protected], 617-598-7265
© Copyright 2008. Yankee Group Research, Inc. All rights reserved.
Yankee Group published this content for the sole use of Yankee Group subscribers. It may not be duplicated, reproduced or retransmitted in whole or in part without the express permission of Yankee Group Prudential Tower, 800 Boylston St. 27th Floor, Boston, MA 02199. Phone: (617) 598-7200. Fax: (617) 598-7400. E-mail: [email protected]. All rights reserved. All opinions and estimates herein constitute our judgment as of this date and are subject to change without notice.
Unix, Linux Uptime and Reliability
Increase; Patch Management Woes
Plague Windows
The Bottom Line: Linux reliability has come of age. Several major Linux distributions including Novell SUSE, Red Hat Enterprise Linux and Ubuntu scored high in reliability, improved over 2006 uptime statistics and achieved near parity with Unix distributions. Security incidents plagued Windows Server 2003, which saw uptime decrease by roughly 10% during last year. Corporations must continue to adhere to best practices in configuration, management and security to ensure optimal server operating system and application reliability. Key Concepts: Reliability, uptime, performance, management, service-level agreement, SLA, compliance Who Should Read: CEO, CIO, CTO, VP of IT, network administrator
Practice Leader:Zeus Kerravala, Enterprise Research Senior Vice President, [email protected], 617-598-7235
Executive Summary
Unix, the leading Linux distributions from Novell and Red Hat as well as open source Ubuntu were the clear winners in the Yankee Group 2007-2008 Global Server Operating System Reliability Survey.
Yankee Group’s second annual Global Server Operating System Reliability Survey polled 400 users from 27 countries worldwide. The latest independent, nonsponsored web-based survey revealed that all versions of Unix— which typically carry very high workloads—are near bulletproof, achieving 99.99999% reliability. IBM’s AIX Unix led all server operating systems for reliability with just more than 30 minutes per server of annual downtime. The top Linux distributions Red Hat Enterprise Linux (RHEL) and Novell SUSE Linux notched the biggest reliability improvements in the latest survey. Each decreased per server per annum downtime by an average of 75%. The biggest and most unwelcome surprise in the survey was that Windows Server 2003 downtime increased by 25% to nearly 9 hours of downtime per server per year compared to the results it achieved in the Yankee Group 2006 Global Server Operating System Reliability Survey (see the March 2006 Yankee Group Report, Unix, Windows and Custom Linux Score Well on Yankee Group 2006 Global Server Reliability Survey). Windows Server 2003’s decreased reliability is attributable to a series of security alerts Microsoft issued in the summer and fall that caused network administrators to take their Windows Server 2003 machines offline for significantly longer periods of time to apply remedial patches.
During the past 2 years, the Yankee Group polls have indicated that all of the major server operating system platforms achieved a much higher degree of reliability than they experienced in the prior decade. In general, none of the major server operating systems—Linux, Macintosh, Windows and Unix—are beset by the long list of bugs that plagued their predecessors in the 1980s and 1990s. Additionally, there is far less disparity now in both the number and severity of unplanned server outages and the actual downtime that businesses experience on their standard Linux, Windows and Unix platforms than at any time in recent memory.
The survey results indicated that individual corporate Linux, Windows and Unix servers experience an average of one to four failures per server per year, resulting in 1 hour to up to 10 hours of annual downtime for each server. The actual amount of downtime depends on the server operating system and its specific configuration (see Exhibit 1).
2 © Copyright 2008. Yankee Group Research, Inc. All rights reserved.
Exhibit 1.
Unix, Linux Server Operating Systems Show Improved Reliability in Server Downtime
Source: Yankee Group 2007-2008 Global Server Operating System Reliability Survey
Hours per Year of End-User Downtime
0 2 4 6 8 10
Unix (e.g., Solaris AIX, HP-UX) Ubuntu Unix HP-UX v11i Unix IBM AIX Sun Microsystems Solaris Other Linux with Customizations Other Linux (e.g., Turbolinux, Mandriva) Linux from SUSE with Customizations Linux from SUSE Linux from Red Hat with Customizations Linux from Red Hat Open Source Linux (e.g., Debian) Windows Server 2003 Windows 2000
2007 2006
Other survey highlights include:
• Unix-based servers, which represent about 10% of the installed base of server operating systems, achieved the highest reliability ratings among mainstream distributions.
• IBM’s AIX achieved the highest level of reliability, with corporate enterprises reporting an average of only 36 minutes of downtime per server in a 12-month period. Hewlett-Packard’s HP-UX version 11i recorded 1.1 hours of downtime for each of its servers on a yearly basis, while Sun Microsystems’ Solaris customers reported 1.4 hours of downtime per server per year.
• Both versions of Novell SUSE Linux, the standard off-the-shelf distribution as well as the custom implementation, saw downtime decline by 73% from more than 4 hours in the Yankee Group 2006 Global Server Reliability Survey to a little more than 1 hour of downtime per server in the latest poll. The off-the-shelf version of Novell SUSE Linux bested Red Hat reliability by recording 37 minutes less downtime for each server compared to the comparable off-the-shelf RHEL
implementation. The customized version of SUSE Linux experienced 65 minutes of downtime per server per year, roughly 13 minutes more for each server than its chief competitor RHEL in a custom configuration. Additionally, Novell’s market share climbed from approximately 13% in last year’s survey to roughly 17% in the current poll.
• Linux market leader Red Hat scored similarly rosy results. The per-server downtime decreased by 75% for the standard off-the-shelf distribution to 1.75 hours for each server annually, down from more than 7.1 hours in Yankee Group’s 2006 survey. Red Hat’s Enterprise Linux also increased its enterprise presence. Custom implementations of RHEL delivered even greater reliability with a scant 52 minutes of unplanned downtime per server per year. This year, 31% of the survey respondents reported they have standard RHEL present in their shops, up 5% from the 26% who had it installed in the 2006 survey.
© Copyright 2008. Yankee Group Research, Inc. All rights reserved. 3
• Debian, a popular open source distribution that posted the highest number of outage minutes last year, saw significant improvement in the Yankee Group 2007-2008 Global Server Operating System Reliability Survey. Debian servers
experienced more than 5 hours of annual downtime, a 41% decrease from the downtime figure it posted in the 2006 survey. The open source operating system also increased its presence year-over-year, with 24% of the respondents reporting they had at least one Debian server in their network compared to 15% who had it installed in the 2006 time frame.
• Ubuntu, which appears in the Yankee Group Global Server Operating System Reliability Survey for the first time this year, has also come on strong and is an open source operating system to be reckoned with. Twenty-two percent of the survey respondents are running at least one Ubuntu server at their sites. It has proven highly reliable, with 1.1 hour of downtime per server per annum.
The survey identifies the real-time resources and monies needed to manage and maintain various server operating systems so that corporate users can make more informed choices on which individual server operating system or combination of servers best suits their business and budgets.
4 © Copyright 2008. Yankee Group Research, Inc. All rights reserved.
Table of Contents
I. Introduction ··· 4
Survey Methodology ... 5
Survey Demographics... 5
II. Data and Analysis ··· 5
Unix: Rock Solid Stability... 6
Security Woes Increase Windows Server 2003 Downtime and Time to Patch ... 6
Windows, Custom Linux Patch Management Time Increases ... 7
Automated Group Policy vs. Manual Patching ... 9
Tier 2, Tier 3 Network Outages Show Overall Decline... 10
III. Conclusions and Recommendations··· 12
Recommendations for Vendors... 12
Recommendations for Corporate Customers... 13
IV. Further Reading··· 15
I.
Introduction
Just as in the 2006 Global Server Reliability Survey, the Yankee Group 2007-2008 Global Server Operating System Reliability Survey polled nearly 400 global businesses ranging from small and medium businesses (SMBs) to large enterprises with more than 10,000 employees. The intent was to quantify the reliability of 14 different server operating system (OS) platforms to identify the most reliable platforms, highlight user management trends and assist businesses in deciding which server OS or heterogeneous combination is most suitable for their respective environments.
The Yankee Group survey asked corporate IT managers and executives to detail the number of Tier 1, Tier 2 and Tier 3 outages each server experiences annually. We polled corporate executives and IT administrators on reliability, outage time and the amount of time they spent applying patches across 10 different server operating system platforms, including a variety of niche market Linux and open source distributions:
• Windows 2000 Server
• Windows Server 2003
• HP-UX 11i
• IBM AIX
• Sun Microsystems Solaris
• Red Hat Enterprise Linux standard distribution
© Copyright 2008. Yankee Group Research, Inc. All rights reserved. 5
• Novell SUSE Linux standard distribution
• Novell SUSE Linux with customization
• Debian open source Linux
• Ubuntu open source Linux
• Other Linux distributions (e.g., Turbolinux, Mandriva)
• Open source Linux distributions with customization
Survey Methodology
The Yankee Group 2007-2008 Global Server Operating System Reliability Survey was an independent web-based poll. To provide our customers with the most unbiased, accurate and reliable information, Yankee Group accepted no vendor sponsorship money for either the online poll or the subsequent first-person interviews conducted in connection with this project. None of the survey respondents received any
remuneration. Additionally, none of the more than two dozen enterprise users interviewed by Yankee Group received any remuneration.
No vendors had any input or influence on the questions or responses. Yankee Group used authentication and tracking tools to ensure that no tampering occurred and to prohibit multiple responses by the same parties. Yankee Group used the same phrasing for the Windows, Linux, Unix and open source responses to maintain objectivity. Corporate respondents were also provided the opportunity to make additional comments in an essay format. Approximately 20% of respondents, or one in five, provided anecdotal remarks to supplement their statistical survey responses.
Survey Demographics
Companies of all sizes and vertical markets were represented in the server poll. Approximately 35% of the survey respondents came from the SMBs with 1 to 100 employees, 30% from midsize companies with 100 to 500 employees, 8% from corporations with 500 to 1,000 employees, 18% from corporations that employ 1,000 to 10,000 people, and the remaining 11% percent from large enterprises with more than 10,000 workers. The survey was truly global. Roughly 85% of the respondents were from North America (Canada and the United States) while 15% were international users in Europe, Asia-Pacific, Australia and South America.
II.
Data and Analysis
The Yankee Group 2007-2008 Global Server Operating System Reliability Survey delves deep into the topic by querying corporations on a wide variety of reliability-related functions including:
• The amount of downtime experienced per year per server
• The time to patch each server
• The percentage of businesses applying patches manually versus automatically downloading patches via Active Directory’s Group Policy procedures
• The actual length of downtime associated with each server
6 © Copyright 2008. Yankee Group Research, Inc. All rights reserved.
Yankee Group distinguishes the server outages by the following tier categories:
• Tier 1: A Tier 1 incident causes downtime for dependent users of less than 30 minutes. Tier 1 incidents
are usually minor incidents that require a straightforward reboot and do not involve any data loss.
• Tier 2: A Tier 2 incident causes downtime for dependent users between 1 and 4 hours. It generally
requires direct intervention by a network administrator and sometimes disrupts corporate business.
• Tier 3: A Tier 3 incident is the most severe. It causes downtime of more than 4 hours for dependent
users and almost always requires the direct intervention of several network administrators. It may involve data loss and possibly have an adverse impact on the company’s core line of business.
The length and severity of each of these actions equate to a specific line-item cost for the business and can positively or negatively impact the overall total cost of ownership (TCO). The all-important ability to meet service-level agreements (SLAs) hinges on server reliability, uptime and manageability. These are key indicators that enable organizations to determine which server operating system platform or combination of platforms best suits the company’s business and technology needs.
The latest survey validates the heterogeneity Yankee Group identified in the 2006 poll because 95% of the 400 respondents have an average of three disparate server operating systems in their environments. Microsoft Windows 2000 Server and Windows Server 2003 together account for about 60% of the
worldwide server operating system market. Linux’ market share grew about 5% and now represents about 30% of the installed base. All versions of Unix combined (IBM AIX, HP-UX and Sun Microsystems Solaris) have approximately 10% of the market share.
Unix: Rock Solid Stability
IBM, HP and Sun Microsystems distributions all continue to deliver rock solid reliability backed by the superior service and support that have long been the hallmark of all three vendors. The IBM AIX per-server annual downtime rate of 37 minutes is the best of any server operating system.
HP continues to win plaudits from customers interviewed by Yankee Group for its pan-agnostic approach to service and support a variety of server environments including Linux, open source Ubuntu and Windows. Sun Microsystems, which has struggled in recent years and seen its Solaris market share erode at the hands of Red Hat Enterprise Linux, appears to be on solid ground again. That’s directly attributable to Sun’s aggressive pricing initiatives on its SPARC server hardware and service and support contracts. It also helps Sun’s sales that the company made its Solaris operating system free.
The survey respondents uniformly extolled the stability, reliability and overall performance of allUnix server OS platforms. High-end Unix systems will retain their solid niche in corporate data centers.
Security Woes Increase Windows Server 2003 Downtime and Time to Patch
As Exhibit 1indicates, one of the most unwelcome surprises in the latest reliability survey was the decline in uptime for both Windows 2000 Server and Windows Server 2003.
© Copyright 2008. Yankee Group Research, Inc. All rights reserved. 7
Windows Server 2003 showed a 25% decrease in per-server annual uptime, while Windows 2000 Server uptime declined slightly to only 5%. In the latest poll, Windows Server 2003 recorded 8.9 hours of downtime versus just more than 7 hours in the prior 12-month period. Its predecessor Windows 2000 Server fared much better. Downtime per server notched up by 5% to 9.9 hours compared to 9.3 hours per server per year in the 2006 poll. Microsoft’s Windows 2000 Server, now nearly 9 years old, recorded the most per-server yearly downtime—just less than 9.9 hours—although that is only a slight increase from the Yankee Group 2006 Global Server Reliability Survey.
A decrease in uptime for Windows or any stable and mature operating systems must be considered an
anomaly. The decline in Windows Server 2003 reliability statistics are dismaying to corporations because the Microsoft server operating system is in use at 91% of the sites we surveyed, while 74% of businesses still use Windows 2000 Server, down from 87% in the 2006 Global Server Reliability Survey.
Upon deeper investigation, security was found to be the clear culprit. In the summer and fall when Yankee Group conducted its survey, Microsoft issued more than a dozen security alerts and patches. And to make matters worse, many of these were critical vulnerabilities.
These statistics are significant because a majority of Windows servers carry the bulk of the line-of-business applications, particularly Exchange Server messaging and SQL Server databases in their firms. The increased downtime and patch management time means more work for network administrators.
Many corporations expressed satisfaction with the performance and ease of use of the Windows Server Update Services (WSUS). However, enterprises were clearly distressed by the spike in security flaws, which precipitated more patching and subsequent service boots for every patch. Typical comments included:
• “Microsoft security is improving but I wish there were fewer security patch needs.”
• “Sometimes it seems like patches are coming out on a daily basis. If I didn’t schedule installations for once a week, I would be rebooting my servers every day.”
• “The window of attack between locating vulnerability and a patch being released is decreasing, but still large enough to be a worry.”
• “Windows patch management is a nightmarish experience. There were regular incidents of one or more applications or services getting adversely affected because of some patch.”
• “A reboot is still needed for Windows patches, which means that they must be applied during the night.” The rash of security flaws hit some companies especially hard. In this year’s survey, 10% of the 400
respondents reported spending more than 4 hours to apply patches, most of them security related. By contrast, only 2% of those polled in last year’s survey said they spent more than 4 hours applying patches.
Windows, Custom Linux Patch Management Time Increases
As Exhibit 2shows, an overwhelming 90% majority of Windows administrators reported they spent roughly 26 minutes applying patches this year compared to 18 minutes to patch each server in 2006, an increase of 40%. Although that extra 8 minutes may seem negligible, it quickly adds up when you consider the overall server count in an organization. The additional 8 minutes will add 2.6 hours of administrative time in a company with 20 servers and 13.3 hours of additional management time in a company that applies patches to 100 servers.
8 © Copyright 2008. Yankee Group Research, Inc. All rights reserved.
Exhibit 2.
Linux Patch Management More Efficient for Most Distributions
Source: Yankee Group 2007-2008 Global Server Operating System Reliability Survey
2007 2006
Minutes per Patch
Average Time to Patch a Server (Excluding Those Who Reported Greater than 4 Hours to Patch)
0 10 20 30 40 50
Unix (e.g., Solaris AIX, HP-UX) Ubuntu Unix HP-UX v11i Unix IBM AIX Sun Microsystems Solaris Other Linux with Customizations Other Linux (e.g., Turbolinux, Mandriva) Linux from SUSE with Customizations Linux from SUSE Linux from Red Hat with Customizations Linux from Red Hat Open Source Linux (e.g., Debian) Windows Server 2003 Windows 2000
The Yankee Group 2007-2008 Global Server Operating System Reliability Survey indicated that on average, Windows Server 2003 workloads were approximately 45% to 55% heavier than workloads of comparable Linux servers. The Windows servers also carry a disproportionately higher percentage of sensitive company data running on mission-critical applications.
Windows 2000 Server and Windows Server 2003 were not the only server operating systems to experience a rise in the time it took to apply patches. Both the custom versions of Red Hat Enterprise Linux and Novell SUSE Linux also required more time to patch this year than they did in 2006.
IT managers at shops with customized RHEL deployments saw a 35% increase in the time it took them to apply patches to their OS environments, which works out to an additional 9 minutes per server. Once again, while that figure seems trivial, it quickly adds up in shops with a significant server contingent. In a shop with 20 RHEL custom servers, that’s an extra 3 hours spent applying patches. A company that has 100 RHEL custom servers would allocate an additional 15 hours to patch.
Novell SUSE Linux custom deployments saw patch management time soar by 40% or roughly 7 minutes per server. That works out to 2.3 extra management hours in a company with 20 custom SUSE Linux servers and just over 11.2 hours to install patches on 100 specially configured SUSE Linux servers.
© Copyright 2008. Yankee Group Research, Inc. All rights reserved. 9
Unlike the Microsoft Windows environment, where the incremental patch management time was attributable to security issues, the spike in custom Linux patch management applications is due to the complex nature of a custom server OS.
Additional time is required to install patches in custom server OS environments because off-the-shelf patches frequently will not work in a custom environment. That forces network administrators to either customize the software patch or strip out a certain portion of the customization in the underlying server OS to ensure interoperability.
However, organizations that opted to deploy the off-the-shelf versions of Novell SUSE and Red Hat RHEL fared well; both of those distributions saw a slight decline in the time it took to roll out patches.
Automated Group Policy vs. Manual Patching
The Windows administrators exhibited the greatest overall comfort level in automated patch management via Group Policy facilities. Windows IT administrators reported the highest rate of applying patches via
automated Group Policy mechanisms. Thirty-three percent of Windows 2000 Server administrators and 32% of Windows Server 2003 administrators use Active Directory’s Group Policy to automatically install updates throughout the enterprise.
By contrast, Sun Microsystems Solaris administrators were least likely to use Group Policy to download and deploy their patches—only 9% use the automated mechanism.
However, Linux administrators are embracing Group Policy to speed patch deployment. Red Hat administrators’ use of automated Group Policy quadrupled from 4% in the Yankee Group 2006 Global Server Reliability Survey to 16% in the current poll. Novell SUSE Linux Group Policy deployments were up 42%, with nearly one-quarter of the network managers using the automated method to install their patches.
Hands-on Manual Patching Still Popular
Despite the additional time and effort required to manually test and apply patches server by server, many network administrators prefer this mechanism.
As Exhibit 3shows, Windows Server 2003 administrators were kept busy patching a plethora of security-related fixes. They increased their manual patch jobs a whopping 146%, spending 89 minutes—1.5 hours compared with the 36 minutes they dedicated to the same chore according to the Yankee Group 2006 Global Server Reliability Survey. That’s more than double or triple the amount of time their Linux, open source and Unix administrative peers spent doing the same thing.
It’s also true though that Windows is a far bigger OS server. Windows 2000 Server contains about 30 million lines of code and Windows Server 2003 has more than 40 million lines of code compared to about 10 million lines of code for the latest Linux distributions and roughly 25 million lines of code for the various Unix distributions.
Nonetheless, the increase in the amount of time the Windows IT managers spent patching means that they had less time to devote to other network tasks such as assisting end users. Yankee Group hopes that the Windows server figures decline to their more reasonable 2006 levels in the upcoming 12 months.
10 © Copyright 2008. Yankee Group Research, Inc. All rights reserved.
Exhibit 3.
Administrators Spending More Time Applying Windows Patches
Source: Yankee Group 2007-2008 Global Server Operating System Reliability Survey
2007 2006
Minutes per Application
Time Spent Applying Patches Manually (Excluding Policy Patching)
0 20 40 60 80 100
Unix (e.g., Solaris AIX, HP-UX) Ubuntu Unix HP-UX v11i Unix IBM AIX Sun Microsystems Solaris Other Linux with Customizations Other Linux (e.g., Turbolinux, Mandriva) Linux from SUSE with Customizations Linux from SUSE Linux from Red Hat with Customizations Linux from Red Hat Open Source Linux (e.g., Debian) Windows Server 2003 Windows 2000
By contrast, the most popular Linux distributions, RHEL and Novell SUSE, saw their combined Tier 2 and Tier 3 network outages decline by a very significant 45% and 39%, respectively.
Contextually, the actual number of combined Tier 2 and Tier 3 incidents is quite small for all of the vendors. There were approximately two for each Windows 2000 Server and Windows Server 2003 per year, and about one for each of the Linux distributions. Unix solidified its reputation for reliability. None of the Unix
distributions recorded anyTier 3 incidents and all had less than a full single Tier 2 outage, which is another key indicator of a network’s health. Tier 3 outages in particular can last for hours or even days and usually involve some loss of productivity and sometimes data for corporate end users, outside business partners or customers. As such, Tier 3 outages can cost a business tens of thousands or even hundreds of thousands of dollars for a single incident that involves multiple servers and several network administrators.
Tier 2, Tier 3 Network Outages Show Overall Decline
Another crucial reliability metric is the actual number of per-server yearly outages. Typically, Tier 1 outages—which are the least pernicious and usually a minor inconvenience for both end-user productivity and IT managers—constitute the largest number and percentage of a company’s unplanned downtime.
© Copyright 2008. Yankee Group Research, Inc. All rights reserved. 11
All of the server OS vendors fared well in this category, with all of the Windows, Linux, open source and Unix distributions registering a decline. The aggregate average number of per-server per-year Tier 1 outages was roughly 1.5. Windows Server 2003 servers averaged 2.8 per year while IBM AIX recorded the least amount of Tier 1 incidents with 0.48, or less than one-half of 1 outage annually.
Exhibit 4 shows the combined percentage total of the more significant Tier 2 and Tier 3 outages.
The open source Debian server operating system had the highest percentage at 59%, which was an increase of 8% from the Yankee Group 2006 Global Server Reliability Survey. Similarly, Windows 2000 Server and Windows Server 2003 also saw their aggregate percentages of Tier 2 and Tier 3 server incidents increase by 2% and 6% respectively.
Exhibit 4.
Serious Tier 2 and Tier 3 Network Outages Decline for Linux, Rise Slightly for Windows
Source: Yankee Group 2007-2008 Global Server Operating System Reliability Survey
2007 2006
Percent of Tier 2 and Tier 3 Incidents
0 10 20 30 40 50 60
Unix (e.g., Solaris AIX, HP-UX) Ubuntu Unix HP-UX v11i Unix IBM AIX Sun Microsystems Solaris Other Linux with Customizations Other Linux (e.g., Turbolinux, Mandriva) Linux from SUSE with Customizations Linux from SUSE Linux from Red Hat with Customizations Linux from Red Hat Open Source Linux (e.g., Debian) Windows Server 2003 Windows 2000
12 © Copyright 2008. Yankee Group Research, Inc. All rights reserved.
III.
Conclusions and Recommendations
To aggregate the Yankee Group 2007-2008 Global Server Operating System Reliability Survey findings, all of the server operating system platforms exhibit a high degree of acceptable reliability. As we said of the Unix distributions in last year’s poll, “Unix server operating systems are like well-oiled machines. Their performance and reliability consistently have been excellent during the past decade. There is every indication that it will remain so.” Yankee Group holds to that conclusion based on the 2007-2008 Global Server
Operating System Reliability Survey results.
The poll also shows that the uptime and dependability of all server operating systems—particularly the top Linux distributions, Novell SUSE Linux and Red Hat Enterprise Linux, and open source distributions Debian and Ubuntu—markedly improved in the past 3 to 5 years.
The server outage spike due to the necessity of applying security patches to the Windows Server 2003 OS platform is hopefully an isolated occurrence. If not, corporate Windows enterprises will make their displeasure known and felt in a very vocal manner. Up until this latest poll, all versions of the Microsoft Windows server operating system were achieving greater reliability while carrying heavy workloads. And in the Yankee Group 2006 Global Server Reliability Survey, the Windows Server 2003 operating system recorded reliability figures that were equal to or better than any mainstream platform with the exception of the high-end UNIX distributions.
Recommendations for Vendors
Yankee Group reiterates its findings from the March 2006 Report,Unix, Windows and Custom Linux Score Well on Yankee Group 2006 Global Server Reliability Survey:
• Microsoft should get an even firmer grip on security and improve its patch management economies
of scale. It is even more imperative that Microsoft do so because of the imminent release of the
next-generation server, Windows Server 2008. Microsoft must realize the historical 20% to 30%
improvements of its predecessors to keep pace with its Linux, open source and Unix rivals. If security woes continue to plague Windows Server 2003, it will almost certainly have an adverse impact on customer deployment plans for Windows Server 2008.
• Linux and open source should continue to hone reliability. Reliability is very good and it will continue
to improve as network administrators become more experienced. However, documentation and fixes are still somewhat lacking compared to the more mature Unix and Windows platforms. Additionally, vendors should consider offering more free on-site training courses for network administrators.
Finally, results of the Yankee Group 2007-20087 Global Server Operating System Reliability Survey show that the standard and custom implementations of all of the major Linux and open source distributions— particularly Novell SUSE Linux, Red Hat Enterprise Linux, Debian and Ubuntu—have come of age. On paper they are nearing parity with Unix; although the majority still lags considerably behind IBM’s AIX, HP-UX and Microsoft Windows in the server workloads they carry.
© Copyright 2008. Yankee Group Research, Inc. All rights reserved. 13
Recommendations for Corporate Customers
Server operating system reliability, performance and security inherently depend on the corporation’s specific implementation, configuration and ability to properly manage the environment and quickly respond to Tier 1, Tier 2 and Tier 3 network outages.
Even the most reliable server operating system and hardware platform will fall prey to outages and fail if it is misconfigured and mismanaged. There are inherent dependencies between the underlying capabilities of a particular server operating system and an individual corporation’s ability to adhere to best deployment practices with respect to training, testing and configuration. Don’t let your server operating system
environment be undone by a bad configuration, lack of integration and interoperability with pivotal hardware or applications. Likewise, businesses must pay close attention to other factors that may adversely impact performance and reliability such as the use of incompatible or unapproved memory and logic chips, hardware, peripherals and software drivers. Yankee Group strongly advises corporate customers to:
• Buy the most robust hardware configuration your budget will allow. This will help to ensure that the
underlying server can adequately handle the OS and application workloads. To achieve optimal performance from both the server and the accompanying OS, corporations must ensure that the server hardware is robust enough to carry both the current and anticipated workloads for the lifecycle of both.
• Consider virtualization. When properly configured and deployed, a virtual data center can help a firm
consolidate its resources and provide it with four times the computer power for the cost of a single server license.
• Strictly adhere to best practices. Thoroughly review and adhere to your vendors’ list of approved,
compatible hardware, software and applications. Don’t let your administrators be cowboys or
iconoclasts. Many network administrators who are highly proficient on a singular software or hardware platform and product feel that it’s acceptable to bend the rules and turbo charge the product by
overclocking the server, tinkering around in the server OS registry and rearchitecting the machine or the OS. This can lead to serious reliability problems.
• Get the appropriate training and recertification for your IT administrators and engineers. There
are no shortcuts or silver bullets and there is no substitute for a well-trained staff when it comes to achieving and maintaining maximum uptime and health of your network.
• Crack down on rogue developers. Software developers are in a class by themselves. By virtue of their
specialized knowledge and their close, quantifiable ties to the revenue stream, software developers will often flout the rules and conventions and ignore company policies and procedures. Do not allow this.
• Perform regular asset management testing. Corporations should schedule regular asset management
reviews on a yearly, biannually or quarterly basis as necessary. Knowing what’s on your network will help your firm lower its TCO by remaining compliant with software licensing contracts. It will also keep the firm current on the various versions of software present on the network and enable the IT department to better plan the time and the extent of network upgrades. Asset management checks also ensure that companies are better equipped to meet their SLA requirements to maintain optimal
performance and uptime.
• Keep your software updated with the latest necessary patches and upgrades. Research which
patches are crucial and should be installed immediately, and which ones your organization can do without. Construct and adhere to a regular schedule to apply patches, preferably on a monthly basis.
14 © Copyright 2008. Yankee Group Research, Inc. All rights reserved.
• Standardize the environment to the greatest possible extent. Yankee Group survey data indicates that
standardization—following a prescribed configuration and version for the company’s hardware, software and network infrastructure components—can lower TCO costs by 15%. Standardization benefits all users, including organizations that have custom configurations.
• Customized server OS environments require expert administrators. Corporations considering a
custom Linux distribution are well advised to seek out and hire experts. Alternatively, businesses should employ the services of a respected systems integrator or outsourcer with the appropriate expertise.
• Regularly monitor server OS usage, reliability and security levels. Network administrators and VPs
of IT should requirenetwork managers to compile statistics—either weekly or monthly reports—on the duration and severity of server operating system downtime; a breakdown of the Tier 1, Tier 2 and Tier 3 outages, the time it took to recover from those failures; and the number of work hours and administrators needed to bring the server back up and the cost of the outage.
• Make an informed decision on whether apply patches manually or automatically via Group Policy.
Companies should also regularly review whether it is feasible for the firm to migrate away from manual patch management and when it makes sense to manually patch their server operating systems.
© Copyright 2008. Yankee Group Research, Inc. All rights reserved. 15
IV.
Further Reading
Yankee Group Link Research
Best Practices: Making a Business and Technology Plan for a SOA Migration, Report, August 2007
Microsoft, Sun Strike Back by Making OSs Low-Cost Alternatives to Linux, Report, January 2007
Counterfeit Software Is Counterproductive, Report, January 2007
Virtualization, Part 1: Technology Goes Mainstream, Nets Corporations Big TCO Gains and Fast ROI, Report, July 2006
Optimizing Processes Using SOA: Responsive and Efficient Supply Chains, Report, May 2006
Unix, Windows and Custom Linux Score Well on Yankee Group 2006 Global Server Reliability Survey, Report, March 2006
Virtualization, Multicore Hardware Technologies Spark Debate on Software License Price Models, Report, January 2006
SOA Business Values Capture the Attention of CIOs, Report, October 2005