Why Automate Data Center Operations?
White Paper
Zefflin Systems LLC
1. Introduction ... 1
2. What Processes Should I Target for Automation? ... 2
3. What Problems Can I Solve? ... 3
4. Today’s Software Tools ... 7
5. What Are My Peers Doing? ... 10
6. How Do I Integrate Data Center Automation into My Organization and Environment? ... 11
7. What Value Does Data Center Automation Bring to My Organization? ... 12
8. Summary ... 13
P a g e | 1 © 2015 Zefflin Systems All Rights Reserved
1.
Introduction
As an IT leader, you are faced with supporting a growing business while budgets remain flat. At the same time you are expected to increase the speed, quality and reliability of service – all as technology constantly evolves and changes. Virtualization of the IT compute and storage infrastructure has
revolutionized operations, increased productivity and resource utilization, and brought new agility to IT. But virtualization has also created new challenges and problems. Even after making virtualization an integral part of operations, many IT organizations find themselves asking “What else can we improve
upon?”
The answer lies with the next logical stage in virtualization’s evolutionary path: automate IT operations functions to increase productivity. Regardless of the cloud architecture chosen (public, private, hybrid), automation of processes in the areas of Catalog/Request Management, Approvals,
Chargeback, Provisioning (not only OS, but storage, network, database and application), Governance and Compliance yields a significant ROI for today’s IT
organization.
Software tools that are used to automate IT processes have matured significantly in recent years. They now cost less to implement and are easier to integrate into existing infrastructure and tools. Superior integration capability means that previous investments in areas like virtualization can be preserved and a best-of-breed approach can be taken without forcing vendor lock-in. Open source software like OpenStack™ has put tremendous downward pricing pressure on traditional enterprise software. This means that the ROI of automating specific parts of IT Operations has changed in favor of the CIO, and what a short time ago might have been a significant financial commitment with high risk is now much less in both cost and risk. Automation is feasible, affordable and carries much lower risk than even one year ago.
2.
What Processes Should I Target for
Automation?
Data center processes can be both numerous and complex. Each process should be looked at in terms of cost of automation (i.e., implementation and maintenance), versus labor and other cost savings gained. There are a set of core processes, however that have a large impact on IT service speed, quality and repeatability, as outlined below.
Process Benefit of Automation ROI
Request/ Catalog Management
Enable self-service for requesting of complex computing environments resulting in better control over standards that are used in both development and production.
Faster response time – not waiting for administrator to analyze environments.
Reduction in administration time spent on:
• Analysis of requests • Demand management • Capacity planning
Approvals Creates an audit trail of all requests and approvals. Brings transparency to the process, so requestors can see where approvals are stuck and how long they can expect them to take.
Reduced time of all stake holders, because approvals have full visibility – no more not knowing what the hold-up is.
Charge-back/ Show-back
Enable cost accounting at a department level, which can be an improvement over public cloud providers by requiring less paperwork, like expense reports and manual chargeback.
• Reduced administration and
accounting.
• Beginning of management of real IT
cost Provisioning • Operating System • Storage • Database • Network • Application
Better control, reduction in human error
• More efficient use of storage and server
capacity
• Standardize OS images
• Faster provisioning of complex
computing environments Increase in Administrator/Engineer productivity for: • Systems • Network • Storage • Database (DBA)
Step back and objectively look at which processes to automate.
P a g e | 3 © 2015 Zefflin Systems All Rights Reserved
Process Benefit of Automation ROI
Governance Better control over environments, resulting in reduced management cost.
• Reduced administration time – no
one is required to constantly monitor development
environments to see if they are still being used.
• Increase in application
development resource utilization – development environments are archived and retired per policy
• Reduced infrastructure cost
Compliance • Automatically enunciates out-of-compliance situations for key areas including PCI, internal security and ISO.
• Helps to identify previously unknown
processes that result in non-compliance (such as application hotfixes)
• Provides flexibility as to how to deal
with out-of-compliance situations (like opening a Service Management incident, routing to person for
correction, or automatically correcting, then notifying key personnel)
Dramatic increase in staff productivity
• Reduces the number of
out-of-compliance incidents
• Reduces the resources required to
maintain compliance
• Improves security without hiring
new staff
General Policy Automation
This category features use of orchestration solutions to automate numerous repeatable IT operations tasks such as:
• Automating password reset policy • Automating event remediation (i.e.,
app restart or server reboot)
• Workflow integration with existing
systems
Improve staff productivity
• Redeploy staff from operations to
more strategic initiatives
3.
What Problems Can I Solve?
There are many day-to-day activities and tasks that are performed by system administrators and IT support staff. The opportunities for automation are endless and the
following describes just a few.
Operating System Provisioning
Today most IT shops run various flavors of Windows and Linux. Manual configuration is often done after deployment in a virtualized or physical environment. This is time consuming and predisposed to manual mistakes. With an automation approach templates can be built for standard versions of each OS (with various configurations depending upon the purpose of the server). Those standard templates can then be used and presented in a catalog format so administrators can pick which OS versions they want to deploy. These same templates can also be used post-deployment to validate images against known standards. This is particularly effective when automating the audit process. Each server is checked against an approved image template. If there are differences, remediation can also be automated; either by automatically deploying changes to return the server to compliance, or by notifying compliance personnel to investigate further. Manual intervention in this process can be advisable, especially when first automated, to ensure operational continuity and avoid any undesired rollbacks. Once OS templates are built and the deployment process is automated, the IT infrastructure is better controlled, standards are more easily enforced and compliance is improved – all while increasing productivity of existing staff.
DevOps and Automation
DevOps encompasses the process involved in moving from code through build, testing, release, and production rollout. Traditionally application development
organizations have not communicated or coordinated with operations and/or support teams. As a result, bug tracking and feedback on production application
releases/upgrades was often reactive and unstructured. For example, the helpdesk may have been surprised with a flood of calls resulting from a new release of which they were not aware.
DevOps as a discipline has improved the situation. Like any process, once the workflow of code build test release production is well defined, it can be
automated. Automation should not only strive to reduce manual effort and improve speed and quality; it should facilitate coordination and communication between development and operations. A well-automated process can bring tremendous advantages in agility and competitiveness to companies who invest in it. The build process, consisting of code merge, compiling and packaging is commonly
automated with development management tools. Once the build is complete, testing can be automated
Development and Operations historically didn't get along.
P a g e | 5 © 2015 Zefflin Systems All Rights Reserved
via integration with orchestration, functional, and performance testing tools. Performance testing is particularly important for large user applications. It is not uncommon for multiple tests to be run to check many different infrastructure scenarios. Using orchestration, it is possible to queue up scenarios automatically, provision the environments, run the tests, record the results, de-provision the
environment and provision the next environment, and so on until all tests are completed and passed. This accelerates the testing process substantially. Additionally, the results can be forwarded in report form automatically. Using orchestration, the next phase of the process, migration to production is completed. To facilitate communication and remain ITIL compliant, the orchestration can open/close a change ticket, recording the release and all cases resolved with the new release (for the support organization to communicate that to the end users) and notify the help desk (so they can prepare for a potential increase in support calls).
Automated Problem Remediation
In complex IT environments, there are many day-to-day operational tasks that represent workarounds or temporary fixes. These tasks are done on a regular basis and take up significant administrator time. Many are not even tracked – IT support staff just complete them on an individual,
isolated basis. This means that:
a. It takes an unknown amount of resources , b. There is no way to measure the impact to IT
or to the business; and
c. The helpdesk never knows about it.
Automation provides a way to not only free up staff, it offers a way to record, track and measure these kinds of tasks, including how often they happen, the degree of interruption of service, and how long it takes to correct them. Consider the example of an
application with a memory leak. Periodically, the server runs out of memory and crashes, interrupting service. The vendor promises the issue is “fixed in the next release”, but until then the service has to get restarted when it comes close to depleting system memory. In an automation framework, an orchestration tool is integrated with a monitoring tool, and when system memory gets low, it automatically calls a script to restart the service, timing the whole process. When the service comes back online, the orchestration tool opens a service desk ticket and closes it, recording the downtime and the impact, satisfying ITIL audit requirements and keeping the helpdesk in the loop. As a result, service impact and human intervention is minimized until the next vendor patch release. Implementation of this kind of workflow is very low cost given the functionality of today’s orchestration tools.
Manual work-arounds are impossible to track and measure without automation.
Free up valuable staff from repeatable tasks.
Virtual Sprawl
Virtual sprawl is a common challenge. Virtualization has made it easy to deploy computing environments for development, testing, or production. This has been good for IT, providing flexibility and ability to deploy servers quickly. It also creates new problems in keeping track of all those virtual environments. Monitoring tools can help, but the management of any computing development, test or production environment is manual and resource intensive. If an administrator needs to tear down an environment to free up resources, they have to find the owner, check if they still need it, and then backup any data or applications that need to be preserved. Using automation this can be achieved by implementing policies up front, combined with use of orchestration and other automation tools to support the process. In an automated environment, policies can be built into the request process (which is easier to do if it is catalog-based). For example, when an individual requests a new computing environment to be
deployed (server, storage, network, database, application), they would pick a time limit (e.g., 30 days, 90 days, or indefinite). Once the system is live, orchestration tools monitor the environment (via
integration to the virtualization platform, OS and network monitoring tools, and system logs) for key policy parameters such as:
a) last user to log in
b) network, CPU, memory, or IO activity c) uptime
d) log activity
For time-based policies, it is more straightforward. When a computing environment is past it’s time limit, orchestration runs shutdown scripts, initiates backup of data and application (via integration with those tools), and opens/closes a change request, documenting the event. In the case of no time limit, a governance policy is enforced. For example, a policy may be that if there is no network traffic to an application, no one has logged in for 30 days and there is limited log activity in 10 days, automatically notify the owner and open a change ticket, shut it down, take a snapshot of the environment, back up the data, and close the change ticket. This can all be done via an orchestration tool integrated into monitoring, backup/archive and change management systems, with little to no human intervention to enforce the policy.
Password Reset
Most companies have a specific password reset policy when it comes to root and Administrator access on virtual and physical servers. A typical policy might dictate that passwords are changed every 90 days (or immediately if an employee with access leaves the company). This typically involves an
administrator with privileged access going to each server, manually logging in, setting the new password
Virtual environments are easier to manage with automation.
P a g e | 7 © 2015 Zefflin Systems All Rights Reserved
and manually recording it in a secure location. Often conventions are used for development, staging and production servers (i.e., password_dev, password_test, password _prod). Manual errors can be made, resulting in additional administrator time to correct. As the number of managed servers
increases, this process becomes more time consuming and prone to error. Now consider an automated solution using an orchestration tool. With a small amount of effort, a workflow can be built to initiate the process for all servers using a seed string that will automatically generate the passwords and add “_dev”, “_test” or “_prod” for the appropriate servers and record all passwords in a secure location for authorized access. With automation, you have you have an efficient, secure solution that eliminates human error. Considering 2-3 minutes per server with manual effort for a shop with 1000 servers, up to 50 man-hours occurs each time passwords are reset. Automating this process would yield a compelling ROI.
4.
Today’s Software Tools
There are many software tools on the market today, with new ones emerging regularly. The list below is not a comprehensive one, but shows some of the industry
leaders. Tools vary greatly in maturity, cost and scalability. The key is to select the right tools for your organization that will minimize cost, risk and resource investment, while enabling your organization to grow the solution as your company grows.
New automation software comes to market faster than ever, with seemingly endless choices.
Software Description
OpenStack™ There are many different OpenStack distributions. All use the core OpenStack code, then add-on their own IP, including utilities, architecture and API’s. The architectures and engineering approaches are different, resulting in significant differences between distributions, from installation to scalability to user interface.
Major distributions include:
• Hewlett Packard (Helion™) • Mirantis® • Piston Cloud® • RedHat® (RDO™) • Canonical® • cloudscaling® • MORPHlabs®
Scalr™ Cloud Management Platform, with out-of-box functionality designed to automate the entire lifecycle of complex computing environments: Service catalog, self-service, cloud environment management, governance, compliance and analytics.
Red Hat™ CloudForms
Cloud Management Platform, with out-of-box functionality designed to automate the entire lifecycle of complex computing environments: Service catalog, self-service, cloud environment management, governance, compliance and analytics.
Puppet™ Automation and orchestration tool, designed for DevOps and configuration management processes.
Chef™ Automation and orchestration tool, designed for DevOps processes.
P a g e | 9 © 2015 Zefflin Systems All Rights Reserved
Software Description
RightScale™ Cloud Management Platform, with out-of-box functionality designed to automate the entire lifecycle of complex computing environments: Service catalog, self-service, cloud environment management, governance, compliance and analytics.
Hewlett Packard Cloud System Automation Suite™: Server provisioning, automated change detection (for audit) and orchestration.
• HP Server Automation™ – Server provisioning, application provisioning, patching,
configuration, compliance and governance
• HP Cloud System Automation™ – Catalog, server and storage provisioning • HP Operations Orchestrator™ - Orchestration
• HP Network Automation™ – Network provisioning and configuration
VMWare® VCloud™ Suite
Includes the vRealize™ suite, also known as:
• vCloud Orchestrator™ (vCO), orchestration tool
• vCloud Automation Center™ (vCAC), for automation of server provisioning and
compliance audits
• vCenter Operations™ (vCOPS), for monitoring of systems
IBM® • IBM Cloud Orchestrator™
• IBM Cloud Manager™ – server provisioning and virtual environment deployment,
approvals, chargeback
Cisco® This includes the service catalog, NewScale™, acquired by Cisco in 2011.
BMC® • Blade Logic Server Automation™ – server provisioning
• Blade Logic Database Automation™ – data base provisioning and operations • Blade Logic Network Automation™ – Network provisioning and configuration • Blade Logic Middleware Automation™ – Deploys, configures, and troubleshoots Java
EE applications
• Cloud Lifecycle Management™ – Service catalog, server provisioning, governance
and compliance
• Atrium Orchestrator™ - orchestration
CA® CA Automation Suite™
• CA Server Automation™ – Server provisioning, application provisioning, patching and
OS configuration
• CA Process Automation™ – Orchestration
P a g e | 10 © 2015 Zefflin Systems All Rights
5.
What Are My Peers Doing?
IT organizations are now realizing that virtualizing the compute and storage environments is just the starting point, and in order to continue to reduce the cost of computing, further investment in automation is necessary. As a result, they are now starting to automate operational processes
surrounding the request, approval, provisioning, monitoring, maintenance, compliance and governance processes of their complex computing environments. The important thing to remember is that
virtualization lays the groundwork for automation. Virtualization is not, by itself automation. Also, having a public, private or hybrid cloud
environment does not eliminate the need for automation. It is relevant and necessary no matter what the architecture, because the business processes around requesting, configuring, chargeback, provisioning,
governance and compliance are just as relevant if your applications are running on a public, private or hybrid cloud. In fact, it should not matter where your computing resources are running. A properly implemented automation framework will serve as an abstraction layer
between the users and the computing infrastructure.
Today’s progressive and forward thinking IT organizations are well past virtualization and templating of OS images. They are investing in the next round of productivity increase – because they have to if they want to stay relevant and their company competitive. Their companies are growing and their IT budget as a percentage of company revenue is shrinking. If they don’t automate, streamline and enable their administrators to do more (much more) with less, they know IT will eventually be the organization that inhibits company growth. No CIO wants to be the subject of an analyst call. Medium to large
organizations are implementing full private cloud environments, from fully defined service catalogs to automated provisioning, compliance audits and policy-based governance of most computing
environments, especially application development environments. In addition, they are looking at every opportunity to automate all processes in their environment, including, but not limited to, those
discussed in this white paper.
Forward thinking IT organizations are starting to reap significant ROI from automation.
P a g e | 11 © 2015 Zefflin Systems All Rights Reserved
6.
How Do I Integrate Data Center Automation into My Organization and
Environment?
An automation strategy and a plan go hand-in-hand with a cloud strategy. A cloud strategy and architecture, whether private, public or hybrid is an essential first step, but is only part of the answer. Once you can easily and adaptably deploy OS and storage in a virtual environment, you should think about how to automate the processes around that cloud environment. These processes include service catalog, approvals, chargeback, application and database provisioning, governance and compliance.
The following steps are essential in adopting an automation strategy.
1. Cloud strategy, architecture and roadmap. It is important to understand what you will be working with before considering automation. For example, choosing AWS as your primary platform provider may affect the choice of automation tools (like orchestration or server provisioning) and processes (like application provisioning or compliance).
2. Step back and look at all manual processes. It is important to objectively look at any manual processes. It is equally essential to look at each process from a ROI perspective: how much do I have to invest in automating this process? How much do I have to invest in maintaining it? and how much labor can I save as a result? Caution: pride of ownership and turf protection can influence the outcome of this review – it must be strictly objective. Some processes may have to be adjusted or re-engineered which adds to the cost. Examples of simple processes to automate would be server root password reset or event remediation. More complex processes might include application provisioning and configuration.
3. Develop a short, medium and long term strategy and objectives, with ROI expectationsfor each stage. This will help prioritize and set expectations. Often it is good to start with short, quick win types of automation projects to prove the success and generate internal momentum for the idea of further investment in automation. This planning should be done with a firm understanding of what is possible, feasible and risk appropriate.
4. Identify software tools. Today there are an incomprehensible number and variety of software tools, from open source to startups and well-established enterprise software companies, that purport to automate data center processes of all kinds. New tools appear on a weekly basis. It is important to filter out the noise, cut through the hype and find out what will work for your organization at a reasonable cost. It is also crucial to determine if you already own some of the
A structured, incremental approach makes automation manageable. Measure success.
P a g e | 12 © 2015 Zefflin Systems All Rights
software that can be used which will dramatically cut cost. For example, if your company has a EULA with an enterprise software company in place, you may have access to some tools already under the terms of that EULA. A solid orchestration tools is essential, as orchestration is the centerpiece to automation of data center processes. It should be flexible, able to develop custom workflows without extensive training and have a large library of plug-ins or APIs that can be used to integrate with your existing applications such as service desk, change management or DevOps tools. 5. Take a baseline for future comparison. A baseline is essential in order to measure progress and
success of future automation efforts. A baseline should encompass metrics for cost and speed of service and can include measurements like:
a. Average number of admins per server b. Server utilization (not just systems deployed,
but those that are used) c. Average time to deploy:
i. A development environment ii. Production servers and applications
d. Average compliance rate i. Security
ii. PCI
iii. Internal standards
e. Cost of ensuring compliance, including manual effort
7.
What Value Does Data Center Automation Bring to My Organization?
Automation can bring tremendous value if implemented well. Improvements in agility, speed, control, cost and end user satisfaction are all attainable, clearly demonstrating IT’s value as a budget focused partner to the rest of the business.
IT is a competitive weapon, as demonstrated through:
• End user satisfaction. End users, who historically
waited for days or weeks for a new server, are now
ecstatic at consistent wait times in hours. For those users who were circumventing IT, are used to providing a credit card number to public cloud providers and getting instant infrastructure, it is also a win. Previously they had to go around IT and fill out an expense report. Now they can go to the IT portal and get the same service while avoiding completion of the expense report or other record keeping. This enables them to reduce friction in getting their jobs done.
• Speed of delivery. With automation and cloud computing, it is common to be able to take a
request, route it for approval, calculate chargeback and provision a complex environment (i.e., server/OS, storage, database, network and application) in minutes or hours, rather than days or weeks. The ability to do that predictably, reliably and in a repeatable way, has tremendous impact on the business and agility of the entire company.
• Quality of service. When any IT process is standardized and automated, results become predicable and repeatable, which raises the quality of services. With provisioning, this means that business
P a g e | 13 © 2015 Zefflin Systems All Rights Reserved
users can count on getting a computing environment up and running at a predicable turn-around time so they can plan their projects more effectively and obtain better business outcomes faster. With compliance automated, out of compliance situations are flagged much more frequently and reliably, increasing the rate of compliance. When governance policies are automated, the computing environment lifecycle is better controlled and resources are more efficiently utilized.
• Cost of service. Cost of delivering IT services drops significantly after automation. When fewer administrators are required to achieve a higher throughput and resource utilization increases, costs will go down dramatically.
• Agility. Business agility is increased because users know that they can get turnaround on successful
deployment of complex computing environments within hours. They can plan their business deliverables around this, which enables the company to react to market changes with more agility and urgency, potentially before the competition.
• Dramatic increase in productivity. It is not uncommon for organizations to go from one admin for 50 servers to one for 300 when adopting a full automation strategy. This often involves rearranging skill sets; some resources are diverted to maintaining automation tools and functions while others focus on developing new ones. Still, the resource investment is much smaller than the labor savings gained by automating.
8.
Summary
Data center automation is not just an option anymore. You, as an IT leader, must continually provide value at a lower cost. In order for your IT organization to continue supporting a growing businesses, remain relevant and prepare for the future, automation has to be an essential part of the strategy. We have outlined some of the possible approaches, challenges, benefits, risks and returns in this white paper. Every IT organization is different and should develop an automation strategy and plan in line with the objectives, resources and constraints of their particular business.
9.
About Zefflin
Zefflin’s focus is exclusively on Data Center Automation and Cloud Management solutions
implementation and integration. As a world-class, agile, center of excellence, our aim is to work with best of breed software, combined with the industry's best technical consulting and integration talent. We cut through the hype, identifying which tools can be implemented and integrated to effectively automate application development and IT operations. We offer high quality, cost effective solutions addressing the automation of the entire lifecycle of complex computing environments, from
request/catalog management, automated provisioning (OS, application, database, storage, network), to policy governance and compliance. Our vision is to bring to market consulting/software solutions that enable the lights-out data center. This will allow our customers to implement fully automated, private, public and hybrid cloud systems, delivering low cost, high quality services to their customers while minimizing personnel cost.