• No results found

Data Centre Efficiency Management Concurrent Thinking Appliances

N/A
N/A
Protected

Academic year: 2021

Share "Data Centre Efficiency Management Concurrent Thinking Appliances"

Copied!
16
0
0

Loading.... (view fulltext now)

Full text

(1)

Data Centre Efficiency Management Concurrent Thinking Appliances

A Technical Overview

Product Version v4.2.0 (May 2011)

 

(2)

Table of Contents

 

Table of Contents ... 2

1 Introduction... 3

2 The concurrentCOMMANDTM Appliance ... 4

2.1 Metric definition and monitoring ... 4

2.2 Power Metrics ... 5

2.3 Customizable Dashboard ... 5

2.4 Interactive Rack View... 6

2.5 Dynamic Graphs ... 7

2.6 Threshold Based Alerts ... 8

2.7 Events and Change Management ... 9

2.8 Customizable Groups ... 9

2.9 Scheduled Tasks ... 10

2.10 System Management ... 11

2.11 Image Capture... 11

2.12 Image Deployment... 12

2.13 Appliance Accessibility ... 13

2.14 Hardware Inventory ... 13

3 The concurrentCONTROLTM Appliance ... 15

3.1 Scale‐out OoB monitoring and management ... 15

3.2 Sensors ... 16  

(3)

1 Introduction

Concurrent Thinking’s concurrentCOMMANDTM and concurrentCONTROLTM products provide a powerful, yet easy-to-use solution for Data Centre Efficiency Management (DCEM). They offer capabilities for managing both physical infrastructure and IT systems within the data centre, including: power monitoring and control, support for environmental sensors, out-of-band server monitoring and even OS deployment and performance monitoring.

As data centres grow in size and complexity, keeping track of the status, power consumed and performance of all IT components requires increasingly sophisticated management techniques. The Concurrent Thinking family of appliances provide an intuitive point-and-click interface to the managed environment allowing administrators to be able to monitor and control their whole data centre simply and easily. concurrentCOMMAND is a 1U rack appliance that delivers a web-based GUI for configuring, monitoring and managing systems from any internet-capable device.

With support for up to 16 wired sensors and two Ethernet networks within a discrete

‘0U’ form-factor, concurrentCONTROL can be thought of as a repeater device which provides the necessary infrastructure to deal with large-scale system deployments.

The two appliances can easily be added to any existing data centre infrastructure using a number of different Ethernet network topologies. A typical topology is illustrated below: in this case concurrentCOMMAND is connected to both the external/office/corporate network and to the data centre management network, whereas concurrentCONTROL appliances are connected to the management network and a dedicated LOM/IPMI/BMC network. Other similar network topologies are also supported.

(4)

2 The concurrentCOMMAND

TM

Appliance

2.1 Metric definition and monitoring

concurrentCOMMAND and concurrentCONTROL can monitor and record information (or ‘metrics’) from numerous sources, including wired environmental sensors, intelligent PDUs, servers, operating systems and virtual machines.

concurrentCOMMAND uses round-robin data-bases (RRDs) for storing the information that it collects. While this technique uses lossy compression techniques to reduce the granularity of the oldest data as time passes, it allows large amounts of temporal data to be stored without dramatic increases in the required storage space – a problem typical of many monitoring systems.

Monitoring is non-invasive and can be performed ‘Out-of-Band’ using standard protocols over Ethernet: in particular, the Integrated Platform Management Interface (IPMI) can used to monitor server health, while SNMP can be configured within the Linux and Windows operating systems to provide information on server utilisation or error status. However, it is also possible to monitor operating systems, virtual machines and even applications by installing a special ‘daemon’ application (provide and supported by Concurrent Thinking) within the operating system. This application is compatible with the open source ‘Ganglia’ framework1, allowing users to implement their own scripts so as to define new system metrics via the Ganglia command line facility (‘gmetric’) which will appear automatically in the concurrentCOMMAND framework.

Any combination of metrics that have been monitored can also be combined through a user-defined formula to define ‘derived metrics’. Derived metrics are very useful for providing indicators of such things as power efficiency, utilisation levels or fault indicators. For example, a powerful indicator of IT usage effectiveness may be CPU utilisation / watt of power consumed, as this could allow an end-user to identify which server resources are under-utilised relative to others. Once derived metrics have been defined, they can be monitored using the same tools that are available for standard metrics.

Stored metrics can be viewed as discrete values, using real time visualizations (see section 2.4), as historical graphs (see section 2.5), or across groups of servers (see section 2.8). They can also be used to trigger events (see section 2.7) and alerts (see section 2.6)

      

1 http://ganglia.sourceforge.net/ 

(5)

2.2 Power Metrics

Power metrics, which can be collected from numerous sources, are brought together in the GUI to provide information on the power utilisation of servers or other rack- based equipment within the data centre. concurrentCOMMAND uses the inventory data-base (see section Error! Reference source not found.) to associate power that is monitored over IPMI, from PDUs with monitored outlets, or from split-core current sensors, with individual servers or groups of servers.

A power cost schedule allows the user to specify the periods for which different energy tariffs should be used, as well as values for the “Effective Grid Carbon Intensity”, as determined by the power provider. These settings are then used with total server-based power metrics to calculate power consumption (kWh), CO2 emissions (kgCO2 /kWh ) and cost (in a pre-defined currency).

2.3 Customizable Dashboard

The product comes with a fully customizable dashboard or overview screen whose settings can be easily altered. The power dial has the ability to display instantaneous power, cost and CO2 emissions for a pre-defined group of servers. Full appliance status can be seen; including service state, fault status and the status of pending upgrades which have been automatically downloaded and are ready to be installed.

(6)

2.4 Interactive Rack View

The interactive rack view allows the user to visualize the instantaneous values of any server-based metric. The rack view consists of a ‘logical view’ and a ‘physical view’, the latter providing a realistic representation of the individual racks, servers and auxiliary devices within the data centre. The logical view allows metric values to be displayed in a traditional bar chart (ordered by rack position or the metric value itself) while the physical view overlays these values graphically on top of the associated server (through a colour map and/or the use of coloured bars). Various filters are available to show only those servers and associated metrics whose values lie within certain limits. Zooming and picking capabilities allow the user to focus on individual racks, blade chassis, servers or auxiliary devices and only graph values associated with these. There is also an option to create groups of these filtered servers and subsequently carry out actions on the group e.g. reboot or script execution (see section 2.8).

The rack view is a very powerful tool for quickly identifying various inefficiencies within the data centre. In particular, it can be used to show excessive power consumption or power imbalance, to identify misused, unused, or under-used IT equipment (particularly through the use of derived metrics - see section 2.1), or to identify and correct environmental problems.

Once a suspected problem or inefficient server has been identified, a tool tip provides high-level information by hovering the mouse over the associated server image; a simple click-through process can then be used to view low-level details, or connect directly to the server so as to conduct a detailed investigation.

(7)

A number of new features, including new physical views, are planned to augment the rack view capability in the near future.

2.5 Dynamic Graphs

Historical values of any metric can be visualized in a dynamic graph that allows the user to focus on a particular period of interest using an interactive scroll bar. Multiple metrics can be visualized simultaneously using line or stacked graphs, with user defined colours for each value; individual metrics can then be enabled, disabled or filtered in or out at the clock of a mouse button. Metrics with two different units can be compared on the same graph using multiple axed. Graphs are dynamically updated as new values are received.

Metrics can be displayed for individual servers, groups of servers or the whole managed environment. Average, minimum and maximum values for a given group can also be visualized.

Incorporated above the graph is a cut down version of the “events time-line” (see section 2.7) which shows a condensed version of the events that have taken place within the period of interest, allowing the user to cross-check changes in metrics to changes within the managed environment.

(8)

2.6 Threshold Based Alerts

Alerts can be defined for any metric or derived metric based on user-defined thresholds. Furthermore, concurrentCOMMAND’s configurable task manager allows administrators to create customized alerts via a scripting repository – examples include the execution of power actions, system reboots or automatic notification by email, pager or SMS via an available GSM service. Ranged threshold capabilities allow for a sequence of escalation actions as a metric deviates or relaxation actions to restore normal service as the metric returns to its normal run time state.

(9)

2.7 Events and Change Management

The appliance comes with a comprehensive ‘events’ tracking capability whereby all automated or user-defined events are logged in the events database for display via a simple tabular log or a scalable graphical time-line. Example events include, but are not limited to: server reboots, DHCP requests, PXE boot requests, task execution and breach events. The graphical events time-line highlights periods of particular activity, and the user can then zoom in to view detailed events via a double click of the mouse.

The ability to add user-defined events relating to a single server, group of servers or to any monitored device helps to ensure an audit trail which can be useful for diagnosing cause and effect within the data centre – for example, an air-conditioning failure leading to a localised increase in rack temperatures and subsequent breaches in server inlet temperatures.

2.8 Customizable Groups

concurrentCOMMAND administrators may create static groups of servers in order to simplify system monitoring and efficiency management. A typical example is the association of servers to end-users, applications or customers in order to charge-back power costs, or compare the IT efficiency of these end-users or business functions;

another example is the definition of different server types that may be suitable for specific tasks or applications.

(10)

Once server groups have been defined, the appliance can simplify complicated tasks by performing automated actions across these groups. For example, users can perform power actions, configure network boot options and run their own tasks and scripts across these groups of servers. This concept can also be used for simultaneously deploying an image across a large number of servers (see section 2.12).

2.9 Scheduled Tasks

concurrentCOMMAND has a built in scheduler that allows users to automate actions across individual servers or groups of server at regular intervals or at a pre-defined day and time. The scheduler can run user-defined scripts that are stored in the script repository. This capability can be used, for example, to power down unused servers at night, weekends or public holidays, or otherwise force them into low power mode.

(11)

2.10 System Management

System administrators have the benefit of being able to run node diagnostics and control the power state of any combination of server nodes to provide additional assistance to data centre support teams. The appliance supports power control at an OS, BMC and PDU level helping administrators to remotely control nodes within their managed environment. A comprehensive appliance console also allows remote access to the secure management network space (through serial-over-IPMI and SSH protocols), allowing concurrentCOMMAND to be the support portal into the inner workings of the managed data centre environment.

2.11 Image Capture

The capture of software images for server nodes is now easier than ever with concurrentCOMMAND; users are walked through the process with a simple wizard that assists administrators at every stage of the process. The capture process logs an audit trail and any changes made to the captured image are also logged. Users are notified of the successful completion of an image capture task via the notification manager. Events are also triggered at the start and finish of an image capture, and there is a full historical audit trail of which images were deployed to which servers.

(12)

2.12 Image Deployment

concurrentCOMMAND offers a truly parallel and scalable image deployment mechanism which allows large server groups to be redeployed in minutes (typically no longer than it would take to image one or two servers). During the parallel deployment phase a matrix view of all nodes is presented to the user, with an instant status of every node in the deployment process. Information relating to the deployment status of a specific server is displayed after selecting a server icon image from the matrix.

(13)

2.13 Appliance Accessibility

Secure connection via an HTTPS/SSL-encrypted interface allows the user to connect to the appliance from a remote location. A role-based authentication mechanism limits operations that can be performed by non-administrative users, allowing such users to monitor the managed environment remotely without them having privileged access.

The appliance provides single click access to either a node’s BMC/IPMI interface or directly to the operating system, thereby offering true lights-out management.

2.14 Hardware Inventory

concurrentCOMMAND has a hardware inventory data-base that must be populated in order to enable the rack view, as well as other physical views planned for upcoming releases (see section 2.4). The hardware inventory stores the position of individual servers and auxiliary devices within each rack, as well as details such as manufacturer and serial number. There is support for customised chassis, blade servers, virtual machines and server-templates, allowing for a fully customisable configuration. A comprehensive wizard is provided to aid the process of filling the data-base, although it is a simple task to add new devices into empty slots of existing racks.

(14)

The hardware inventory also stores information on which servers are connected to which managed PDU outlets, with a simple drag-and-drop interface so as to simplify the appliance configuration process. Servers with multiple PSUs can be associated with individual outlets on single or multiple PDUs, allowing concurrentCOMMAND to associate power readings with individual servers when individual outlet monitoring is possible. In the same way, the appliance provides a simple mechanism for turning on and off servers directly from the rack view.

(15)

3 The concurrentCONTROL

TM

Appliance

3.1 Scale-out OoB monitoring and management

concurrentCONTROL is a space-optimized ‘Zero U’ appliance which can easily be mounted in the back or side of any 19” rack. It provides a rack-based environmental sensor capability, while undertaking other control and monitoring tasks locally so as to offer a scale-out monitoring infrastructure. Typically, a single concurrentCONTROL will manage two to three racks of equipment, polling for OoB SNMP and IPMI metrics locally and then collating these before returning information to concurrentCOMMAND.

concurrentCONTROL is simple to add into the data centre and is automatically configured via the concurrentCOMMAND appliance.

Typical network configurations are detailed in section 1.

(16)

3.2 Sensors

concurrentCONTROL supports up to 16 wired 0-5V sensors with varying cable lengths that allow the sensors to be placed within reasonable proximity to the appliance. Supported sensors are presently limited to temperature and humidity, although support for new sensor types is constantly being added.

 

References

Related documents

Moreover, if we consider that the barter industry has developed two different sectors, it seems reasonable to wonder if it is correct to assimilate all modern barter practices,

exceed $4 , 000 , 000 County of Tuscola Pension Obligation Bonds, Series 2017 ( Federally Taxable - General Obligation Limited Tax ) (the " Bonds " ) , it is necessary to

Hardware-based solutions offer secure, remote KVM access, serial device management and power control in a single, space-saving unit. It allows IT staff to securely restore service

Additionally, this paper compares student evaluation results across this course in three time periods (two semesters of traditional delivery and the

If two or more reactants are fed to a chemical reactor according to the stoichiometric proportion to produce the products, then one of the reactant would disappear at the completion

Write a member of santa fe store hours are updated regularly, and to ssl path unless it came to buy homes in business: desert academy of santa fe.. Understands that you qualify

The Avocent serial console appliances provide secure, remote access to the serial console and confi guration ports of a wide variety of equipment—including headless servers,

Polycyclic aromatic hydrocarbons (PAHs) and polychlorinated biphenyls (PCBs) were investigated in soils of Agbabu, Nigeria.. PAHs in the samples were quantified using gas