• No results found

IT Service Management System at the Leibniz Supercomputing Centre

N/A
N/A
Protected

Academic year: 2021

Share "IT Service Management System at the Leibniz Supercomputing Centre"

Copied!
26
0
0

Loading.... (view fulltext now)

Full text

(1)

Leibniz Supercomputing Centre of the Bavarian Academy of Sciences and Humanities

IT Service Management System at the

Leibniz Supercomputing Centre

Dr. Michael Brenner [email protected]

(2)

Leibniz Supercomputing Centre (LRZ)

Who we are, what we do

(3)

Leibniz Supercomputing Centre of the Bavarian Academy of Sciences and Humanities

Leibniz Supercomputing Centre (LRZ)

• Owned by the Bavarian Academy of Sciences and Humanities • Staff: ca. 170 full time employees

- The LRZ operates

>10 routers; >1.000 switches; >2.000 access points; >80 dark fibers;

>600 physical servers; a VMware ESX cluster with 512 cores;

a supercomputer (currently 62-78 TFlop, next year ca. 3 PFlop); a Linux cluster with 43 TFlop; a virtual reality laboratory; archiving systems with >42 petabyte capacity; etc.

3

(4)

Leibniz Supercomputing Centre of the Bavarian Academy of Sciences and Humanities

LRZ: Mission (our "Business")

• National and European Supercomputing Centre

- Part of the Gauss Centre for Supercomputing

- Integrated in National and European HPC and Grid projects

• Regional Computer Centre for all Bavarian Universities

- Capacity computing, special equipment, backup and archiving centre (15 petabyte, more than 9 billion files)

- Competence centre (Networks, HPC, Grid Computing, IT Management)

• Computer Centre for all Munich Universities

- Student users: more than 90,000

- Professional users: more than 30,000; including 8,500 scientists

4

(5)

The case for an

organizational / process approach

5

(6)

IT Service Management

Why we need (better) Service Management

• LRZ services become more 'business critical'

- University administration processes more time-critical and IT-dependent - University institutions act as (semi-)commercial service providers

- New services for state libraries, for the office for the central allocation of study places, etc.

• Interest of many customers is shifting from the performance to the availability and dependability of LRZ's services

6

Rising complexity

of service provisioning

Higher availability and dependability demands

(7)

IT Service Management

Architectural view of IT service provisioning

7 Client Tier Presentation Tier Business Logic Tier Integration Tier Resources Tier Application Layer Application Infrastructure Layer Platform Layer Web Server

JSP Skript ApplicationBusiness

Application Integration Broker Stored Procedures Application Server Transaction Middleware relationales DBMS Network Layer Browser OS HW-Plattform

IP-Service IP-Service IP-Service IP-Service IP-Service OS HW-Plattform OS HW-Plattform OS HW-Plattform Java Virtual Machine Java-Applet

Example, based on Sun 3-D Architectural

Framework

• Manyfold dependencies across architecture layers and tiers

• Additional dependencies through centralized infrastructure services (identity management, load balancing, ...)

(8)

Leibniz Supercomputing Centre of the Bavarian Academy of Sciences and Humanities 8

Where do IT service incidents originate?

80% People &

Process Issues

Sources of downtime, according to Gartner (1999):

20% Technology Infrastructure improvements Software Engineering IT Service Management

• Technology issues are not the main source of service downtime • Main source of downtime is "somebody has tinkered with the

configuration" Network and systems management

(9)

Leibniz Supercomputing Centre of the Bavarian Academy of Sciences and Humanities

How are IT service incidents resolved?

9

Communication Networks

User Services and Systems

High Performance

Systems Central Services

Directories, E-Mail

Internet Services and Databases Graphics, Visualization and Multimedia Desktop Management Compute Server High Performance Computing Distributed Resources

File- and Storage Systems Operation of Communication Networks Planing of Communication Networks Maintenance of Communication Networks Administration Facility Management Public Relations, Licenses, Courses and

Administrative IT User Information and

IT Support

Directorate

Executive Management

Organizational Structure of the LRZ

User can't access his data. Called us, but our systems are running fine.

Gotta be a NAS fault

It's not the NAS-Filer. Has to be a network problem.

Network backbone is fine. Has to be a local problem

No, everything is fine. Maybe there's something wrong

with the authorization? IM-System is running

fine? What account are we talking about?

Dear Sir or Madam,  I will be out of office until Nov

(10)

Leibniz Supercomputing Centre of the Bavarian Academy of Sciences and Humanities

Why we need IT Service Management

• Traditional organiational structures of IT service providers divide functions along infrastructure layers and tiers (or technical

specialties)

• But today, most IT services are multi-layered and multi-tiered

- Resolution of IT failures needs to be coordinated across specialties and parts of the infrastructure

- Effects of changes do not respect organizational boundaries - ...

We need

- Better cross-functional coordination of activities - Repeatable, predictable outcomes of activities

- Better cross-functional documentation of infrastructure interdependencies

We need IT Service Management processes

(11)

Introducing IT Service Management (ITSM)

as a project

11

(12)

Leibniz Supercomputing Centre of the Bavarian Academy of Sciences and Humanities

What kind of a project is introducing ITSM?

• This is a process improvement project, because

- Without defined process, workflow remains ad-hoc and results won't be repeatable and reliable.

- Processes need to be defined, established, measured and improved (Continuous improvement, "PDCA cycle").

• This is a technology project, because

- Tools enable process coordination, reporting and continuos improvement.

- A good tool can make staff accept change more readily, a bad tool will make everything worse.

• This is a change (organizational change, "people") project, because

- A process organization is laid over the existing structure

- Roles, authorities and responsibilities need to be defined and established 12 People Process Techno-logy

(13)

Leibniz Supercomputing Centre of the Bavarian Academy of Sciences and Humanities

Introducing ITSM as a "Technochange"

• Introducing IT Service Management should be seen as a "Technochange" project (organizational

change with a strong tech-component)

- Technochange projects are high risk, historic failure rate ca. 75%

- Biggest source of risk: people's negative reaction to change

- IT people do not easily get excited about user support, change control, documentation etc.

- An ITSM system introduces a level of bureaucracy that is usually uncomfortable for IT people

• "Best practices" on organizational just as important as those on ITSM processes

• If staff resistance is too high, your ITSM project is doomed.

13

Techno-logy

(14)

ITSM at the LRZ

People, Process and Technology

(15)

Leibniz Supercomputing Centre of the Bavarian Academy of Sciences and Humanities

New Roles and Responsibilities

15

Communication Networks

User Services and Systems

High Performance

Systems Central Services

Directories, E-Mail

Internet Services and Databases Graphics, Visualization and Multimedia Desktop Management Compute Server High Performance Computing Distributed Resources

File- and Storage Systems Operation of Communication Networks Planing of Communication Networks Maintenance of Communication Networks Administration Facility Management Public Relations, Licenses, Courses and

Administrative IT User Information and

IT Support

Directorate

Executive Management

Process Owner -

Incident Management Incident Manager Incident Owner Support Team Member

Process Owner -

Change Management Change Manager Change Owner

Process Owner - Configuration

Management

Configuration

Manager Configuration Librarian CI Owner

(16)

Leibniz Supercomputing Centre of the Bavarian Academy of Sciences and Humanities 16

Project Marketing and Training

• General Project Marketing (there is never enough communication) • Training Program

- All staff eligible to receive in-house foundation training and certification

- Cooperation with German TÜV Süd (developer of ITSM qualification scheme) - More than 150 staff members achieved "ISO/IEC 20000 Foundation" or higher

certification since 2007

Participants of the worldwide first pilot exams for new ISO/IEC 20000 personell certification scheme at LRZ, November 2007 People

(17)

Leibniz Supercomputing Centre of the Bavarian Academy of Sciences and Humanities

About the ITSM Frameworks

ITSM frameworks provide overall process structure, best practices for the defined processes and common terminology

- ITIL V3

- Most popular ITSM framework, describes ca. 25 processes.

- 5 books, more than 1500 pages (about 1000 pages net content).

- Comprehensive, but verbose and badly edited: Abundant contradictions, inconsistencies and obscurities. (2011 edition might be somewhat improved.)

Could probably be abridged to 250 pages without losing much useful content. - ISO/IEC 20000

- The "certifiable" standard for ITSM. Defines requirements for 13 processes. About half as popular as ITIL V3.

- Core parts (20000-1 and 20000-2) less than 100 pages in total.

- Concise to a fault. Too short to be useful as ITSM guidance by itself. - MOF

- Could be called "Microsoft's ITIL adaption", describes 16 "Service Management Functions". About half as popular as ISO/IEC 20000.

- Large number of documents available at microsoft.com/MOF at no cost.

- Somewhat complex structure, but contains a lot of useful and well-edited content.

17

(18)

Leibniz Supercomputing Centre of the Bavarian Academy of Sciences and Humanities 18

ISO/IEC 20000 as basic framework

Business Relationship Management

Budgeting & Accounting for IT Services

Information Security Management

Service Level Management

Service Reporting

Capacity Management Service Continuity & Availability Management

Incident Management ✔

Configuration Management

Control Processes

Resolution Processes Relationship Processes

Service Delivery Processes

• Selected at the LRZ for its relative simplicity, clarity and consistency

• Includes a "separation of cross-cutting concerns" in the process framework • Includes requirements for individual processes as well as for general

management practices (similar to those in ISO 9000)

Change Management

Release and Deployment Management

Problem Management Supplier Management

(19)

Leibniz Supercomputing Centre of the Bavarian Academy of Sciences and Humanities

Defining Processes

• Prerequisites

- Process framework (goals and interfaces of each process)

- Document control

- Policy and templates for process definitions

• E.g.: Defining LRZ

"Incident Management"

- Objective

- Workflow: activities, control flow - Roles and responsibilities

- Information to be recorded - Tools used

- Control parameters, "key performance indicators"

19

(20)

Leibniz Supercomputing Centre of the Bavarian Academy of Sciences and Humanities

ITSM Suites

• Over 100 self-proclaimed "ITSM-tools" in a very in-transparent market

• An "ITSM suite" commonly integrates applications for several ITSM processes with each other and with a Configuration Management Database (CMDB)

20

Technology Platform

Client

Database

Developer Tool Self Help ...

IT Personnel Developers / Admins Users / Customers

ITSM Application

Workflows Forms ITSM Objects ...

e.g. Monitoring, external ticketing

other Systems

Techno-logy

(21)

Leibniz Supercomputing Centre of the Bavarian Academy of Sciences and Humanities 21

Tool: Incident form (user client)

Techno-logy

(22)

Leibniz Supercomputing Centre of the Bavarian Academy of Sciences and Humanities 22

contact info (from LRZ-SIM)

description of incident impacted service

impact and urgency

assigned staff (owner and expert/admin)

type and status

Tool: Incident form (user client)

Techno-logy

(23)

Leibniz Supercomputing Centre of the Bavarian Academy of Sciences and Humanities 23

target time for

resolution from SLA

documentation and communication

Knowledge Base entries

priority ID

Link to Configuration Items (CMDB)

linked problems and changes

linked incidents

linked workflow

contact history

Techno-logy

(24)

Leibniz Supercomputing Centre of the Bavarian Academy of Sciences and Humanities 24

This CI is marked as "down" At least one active problem linked to this "VM"-CI

At least one active incident linked to this "Service

Building Block" CI

Tool: CMDB representation

Techno-logy

The "Service Building

Block"'s function depends on the VM's function

(25)

Leibniz Supercomputing Centre of the Bavarian Academy of Sciences and Humanities

After

Before

Overall Goal

25 I T S M -S u i t e

Process Tech- People nology

ticket

phone

email

Process Tech- People nology

- Undocumented processes (maturity level ≤2)

- Many management tools - Manual workflow control

- Fragmented information flow - Redundant documentation

- Documented processes (maturity level ≥3)

- Moderate number of management tools - Workflow management support

- Controlled information flow

(26)

Leibniz Supercomputing Centre of the Bavarian Academy of Sciences and Humanities

Summary and Conclusion

• Highly available and reliable services require systematic and process-oriented ITSM

• Establishing an ITSM system and achieving ISO/IEC 20000 certification is an on-going project at the LRZ

• Introducing ITSM is labor and cost intensive

• ITSM projects need to be consider not just process, but also people and technology aspects

References

Related documents

In this thesis, we investigate how the combination of quantum physics and information theory could deliver solutions at the forefront of information security, and, in particular,

However, the presence of more difficulties with immaturity, attention problems, and self-control problems in the OO adolescents indicate that they are exhibiting mild

New autoclaves Before placing an autoclave into service, a test load approximating the weight and density of the type of waste generated shall be autoclaved with test spore vials.

Ochatt, INRA UMR1347 Agroécologie, France Reviewed by: Rosario Muleo, Università degli Studi della Tuscia, Italy Wen-Wu Guo, Huazhong Agricultural University, China

Basic concepts in memory interfacing (Rules) Q.1) Design a microprocessor system for 8085 such that it should contain 8k byte of EPROM and 8k byte of RAM. using 1) Absolute /

5) To ensure the verification of successful testing to the degree required by organization standards before change introduction to the production environment. 2) In the

edible fi lms by their high content of isolated proteins (12–20 wt%) obtained after fat extraction, providing a solution to the accumulation or burning of these

Thus, if pastors experience exhaustion, it not only has negative effects on pastors’ spiritual sense of duty and psychological stability, but it can also be damaging to their