• No results found

High Availability for Microsoft Servers

N/A
N/A
Protected

Academic year: 2021

Share "High Availability for Microsoft Servers"

Copied!
20
0
0

Loading.... (view fulltext now)

Full text

(1)

1

High Availability for Microsoft

Servers

Brendan Murphy

IFIP, New York

June 2000

(2)

2

Agenda

¥ Windows 2000 dependability.

¥ Tomorrows problems.

(3)

3

Dependability Goal Setting.

From the customers perspective.

¥ Win 2k dependability > NT4.

Irrespective of additional new feature.

OpenVMS Metrics (running on VAX Systems) Post Installation Behaviour

0 10 20 30 40 50 60 70 80 90 100 V5.5 V5.5-1 V5.5-2 V6.0 V6.1 V6.2 V7.0 V7.1

Operating System Version

Rate Of System Outages

Upper Confidence Bound Average

Lower Confidence Bound

VAX Servers: ©FTSC 1999 Madison, Murphy, Davies, Compaq. VAX Servers: ©FTSC 1999 Madison, Murphy, Davies, Compaq.

Note reliability improvements

between each Ôservice packÕ

release.

Note drop in reliability

between Ôservice packÕ and

major release.

Exception

(4)

4

Making NT Measurable

Logging System Events

¥ Capturing time and type of outage.

Ð 6006 = clean shutdown.

Ð 6008 = blue screen.

Ð 6005 = Time of reboot.

¥ Capturing OS version and upgrade.

Ð 6009 = OS version on reboot.

Ð 4353 = installation of service pack.

¥ Capturing cause of system crash

Ð 1001 = brief description of crash details.

(5)

5

NT4 Failure Classes

Source: Sample from PSS Incidents: NT Server 4.0 8/5/96-4/7/98

Source: Sample from PSS Incidents: NT Server 4.0 8/5/96-4/7/98

Device

Device

drivers

drivers

16%

16%

Core NT

Core NT

43%

43%

Other

Other

third-party drivers

party drivers

16%

16%

Anti-virus

Anti-virus

12%

12%

Hardware

Hardware

Failure

Failure

13%

13%

(6)

6

Windows 2000 Bluescreen Reduction

Anti-virus

Anti-virus

n

n

Anti-virus dev labs

Anti-virus dev labs

n

n

Driver verifier

Driver verifier

n

n

Better DDK

Better DDK

Core NT

Core NT

n

n

Kernel verifier

Kernel verifier

n

n

Driver verifier

Driver verifier

n

n

Full time source code reviewers

Full time source code reviewers

n

n

Better

Better

longhaul

longhaul

testing

testing

n

n

Better component stress

Better component stress

n

n

PREfix

PREfix

source scanning

source scanning

Hardware failures

Hardware failures

n

n

Hardware

Hardware

compatibility list

compatibility list

Device drivers

Device drivers

n

n

Broader device

Broader device

test matrix

test matrix

n

n

Driver verifier

Driver verifier

n

n

Better DDK

Better DDK

Other third-party drivers

Other third-party drivers

n

n

File system dev labs

File system dev labs

n

n

Driver verifier

Driver verifier

n

(7)

7

NT4 Server Reboot Causes

OS install

OS install

27%

27%

Hardware

Hardware

install/

install/

config

config

3%

3%

Preventative

Preventative

reboots

reboots

20%

20%

OS

OS

Configuration

Configuration

7%

7%

Application

Application

install/

install/

config

config

8%

8%

Application

Application

Failure

Failure

21%

21%

System failure

System failure

14%

14%

35% of outages

35% of outages

are unplanned

are unplanned

65% of outages

65% of outages

are planned

are planned

Source: One site, 1,180 servers, 9/1/98-5/7/99, SP4

(8)

8

Windows 2000 Reboot Reduction

Preventative reboots

Preventative reboots

n

n

Published best practices

Published best practices

n

n

Resource Partitioning

Resource Partitioning

n

n

IIS restart

IIS restart

System Failure

System Failure

n

n

Bluescreen reduction

Bluescreen reduction

Unplanned reboot reduction

Unplanned reboot reduction

Planned reboot reduction

Planned reboot reduction

App install/

App install/

config

config

n

n

Windows file protection

Windows file protection

n

n

Windows installer

Windows installer

OS configuration

OS configuration

n

n

Eliminated dozens

Eliminated dozens

of configuration

of configuration

reboots

reboots

OS install

OS install

n

n

Service pack

Service pack

slipstreaming

slipstreaming

Hardware

Hardware

install/

install/

config

config

n

n

Plug-n-Play

Plug-n-Play

App Failure

App Failure

n

n

Resource Partitioning

Resource Partitioning

n

n

Task mgr Òkill proc treeÓ

Task mgr Òkill proc treeÓ

n

n

IIS restart

IIS restart

n

(9)

9

Windows 2000 Verification

Problem

Windows 2000

Verification

Features

Bug Fixes

+ features

Test

Development

Field Tests

(Beta Testing)

38 packs/working day

288 variants/working day

7000+

>5000

Customer

Servers

5000+ Supported

Computers

Application

Test & Development

(10)

10

Testing Process.

Incremental Improvement

Daily Builds

Weekly Builds

RC/Beta Builds

RTM

(11)

11

Windows 2000

Failure Analysis.

Drivers for HCL HW 7%

Drivers for NonHCL HW 20% HW Failure 22% Anti-Virus 4% System Config 34% Other 3rd Party Kernel code 11% MSInternalCode 2% Other IFSDrivers 0%

Device

Device

drivers

drivers

16%

16%

Core NT

Core NT

43%

43%

Other

Other

third-party drivers

party drivers

16%

16%

Anti-virus

Anti-virus

12%

12%

Hardware

Hardware

Failure

Failure

13%

13%

Source: Sample from PSS Incidents:

Source: Sample from PSS Incidents:

(12)

12

Future/current dependability

issues

¥ Scalability

Ð Vertical

¥ Complex Faults.

¥ Change control.

¥ Application configuration management.

Ð Horizontal

¥ System Management.

¥ Shared data.

(13)

13

Future/current dependability

issues contÉ

¥ Measurement and characterization.

Ð System not the metal box.

¥ Measure MSN behaviour?

Ð Relate to reality.

¥ Not 99.999% availability claims.

Ð Drive correct behaviour.

¥ For both developers and customers.

(14)

14

Future / current dependability

issues É. cont

¥ Size.

Ð Geographic solutions.

Ð Reassess failure rates.

Failure rates per TB of memory.

¥ Real time change management.

¥ Cost of ownership.

Ð Decrease skill and number of system managers.

(15)

15

Future / current dependability

issues É. cont

¥ Lack of hierarchy.

Ð Storage, computer, network, applications.

¥ The system is not the computer.

Ð MSN gets a rack of computers every 2 weeks.

¥ Mobiles.

Ð Security/trust.

Ð Synchronization.

(16)

16

Microsoft Research

Development tools

¥ Code Analysis.

Ð Generic analysis applied to all new code.

¥ Decrease false positives.

Ð Automating verification of best practices(e.g. driver

development).

¥ Driver verifiers.

Ð Fault injection(ish).

Ð Software development environments.

Ð Using OS as test environment.

Ð Objective is to protect the operating system.

(17)

17

Testing

$162 million problem and rising

¥ Testing.

Ð Virtual labs.

Ð Specific labs

Ð Test development.

Ð Attribute testing.

Ð Measurements.

Ð Release criteria.

Ð Motivation is $$$$$$$$

(18)

18

Failure predictions

¥ Hardware failure prediction methodologies

applicable to software failures?

¥ Analysis of SQL and exchange using NT

logs.

¥ Apply to other applications.

(19)

19

System Fault Management.

¥ Assume no hierarchy.

¥ Assume limited knowledge between

different elements (i.e. abstraction).

¥ System is not a metal box.

¥ Need to work under failure conditions.

¥ Pattern recognition time based.

Ð Pattern variations based on system speed.

(20)

20

Conclusions/Issues

¥ Dependability is improving.

Ð In spite of using ÔwrongÕ methods?

¥ Industry has problems and research have solutions.

Ð But what is the relationship?

¥ Industry addressing current problems through

innovation and brute force.

Ð Opportunities to address future issues (risky predicting

the future).

References

Related documents

Lochinvar LLC, reserves the right to make changes at any time, without notice, to prices, color, materials, specification and

• Doors need to open and close easily and smoothly • Doors with automatic closures should be checked for..

Child Maintenance – an order that provides for financial support of a child Specific Issues – an order about any other aspect of parental responsibility (this may include

For the prediction of tobramycin susceptibility and resistance, the machine learning classifiers performed almost equally well when the three input data types (SNPs, GPA, and

Creating scalable file-serving clusters comprised of Microsoft Windows Server 2008 R2 and Sanbolic Melio offers organiza- tions the flexibility to use external or internal storage

Product Version License Requirement Application Server Microsoft Windows 2000 Server Or Microsoft Windows Advanced Server 2000 Or Microsoft Windows 2003 Server Or Microsoft Windows

Upgrading Integrity Servers from Windows Server 2008 To upgrade Integrity Servers from Windows Server 2008 to Windows Server 2008 SP2, complete the following, and then reboot

This list contains products that meet the interpretation of the Internal Revenue Service (IRS) guidance for preventative medications; it is not inclusive of all