Hope is Not A Strategy
Avoiding 5 Common Network Management Mistakes
Tim Connelly
Director, Systems Engineers www.netcordia.com
In the News….
• February 18, 2010 – TechCrunch crippled for 4 hours
Cause: – a “unscheduled router change“
• December 17, 2009 – Blackberry loses ability to send email
Cause: – a routine router maintenance caused “unplanned result“
• August 3, 2009 - Paypal goes down – payment services are lost for over 6 hours!
Cause: – a "back-end router" complicated by a change that took place
• August 9, 2009 - Cisco’s web site goes down
Cause: A “human error”
• September 1, 2009 - Gmail goes down worldwide
A Few of The Responses
• “I hope it will be much longer than four years before we face a
problem like this again.”
• Matt Mullenweg, WordPress Founder on blog outage impacting TechCrunch
• “Something went wrong….these things happen.”
• Crystal Davis, Sprint on the Blackberry outage
• Is “Hope” a strategy?
Mistake 1
Manually Dealing with Network
Change
Facts on the Network Infrastructure
•
60-70% of all network issues are tied to network change
•
Over 70% of day is spent on unplanned work
•
Up to 80% of all IT resources are consumed just to
maintain the status quo
•
Over 90% of organizations use significant manual
The IT Capability Gap
0% 2000 2002 2004 2006 2008 2010 IT resource/staff Importance of network/ risk of downtime IT capability gapThe Growing Pain of Manual Processes
• The misnomer of “A quick & easy manual change” • Easy change is 3-5 minutes per device
• Access device, log in, check current setting, change, save, verify, hope
nothing breaks
• Doing the math
• 4 minutes per device • X 2.3 changes per week • X 275 devices
• = 2,530 minutes (over 42 hours)
Mistake 2
Don’t Understand the Difference
between Change Management and
Change Management Process
Request a change Request a change Review change Review change Receive approval Receive approval Schedule ticket Schedule ticket Make change/close ticket Make change/close ticketChange Management Process
Request a change Review change Approval Schedule ticket Change/ close ticket Good change? Impact neighbors? Within policy? Successful?Mistake 3
Believing Managing Performance
and Managing Change Should Be
Performance and Change Are Separate
Performance Team
•
Track up/down status
•Monitor response time
•Measure usage and
utilization
•
Verify service level
agreements
Change Team
•
Manage network devices
•Implement changes and
configurations
•
Maintain consistency
and standards
Only Meet When Problems Occur
• Only get together when troubleshooting starts
• Who caused the problem
• What happened
• When did the cause start
• Where is the impact
Mistake 4
Key Driver for Compliance – Internal
• Internal best practices – gold standard
• Designed by IT/networking staff
• Typically defined on paper and taught
• Generally accepted by IT
• ISO, ITIL and custom standards
• External mandates - governance
• Required by government or sector
• Generally disliked by IT
• Extensive audit and reporting needs
Common Issue Today for Compliance
Define Standards on Paper
Wait Until Auditor Report Is
Needed
Five Critical Steps for Compliance
Define Standards & Policies
Implementation to Device Set
Proactive Monitoring & Alerting
Violation Remediation &
Verification
Reporting – Up-to-the Minute &
Scheduled
Mistake 5
Change on Network Neighbors
•
Typically, changes are focused on individual devices
•Reviews should consider impact on neighboring
devices or along the path
•
Challenges occur with unintended consequences on
device neighbors
•
Limited view on impact of change increases risk along
Understanding the Impact of Change
•
Cause & Effect
• Help user identify hard to
find issues
• See if a change had a
positive or negative impact on health
• Verify if change impacts
policy compliance
• View impact on device
Enforce Compliance and Standardization
•
Build Consistency
• Over 140 pre-packaged
rules
• Wizard encoding of
complex rule logic
• Proactive alerts for
policy violations
• Built-in remediation • Live and historical
status, trends and reports
Improve Staff Efficiency and Productivity
Empower Staff
• Automate data collection
& analysis
• Reduce manual time and
effort • Become proactive • Remediation options • Multi-user roles • Views based on individual needs
Meeting New Technology Requirements
Build Foundation
• Understand impact of
virtualization, cloud computing, data center consolidation
• Visibility into dynamic
world
• Build consistency/
standardization
• Scalability/ distributed
Don’t Fall Into the Same Traps as Others
•
Manage Change – Take control of #1 cause of network
outages
•
Understand impact of change
•
Use automation to achieve goals
•Control access
Hope is Not A Strategy
Avoiding 5 Common Network Management Mistakes
Tim Connelly
Director, Systems, Engineers www.netcordia.com