• No results found

Ticket Management & Best Practices. April 29, 2014

N/A
N/A
Protected

Academic year: 2021

Share "Ticket Management & Best Practices. April 29, 2014"

Copied!
15
0
0

Loading.... (view fulltext now)

Full text

(1)

Ticket Management

& Best Practices

(2)

Trouble Ticketing System Objectives

Assumption: Network Operations Centers are governed by two principles

1.

Pursuit of excellence in customer service

2.

Operating the most cost effective NOC possible

Common elements of these two principles:

Have as few outages as possible, with the shortest MTTR achievable

Have systems perform work for you to act quickly & effectively with regular updates to customers

In support of these principles, what do we need from a trouble ticketing

system?

Efficiency – Easy to use with simplicity & automation

Customer service – updates, interaction and self-service

Analytical capabilities – Ability to reduce trouble ticket volume through analytics; Repository of

data for reporting on drivers of troubles and high TTR with focus on fault and TTR reduction

CURRENT STATE

An effective trouble ticketing system is the foundation of operational

success

(3)

Trouble Ticketing System Requirements

Efficient

Intuitive / easy to train & use

Automation & Auto-population of data- have system perform as much of the work as possible

Flexible & Extensible

Easily customizable (preferably by in-house personnel)

Extensible with add-ins and additional functionality

Capable of integration (input/output) with other systems

Inventory systems for circuit details (single system for NCC technicians)

Alarm systems for ticket / alarm automation

APIs for integration with external vendors/customers

Extensive data capturing and reporting capability

Association of each case with customer circuit ID, account, locations

Root cause codes (4 or 5 levels)- down to circuit pack part number for equipment failures

Responsible party (customer or provider)

MTTR (auto calculated)

Type II vendor performance (where involved)

Site / City / State / Region / Legacy Network (as applicable)

CURRENT STATE

The trouble ticketing system should be designed to accommodate current

operational needs, and extensible to support future requirements

(4)

Trouble Ticketing System Architecture

(5)

Trouble Ticket Lifecycle

1.

Create

2.

Update

3.

Close

4.

Analyze

5.

Act

Capturing, analyzing, and acting on accurate trouble ticket data is critical to

operational improvement

Best in class trouble ticket handling requires 5 phases of trouble ticket

handling…

Traditional life cycle of a

trouble ticket; open,

update, close

Value-add phases of trouble

ticket lifecycle; analyze and

act on the data

(6)

Trouble Ticket Lifecycle – Phase 1 (Case Creation)

1.

Create Ticket

Primary Objectives for Creating Case:

Associate case to service & validate contacts on file

Capture all customer information in first conversation (pursue one call resolution)

Establish case framework with all information needed to successfully work and close ticket

Populate required trouble ticket fields

Use a standardized case subject e.g. “Customer XYZ, DS-3 HIYX/123456//ZYO Down Hard”

Associate to customer circuit ID (and thereby account)

Capture all available information

Customer description of issue, including circuit status (Is circuit currently hard down or degraded?, intrusive testing permitted?)

Troubleshooting steps customer has taken- Have they checked CPE for power (for hard down issues)?

Customer information to include trouble ticket number, contact name, phone number and email address(es)

Access process to customer location (if applicable)?

Update customer with details on next steps towards resolution

Provide customer with service provider ticket number

Automatically send email to customer with ticket details, link to portal (if applicable)

Suggest additional troubleshooting steps customer might take (bounce ports, restart router, enable lasers, etc.)

Articulate next steps in process- e.g. dispatching a field technician to site, engaging our Tier II technicians, etc and set expectations for next communication to customer

Immediately initiate troubleshooting and repairs

Queues cost customers- the outage may be significantly impacting the customer’s business- act like it

Immediately route to appropriate Tier II organization or other fix agent

Case creation stage pertinent to ensure correct association to circuit and

account for SLA reporting and customer updates

(7)

Single Customer vs. Network (Multi-customer) Cases

Network Case

- Case comments

- Case status

- Closure Codes

- Case Closure

Customer Case #1

Customer Case #2

Customer Case #3

Customer Case #4

Customer Case #5

……….

……….

……….

Customer Case #N

Network Case Handling (multiple customer circuits affected)

-

For issues affecting multiple customers, create a single parent / network case that reflects the

overall event and a child / customer case for each service impacted

-

All information entered into the Parent case cascades down to child cases

-

Proactively create the cases (ideally with system automation) and email customers upon case

opening; Attempt to inform customer of event BEFORE they contact the NOC to report service

issue

-

Enter “public” comments frequently into the parent case to send emails to each affected

customer; over-communicate updates to significantly reduce call volume into NOC (and increase

customer satisfaction over event handling)

(8)

Trouble Ticket Lifecycle – Phase 2 (Case Updates)

2.

Update Ticket

Primary objectives for working case:

Resolve issue as quickly as possible (i.e. Work with a great sense of urgency)

Over-communicate with affected customers until issue fully resolved

Thoroughly document event and actions taken

Tactical Approach

Enter thorough, detailed case comments- include names, phone numbers, IP addresses, location details, equipment alarm logs, etc. The more detail, the better.

Document every action taken, every conversation held- “if it isn’t in the ticket, it didn’t happen”

Case comments should have automatic timestamps for reconstruction of event

Case status changes should automatically drive MTTR logs

> Case Created (starts MTTR clock)

> Repair in Process

> Technician dispatched

> Technician arrived

> Service restored (stops clock)

Enter “Public” comments as frequently as possible, never less than once per hour for long duration events

Escalate as needed

Engage higher level resources as needed and involve Tier III, Engineering, vendor resources as required- don’t get stuck

Update management on critical issues- don’t let management team be caught by surprise

When required resources are not reachable (e.g. field technicians), escalate up their management chain immediately- “once around and up”

Outages will occur; acting with urgency and providing frequent updates to

customers improves customer satisfaction and reduces attrition

(9)

Trouble Ticket Lifecycle – Phase 3 (Case Closure)

3.

Close Ticket

Primary Objectives for Closing Case:

Close-out communications with customer- “wrap it up”

Capture closure code details for subsequent reporting

Summarize the event in 2 to 3 sentences for future internal and external consumption

Communicate with Customer that case is being closed

Summarize case details, provide preliminary RFO, let customer know that case is being closed or placed into monitor status

Create succinct, descriptive closing summary

“Customer reported DS-3 down hard, dispatched technician and isolated to failed DSX-3 module, replaced DSX-3 module to restore”

Capture closure codes with accurate detail

Level 1: Zayo owned equipment or fiber

Level 2: Equipment Failure

Level 3: Telect

Level 4: DSX-3 Module

Specific part number captured by equipment replacement request

Review MTTR logs for accuracy, correct if needed

Close case or set to monitor status with auto-close (i.e. try not to touch it again)

Accurate case closure codes are required for reporting on drivers of trouble

volumes and high TTR; customer consumable closure summaries reduce

(10)

Trouble Ticket Lifecycle – Phase 4 (Analyze Ticket Data)

4.

Analyze Ticket Data

Primary Objectives for Analyzing Trouble Ticket Data:

Determine most frequent causes of trouble tickets

Determine drivers of high TTR

Identify chronic issues (before the customer does)

Analyze trouble by closure codes

Create pareto charts to determine top drivers of trouble volumes

Determine fault frequency rate of equipment issues; expect <2.5% failures per annum

Review cases from different perspectives

> Troubles by vendor, equipment make/model, circuit pack (part number), software load

> Troubles by service type

> Troubles by site

> Troubles by legacy network

Identify and review chronic issues

Identify repeat/recurring troubles on specific circuits

Repeat events at site (high temp, low temp, power loss, card failures, circuit errors, etc.)- may be indicative of power/grounding/lighting/cabling issues

Specific routes subject to failure- fiber cuts, power outages, intermittent errors, (e.g. PMD identification)

Analyze Root Cause of Faults and drivers of high TTR

Analyze events with high MTTR to determine drivers

> Regional, state, city (sparing, technician locations, tools, training, OSP repair processes and capabilities, local management)

> Equipment type (NOC technician training & capabilities, OSS systems, software issues, vendor support)

> OSP repair processes (cut isolation and repair approach, 3rd party performance, OSP restoration contractors and capabilities)

Customer responsible troubles

Drivers of customer responsible troubles

Specific circuits with high volumes of customer responsible issues

Specific customers with high volumes of customer responsible issues

Weekly, monthly, quarterly, and annual analyses provide different perspectives

Invest the time required to thoroughly analyze trouble ticket metrics to

determine root cause drivers of outages and high MTTR

(11)

Pareto Chart Analysis of Equipment Failures

Top 3 levels of closure codes provide view down to

equipment manufacturer

In this example, fault frequency rate of Force10

equipment determined to be >8% across ~500 network

elements (vs. several thousand Accedian and Westell

Devices)

Data used to create business case for removal of

equipment as part of network modernization; resulted in

significant improvement in trouble ticket volumes res

(12)

Trouble Ticket Lifecycle – Phase 5 (Improvement Activity)

5.

Act on Trouble Ticket Data Analysis

Primary Objectives:

Reduce trouble ticket volumes

Reduce mean time to restore

Identify Opportunities to eliminate outages and reduce MTTR

Determine actions that can be taken to eliminate outages (Software upgrades, equipment replacements, process improvements, training, systems, power audits, etc.)

Engage technology vendors and demand product improvement as appropriate (e.g. >2.5 annual fault frequency rate)- don’t accept subpar technologies)

Hold type II providers to high standards; report to them on their performance and request corrective action plans as appropriate. Ensure that vendor performance influences buying decisions

Identify potential to reduce MTTR (troubleshooting processes & training, field technician locations, tools & equipment, sparing, restoration and power contractors, etc.)

If it is worth doing, put it on an Action Item register and make a commitment to completing; Create impactful corrective actions and assign an individual that is accountable for each action item with a due date. Don’t create trivial corrective actions as this diminishes importance of urgent action items

Have a system for following each action item to completion; for larger organizations consider dedicating an employee just to this function- it’s that important

Determine methodology to reduce customer responsible troubles

Inform customers of potential chronic issues on their side; suggest potential improvement initiatives, noting that some customers may not have the ability to identify chronic issues or capability to reduce issues

Enable customer self-service where possible (DNS updates, routing updates, equipment PMs, circuit status, etc.)- pursue advanced portable capabilities

Bill customers for repetitive abuse of the system- i.e. using service provider to troubleshoot customer equipment or isolate among multiple providers

In rare cases, consider “firing your customer”

Acting on the trouble ticket data dramatically improves network

performance & customer service while reducing operational costs

(13)

Trouble Ticket Lifecycle – General Commentary

Always attempt to contact the customer before they contact you

Proactively notify of outages

Over-communicate with frequent case updates; don’t make customer ask

Internal escalation- if appropriate, escalate to management before customer escalates and have upper management engage

Attempt to interact with customers in the method(s) that they prefer

Carrier customers prefer to interact via phone

Enterprise customers (particularly IP) generally prefer to interact via email or portal

Enable self-service; allow customers to service themselves for non-outage requests

DNS Updates

Routing Updates

Bandwidth utilization

Contact updates

Build a NOC model focused on continuous improvement

Continuous reduction in fault frequency rate and trouble ticket volumes

Improvement in MTTR until consistently meeting target objectives

Pursuit of these initiatives results in delivering on the most critical NOC objectives:

Delivering the best in customer service

(14)
(15)

Greg Hadlock

303-731-6662

[email protected]

Thank You.

References

Related documents

Our key results are based on a structural gravity model of FDI that is applied to bilateral FDI data from UNCTAD (2014), while controlling for the presence, heterogeneity and depth

Reached conclusion by Kotásek (2012) was the frequencies of easy calving for Holstein breed within the range from 15.62 to 17.34%, the frequencies of difficult calving for all

He states that in a recent conversation with a guy who was taking delivery of a new bike, he learned that this person had been riding for a long time and had never had a

With contributions from an international list of experts, The Data Center Handbook instructs readers to: - Prepare strategic plan that includes location plan, site selection,

As settlement banks typically repay such overnight loans (that is, send a CHAPS payment) before taking out a new loan (that is, receive a CHAPS payment), borrowing from the

Given this negative result we then tested whether independently the unrestricted version of the monetary model, the UIP condition and the proportionality between the black

Difference between an Authorised Participant and market maker An Authorised Participant is a person approved by the Responsible Entity in accordance with paragraph 9.1(d) above,

Since the automatic detection of semantic concepts in image data based on using predefined models has made so much progress in recent years we are now seeing the problem of