Utilizing Equipment Data for Proactive Asset Management

(1)

Utilizing Equipment Data

for Proactive Asset Management

Mark Williams, Atmos Energy Russell W. Burch, Atmos Energy David Krenek, Caterpillar, Inc – Global Petroleum Group

Gas Machinery Conference – October 2012 – Austin, Texas

Executive Summary

Starting with the Clean Air Act of 1970, there have been a number of dynamics in the natural gas compression industry that have posed challenges to operators of natural gas compression equipment ‐ both from a technical and business perspective. One of these has been the development and utilization of higher speed, higher output natural gas engines to achieve exhaust emission regulations as they have become more stringent. Another is to meet the demand for higher power density equipment. The technologies and controls needed to achieve these requirements have resulted in engines that are more complex than those used in the past. Business pressures to improve reliability and reduce costs have increased the challenge faced by operations and maintenance organizations as they acquire and maintain engines with the newer technology. At the same time that organizations are trying to adapt to these technology changes, many are also trying to control costs by reducing manpower at compressor sites and are trying to react to a changing workforce. One way that Atmos Energy has chosen to address these challenges is to employ a service that leverages new technology to assist in the proactive management of their equipment.

Background

Atmos Energy Corporation, headquartered in Dallas, Texas is one of the country's largest natural‐ gas‐only distributors, serving about three million natural gas distribution customers in over 1,400 communities in nine states from the Blue Ridge Mountains in the East to the Rocky Mountains in the West. Atmos Energy also provides natural gas marketing and procurement services to industrial, commercial and municipal customers primarily in the Midwest and Southeast and manages company‐owned natural gas pipeline and storage assets, including one of the largest intrastate natural gas pipeline systems in Texas (Figure 1).

Atmos Energy operates a mixed fleet of natural gas compressors in its Atmos Pipeline – Texas Division totaling 98,700 hp at 5 storage facilities and 11 mainline compressor stations. The mix consists of

(2)

 Natural Gas Engine Driven High‐Speed, Separable Compressors ‐ 44,980 hp  Gas Turbines – 41,400 hp  Integrals – 12,400 hp One of the compressor stations that is the subject of this paper is located northwest of Fort Worth, Texas. The other site is located near Ennis, Texas. Both sites have two (2) Ariel JGC‐6, single‐stage compressors driven by Caterpillar G3612 engines (Figure 2)

Atmos decided to try the service at the first site because of new contractual requirements that changed the operating mode of the station. Under the new contract, both units were required to operate 24 hours a day, 7 days a week. The compressor units are configured for remote starting and stopping by Gas Control to help in the management of system demand – i.e. pressure and flow control. Each unit has 9 load steps that utilize both head end cylinder deactivation and fixed Figure 1 Atmos Pipeline‐Texas System Map

(3)

clearance pockets to regulate flow based upon set point. The station is manned 5 days a week during normal business hours. It is unmanned at night and on the weekends – it is this gap in operational surveillance where the monitoring service has been the most helpful.

The station operator has a very good working knowledge of the engine and compressor and possesses excellent diagnostic skills. One of the aspects of the service is for the operator to meet via teleconference every other Tuesday with the service provider’s Condition Monitoring team. The purpose of the meeting is to review the flagged items that the team has observed, items that the operator has observed and any corrective actions that the operator has taken. Topics of the meetings include shutdowns that may have occurred (both scheduled and unscheduled), vibration issues, engine operational parameters such as ignition secondary voltages, and any event or diagnostic codes generated by the engine. This process is very helpful in determining potential problems before they cause a significant problem or failure.

The monitoring system’s trending capabilities enables the operator to go back and look at the performance of the engine at any given time for any area of concern, or recurring problem. Within the trending options the operator can spot a problem area and troubleshoot it. This gives the operator a chance to compare the engine control system’s desired and actual parameters. For example the operator can review the combustion timing trend and the secondary voltage trend and

Figure 2 ‐ Ariel JGC‐6 / Caterpillar G3612 Compressor Package

(4)

observe what the engine’s performance has been for any given period of time. This will allow him to make a determination of what cylinder or cylinders are causing the engine to run unstable or misfire. With this data it makes daily operations of the unit run much smoother and enhances the operator’s ability to improve equipment reliability. Another benefit of the system is that the Condition Monitoring team is able to observe how the engine controls (such as the waste gate, fuel and choke systems) are performing during failed start attempts. Examples are elaborated in following sections.

Electronic Data Condition Monitoring Service

Overview

The Condition Monitoring service was developed to address challenges that many operators of natural gas compression equipment were facing. These include

1. Increasing pressure to improve reliability and at the same time decrease operating and maintenance costs.

2. Managing business with a changing workforce – primarily with the loss of talent from retirees. 3. Increasingly more complex natural gas engines

4. Trying to maintain reliability of operations with either reduced attendance or unattended compressor stations.

5. Operating equipment in areas of gas production where no previous infrastructure or skill base exists.

In order to meet the higher power density requirements and lower exhaust emission requirements, natural gas engines have become increasingly more complex. The complexity results from the controls and the corresponding sensors necessary to more precisely manage combustion. One of the benefits of this increased number of sensors on current production engines is that there is more data available to provide insight into the health of the engine. Unfortunately, the volume of data and its complexity make it impractical to analyze the data at the site. There are several factors that contribute to this impractical aspect

1. Operators (if there is one is at the site) do not have time to review the data

2. There are very few one‐to‐one cause‐effect relationships between parameters. Relationships are typically multidimensional.

3. Dedicated reliability engineers or analysts are not located at the site.

The lack of available site resources was a key factor in designing the service to be based upon shipping the data offsite for analysis. This methodology provides the additional advantage of a central location from which analysis results can be distributed to experts for interpretation.

The primary objective of the service is to provide early warning of pending component failures in order to reduce the number of unscheduled outages. Unscheduled outages can result from a variety of failure types ‐

(5)

1. Random early hour failure of new components ‐ generally referred to as “Infant Mortality”. 2. Random failure of components during their useful life – generally characterized as a machine’s

“Reliability”.

3. Random failure of components as they reach the end of their useful life under normal conditions.

4. Early end of life of components induced by operational and/or maintenance practices.

Key Processes

There are 5 key steps in turning data into corrective action. See Figure 3

1. Aggregating Data ‐ Data from the various assets at the site must be aggregated to a central jump off point where the data can receive a common timestamp. Sometimes this poses technology challenges because disparate devices using proprietary protocols are in use at the site. 2. Transmitting Data –The most practical solution available for the site and data rates must be chosen for transporting the data off of the site. Multiple technologies exist to accommodate this. 3. Finding Exceptions – The first significant technological challenge in the process is finding those pieces of data that are indicative of component degradation and at the same time not require minimal human effort of highly skilled analysts. 4. Determining Root Causes – Once exceptions in data are found, further analysis is required to determine the component and associated failure mode that is driving the behavior observed in the data. This is more difficult when multiple components are influencing the same parameter.

5. Developing Corrective Action – Finally, a timely course of action must be developed to address the potential failure before it causes an unscheduled outage.

(6)

Figure 4 – Condition Monitoring Service Model

Figure 4 illustrates the overall process of performing the Condition Monitoring Service that incorporates these 5 key processes. The Condition Monitoring team who execute the service consists of 1. Condition Monitoring Analyst 2. Performance Engineer 3. Application Specialist

1. Aggregating Data

The first step in the overall process is to collect the electronic data at the site. The data can come from the engine, the driven equipment and any ancillary equipment. OSISoft’s PI System is used for the service’s data infrastructure – to provide data aggregation, data archiving, analytics and visualization. Multiple interfaces exist to allow data collection from a wide variety of industrial control systems, DCS’s and SCADA systems. The data is collected from the various i/o devices and controllers at the site utilizing the site process control network. The interface is Windows‐based and requires a Windows workstation. Data is sampled once every second in order to have sufficiently high fidelity data to capture transients and short duration, intermittent events that are precursors to some failure modes. For the engine, the data set includes analog values measured by on‐engine sensors (such as pressures), values derived by the engine controller (such as load), digital states and diagnostic codes.

Ser v ice Prov ider Cus tome r

(7)

Site Implementation Considerations

Define the Scope of Work To ensure project success when implementing a condition monitoring solution, develop a scope of work early in the project and define which party is responsible for the overall implementation of the work to be done. Generally, there are three parties working simultaneously to design and install a condition monitoring system project, 1. The company providing the monitoring system and service. 2. The integration contractor. 3. Operations and/or maintenance group that owns the equipment being monitored.

Choose a reputable condition monitoring service provider who will specify and supply all equipment and software needed to manage the data and that can provide engineering, testing, commissioning and will also provide support for the equipment should it fail. It is important to define and document in the contract who owns the equipment supplied by the service provider once it has been installed on site. Equipment commonly provided from the condition monitoring service provider can include items such as a desktop PC, router, modem, cell booster and antennae, and cables. The integration contractor should be responsible for moving the data from the unit level into registers that the monitoring company’s system can access and also verify that the data is still available for use by SCADA and other host systems. The integration contractor may also provide additional oversight during the design and installation phases of the project. Personnel responsible for the daily operation and maintenance of the monitored equipment should be familiar with the system layout and design during the installation and commissioning phases. They should also be responsive to corrective measures identified by the monitoring service provider when it “goes live” so that the equipment is maintained properly and functions reliably.

(8)

Network/Data Security

Consult with your IT Network Security Group to ensure that the solution will be consistent with data privacy, data security and network security policies. It is essential to do this early in the project as it may impact the how the solution is designed. While most IT Groups are very knowledgeable in PC data/network management, some lack experience of the potential security risks associated with control system networks which can mean longer delays to project completion and the addition of supplemental hardware to prevent possible risks.

Capital and O&M Costs

The equipment, installation and integration effort portion of this type of project can be charged as a capital expense. Upfront capital costs can vary depending on available data bandwidth on the existing network and the type of control system in use. A robust communication network is preferred for this type of system since all data points associated with the engine(s) and compressor(s) will be duplicated into the condition monitoring service provider’s interface. The monthly monitoring service, analysis and reporting costs would be included in the monitored company’s operations or maintenance budget. One consideration in choosing a monitoring service is whether the cost is based upon a flat monthly rate as opposed to equipment operating hours. For instance, storage and injection units generally operate less frequently than mainline transmission

Figure 5 ‐ Network Topology for Station 1

(9)

units. Some monitoring companies monitor only the driven equipment data while others will monitor both the driver and driven equipment data which can result in faster response to capturing and correcting issues associated with the driven equipment.

Figure 5 illustrates the network topology at the Station 1. Each Unit Control Panel (UCP) has an Allen‐Bradley CompactLogix 5332C PLC as the processor. The engine control network utilizes a proprietary protocol referred to as Cat Data Link (CDL).

Figure 6 ‐ Station 1 UCP before Modification

(10)

If engine data is desired in the UCP, an off‐boarding device is required to extract the data from the CDL and convert it to some other non‐proprietary protocol. In the original configuration of the UCP at Station 1, Caterpillar’s legacy Customer Communications Module (CCM) was the off‐boarding device (Figure 6). The M5X protocol output of the CCM was converted to Modbus 232 using Monico’s legacy CCM Translator and routed to a ProSoft MVI56‐MCM Communication Module. The original installation was not configured to extract all of the engine data that the Condition Monitoring service utilizes. To avoid complications with working with legacy devices, the CCM/CCM Translator pair of devices was replaced with a Caterpillar PL1000E translator configured for Modbus 232. See Figure 7.

The site utilizes Allen‐Bradley’s ControlNet communications platform for the process control network. Each UCP is connected via ControlNet to the Station PLC that uses Allen‐Bradley’s ControLogix 5561 PLC. The operator HMI workstation is on the same ControlNet network and runs Allen‐Bradley’s RSLinx.

Data is exposed to the PI System by RSLinx acting as an OPC Server via an OPC interface installed on the HMI workstation. The interface is the single point collector for engine and compressor data

(11)

from both units. The hardware and installation requirements can vary by site depending upon the type of unit and station control panels and network architecture. For the 2nd site at which the Condition Monitoring service was implemented, the interface was installed on a separate PC, and Monico’s CDL Gateway was installed as the engine off‐boarding device. The topology for this site is illustrated in Figure 8.

Site Systems Integration Considerations

Data Interface

Since there are many different manufacturers of control systems, how the data is collected and stored in registers can vary widely. At the first site the PL1000E communication interface was used.

At the second site Monico’s CDL Gateway (Figure 9) was utilized to move the data into the proper registers. Either device can be configured to accept and translate signals over different communication protocols from the data link. Both devices can be used with RS232, RS422, RS485 (2 or 4 wire) or Modbus TCP. The CDL Gateway was chosen for the second site for several reasons ‐

Figure 9 ‐ Station 2 UCP After Modification

Unit PLC Monico CDL Gateway

(12)

1. Communicates natively with the Allen‐Bradley PLC via Ethernet. This eliminates the extra layer of Modbus translation which saves on installation cost, improves system reliability and reduces integration labor associated with data mapping. 2. It delivers data already scaled. 3. It has a smaller footprint and can be mounted with standard industry hardware 4. It doesn’t require special service connectors ‐ just a standard USB cable. 5. It is furnished with multiple status indicators which simplifies troubleshooting Data Management (Mapping) Developing a “Modbus Map” (see Figure 10) early in the project can be essential to the outcome of the project overall timeline. The Modbus map should identify all of the data point addresses that are accessed and shared with the condition monitoring system. It also ensures that all data points have the same scale and engineering units.

Transmitting Data

From the site aggregation point data is transmitted to the central offsite server. A variety of communication solutions can be utilized for transmitting the data depending upon several factors. 1. Availability of any given communications technology at the site. 2. Cost of any available communications technology at the site. 3. Network security requirements imposed on the local process control network Communication methods that can be used include

Figure 10 ‐ Data Mapping

(13)

1. Cellular 2. Satellite

3. DSL or other land‐based solution

4. A co‐managed network connection through the operator’s Wide Area Network

5. A co‐managed network connection between the operator’s central server to the service provider’s server. For the subject sites, data is transmitted via cell network using a Sierra Wireless RavenX modem. Data and Network security is enabled by using a router that provides a IPSec, GRE encrypted VPN tunnel between the interface and server. Figure 11 shows the equipment used for data aggregation and transmission at Station 1.

Finding Exceptions

After the data is received by the central server, the server sends a snapshot of the equipment analog values once every 30 minutes to a data analysis application for to identify exceptions (or changes) in the data. The service utilizes SmartSignal’s EpiCenter analytics solution for this effort. EPICenter utilizes non‐parametric, multi‐variate data analysis to find anomalies in the data. Traditional, single channel alarms utilize a fixed setpoint for determining when a parameter has exceeded acceptable limits. Because of this, the setpoint has to be set outside the normal operating range of the measured process in order to prevent spurious notifications. The technique used with the Condition Monitoring service varies from traditional, single channel alarms in that dynamic or “floating” setpoints are used to set the limits for parameters as they vary within the normal

Figure 11 ‐ Interface and Communication Arrangement

Windows PC HMI Workstation

VPN Router Cell Modem

(14)

operating range. This allows for the identification of parameter deviations before the parameter reaches an excessively high or low value – thus providing an earlier warning.

The software creates the floating setpoints for each parameter by estimating what the expected value of a given parameter is in any given snapshot of data. It creates the estimate based upon the value of all of the parameters in the current snapshot of the analysis model and based upon the values of the same parameters in a historical reference data set. The reference data set is unique to the asset and is created from data captured during the first 30 days of data feed.

When a persistent deviation occurs between the actual value and the estimated value of a parameter, the system creates an incident and posts it to a dashboard. The threshold that determines when an exception is created varies from parameter to parameter and is tuned by the Performance Engineer to optimize for false positives and misses.

Determining Root Cause

Once each day, the Performance Engineer reviews the dashboard for each asset to see if any new exceptions have occurred during the previous 24 hour period. If there are new exceptions, the next step depends upon the nature of the exception.

1. The exception is a suspected false positive – commonly created by changes in equipment operation.

2. The exception and corresponding data represent a well‐defined failure mode 3. The exception and corresponding data represent a less‐defined failure mode

Once each week, the Condition Monitoring team meets with the asset operator via teleconference. In the case of a suspected false positive, the exception is reviewed by the Condition Monitoring Analyst and Operator to validate if a change in operating conditions caused the false positive – such as change in suction and/or discharge pressures of the compressor or compressor cylinder loading. If there are new operating conditions, then the Performance Engineer trains the new data into the reference data set to prevent future false positives.

For those cases where the failure mode is well defined, notification will take one of two paths depending upon the anticipated time to failure. For those failure modes that have a 2 plus week window to failure, the Condition Monitoring Analyst or Performance Engineer advises the operator during the normal, weekly scheduled update meeting. For shorter windows, an email is sent to the operator.

For those cases where the failure mode is less defined or new, the Condition Monitoring Analyst and/or Performance Engineer will take additional steps depending upon the circumstances. These include

1. Obtaining assistance from an Application Specialist 2. Obtaining assistance from Factory Experts

3. Requesting the operator to gather additional site data

4. Asking the operator if any operational changes were made and/or if maintenance activities had been performed.

(15)

During this phase of the process, the team may face the challenge of discerning the effects of concurrent component failures as they drive a change in common parameters ‐ either in the same direction or in the opposite direction. For example, exhaust port temperature can change with air/fuel ratio, power, speed, air inlet temperature, misfire and ignition timing. Therefore a failing spark plug or a failing pre‐chamber check valve can be the root cause of temperature deviation. Narrowing down the potential root cause may involve drilling down into the data, obtaining deep expertise from Subject Matter Experts, interviewing the operator on recent changes or requesting the operator to perform simple follow up measurements.

Developing Corrective Action

After potential root causes have been identified for the exceptions in data, the next process is to develop and report the corrective action. This process is a collaborative effort involving the Condition Monitoring Analyst, Performance Engineer, Application Specialist and the asset owner’s operations and maintenance staff. While there may be little variation in the scope of the corrective work, the scheduling can change depending upon 1. Production requirements 2. Current outage schedule based upon planned maintenance or production 3. Estimated time to failure 4. Severity of failure consequences This planning is one of the key topics of the bi‐weekly meeting of the Condition Monitoring Team. Other topics of this meeting include review of existing outstanding actions and validating that previously performed corrective actions have resolved the underlying root cause.

Reporting

Another feature of the service is a web application that provides visibility of asset health and performance to the various stakeholders in the value chain. One of the views available is real time data of the monitored parameters for stakeholders to observe current health. There are also standard trends for monitoring changes in operation as well as the effects of corrective action. Figure 12 is an example of high level site view showing current status, runtimes and speed and torque trends. Figure 13 is an example of a detailed package level view. Three histograms show utilization of the asset based upon speed, torque (as measured by engine) and power based upon lifetime of the available data and the previous 28 days of operation (Figure 14).

(16)

Figure 12 ‐ Site View

(17)

Examples of Proactive Measures

Pre‐chamber Port Erosion

Figure 15 illustrates an example of a series of exceptions being reported by the data analytics software over a period of 6 months. This example is for an engine that was not a part of this project. It is being included here to serve as an example of how slow developing events are tracked as top end overhauls were performed on the subject engines shortly after the Condition Monitoring service started. The plot is for filtered combustion time on a cylinder of a G3600 natural gas engine. The blue trace is the actual measured combustion time. The green trace is the estimated value from

Figure 14 ‐ Histogram Example

Figure 15 ‐ Combustion Time Exceptions

(18)

the analysis application. During the period there were numerous incidents reported – some involved the actual value being higher than estimated and some were where the actual value was lower. During the entire period the cylinder was performed within acceptable limits. With the approach of an out of service preventive maintenance operation, the Condition Monitoring Team recommended that the operator boroscope the pre‐chamber of this cylinder during the planned outage. Port erosion or a pre‐chamber coolant leak was suspected. Inspection confirmed that the ports had eroded, and the pre‐chamber was replaced during the outage. After replacement, the pattern returned to normal. Early warning of pre‐chamber erosion allows the operator to replace the pre‐chamber during a scheduled outage instead of in a possible emergency situation later when the engine needs to be tuned to maintain emissions compliance or is shutting down on detonation of the other cylinders because of excessive misfire on the subject cylinder.

Nevertheless, with 2 months of data on Unit #2 at Station 1, 5 cylinders were identified as potential suspects for port erosion as the top end overhaul approached. Four were replaced during the overhaul. Figure 16 shows the variation in the combustion time of one of the suspected cylinders ‐ Cylinder #6(cyan trace) prior to the Top End overhaul.

Engine Lube Oil Filter – Scheduling Replacement During Routine Maintenance

Prior to the scheduled change of the engine lube oil filters data analytics reported an incident of increasing engine lube oil filter differential pressure on Unit #2. The incident was reported in the regularly scheduled weekly meeting. See Figure 17. While the value at the time of reporting was not at the engine alarm (15 psi) or shutdown (45 psi) point, the team wanted to confirm that no abnormal condition was driving the increase in pressure drop. The latest lube oil analysis report was consulted and was found to be normal. The operator reported no other unusual behavior of the engine. The incident was put on watch to ensure that a shutdown condition would not occur before the next scheduled change. On 16 August the operator took the opportunity of a shutdown for routine compressor maintenance and replaced the oil filters to avoid shutting the engine down later for the replacement. The gradual and accelerated rise of oil filter pressure differential as well as the return to normal pressure can be seen in Figure 18. The period of time is 274 days.

Figure 16 ‐ Unit 2, Cylinder #6 Filtered

Combustion Time

(19)

Figure 17 ‐ Engine Oil Filter Differential Incident

Figure 18 ‐ Trend of Oil Filter Differential Pressure

Filter Changed Oil Filter Differential Pressure Incident Reported

(20)

Compressor Performance

The technology used in the service works for reciprocating gas compressors as well as engines. Figure 19 is an example of the data analysis for the compressor as load steps are changed. When the outer end of the compressor cylinder is unloaded with suction valve unloaders, the discharge temperature of the crank side increases due to the pre‐heating of the suction gas as it is moved in and out of the de‐activated end. By including the load step in the analysis and training in the data, a false positive in the analysis is prevented. Note that incidents are being created on cylinder #1 discharge temperature but not #3 and #5 as load step is changing between Load Step 3 and Load Step 4

Spark Plug Failure

Figure 20 is an example of a catch involving a failing J‐type spark plug. The failure mode involved the precious metal tip becoming delaminated from the center electrode. As the tip is pushed away from the base material it reduces the gap with the side electrode. This failure was identified first by an incident with decreasing secondary transformer voltage. Drilling down into the data revealed an approximately 1‐hour long event in which misfire gradually increased, peaked and returned to normal. During the excursion of this episode, the engine did not shutdown. It is typical in this failure

Figure 19 ‐ Compressor Pressure and Temperature Changes with Load Steps

Load Step Cylinder #1 Discharge Temperature Discharge Pressure

(21)

mode for the 1st occurrence to result in a short duration misfire episode. However, as the degradation progresses, the cylinder will eventually go into continuous misfire as the precious metal tip grounds the center and side electrodes. Loss of a power cylinder while the engine is near rated torque generally results in a detonation shutdown by one of the other cylinders as load is increased on the remaining cylinders in order to maintain speed. Time to failure from first incident can be a few days to a couple of weeks. With this type of failure mode, the operator is typically advised by email at time of the first event. For this particular case the operator chose to shut the engine down on the day before a 4‐day holiday weekend, to minimize the risk of a callout during the long weekend.

Sensor Failures

One of the benefits of higher frequency data sampling is the ability to capture intermittent and short duration events (10 to 30 seconds) that will go unnoticed by operators. Some failures modes manifest themselves in short duration events prior to having a persistent indication. This is true of some sensor failures and some mechanical failures such as “sticking” linkages. One of the engine control system’s functions is to monitor sensor and electrical system health. If voltages or currents are determined to be outside of normal ranges, “Diagnostic” warnings are issued. How the Condition Monitoring team reacts to these depends on the sensor, criticalness of the sensor and the observed failure mode. One example of such a failure is the turbine inlet temperature sensor – a thermocouple. Catching the short duration, intermittent diagnostics issued by the engine early in the sensor’s failure, allows anywhere from 2 to 4 weeks visibility of the sensor failure before it is recognized by the operator. The recommendation given when these failures are observed depends upon the failure mode. The control system has both a high warning and high shutdown setpoint for turbine inlet temperature. If the pending failure is resulting in a false low temperature being reported, the recommendation by the team is replace the sensor at the next scheduled service when

(22)

the engine will be down. However, if the failure mode is resulting in a false high temperature being reported, then the recommendation is to replace the sensor as soon as practical in order to avoid a spurious shutdown. Figure 21 illustrates such a case.

Pre‐Chamber Check Valve

The 2‐way dialog between the operator and Condition Monitoring Analyst is essential in narrowing the potential root causes in situations where one of several events can drive data exceptions or where multiple events are driving the exceptions. Such is the case in one example that resulted in the replacement of a pre‐chamber check valve. In May 2012 the data analytics posted an incident

for combustion time of Cylinder #10 on Unit #1 running slower than normal. See Figure 22. It is not unusual for cylinders to have combustion times to drift either fast or slower as components wear or become fouled. The standard approach to such incidents is to monitor the combustion time for continued degradation or for improvement when engine calibration is performed. In this particular case, the operator noted to the team that the combustion time did not respond to calibration. The Condition Monitoring team then recommended the replacement of the pre‐chamber check valve at

Figure 21 ‐ Sensor Failure

Figure 22 ‐ Filtered Combustion Time Increase

08-Apr-10 12:15:00 08-A pr-10 04:15: 00 8.00 hours Plot-0 -50 50 150 250 350 450 550 650 750 -50 750 0 5 08-Apr-10 11:44:21 568.99 461.12 1491-4 Turbine Inlet Temp with High Temp Spike Turbine Outlet Temperature – Normal Reading Intermittent, Short Duration Diagnostic Code ‐ 1491

(23)

the next scheduled outage as secondary transformer voltage did not indicate an excessive gap or spark plug fouling issue. Replacement of the check valve during a scheduled outage prevented a future unscheduled outage that would have resulted from a dead cylinder as the check valve failure mode progressed. Figure 23shows the change in misfire after the check valve replacement.

High Fidelity Data’s Role

As mentioned earlier, most parameters are sampled at 1 second intervals. Figure 24 and Figure 25

Figure 23 ‐ Pre‐chamber Check Valve Replacement

Figure 24 ‐ Trends for Apparent Process

‐6 ‐4 ‐2 0 2 4 6 0 500 1000 1500 2000 2500 3000 3500 4000 ‐6 ‐4 ‐2 0 2 4 6 0 500 1000 1500 2000 2500 3000 3500 4000 ‐6 ‐4 ‐2 0 2 4 6 0 500 1000 1500 2000 2500 3000 3500 4000

(24)

illustrate how faster sampling frequencies can better insight into process behavior. What appear to be 3 separate processes is actually the same process – misrepresented by three slower data capture frequencies with slightly different sampling frequencies.

Following is a case where high fidelity data allowed the capture of an early warning of a component issue and led to the narrowing of the scope of corrective action.

One piece of data used in the analysis was a series of 3 event codes issued by the engine management system over a period of one minute. Each of the codes was active from 10 to 20 seconds. The 3 codes were

Timestamp Code Description

27‐Nov‐11 21:11:07 1045‐1 Low Intake Manifold Pressure 27‐Nov‐11 21:11:22 242‐1 Engine Overload

27‐Nov‐11 21:12:02 411‐1 Cylinder #11 Detonation Warning

The scope of potential root causes for these events was first narrowed down by overlaying them on the actual engine parameters at the time of the events – Figure 26 . A 10‐minute long view of the data (Figure 27) reveals that Gas Control had unloaded the compressor (yellow trace) to the

Figure 25 ‐ Recorded Trend vs Actual Process

‐6 ‐4 ‐2 0 2 4 6 0 500 1000 1500 2000 2500 3000 3500 4000 ‐6 ‐4 ‐2 0 2 4 6 0 500 1000 1500 2000 2500 3000 3500 4000 ‐6 ‐4 ‐2 0 2 4 6 0 500 1000 1500 2000 2500 3000 3500 4000

(25)

minimum load step and was loading it up again. Just prior to the Low Intake Manifold Pressure Warning and Engine Overload Warning there was no further block loads being applied and the engine was at 60% Indicated Load. Other traces showed steady suction and discharge pressures – thereby eliminating a process upset as the cause. Referring to Figure 28, the team could see that during the loading of the compressor that the engine

Figure 26 ‐ Event Codes Superimposed on Parameters

Figure 27 ‐ 10 minute view of events

1001 58.463 6 0 21.691 21.716 48.469 Engine Speed

Engine Load Factor

Compressor load step

Engine ECM Active Event CID Number 01

Desired Air Manifold Pressure

Actual Air Manifold Pressure Wastegate Position Command

27-Nov-11 21:13:00 27-Nov-11 21:10:00 3.00 m inutes Sample 0 200 400 600 800 1000 1200 0 120 0 10 0 2000 0 40 0 40 0 120 27-Nov-11 21:11:08 991.7 63.044 6 1045 22.976 19.330 2.8602 27-Nov-11 21:11:23.2 1002 110.67 6 242 37.754 39.929 23.464 27-Nov-11 21:12:03.1 1005 51.755 6 411 19.207 18.500 54.636 999.7 61.619 7 0 22.532 22.621 47.113 Engine Speed

Engine ECM Active Event CID Number 01

Desired Air Manifold Pressure

Actual Air Manifold Pressure Wastegate Position Command

27-Nov-11 21:14:00 27-Nov-11 21:04:00 10.00 minutes Sample 0 200 400 600 800 1000 1200 0 120 0 10 0 2000 0 40 0 40 0 120

(26)

air manifold pressure was not increasing as the engine commanded the wastegate to close. The command went to full close with no increase in air pressure. Shortly after the command reached 0%, there was a sudden increase in air pressure (apparent sudden closing of the wastegate) which caused the engine to go into lean misfire. The higher boost and fuel flow during this period cause the calculated load to exceed 110% even though the compressor load was still around 60%. During the subsequent control recovery the mixture went momentarily rich causing Cylinder #11 to detonate.

Even though there are a number of root causes that could generate the event codes listed, the synchronized higher frequency data allowed it to be narrowed to the air system. The recommendations to the operator for this event were

1. Inspect Waste Gate linkage for excessive looseness or binding 2. Repair / Replace Waste Gate actuator

3. Inspect / Replace Waste Gate for carbon buildup or worn shaft bushings.

As a comparison, Figure 29 shows how the parameters would have appeared with 1‐minute sample frequency. With 1‐minute sample frequency, the tell‐tale event codes would have not been captured. The increase in desired air pressure would have appeared to coincide with a block load on the engine. An early warning of a potential failure would have been missed.

Figure 28 ‐ 1‐minute Sequence

1001 58.463 6 0 21.691 21.716 48.469 Engine Speed

Engine ECM Active Event CID Number 01 Desired Air Manifold Pressure Actual Air Manifold Pressure Wastegate Position Com mand

27-Nov-11 21:13:00 27-Nov-11 21:10:00 3.00 minutes Sample 0 200 400 600 800 1000 1200 0 120 0 10 0 2000 0 40 0 40 0 120 1.

•Actual Air Pressure (White) is less than Desired Air Pressure (Red) •Engine Commands Wastegate (Blue) to close 2. Air Pressure Not Increasing 3. Command Now Full Closed 4. Low Pressure Alarm Active Increasing Air Pressure Indicates Wastegate has Closed 5. Engine Indicated Load Peaks (Cyan) and Overload Fires 6. As A/F Ratio overshoots on the rich side, Cylinder #11 Detonates

(27)

Hydrax Pressure Switch Failure

Another case where high fidelity data assisted in capturing intermittent, short duration events occurred with a Hydrax Pressure Switch. The engine’s fuel valve, wastegate and choke are actuated with hydraulic pressure supplied by an engine mounted pump. To ensure that sufficient hydraulic

pressure exists during engine cranking to control the fuel, a pressure switch is used to generate the permissive. After a series of intermittent failed starts annunciated with Overcrank by the engine control system, review of the data showed that no fuel command had been issued during the cranks. This could be caused by a worn pump or faulty pressure switch. Check of the hydraulic pressure

Figure 29 – 1‐minute Sampling

Figure 30 ‐ Start Failure ‐ Hydrax Pressure Switch

999.748 61.6189 7.00003 0 22.5315 22.6209 47.1130 Speed 1 Minute Load 1 Minute

Compressor Load Step 1 minute Event 1 Minute

Desired Air Pressure - 1 Minute Actual Air Pressure - 1 minute Wastegate Command - 1 Minute

27-Nov-11 21:14:00 27-Nov-11 21:04:00 10.00 minutes Sample 0 200 400 600 800 1000 1200 0 120 0 10 0 1 0 40 0 40 0 120

(28)

during cranking revealed sufficient pressure. Replacing the pressure switch resolved the problem. Figure 30 shows the failed cranking cycle with “0” fuel command and the subsequent issue of the “Overcrank” event code.

Insight Gained Through Starting Transients

Insight into the condition of engine components is also obtained through transients exhibited during the loading of the engine. The relatively large changes in speed and load during the startup phase require greater manipulation of control elements than the more steady state at load conditions. Figure 31 illustrates the response of the air control system response during one starting episode. As required air pressure increased, the engine commanded the wastegate to close. Even after requiring the wastegate to be completely closed, the actual air pressure did not meet the demand. In this particular case, it appears that the wastegate did not start closing until approximately 2 minutes after being commanded to full close position. Once air pressure demand was met, normal operation took place. With such warnings, linkage adjustments or actuator replacement can be performed during scheduled outages before component condition becomes poor enough to cause an unscheduled shutdown.

(29)

Results

The subject condition monitoring service was implemented on March 16, 2011 at one of Atmos’ compressor stations. Leveraging this technology has resulted in better insight into the utilization of their compression assets and more facts available on which to base operational and maintenance decisions.

The first month of service was used to collect data in order to build the initial analytics models. Figure 33 and Figure 32 are histograms for speed, torque (as measured by engine) and power for Units #1 and #2, respectively, showing the utilization of the assets during the period included in this paper. Table 1 shows key dates, events and runtimes.

Unit #1 Unit #2

Top End Overhaul Performed 23 JUN 2011 27 MAY 2011

Engine Hours at TE 22894 22228 Run Hours from 16 MAR 2011 to 1 SEP 2012 10667 10698 Utilization from 1 APR 2011 to 1 SEP 2012 85.6% 85.9% Utilization from 1 JAN 2012 to 1 SEP 2012 97.2% 97.1%

Table 1 Utilization

Figure 32 ‐ Unit 2 Histograms

Figure 33 ‐ Unit 1 Histograms

(30)

The Average Reliability of the 2 units for the period (running hours divided by running hours plus downtime hours for unscheduled outages) was 99.5 %. The Average Mean Time Between Outages was 488 hours. The regular dialog that occurs between the Station Operator and the Condition Monitoring team has resulted in an additional benefit. Through the explanation of why certain events are occurring, the operator’s skill in troubleshooting and diagnosing engine events has increased – contributing to the ongoing effort to improve reliability.

As the result of an increased investment in operator and technician training, improvements in overall maintenance strategy and condition monitoring; callouts for Station 1 have decreased from multiple occurrences per week to only 2 callouts over the past 6 months.

While the technology employed with the Condition Monitoring service reduces the amount of human effort required to assess equipment health and condition, it is important to note that the overall success of a condition monitoring program is still dependent upon human interaction. A culture needs to exist that promotes active dialogue among the various stakeholders in the value chain. Having someone constantly looking over your shoulder can be intimidating to an operator unless management has created an atmosphere of trust where discoveries are treated as learning experiences as opposed to reprimands. __________________________________________________________________________________________________________________ Acknowledgements: The authors would like to express their thanks and gratitude to the following:  Bharat Trivedi and Marlan Jarzombek, Atmos Energy. For their leadership and support in leveraging new technologies to assist operations and maintenance staff in meeting their reliability goals.  Justin O’Dell, Atmos Energy Plant Control Specialist at Station 1 ‐ For his effort and patience to apply the new technology.  Jeremy Ash and Larry Thayn, Holt Cat – For their initiative and ongoing support of the service  The following individuals who contributed to the overall integration and implementation of the service for the sites - Boyce Hardin, BH Systems Consulting, LLC - Paul Franks, Universal Automation Systems, Inc. - Michael Hjorten, Casne Engineering - Tarun Mannepalli, Caterpillar - Shawn Sellers, Caterpillar - Mark Grondin, Caterpillar