• No results found

Overview of Reliability Problems

Reliability is often defined as the ability of a device to fulfill its intended function during an interval of time [9]. Today, reliability engineering is applied to many products as failure can causes catastrophic consequences such as aircraft accidents. Even some small failures, such as the failure of a mobile phone, are a nuisance and could give the company that produces it a bad reputation. According to an industry based survey in 2009 [10], the converter is one of the most unreliable parts of an electrical system operating in a harsh environment, such as wind turbine. The cost of converter failure and maintenance is high in the case of off-shore operation due to the inaccessibility of the system. The system reliability can be greatly increased by replacing devices before they fail. Therefore the reliability of the converter is a critical issue.

R ate of f ai lur e Mission time Useful life

Infant mortality Wear-out

Figure 1.5: Idealized bathtub curve.

bathtub curve shows in Fig. 1.5. This curve consists of three regions: infant mortality, useful life and wear out. In the first region, failure is mainly caused by manufacturing defects or poor workmanship. The defective products tend to fail early, so the failure rate decreases with time. The failure rate reaches a constant level in the useful life region, where the failures observed are considered as random failures. In the last region, the products start wearing out and consequently the failure rate begins to rise as they experience more stresses.

Availability of a system over its life cycle is considered as a factor of system reliability. It is defined as the percentage of time when system is operational over its designed life cycle. Availability is the goal of most system users and could be increased by either improving reliability or reducing the Mean Time To Repair (MTTR) according to Eqn. 1.1. The early failure can be detected by the burn in test before leaving the factory to improve the device reliability. Estimating the devices lifetime, detecting their wear out region and then replacing the potential faulty devices could help optimise the maintenance to reduce the MTTR hence greatly increase the system availability, especially in the case of off-shore wind power generation as shown in Fig. 1.2 due to the inaccessibility of the system.

Availability = Mean T ime Between F ailure(MT BF)

MT BF +MT T R (1.1)

Generally, there are four methods which are used to evaluate the lifetime of a power module [11]. They are the qualification procedure, theoretical calculations, field failure experiences, and the physics of failure method. Qualification procedure refers to a standard test with defined conditions. The test condition normally defined by international standard organisations, such as IEC, for a specific industry sector. The product that passes these tests are considered reliable. This method is simple for companies since no study is needed. However, this method normally only gives the general conditions and do not takes the application into account. Therefore its accuracy is not guaranteed.

Theoretical calculation refers to the estimation of reliability based on the accelerated tests and statistical models. The test results are fitted to a predefined statistical models and then extrapolated to the field conditions to give an estimation of the failure rate. The ratio of lifetime between the normal use level and higher test stress level is termed acceleration factor (AF) [12] . For the package failures, the lifetime is determined by cyclic thermomechanical stress, the Coffin-Manson equation is widely used to estimate the number of cycle to failure

Nf, as shown in Eqn. 1.2. Where b and C are constants that fitted from accelerated test

results. In this case, the AF could be given by Eqn. 1.3, where Tjn is the field IGBT

junction temperature cycle and Tja is the acceleration IGBT junction temperature cycle.

This method is widely used to give the reliability estimation. However, its drawback is the difficulty in collecting data as well as the failure mechanisms at different operation condition might be different. Nf = C ∆Tj b (1.2) AF = ∆Tja ∆T b (1.3)

The field failure experience involves the failure data collection and analysis. Although this method gives accurate reliability estimation, it is unrealistic to carry out since the difficulty of data collection. Furthermore, it is not suitable for estimating the reliability of new technology since this means test the new product in field condition. For example a product with designed lifetime of 40 years, it is impossible to take this test.

The reliability estimation of the above three methods are based on the test or field data, no actual physics of failure mechanism is considered. While the physics of failure method [13,14], which is developing rapidly in recent years, refers to the estimation of the device reliability based on the root cause of the failure process. This method requires a good understanding of the failure mechanisms, including the device material properties, the damage initiation, accumulation and propagation. Physics based failure models for different failure mechanisms are built to estimate the remaining lifetime of the device. In order to obtain an accurate reliability estimation in the field conditions, the mission profile is generated and used as the input of the lifetime damage simulation. The mission profile refers to a set of typical operational conditions that the device experienced during field operation. There are various failure modes for IGBT modules, two of the most commonly observed ones are both packaging failures: bond wire damage, as shown in Fig.1.6(a), and die-attach solder fatigue, as shown in Fig. 1.6(b). Fatigue is a process leading to the failure which is caused by repeated stress cycles below the tensile strength of the material [3]. Both failure modes are caused by the temperature cycling during operation and the thermal expansion coefficient (CTE) mismatch between adjacent layers. Therefore IGBT junction temperature

Tj is one of the critical factors for lifetime estimation. However, Tj is difficult to measure

during operation since the chip is buried in the module package. A possible option is to build an accurate electrothermal model to calculate the junction temperature during operation.

(a) Bond wire lift off [15]

(b) Die attach solder crack [16]

Figure 1.6: Two common failure mechanisms.