[008] Why Do Power Supplies Fail?
Switching power supplies remain a weak link of most electronic systems and fail more often than they should. The reasons for failure are discussed.
Introduction
Modern switching power supplies have been with us for three decades now. From full-bridge circuits operating at power levels up to 10 kW to flyback circuits operating at less than 1 W, a enormous amount of research and development has transpired over the years. Any yet, despite 30 years of design maturity, switching power supplies remain a weak link of most electronic systems. Most of them are noisy, hot, and fail more often than they should.
The Heart of the Problem
Let’s get right to the problem—it is the power FET. Don’t misunderstand this statement. Modern FETs from reputable manufacturers are rugged and reliable when operated properly. The problem is that it is easy for them to be improperly operated.
There is a huge disparity in the power capability of a FET, and the thermal rating of the package. For example, the common IRF640 device has an on-resistance of 0.15 ohms, and a voltage rating of 200 V. Electrically, the power rating is 26.67 kW! Thermally, however, the power rating is about 10 W for a TO220 package with appropriate heatsink. If the components around the FET that make up the power supply do not behave properly, a huge amount of stress can be placed on the device, leading to failure.
It is almost always the FET that eventually fails, but it is not always the direct cause of the problem. The reason for failure can often be very difficult to observe or prove. There are hundreds of parts in some power supplies, and numerous mechanisms that can cause destructive waveforms. Improper operation of the FET can lead to failures in less that 1 µs, and unless you just happen to be probing the exact waveform at the right time, you cannot see the failure occurring.
As I started planning this article, I envisioned giving detailed waveforms showing the failure mechanisms. However, there are so many potential causes, all there is time for is a list of types of failure that can be encountered. Each one of these could merit a full length conference paper in itself.
MOSFETs
As mentioned above, the FET is usually the ultimate victim of erroneous operation in a power supply. In my experience, it is not often the major cause of a failure, but just the ultimate symptom that lets you know something is not working properly. Since so much destructive energy can be unleashed in the FET, it is the most dramatic sign of failure.
The FET itself can sometimes be the root cause of the failure. The most common events I have seen include the following:
Overheating.
This is mainly due to improper heatsinking, packaging, and airflow. Low estimation of the conduction losses (which increase with temperature) and switching losses also lead to excessive temperature. With small packages, be aware that junction temperatures can be substantially higher than package surfaces.
Drain overvoltage.
Early FETs in the 1980s had problems with overvoltages. Later on, they became very rugged, and today, manufacturers even provide avalanche ratings which allow you to operate beyond the normal voltage specifications. My advice is not to use avalanche capability unless you have very good data for your application, such as data you may have to collect yourself. Even then, I will not personally design a circuit to operate in this region under any conditions. Be aware that the ICs that incorporate control logic and a FET in the same package have seriously compromised drain-source ratings, and they reduce further with temperature.
Drain overcurrent.
It is hard to destroy a FET with overcurrent since they are very rugged devices. The overcurrent leads either to overvoltage when it’s turned off, or overheating of the package.
Gate overvoltage
A weak electrical feature of FETs is the gate-source voltage rating. If you apply just a few volts over the rating, the gate oxide will fail very quickly.
Antiparallel diode.
The other Achilles heel of a FET is the internal diode. If the internal diode of a FET conducts, its reverse blocking capability is severely compromised with fast application of voltage. This frequently causes problems with full-bridge circuits, and can occur even at very light loads.
DC blocking
If you apply a dc voltage for a long time, a FET can begin to fail. I’ve only heard about this one time, and it’s an unusual application for a FET.
Power Diodes
Like the FET, power diodes can have high voltage and current ratings. However, due to the diode action, it is impossible for both of the ratings to be applied concurrently for any extended period of time. High current and voltage are only applied simultaneously during switching transitions with inductive loads. Diode failures can be caused by:
Overheating
As with the FET, this is mainly due to improper heatsinking, packaging, and airflow.
Schottky overvoltage
Do not for a minute think that schottky diodes can be pushed as hard as FETs. One cycle of overvoltage, and they are likely to fail. Derate the component, and make sure this never happens. Special care must be paid to full- and half-bridge circuits with dc blocking capacitors in the primary.
Switching losses.
This can be higher than anticipated due to reverse recovery of fast diodes. This is particularly a problem in boost power factor correction circuits where there are no magnetic elements to slow down rise and fall times.
Capacitors
Modern capacitors provide tremendous energy storage by volume, and are truly amazing components. However, they are often poorly specified by the vendor (as discussed in an earlier article in this magazine [1]) and are prone to failure if overstressed.
Overheating
This is an issue due to excessive current, or placement on the PC board near hot components. Overheating will lead to early failure.
MLC capacitors breakage.
These capacitors can easily crack in the large format packages. Proper stress relief must be provided for soldering operations, and for operation with high currents.
Overvoltage.
MLC capacitors will not tolerate overvoltage.
Temperature Sensitivity.
Electrolytic capacitors dry out with prolonged ambient temperatures and current. They freeze with low temperatures, and this can lead to instability.
Safety Hazards.
Tantalums can fail in a short-circuit condition, leading to potential fire safety hazards. Like MLCs, they do not tolerate overvoltage well.
There are no ideal capacitors, and all power capacitors must be designed into the circuit carefully with a full understanding of all its characteristics and tradeoffs.
Magnetics
Actual catastrophic failure of the magnetics is an extreme condition. The insulation on the wire has to melt or fail, and usually you will see the semiconductors fail before this happens. However, the magnetics can enter regions of nonlinearity that overstress the rest of the circuit. Magnetic failures can be caused by:
Overtemperature.
Hot spots are often created in magnetics, and insulation can melt inside the parts even before an appreciable temperature rise is felt at the surface.
Saturation of the core.
Whether due to poor design margins, or overtemperature, inductances can reduce by more than an order of magnitude when too much voltage or current is applied.
Insulation failure.
This can occur due to overtemperature, overvoltage, damage due to initial construction, long term abrasion of adjacent parts, or corona breakdown with high voltages.
Core breakage.
Breakage can occur if a core is dropped during construction or with excessive mechanical stress during use. Cores are brittle like glass, and can have a complete crack that is undetectable on initial testing.
Thermal aging of cores.
This happens when cores contain binders that may degrade with time and temperature. [2]
Excessive ac current
If proximity loss calculations are not performed on the magnetics, ac currents can lead to very high dissipations. [3]
Control Circuits
Control chips are low cost parts placed at the nerve center of the power supply. Every control chip I have ever worked with has had some unusual region of behavior which, if misunderstood, can lead to failure.
The control chip determines when and how the FET is turned on and off. It must be done perfectly every time to avoid failure.
Overtemperature.
Power dissipation in the control chip can be surprisingly high, especially when driving large FETs at high frequency. At the same time, package sizes and thermal capability keep dropping.
Erratic clock operation.
Improper layout can make chips very susceptible to noise, either generated internally, or from other sources in the power stage.
Instability.
A majority of converters shipped don’t have their loops properly compensated. Oscillations can lead to overstress and failure. All conditions must be checked – start up, overcurrent, step loads, and all combinations of line and load.
Improper current limiting.
This is a major cause of failures. Regardless of how careful you are, power supplies will alwaysbe susceptible to spurious events beyond your control, and fast precise current limiting is the backup protection mechanism. Don’t build a power supply without cycle-by-cycle current limiting if you want rugged operation.
Familiarity.
Stay with suppliers you know, trust, and have experience with. Don’t just shoot for the lowest cost part. Be especially wary of parts that don’t have complete data sheets, or conflicting data. If you’ve used a part on several projects already, stay with it on the next project, or be prepared for the time needed to learn new quirks of operation.
Gate drive circuits.
Whether using gate drive transformers (preferred for off-line supplies) or high side drivers, all transient events must be considered to ensure the gate drives are well designed. Gate drive speed must be carefully controlled.
Printed Circuit Boards
I never put a power supply design on a single-sided board. It is simply impossible to achieve the level of noise reduction and reliability that a good power supply needs. Going beyond double sided, however, also has its drawbacks. Buried planes for noise reduction can be useful, but can also conceal weak points of the design.
Board related failures include the following:
External spacing.
Proper spacing between high voltage traces must be maintained according to safety standards for your industry. If there are no safety standard requirements, high voltage traces must still be well separated or they will fail. This is especially an issue with TO220 packages which have woefully small clearances when inserted in the board with standard layout software.
Internal spacing.
Spacing on internal layers must be carefully checked. Even within a board, arcing can occur. Don’t assume there is sufficient insulation in the board construction.
Degradation.
PCB materials degrade with excessive temperature for long periods of time. Be careful in choosing cheaper PCB materials.
Closing loops.
High current loops formed by board traces must be closed to prevent generation of magnetic fields.
Shielding.
High voltage parts should be shielded to prevent generation of electrostatic fields
Inadequate Specifications
Power supply specifications are often inadequate for the application. Depending on the conservative nature of the design, reliability can vary greatly. Most designs I see have not been adequately tested for the extremes that will be seen in the field.
Thermal Ratings.
Worst case airflow, ambient temperature, and electrical conditions must be combined to show true temperature rise of all parts.
Input voltage ratings.
Power supplies should be tested in excess of the ratings to have a good margin, both at the low and high end. For example, most power supplies are designed for high-line operation of 264 VAC. However, IBM’s power line studies in the 1970s showed that 300 VAC for short periods of time would be regularly experienced, and designs should allow for this.
Load ratings.
Outputs must be loaded to full load, and then increased indefinitely until protection steps in to limit current. Short-circuit conditions are not usually the worst case situation for a power supply. Output load resistances should be gradually increased to identify the worst-case operating points.
Input surge ratings.
High frequency ringwaves, and other test waveforms are used to verify ruggedness in real-world environment. Surge suppression devices, and filter components are needed to survive these events, but they are often omitted to save money.
Mechanical Packaging
Mechanical packaging is often the cause of many failures. Unfortunately, the mechanical parts are often the first part of the system designed, and cannot be changed to accommodate thermal conditions inside the power supply. The following are common mistakes:
Input connectors inadequately rated for mechanical stress and electrical current.Violation of spacing issues.Inadequate mounting of hot parts to heatsinks.Inadequate ventilationImproper orientation of parts for proper cooling.Improper shock and vibration design.
Summary
The list presented here is by no means complete. Almost every power supply I see has some new and unique combination of events that lead to failure, and good analog problem solving skills and experience are essential to redesign power supplies.
Additional Reading
- Join our LinkedIn group titled “Power Supply Design Center”. Noncommercial site with over 7000 helpful members with lots of theoretical and practical experience.
- For power supply hands-on training, please sign up for our workshops.
- “Capacitors for Switching Power Supplies”, Power Systems Design Europe, May 2007