August 5, 2014

The following post is taken from a PDHonline course this author has written for professional engineers. The entire course may be found from  Look for Introduction to Reliability Engineering.


One of the most difficult issues when designing a product is determining how long it will last and how long it should last.  If the product is robust to the point of lasting “forever” the price of purchase will probably be prohibitive compared with competition.    If it “dies” the first week, you will eventually lose all sales momentum and your previous marketing efforts will be for naught.   It is absolutely amazing to me as to how many products are dead on arrival.  They don’t work, right out of the box. This is an indication of slipshod design, manufacturing, assembly or all of the above.  It is definitely possible to design and build quality and reliability into a product so that the end user is very satisfied and feels as though he got his money’s worth.     The medical, automotive, aerospace and weapons industries are certainly dependent upon reliability methods to insure safe and usable products so premature failure is not an issue.  The same thing can be said for consumer products if reliability methods are applied during the design phase of the development program.  Reliability methodology will provide products that “fail safe”, if they fail at all.  Component failures are not uncommon to any assembly of parts but how that component fails can mean the difference between a product that just won’t work and one that can cause significant injury or even death to the user. It is very interesting to note that German and Japanese companies have put more effort into designing in quality at the product development stage.  U.S. companies seem to place a greater emphasis on solving problems after a product has been developed.  [5]   Engineers in the United States do an excellent job when cost reducing a product through part elimination, standardization, material substitution, etc but sometimes those efforts relegate reliability to the “back burner”.  Producibility, reliability, and quality start with design, at the beginning of the process, and should remain the primary concern throughout product development, testing and manufacturing.


There seems to be general confusion between quality and reliability.  Quality is the “totality of features and characteristics of a product that bear on its ability to satisfy given needs; fitness for use”.  “Reliability is a design parameter associated with the ability or inability of a product to perform as expected over a period of time”.  It is definitely possible to have a product of considerable quality but one with questionable reliability.  Quality AND reliability are crucial today with the degree of technological sophistification, even in consumer products.  As you well know, the incorporation of computer driven and / or computer-controlled products has exploded over the past two decades.  There is now an engineering discipline called MECHATRONICS that focuses solely on the combining of mechanics, electronics, control engineering and computing.  Mr. Tetsuro Mori, a senior engineer working for a Japanese company called Yaskawa, first coined this term.  The discipline is also alternately referred to as electromechanical systems.  With added complexity comes the very real need to “design in” quality and reliability and to quantify the characteristics of operation, including the failure rate, the “mean time between failure” (MTBF ) and the “mean time to failure” ( MTTF ).  Adequate testing will also indicate what components and subsystems are susceptible to failure under given conditions of use.  This information is critical to marketing, sales, engineering, manufacturing, quality and, of course, the VP of Finance who pays the bills.

Every engineer involved with the design and manufacture of a product should have a basic knowledge of quality and reliability methods and practices.


I think it’s appropriate to define Reliability and Reliability Engineering.  As you will see, there are several definitions, all basically saying the same thing, but important to mention, thereby grounding us for the course to follow.

“Reliability is, after all, engineering in its most practical form.”

James R. Schlesinger

Former Secretary of Defense

“Reliability is a projection of performance over periods of time and is usually defined as a quantifiable design parameter.  Reliability can be formally defined as the probability or likelihood that a product will perform its intended function for a specified interval under stated conditions of use. “

John W. Priest

Engineering Design for Producibility and


“ Reliability engineering provides the tools whereby the probability and capability of an item performing intended functions for specified intervals in specified environments without failure can be specified, predicted, designed-in, tested, demonstrated, packaged, transported, stored installed, and started up; and their performance monitored and fed back to all organizations.”


“Reliability is the science aimed at predicting, analyzing, preventing and mitigating

failures over time.”

John D. Healy, PhD

“Reliability is —blood, sweat, and tears engineering to find out what could go wrong —, to organize that knowledge so it is useful to engineers and managers, and then to act

on that knowledge”

Ralph A. Evans

“The conditional probability, at a given confidence level, that the equipment

will perform its intended function for a specified mission time when operating

under the specified application and environmental stresses. “

The General Electric Company

“By its most primitive definition, reliability is the probability that no failures will occur in a given time interval of operation.  This time interval may be a single operation, such as a mission, or a number of consecutive operations or missions.  The opposite of reliability is unreliability, which is defined as the probability of failure in the same time interval “.

Igor Bazovsky

“Reliability Theory and Practice”

Personally, I like the definition given by Dr. Healy although the phrase “performing intended functions for specified intervals in specified environments “ adds a reality to the definition that really should be there. Also, there is generally associated with reliability data a confidence level.  We will definitely discuss confidence level later on and how that factors into the reliability process.   Reliability, like all other disciplines, has its own specific vocabulary and understanding “the words” is absolutely critical to the overall process we wish to follow.


The main goal of reliability engineering is to minimize failure rate by maximizing MTTF.  The two main goals of design for reliability are:

  • Predict the reliability of an item; i.e. component, subsystem and system ( fit the life model and/or estimate the MTTF or MTBF )
  • Design for environments that promote failure. [10] To do this, we must understand the KNPs and the KCPs of the entire system or at least the mission critical subassemblies of the system.

The overall effort is concerned with eliminating early failures by observing their distribution and determining, accordingly, the length of time necessary for debugging and methods used to debug a system or subsystem.    Further, it is concerned with preventing wearout failures by observing the statistical distribution of wearout and determining the preventative replacement periods for the various parts.  This equates to knowing the MTTF and MTBF.   Finally, its main attention is focused on chance failures and their prevention, reduction or complete elimination because it is the chance failures that most affect equipment reliability in actual operation.  One method of accomplishing the above two goals is by the development and refinement of mathematical models.   These models, properly structured, define and quantify the operation and usage of components and systems.


No mechanical or electromechanical product will last forever without preventative maintenance and / or replacing critical components.  Reliability engineering seeks to discover the weakest link in the system or subsystem so any eventual product failure may be predicted and consequently forestalled.   Any operational interruption may be eliminated by periodically replacing a part or an assembly of parts prior to failure.  This predictive ability is achieved by knowing the meantime to failure (MTTF) and the meantime between failures (MTBF) for “mission critical” components and assemblies.   With this knowledge, we can provide for continuous and safe operation, relative to a given set of environmental conditions and proper usage of the equipment itself.  The test, find, fix (TAAF of TAAR) approach is used throughout reliability testing to discover what components are candidates for continuous “preventative maintenance” and possibly ultimate replacement.  Sometimes designing redundancy into a system can prolong the operational life of a subsystem or system but that is generally costly for consumer products.  Usually, this is only done when the product absolutely must survive the most rigorous environmental conditions and circumstances.  Most consumer products do not have redundant systems.   Airplanes, medical equipment and aerospace equipment represent products that must have redundant systems for the sake of continued safety for those using the equipment.  As mentioned earlier, at the very worst, we ALWAYS want our mechanism to “fail safe” with absolutely no harm to the end-user or other equipment.  This can be accomplished through engineering design and a strong adherence to accepted reliability practices.  With this in mind, we start this process by recommending the following steps:

  • Establish reliability goals and allocate reliability targets.
  • Develop functional block diagrams for all critical systems
  • Construct P-diagrams to identify and define KCPs and KNPs
  • Benchmark current designs
  • Identify the mission critical subsystems and components
  • Conduct FMEAs
  • Define and execute pre-production life tests; i.e. growth testing
  • Conduct life predictions
  • Develop and execute reliability audit plans

It is appropriate to mention now that this document assumes the product design is, at least, in the design confirmation phase of the development cycle and we have been given approval to proceed.  Most NPI methodologies carry a product though design guidance, design confirmation, pre-pilot, pilot and production phases.  Generally, at the pre-pilot point, the design is solidified so that evaluation and reliability testing can be conducted with assurance that any and all changes will be fairly minor and will not involve a “wholesale” redesign of any component or subassembly.  This is not to say that when “mission critical components” fail we do not make all efforts to correct the failure(s) and put the product back into reliability testing.  At the pre-pilot phase, the market surveys, consumer focus studies and all of the QFD work have been accomplished and we have tentative specifications for our product.  Initial prototypes have been constructed and upper management has “signed off” and given approval to proceed into the next development cycles of the project.  ONE CAUTION:  Any issues involving safety of use must be addressed regardless of any changes becoming necessary for an adequate “fix”.  This is imperative and must occur if failures arise, no matter what phase of the program is in progress.

Critical to these efforts will be conducting HALT and HAST testing to “make the product fail”.  This will involve DOE (Design of Experiments) planning to quantify AND verify FMEA estimates. Significant time may be saved by carefully structuring a reliability evaluation plan to be accomplished at the component, subsystem and system levels.  If you couple these tests with appropriate field-testing, you will develop a product that will “go the distance” relative to your goals and stay well within your SCR (Service Call Rate) requirements.  Reliability testing must be an integral part of the basic design process and time must be given to this effort.  The NPI process always includes reliability testing and the assessment of those results from that testing.  Invariability, some degree of component or subsystem redesign results from HALT or HAST because weaknesses are made known that can and will be eliminated by redesign.  In times past, engineering effort has always been to assign a “safety factor” to any design process.  This safety factor takes into consideration “unknowns” that may affect the basic design.  Unfortunately, this may produce a design that is structurally robust but fails due to Key Noise Parameters (KNPs) or Key Control Parameters (KCPs).


As you might expect, this is a “lick and a promise” relative to the subject of reliability.  It’s a very complex subject but one that has provided remarkable life and quality to consumer and commercial products.   I would invite you to take a look at the literature and further your understanding of the “ins and outs” of the technology.  As always, I welcome your comments.


What do you think?

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: