Software Failure Modes and Effects Analysis
Jacob J. Stadler,
Neal J. Seidl, General Electric Healthcare
Key Words: FMEA, medical devices, safety, software
SUMMARY & CONCLUSIONS
Failure modes and effects analysis (FMEA) is an effective
way to identify and mitigate potential problems within the
design of a system. By adapting the general process outlined
in MIL-STD-1629A [1] to the design of software, a rigorous
software FMEA (SFMEA) process has been developed to
drive the identification of risks to safety, reliability, and
customer satisfaction.
1 INTRODUCTION
FMEA is an inductive, bottom-up analysis of potential
failure modes within a system and assessment of the
associated effects on system functionality. It is used to identify
potential design weaknesses such that they can be mitigated in
the early stages of a design program. It also helps prioritize the
engineers time by focusing on problems that are the most
significant.
Weaknesses within the software portion of systems are of
particular concern. Medical devices are increasingly reliant on
software to perform their intended functions and to control
potential risks to patients. However as software increases in
size and complexity it cannot typically be exhaustively
verified [2] and, as a result, latent software faults have become
a leading source of safety and quality issues for medical
devices. In 2011, over 20% of all medical device recalls in the
United States were due to software failures [3].
Because it is typically not feasible to exhaustively verify
the complex software incorporated in todays medical devices,
it is essential to ensure that it is developed under a controlled
process that focuses rigor on the subset of software items that
are most critical. Medical device manufactures typically
institute software development lifecycle processes that are
compliant with IEC 62304, an international process standard
for medical device software development that is widely
recognized and required by medical device regulators
worldwide [4]. Incorporation of FMEA into the software
development process is a potentially powerful tool to improve
medical device dependability and safety. However it is a
necessity that traditional FMEA techniques, which have been
oriented towards hardware and system designs, be adapted
such that they can be applied to the analysis of software.
A variety of proposals have been made over the years
regarding how to apply FMEA to software. For example,
Reifer proposed SFMEA for software requirements analysis
[5]; Goddard described SFMEA for both the software
architecture and detailed design levels [6]; Ozarin et al.
described SFMEA at the source code level [7] and Sozer et al.
proposed a scenario-based software architecture analysis
technique wherein the up-front scenario definition is based on
FMEA [8]. Despite such important contributions, the field
remains relatively immature and there is little clear consensus
in particular from a medical device design perspective
regarding how to select among potential SFMEA approaches
and best integrate them to the development lifecycle.
This paper describes an SFMEA process devised at
General Electric Healthcare (GEHC). SFMEA is not a onetime analysis, but is in fact part of an iterative process. That
process involves three main steps:
SFMEA pre-work
Software failure modes and effects analysis
SFMEA follow up
2 SFMEA PRE-WORK
One of the common pitfalls in conducting FMEA is
attempting to populate an FMEA table or form as the first step
in the analysis. This often leads to the FMEA being performed
at either too low or too high of a level. Too high of a level
(e.g., only high-level software architecture) may result in a
superficial analysis, but it is inefficient and often impractical
to perform detailed low-level analysis (e.g. detailed design or
source code) on the entire system.
Our approach to SFMEA includes necessary pre-work to
set an appropriate scope for the analysis applying the most
rigor to the most critical software items and ensuring that the
focus is on actionable design improvements. In particular, we
leverage two key aspects of IEC 62034 its requirements for
software architecture analysis and software detailed design
activities that naturally enable the software development
process to incorporate focused and efficient SFMEA.
2.1 Top Level Failure Mode and Hazard Identification
The first step in SFMEA pre-work is to identify key toplevel (i.e. product-level) failure modes and potential safety
hazards of interest (i.e. those to which software may
contribute). A technique commonly used for identifying
potential hazards at very early design concept stages is
preliminary hazard analysis (PHA) [9]. PHA is an excellent
starting point for SMFEA pre-work for most medical devices,
but it is often important to extend the analysis beyond safety
978-1-4673-4711-2/13/$31.00 2013 IEEE
Authorized licensed use limited to: George Mason University. Downloaded on September 30,2020 at 23:08:47 UTC from IEEE Xplore. Restrictions apply.
concerns (e.g. threats to reliability or information security).
The baseline analysis technique described below is valuable
for this purpose.
The objective of a baseline analysis is to identify a
representative baseline product or design and understand
when, where, and how that baseline fails, including but not
limited to safety-related failures. This baseline information is
used to identify potential hazards or other consequences,
failure modes and failure causes as well as established design
patterns that can contribute to or avoid/mitigate these threats.
Representative baseline, in this case, means a system,
sub-system, or item similar to the new design in terms of
function, design patterns or technologies. To obtain failure
information on a representative baseline, an engineer may
have to be creative and use whatever information is available.
Sources of information could include:
Analysis of field failures / complaints and associated
corrections
Software of unknown provenance (SOUP) anomaly list
from software supplier
Anecdotal data from service dispatches
The idea is to obtain information about the types and rates of
failure modes and failure causes of the representative baseline
design to help predict these in the design under consideration.
2.2 Software Architecture Definition
The next step of SFMEA pre-work is software
architecture development and documentation, which is also
required by IEC 62034 for the majority of medical device
software systems (specifically for those systems that can
potentially contribute to any safety hazard safety
classifications B and C). The software architecture describes
the structure of the software system by decomposing it into
software items and defining the items interrelationships and
interfaces including those between items and external systems
(hardware or software). The documented software architecture
provides a framework for subsequent analysis (including
hierarchical per-item/unit safety classification, where a subset
of items/units may have a lower classification than their
parent), so care should be taken to ensure that the architecture
documentation scope and level (e.g. views and degree of
design decomposition) are appropriate and meaningful.
Many different views and types of diagrams may be used
to emphasize different aspects of the software architecture
including:
Data flow diagrams,
Component diagrams,
Class diagrams, and
Object diagrams
2.3 Software Detailed Design
Software detailed design further refines software items
into software units, which represent the lowest level of item
decomposition, and elaborates data representations, algorithms
and interface details. In general, decomposing software into
smaller granularity units facilitates efficient but impactful
SFMEA by allowing intensive analysis to be focused on the
most critical parts of the software. IEC 62304 requires
development of a detailed design covering all software units
that can potentially contribute to a risk of serious injury or loss
of life (safety classification C). In addition, extending detailed
design activities to cover software items identified as
contributing to critical non-safety risks is recommended to
facilitate SFMEA on these items.
Aspects of software detailed design to be considered and
documented may include static organization (e.g. object
hierarchy, data elements) and dynamic behavior such as event
sequencing, data and control flow, state transitions,
messaging, communication protocols.
To support SFMEA, it is particularly important for the
detailed design to address potential failures, faults and external
failure causes and not be limited to nominal behaviors,
conditions and success scenarios. For example, consider
contributing causes and failures of risk control measures
including:
Incorrect or incomplete specification of functionality
Software implementation defects
SOUP anomalies
Hardware failures
Reasonably foreseeable misuse and use errors
2.4 SFMEA Scoping
The final step of SFMEA pre-work is SFMEA scoping.
SFMEA can be performed at differing levels of design
abstraction. It is generally best to start with SFMEA at the
system and architecture level early in a design program and
progress to more detailed software item and unit levels as the
design matures. But because it is not typically practical to
perform detailed SFMEA on the entire system, a key issue is
how to select the most critical items and units for detailed
analysis.
SFMEA scoping relies upon the results of prior SFMEA
pre-work steps. Insights from baseline analysis can be useful
for identifying parts of the design that are likely to be failure
prone due to experience with similar designs. The safety
classification assigned during architecture and detailed design
is another basis for selection of software items and units for
detailed SFMEA (e.g. include all class C software units in
scope). In addition, it is often desirable to extend the scope of
SFMEA beyond safety-related failures to improve other facets
of product quality.
A further tool for identifying the most critical software
items is fault tree analysis (FTA). FTA is a top-down analysis
to identify combinations of failures, faults and events both
internal and external to the software system that can result in a
given top-level product failure mode or hazard of interest.
Using software architecture and detailed design, a fault tree
can be constructed for each top-level failure mode and hazard
identified through baseline analysis and PHA. Software items
or units that appear in the minimal cutset of each tree represent
sensitive areas for which detailed SFMEA should be
considered.
Authorized licensed use limited to: George Mason University. Downloaded on September 30,2020 at 23:08:47 UTC from IEEE Xplore. Restrictions apply.
3 FAILURE MODES AND EFFECTS ANALYSIS
After completion of the FMEA pre-work, the actual work
of conducting the SFMEA may begin. Many FMEA templates
exist, but in their most general form they include defining
failure modes, failure effects, failure causes, and current
design controls. Some form of rating of concern (e.g. by
severity, occurrence, and/or detectability) is often also
included as a way to prioritize design actions to prevent or
mitigate failure causes, failure modes or failure effects.
3.1 Failure Mode Identification
From the perspective of a given software unit, a Failure is
any deviation between its actual behavior or performance and
its required or expected behavior or performance (i.e., its
intended function). A failure mode is a specific manner in
which the software unit could potentially fail (e.g. reported
blood pressure value too large, reported blood pressure value
too small, no blood pressure value output, etc.), each of which
may have distinct effects.
Brainstorming can be a very useful tool for failure mode
identification but a can result in
incomplete identification. Therefore it is beneficial to employ
a structured approach that considers potential failure modes
according to predefined categories. Following is a software
failure mode taxonomy that can be employed to promote
complete identification:
Output (e.g. return value, memory or register written,
message generated, resources affected, etc.) failure modes
Incorrect output value
Numerical output value(s) inaccurate
Numerical output value too small
Numerical output value too large
Numerical output values representing a
signal are noisy or corrupt
Unexpected non-numerical or special numerical
output value (e.g. floating point NaN, , -, -0,
denomal/subnormal value)
Pointer or index output incorrect (e.g. wrong
address/index, NULL)
Enumerated value output incorrect
Other incorrect or invalid output (e.g. involving
multiple fields of complex data structure)
Wrong output timing
Output too early / frequent
Output too late / infrequent
Out-of-sequence outputs
No/missing output (e.g. message not generated,
memory or register not written)
Extra/spurious output (e.g. extra message(s)
generated, memory corruption, failure to release
resources)
Execution flow failure modes
Diverted execution (e.g. incorrect branch or function
call, raised exception or interrupt)
Wrong execution timing
Early return / completion
Late return / completion
Fail to return / complete (e.g. hang,
halt/terminate)
3.2 Failure Effects
A failure effect is the consequence of a failure mode on
the top-level function of the product/process as it is perceived
by the customer. Failure effects are described in terms of what
the customer or user might experience and includes examples
such as: alarms inoperative, degraded image quality, noisy
electrocardiogram waveform, or loss of display. Top-level
product failure modes and hazards identified during baseline
analysis and PHA provide an initial list of important potential
effects to consider during SFMEA, but other effects may also
be identified.
3.3 Failure Causes
Failure causes include faults internal to the item under
analysis as well as factors/events external to the item or entire
software system (hardware failure, use error, etc.) that could
cause the failure mode in question to occur. Examples of
potential intrinsic software faults are as follows:
Infinite loop
Multi-process / task / thread deadlock
Unhandled exception
Wrong / invalid code executed (e.g. code space
corruption, defect involving use of function pointers)
Non-reentrant code executed reentrantly
Logic defects
Calculation error (e.g. wrong formula, units, constants,
etc.)
Numerical overflow / underflow / saturation
Counter rollover
Initialization error (e.g. missing, incorrect)
Invalid operation (e.g. divide by zero, logarithm of zero,
square root of negative value)
Finite precision error accumulation
Stack or heap size insufficient
Memory leaks (failure to free or delete dynamically
allocated memory / objects)
Unprotected critical sections where data in use may be
modified by other task or interrupt
Note that a failure cause (e.g. corrupted data) of a
software item under consideration may correspond to a failure
mode (e.g. writing to an incorrect memory location) of some
other item.
3.4 Design Controls and Recommended Actions
[10] defines current design controls as
prevention, design verification/validation, or other activities
which will assure the design adequacy for the failure mode
and/or cause/mechanism under consideration. The main types
of design controls are:
Process controls employed during the development phase
of the product lifecycle to:
o Prevent defect injection through fault avoidance
Authorized licensed use limited to: George Mason University. Downloaded on September 30,2020 at 23:08:47 UTC from IEEE Xplore. Restrictions apply.
techniques such as following design or coding
standards.
o Detect and remove defects, (e.g., design reviews,
timing analysis, static code analysis, or software
reliability growth testing).
Controls implemented within the design itself to:
o Prevent or reduce the likelihood of a software fault or
other cause (e.g., user interface design that prevents
entering an incorrect selection, etc.)
o Prevent or reduce the likelihood of a failure mode
resulting from a software fault or other cause (e.g.
architectural segregation tactics, error handling, input
checking, redundancy or other fault-tolerant design
provisions)
o Detect or mitigate a failure mode after it has occurred
(e.g., fail-safe design, watch dog tasks, automatic
failure detection and recovery, technical alarms to
alert the user).
Note that design controls may be specific to one or more
causes or may apply to a failure mode. Typically Design
Controls that prevent faults, failures and effects are preferred
to those that merely detect and mitigate faults or failures after
they have occurred.
Recommended actions (and identification of an owner)
are at the heart of the DFMEA process. Recommended actions
are those activities that will:
Reduce the occurrence rating by eliminating or reducing
the probability of faults or other failure causes,
Reduce the severity rating by changing the nature of the
failure effects, or
Increase detectability rating (i.e. the efficacy of the design
controls to prevent a failure cause from resulting in the
failure mode and/or to prevent a failure mode from
resulting in a failure effect) by:
o Adding tests and/or analyses to the development
process to detect and remove faults, or
o Adding diagnostics, additional logic, exception
handling, redundancy, etc. to the system design to
mitigate the likelihood that a fault or other failure
cause will result in a failure mode or that a failure
mode will result in a failure effect
Figure 1 illustrates a prioritization of various general
types of actions. Specific examples of some possible actions
for software are as follows:
Improvement of testing to cover more scenarios or
include more severe stresses
Design constraints or simplifications (e.g. replacing
dynamic memory allocation with static allocation,
reducing the number of interfaces, etc.)
Implementing segregation tactics (e.g. partitioning
application into separate processes, use of protected
memory / execution space, etc.)
Adding exception or other fault handling
4. FOLLOW UP
The output of the SFMEA is a list of actions with the
potential for improving the design. The implementation of
those actions and evaluation of their effectiveness is the way
in which a SFMEA ultimately provides value. Actions
identified in the SFMEA must be implemented and then need
to be evaluated to ensure they have the intended benefit.
Figure 1. Hierarchy of Recommended Actions
Findings from SFMEA will often require pre-work to be
revisited. For example, may
identify additional software items or units as critical cause
contributors requiring item/unit safety classification or
scoping of detailed SFMEA to be revisited. As a further
example, if SFMEA actions include changes to the product
design this could often involve updates to the software
architecture, detailed design or fault trees. Thus the overall
process is iterative.
Evaluation of the actions taken can be in the form of
review (e.g., design reviews or code reviews),
analyses/simulation (e.g., timing analysis), tests (e.g.,
integration testing or software stress testing), or other
activities that confirm the action mitigates the failure mode
and that the action does not introduce any new failure causes
or failure modes. Evaluation confirms the recommended
action has been completed and is effective.
4 CONCLUSION
When conducted as part of a rigorous design process,
SFMEA is a valuable tool for identifying, prioritizing, and
mitigating potential design problems. It is important to
consider SFMEA to be part of an iterative process that starts
early in design and development and continues throughout the
Authorized licensed use limited to: George Mason University. Downloaded on September 30,2020 at 23:08:47 UTC from IEEE Xplore. Restrictions apply.
product lifecycle. This process consists of three main phases:
SFMEA pre-work
Software failure modes and effects analysis
SFMEA follow up
It is necessary to include each of these phases, as each builds
upon the previous to enhance the effectiveness of the SFMEA.
REFERENCES
1. MIL-STD-1629A, Procedures for Performing a Failure
Mode, Effects and Criticality Analysis, 24 November
1980
2. Isaksen, U., Bowen, J.P. and Nissanke, N. System and
Software Safety in Critical Systems, Technical Report
RUCS/97/TR/062/A, University of Reading, UK, 1997.
3. Office of Science and Engineering Laboratories (OSEL)
2011 Annual Report
4. IEC 62304:2006, Medical Device Software Software
Life Cycle Processes
5. Reifer, Donald J., “Software Failure Modes and Effects
Analysis,” Reliability, IEEE Transactions on, vol.R-28,
no.3, pp.247-249, Aug. 1979
6. Goddard, P.L., “Software FMEA techniques,” Reliability
and Maintainability Symposium, 2000. Proceedings.
Annual, vol., no., pp.118-123, 2000
7. Ozarin, N. and Siracusa, M., “A process for failure modes
and effects analysis of computer software,” Reliability
and Maintainability Symposium, 2003. Annual, vol., no.,
pp. 365- 370, 2003
8. Sozer, H., Tekinerdogan, B. and Aksit M., Extending
failure modes and effects analysis approach for reliability
analysis at the software architecture design level. In
Architecting dependable systems IV. Lecture Notes In
Computer Science, Vol. 4615, pp 409-433, 2007
9. IEC 60300-3-9:1995, Dependability management Part
3: Application guide Section 9: Risk analysis of
technological systems
10. MIL-HDBK-338B, Electronic Reliability Design
Handbook, 1 October 1998
BIOGRAPHIES
Jacob J. Stadler
GE Healthcare/Datex Ohmeda
P.O. Box 7550
Madison, WI 53707-7550 USA
e-mail: [email protected]
Jacob Stadler is a Senior Reliability Engineer for the General
Electric Companys Healthcare division. He has worked in
reliability and safety engineering roles supporting the
development of a wide range of products including life
support equipment, patient monitoring devices, infant
warmers, and diagnostic imagining systems. He is primarily
focused on Design For Reliability (DFR) and the integration
of reliability in the engineering process. He is a senior member
of ASQ and a Certified Reliability Engineer.
Neal J. Seidl
3000 N. Grandview Blvd
T24
Waukesha, WI 53188-1696 USA
e-mail: [email protected]
Neal Seidl is a Design Controls Manager in GE Healthcares
Global Quality, Regulatory and Medical organization
responsible for definition and application of company-wide
design controls processes pursuant to ISO 13485, ISO 14971,
FDA Quality System Regulation, and other global medical
device regulations. He has 17 years of experience in systems
engineering and software development including eleven years
of experience in medical device design and risk management.
Authorized licensed use limited to: George Mason University. Downloaded on September 30,2020 at 23:08:47 UTC from IEEE Xplore. Restrictions apply.
Are you busy and do not have time to handle your assignment? Are you scared that your paper will not make the grade? Do you have responsibilities that may hinder you from turning in your assignment on time? Are you tired and can barely handle your assignment? Are your grades inconsistent?
Whichever your reason is, it is valid! You can get professional academic help from our service at affordable rates. We have a team of professional academic writers who can handle all your assignments.
Students barely have time to read. We got you! Have your literature essay or book review written without having the hassle of reading the book. You can get your literature paper custom-written for you by our literature specialists.
Do you struggle with finance? No need to torture yourself if finance is not your cup of tea. You can order your finance paper from our academic writing service and get 100% original work from competent finance experts.
Computer science is a tough subject. Fortunately, our computer science experts are up to the match. No need to stress and have sleepless nights. Our academic writers will tackle all your computer science assignments and deliver them on time. Let us handle all your python, java, ruby, JavaScript, php , C+ assignments!
While psychology may be an interesting subject, you may lack sufficient time to handle your assignments. Don’t despair; by using our academic writing service, you can be assured of perfect grades. Moreover, your grades will be consistent.
Engineering is quite a demanding subject. Students face a lot of pressure and barely have enough time to do what they love to do. Our academic writing service got you covered! Our engineering specialists follow the paper instructions and ensure timely delivery of the paper.
In the nursing course, you may have difficulties with literature reviews, annotated bibliographies, critical essays, and other assignments. Our nursing assignment writers will offer you professional nursing paper help at low prices.
Truth be told, sociology papers can be quite exhausting. Our academic writing service relieves you of fatigue, pressure, and stress. You can relax and have peace of mind as our academic writers handle your sociology assignment.
We take pride in having some of the best business writers in the industry. Our business writers have a lot of experience in the field. They are reliable, and you can be assured of a high-grade paper. They are able to handle business papers of any subject, length, deadline, and difficulty!
We boast of having some of the most experienced statistics experts in the industry. Our statistics experts have diverse skills, expertise, and knowledge to handle any kind of assignment. They have access to all kinds of software to get your assignment done.
Writing a law essay may prove to be an insurmountable obstacle, especially when you need to know the peculiarities of the legislative framework. Take advantage of our top-notch law specialists and get superb grades and 100% satisfaction.
We have highlighted some of the most popular subjects we handle above. Those are just a tip of the iceberg. We deal in all academic disciplines since our writers are as diverse. They have been drawn from across all disciplines, and orders are assigned to those writers believed to be the best in the field. In a nutshell, there is no task we cannot handle; all you need to do is place your order with us. As long as your instructions are clear, just trust we shall deliver irrespective of the discipline.
Our essay writers are graduates with bachelor's, masters, Ph.D., and doctorate degrees in various subjects. The minimum requirement to be an essay writer with our essay writing service is to have a college degree. All our academic writers have a minimum of two years of academic writing. We have a stringent recruitment process to ensure that we get only the most competent essay writers in the industry. We also ensure that the writers are handsomely compensated for their value. The majority of our writers are native English speakers. As such, the fluency of language and grammar is impeccable.
There is a very low likelihood that you won’t like the paper.
Not at all. All papers are written from scratch. There is no way your tutor or instructor will realize that you did not write the paper yourself. In fact, we recommend using our assignment help services for consistent results.
We check all papers for plagiarism before we submit them. We use powerful plagiarism checking software such as SafeAssign, LopesWrite, and Turnitin. We also upload the plagiarism report so that you can review it. We understand that plagiarism is academic suicide. We would not take the risk of submitting plagiarized work and jeopardize your academic journey. Furthermore, we do not sell or use prewritten papers, and each paper is written from scratch.
You determine when you get the paper by setting the deadline when placing the order. All papers are delivered within the deadline. We are well aware that we operate in a time-sensitive industry. As such, we have laid out strategies to ensure that the client receives the paper on time and they never miss the deadline. We understand that papers that are submitted late have some points deducted. We do not want you to miss any points due to late submission. We work on beating deadlines by huge margins in order to ensure that you have ample time to review the paper before you submit it.
We have a privacy and confidentiality policy that guides our work. We NEVER share any customer information with third parties. Noone will ever know that you used our assignment help services. It’s only between you and us. We are bound by our policies to protect the customer’s identity and information. All your information, such as your names, phone number, email, order information, and so on, are protected. We have robust security systems that ensure that your data is protected. Hacking our systems is close to impossible, and it has never happened.
You fill all the paper instructions in the order form. Make sure you include all the helpful materials so that our academic writers can deliver the perfect paper. It will also help to eliminate unnecessary revisions.
Proceed to pay for the paper so that it can be assigned to one of our expert academic writers. The paper subject is matched with the writer’s area of specialization.
You communicate with the writer and know about the progress of the paper. The client can ask the writer for drafts of the paper. The client can upload extra material and include additional instructions from the lecturer. Receive a paper.
The paper is sent to your email and uploaded to your personal account. You also get a plagiarism report attached to your paper.
Delivering a high-quality product at a reasonable price is not enough anymore.
That’s why we have developed 5 beneficial guarantees that will make your experience with our service enjoyable, easy, and safe.
You have to be 100% sure of the quality of your product to give a money-back guarantee. This describes us perfectly. Make sure that this guarantee is totally transparent.
Read moreEach paper is composed from scratch, according to your instructions. It is then checked by our plagiarism-detection software. There is no gap where plagiarism could squeeze in.
Read moreThanks to our free revisions, there is no way for you to be unsatisfied. We will work on your paper until you are completely happy with the result.
Read moreYour email is safe, as we store it according to international data protection rules. Your bank details are secure, as we use only reliable payment systems.
Read moreBy sending us your money, you buy the service we provide. Check out our terms and conditions if you prefer business talks to be laid out in official language.
Read more