DEDICATED TO EMBEDDED SOLUTIONS
RELIABILITY IN
SUBSEA ELECTRONICS TECHNIQUES TO
OBTAIN HIGH RELIABILITY
STIG-HELGE LARSEN KARSTEN KLEPPE
DATA RESPONS 2012-10-16
AGENDA
Introduction
Analysis and Design Techniques
Reliability Predictions
FRACAS and Data Processing Techniques
Production and Repair
Testing
Reliability Program Planning
2
THIS IS DATA RESPONS
We are a full-service, independent
technology company and a leading player in the embedded solutions market.
ESTABLISHED: 1986 Listed on the Oslo Stock Exchange (Ticker: DAT) CERTIFICATIONS:
ISO 9001:2008 ISO 14001:2004 OHSAS 18001:2007 EMPLOYEES: 465
CUSTOMISATION
4
Humidity
Altitude
Temperature
Vibration
Salt spray
Shock
EMC
Physical size
Interfaces
Functionality
Performance
Power demands
Regulations
Standards
Operating systems
Software architecture
Hardware platform
Processor architecture
Memory and storage
Communication & I/O
Display and touch
EXTREME
CONDITIONS CHOICE OF
TECHNOLOGY CUSTOM
SPECIFICATION
EXAMPLE: CURRENT SENSOR BOARD
Meassuring range: 0.2–1.2 A AC
Accuracy: Better than ± 1.0 %
CAN bus interface
4-20 mA outputs
Qualified according to ISO
13628-6 for Subsea Production Control Systems
Based on Hall effect current
sensor
RELIABILITY IN SUBSEA ELECTRONICS
INTRODUCTION
Reliability in Data Respons
Reliability study
IEC 61508
QA system
Reliability
The ability of an item to perform a required function under stated conditions for a specified period of time
Availability
The proportion of time for which the equipment is able to perform its function
SUBSEA
Characteristics
Relative low volumes
Need for high reliability
Low accessibility
High cost in case replacements
8
KEY POINTS
Techniques to obtain high reliability in electronics
Topic Areas:Relevant Themes:
Key Points: Design Techniques and Analysis
Root Causes of Failures
Failure Reporting and Corrective Actions System
Automated Testing
Accelerated Stress Testing
Reliability Program Plan
ANALYSIS AND DESIGN TECHNIQUES
10
Techniques to obtain high reliability in electronics
Topic Areas:Relevant Themes:
Key Points: Analysis and Design Techniques
Root Causes of Failures
Failure Reporting and Corrective Actions System
Automated Testing
Accelerated Stress Testing
Reliability Program Plan
ANALYSIS AND DESIGN TECHNIQUES
ANALYSIS AND DESIGN TECHNIQUES
Start with evaluation of the relationships between different parts of the system
Evaluate different design alternatives
Follow design guidelines
12
ANALYSIS AND DESIGN TECHNIQUES
Use design checklists
Arrange design reviews
Perform stress analysis and
derating of components
ANALYSIS AND DESIGN TECHNIQUES
Failure Mode, Effects and Criticality Analysis (FMECA)
identifies potential failure modes
lists the effects of failures
basis for eliminating mission- critical, single-point failures
14
Hardware Design
FMECA Component
Data[Base]
Failure Modes
Failure Effects
Failure Rate
& Criticality Numbers
ANALYSIS AND DESIGN TECHNIQUES
Failure Mode, Effects and
Diagnostic Analysis (FMEDA)
includes diagnostic coverage
(the ability of any automatic
diagnostics to detect failures) Hardware Design
FMEDA Component
Data[Base]
Failure Modes
Failure Effects
Failure Rate
& Criticality Numbers Diagnostic Coverage
FMECA - EXAMPLE OF DA FORM 7611
16
FMECA - EXAMPLE OF DA FORM 7612
ANALYSIS AND DESIGN TECHNIQUES
Redundancy
duplicating critical parts
usually in the case of a backup or fail-safe
18
ANALYSIS AND DESIGN TECHNIQUES
Software Development Plan
Describing software development methodology and techniques
including reviews, coding standard, and testing.
Key aspect of the software reliability program.
The software reliability depends on the number of software faults.
Testing is very important for software:
every individual unit
integration
full system
ANALYSIS AND DESIGN TECHNIQUES
Design for Test (DFT)
make it easier to implement low level manufacturing tests
Built-In Test (BIT)
to achieve high reliability for a lower cost
Automatic Reset Features
restart if critical events
lack of communications, or
improper software operation.
20
Typical Board with Boundary-Scan Components
Source: Corelis
ANALYSIS AND DESIGN TECHNIQUES
Thermal Analysis
good working temperature for every chip
to achieve the required design for reliability and performance
Electromagnetic Analysis
good electromagnetic
compatibility (EMC) design
for correct operation of different equipment in the same
electromagnetic environment
ANALYSIS AND DESIGN TECHNIQUES
Accelerated Testing
using high stresses to get failures quickly
22
ANALYSIS AND DESIGN TECHNIQUES
Root Cause Analysis (RCA)
to correct or eliminate root causes
a tool of continuous improvement
Reliability Growth Analysis
collecting, modeling, analyzing and interpreting data
learn improvement done in the reliability of a product
RELIABILITY PREDICTIONS
RELIABILITY PREDICTIONS
A quick reliability analysis for the designed system is needed
MTBF is often used as a measure for reliability
Restricted to operation under stated conditions
Important to use a relevant
prediction calculation procedure
RELIABILITY PREDICTIONS
Abstract from reliability analysis checklist in MIL-HDBK-217
26
RELIABILITY PREDICTIONS
Factors that affect the MTBF figures from vendors
Prediction methods
Predefined conditions
Quality level of components
The source and assumptions for the base failure rate of each component type
The vendors’ assumptions need to be understood.
MTBF – a indicator of reliability
RELIABILITY PREDICTIONS
What is the use of reliability predictions?
assessment of whether reliability goals (e.g. MTBF) can be reached
identification of potential design weaknesses
evaluation of alternative designs and life-cycle costs
the provision of data for system reliability and availability analysis
28
FRACAS & DATA PROCESSING TECHNIQUES
FRACAS & DATA PROCESSING TECHNIQUES
30
Techniques to obtain high reliability in electronics
Topic Areas:Relevant Themes:
Key Points: Analysis and Design Techniques
Root Causes of Failures
Failure Reporting and Corrective Actions System
Automated Testing
Accelerated Stress Testing
Reliability Program Plan
FRACAS
FRACAS: Failure Reporting And Corrective Action System
Pareto chart: To highlight the
most important among a
(typically large) set of factors.
The most frequent fault causes will vary from item to item.
“No fault found” and “Root cause unknown” will often amount to a larger part of all cases.
DATA ANALYSIS: PARETO CHART
32
DATA ANALYSIS: NO FAULT FOUND
Some possible reasons for no fault found (NFF):
a seldom failure hard to recreate (e.g. failure under special
conditions)
the failure is coming and going (e.g. a loose connection)
there has never been a fault on
the item
DATA ANALYSIS: INTERMITTENT FAILURES
Intermittent Failures:
The system performs incorrectly only under certain conditions, but not others.
Can cause the same system failure if reinstalled, and can therefore generate high costs.
34
DATA ANALYSIS: PARETO CHART
Example – summarized
The following categories in particular need attention:
1. Power circuit
2. PCB production / assembly 3. Input/output circuit
4. Firmware
5. Connectors or internal cables
Also often relevant for some items:
6. Secondary storage / external memory (disk) 7. Mechanical damage
8. Batteries 9. Software 10. CPU module
11. Others – for instance
short circuit
internal memory (RAM) fault
defect fan
errors in procedure
design fault
PRODUCTION AND REPAIR
PRODUCTION AND REPAIR
Some relevant topics:
Errors during production tests and field errors will correlate
Follow-up of suppliers
Production batch volume for electronics
Saving test data so that analysis is easily
ISO 20815 standard –
Production assurance and
reliability management
PRODUCTION AND REPAIR
IPC-A-610 - Acceptability of Electronic Assemblies
IPC J-STD-001 - Requirements for Soldered Electrical and
Electronic Assemblies
IPC product classes:
CLASS 1 - General Electronic Products
CLASS 2 – Dedicated Service Electronic Products
CLASS 3 – High Performance Electronics Products
38
PRODUCTION AND REPAIR
Rework
implies a risk for the reliability, and therefore it should be
requirements about the maximum allowed rework
should be substantiated and documented for each serial number
IPC-7711/7721 is the IPC
standard for rework, modification and repair
HANDLING ELECTRONIC ASSEMBLIES
Electrostatic discharge (ESD) can occur with no visible signs of damage.
40
HANDLING ELECTRONICS ASSEMBLIES
Two simple principles of
electrostatic safe handling are:
1.Only handle sensitive components in an ESD Protected Area (EPA).
2.Protect sensitive devices outside the EPA using ESD protective packaging
TESTING
TESTING
Techniques to obtain high reliability in electronics
Topic Areas:Relevant Themes:
Key Points: Analysis and Design Techniques
Root Causes of Failures
Failure Reporting and Corrective Actions System
Automated Testing
Accelerated Stress Testing
Reliability Program Plan
AUTOMATED TESTING
Why automated testing?
human errors can be minimized
more thorough testing
enable monitoring of variations in test results
do several tests very quickly and find potential points of failure
44
AUTOMATED TESTING
Automatic Optical Inspection (AOI)
takes time to set up correctly
Automated X-Ray Inspection (AXI)
in many ways similar to AOI except that it can look through IC packages
Example from Axiomtek
AUTOMATED TESTING
In-Circuit Test (ICT)
often limited when pins for contact don’t get access on boards
Manufacturing Defect Analyzer (MDA)
does not check the operation of ICs
46
ICT example from RNS International
AUTOMATED TESTING
JTAG Boundary Scan
widely used
much of a board to be tested with only minimal access
its standard is IEEE 1149.1
boundary scan integrated circuits (ICs) connected serially on a board
Typical Board with Boundary-Scan Components
Source: Corelis
AUTOMATED TESTING
Functional Automatic Test System
use equipment for testing the function of a circuit
48
Example on a software-defined test system from National Instruments
AUTOMATED TESTING
Built-In Test (BIT)
good accessibility to the hardware
often less-expensive tests
Loop back test
connecting transmitter and receiver on the same board
Some form of external tests will
usually be required in addition
to self-diagnostics
AUTOMATED TESTING
For testing of external interfaces using a standard protocol, a
software tool can be purchased for testing and data logging
By analyzing data from testing, production areas that need
attention and improvement can be pinpointed.
50
STRESS TESTING - ISO 13628 PART 6
ISO 13628 part 6 for subsea production control systems:
Qualification and EMC
(electromagnetic compatibility):
Shock
Vibration
Temperature
EMC tests
ESS (Environmental Stress Screening) during production:
Random vibration
Thermal cycling
Burn-in
Final functional test
BATH TUB CURVE
52
HALT - HIGHLY ACCELERATED LIFE TESTING
HALT
to provoke failures commonly seen after long-term use within a relatively short period of time
take corrective measures – either changes to the design or changes in the production
process
HALT - HIGHLY ACCELERATED LIFE TESTING
54
Typical tests are:
Cold Step Test
Hot Step Test
Rapid Temperature Cycling Test (e.g. 60°C/minute ramp-rate)
Stepped Vibration (random) Test
Combined Environment Stress
HASS - HIGHLY ACCELERATED STRESS SCREENING
HASS
production equivalent of HALT
to find manufacturing/
production process induced defects
Source: Turin Networks
Common screen varieties
RELIABILITY PROGRAM PLAN
RELIABILITY PROGRAM PLAN
Techniques to obtain high reliability in electronics
Topic Areas:Relevant Themes:
Key Points: Analysis and Design Techniques
Root Causes of Failures
Failure Reporting and Corrective Actions System
Automated Testing
Accelerated Stress Testing
Reliability Program Plan
RELIABILITY PROGRAM PLAN
Reliability Program Plan
include required activities, methods, analyses, tools, and test strategies for the system
important to reach the required reliability
58