Reliability of Software Intensive Systems
by
Gerrit Muller
Buskerud University College
e-mail:
gaudisite@gmail.com
www.gaudisite.nl
Abstract
The amount of software in many systems increases exponentially. This increase
impacts the reliability of these systems. In the source code of software many
hidden faults are present. These hidden faults can transform into errors during
the system life cycle, due to changes in the system itself or in the context of the
system.
We will discuss the current trends and potential directions for future solutions of
an increasing reliability problem.
Distribution
This article or presentation is written as part of the Gaudí project. The Gaudí project philosophy is to improve by obtaining frequent feedback. Frequent feedback is pursued by an open creation process. This document is published as intermediate or nearly mature version to get feedback. Further distribution is allowed as long as the document remains complete and unchanged. March 6, 2013 status: draft version: 0.2 reduce functionality reduce code more procedures more/longer testing improve way-of-working more formality more automation shared understanding improve robustness
improve detection tolerant interfaces robust patterns improve recovery more feedback improve reliability within cost&time without performance penalty with increasing requirements increasing #suppliers #sites, et cetera more standardization early integration Boderc Tangram Ideals Tangram Tangram Trader Trader Trader
Figure of Content
TM
Introduction
+ ESI
+ Speaker
Exploration
+ trends
+ consequences for reliability
+ potential solutions
Research Projects
+ Trader
+ Tangram
+ Ideals
+ Boderc
conclusion
Reliability of Software Intensive Systems
2 Gerrit Muller
version: 0.2
March 6, 2013 RSIScontent
Embedded Systems Institute (ESI)
Mission
To advance industrial innovation and academic excellence in
embedded systems engineering (ESE).
Vision
ESI and its partners create and apply world-class ESE methods.
7 Founders:
Industry (Philips, ASML, Océ)
Universities (Twente, Delft, Eindhoven)
Knowledge Institutes (TNO)
television
printer
waferstepper
cardio X-ray system
GSM
Reliability of Software Intensive Systems
3 Gerrit Muller
version: 0.2
March 6, 2013 RSISwhatIsESI
ESI Research Agenda
Embedded Systems Engineering
evolvability
performance
reliability
in relation to all other system qualities:
security, power, cost, size, et cetera
Reliability of Software Intensive Systems
4 Gerrit Muller
version: 0.2
March 6, 2013 RSISresearchAgenda
Industry as Laboratory
research
industry
apply new
engineering
methods
hypothesis
evaluate
observe
results
improve
challenging
problems
application
playground
source of
inspiration
Reliability of Software Intensive Systems
5 Gerrit Muller
version: 0.2
March 6, 2013 IALAindustryAsLaboratory
Who is Gerrit Muller?
CT
OSS
NM
X-ray
MR
EasyVision
manager system
engineering
Gaudí
1980
1990
2000
Philips Medical
ASML
Philips
NatLab
ESI
industrial experience
research
time pressure
cost constraints
pragmatics
products
sales
reflection
evidence
exposure
lots of people
education
Buskerud
Reliability of Software Intensive Systems
6 Gerrit Muller
version: 0.2
March 6, 2013 FIESAwhoIsGerrit
Gaudi Systems Architecting
www.gaudisite.nl
deze lezing:
www.gaudisite.nl/ReliabilityOfSoftwareIntensiveSystemsSlides.
Reliability of Software Intensive Systems
7 Gerrit Muller
version: 0.2
Exploration of Reliability
Introduction
+ ESI
+ Speaker
Exploration
+ trends
+ consequences for reliability
+ potential solutions
Research Projects
+ Trader
+ Tangram
+ Ideals
+ Boderc
conclusion
Reliability of Software Intensive Systems
8 Gerrit Muller
version: 0.2
March 6, 2013 RSIScontentExploration
Trends in Embedded Systems
How to survive in
innovative
domains?
fast moving market
fast moving technology
complex value chains
increased integration
Reliability of Software Intensive Systems
9 Gerrit Muller
version: 0.2
March 6, 2013 ECMAproblem
Increased Team Size
1000
100
10
historic trend
our challenge
required team size
year
1995
2000
2005
2010
Reliability of Software Intensive Systems
10 Gerrit Muller
version: 0.2
March 6, 2013 DYOFteamsize
Number of Faults Propertional With Code Size
1000
10Mloc
100
1 Mloc
10
100 kloc
manyears and LOC
(lines of code)
per product
1990
1995
2000
2005
1000
10k
typical amount of
errors
per product
Based on average
3 errors/kloc
Reliability of Software Intensive Systems
11 Gerrit Muller
version: 0.2
March 6, 2013 ASMLproblem
The Hard Reset Syndrome: Power Down Needed!
Hard Reset Required:
Cell Phone
Television
PC
Beamer
Car
Coffee Machine
DVD player
Airbus
Pilot announces a flight delay,
due to computer problems.
A complete reset is required.
The flight entertainment system
also show a reset:
a complete Linux boot.
This reboot hangs:
server xxx not found
Reliability of Software Intensive Systems
12 Gerrit Muller
version: 0.2
March 6, 2013 RSIShardReset
How to Make SW Intensive Systems Reliable
reduce functionality
reduce code
more procedures
more/longer testing
improve
way-of-working
more formality
more automation
shared
understanding
improve robustness
improve detection
tolerant interfaces
robust patterns
improve recovery
more feedback
improve reliability
within cost&time
without performance
penalty
with increasing
requirements
increasing #suppliers
#sites, et cetera
more standardization
early integration
Reliability of Software Intensive Systems
13 Gerrit Muller
version: 0.2
March 6, 2013 RSISsolutions
Reliability Research
Introduction
+ ESI
+ Speaker
Exploration
+ trends
+ consequences for reliability
+ potential solutions
Research Projects
+ Trader
+ Tangram
+ Ideals
+ Boderc
conclusion
Reliability of Software Intensive Systems
14 Gerrit Muller
version: 0.2
March 6, 2013 RSIScontentResearch
Example Trader Project
Trader Television RelatedArchitecture and
Design toEnhanceReliability
Objective: Develop method & tools to ensure
reliable consumer electronics products
Research
agenda: System Reliability
Domain: Digital TV
CIP: Philips Semiconductors
Partners: Philips CE, Tass, and Research
DTI, IMEC, TUD, TU/e, UL, UT, ESI
Timeline: 9/2004 – 9/2008, 22 Fte
Overview
Poor reliability has severe business impact
Industrial Relevance
• Customer expectation of TV reliability is high
– Little tolerance for technical problems
• 100% fault-free design is not achievable
• High volume market implies high risks if reliability problems occur
– Low product margin leaves no buffer for service costs – Service center costs multiplied by number of complaints – Market share reduction likely, i.e. customers buy
another brand
• (On) Time to market is critical
– Missing fixed shipping gates costs millions of dollars
Poor reliability has severe business impact
Industrial Relevance
• Customer expectation of TV reliability is high
– Little tolerance for technical problems
• 100% fault-free design is not achievable
• High volume market implies high risks if reliability problems occur
– Low product margin leaves no buffer for service costs – Service center costs multiplied by number of complaints – Market share reduction likely, i.e. customers buy
another brand
• (On) Time to market is critical
– Missing fixed shipping gates costs millions of dollars
User Perceived Reliability
Trader Domain Trends
Trader Domain Trends
TV complexity increase follows the PC world
from “Display movies over antenna ”
to “Display anything over everything ”
Connectivity
Cameras, JPEG, flash cards, HDD, MP3, Web browser, Ethernet, USB, etc… Broadcast ATSC, DVB, ISDB, Analog Terrestrial, Satellite, Cable • Objective
– Determine the user -perceived severity of a product failure mode • Methodology
– Create a model considering relevant factors
• User -perceived loss -of-functionality • User -perceived reproducibility • Failure -frequency • Work -around difficulty • User -group characteristics • Failure characteristics • ….
– Validate model
– Evaluate and suggest system failure recovery strategies
• The recovery strategy may not annoy the user even more!
0 160 576 Total: …… 0 5 4 Loss of Function / time / Behavior 5 2 3 Solvability 5 4 3 Reproducibility 5 1 4 Frequency 5 4 4 Function importance 3 (JPEG) 2 1 Aspects (all depends on user)
Reliability of Software Intensive Systems
15 Gerrit Muller
version: 0.2
March 6, 2013 RSIStrader
Increasing Code Size in Televisions
1965
1979
2000
1990
1 kB
64 kB
2 MB
Moore's law
From: COPA tutorial, Rob van Ommering
Reliability of Software Intensive Systems
16 Gerrit Muller
version: 0.2
March 6, 2013 LWAmooresLawRvO
Research: System Awareness to Improve Reliability
monitor
customer expectations
system failure model
awareness
system
correction
user input
system output
Reliability of Software Intensive Systems
17 Gerrit Muller
version: 0.2
March 6, 2013 RSIStraderAwareness
Research: Code Analysis to Improve Reliability
Expose product weaknesses at design time
• Source code analysis
– Identify hotspots in
code
– Consider impact to
user-visible behavior:
prioritize warnings
• Software architecture reliability analysis
– Techniques to identify failure-prone components
– Evaluation of architectural alternatives and trade-offs
Koala component
map
Reliability of Software Intensive Systems
18 Gerrit Muller
version: 0.2
March 6, 2013 RSIStraderCodeAnalysis
Quality Degradation Caused by Shit Propagation
needed code
repair code
needed code
bad code
new needed
code
code not
relevant for new
function
new bad
code
copy
paste
modify
bad code
Reliability of Software Intensive Systems
19 Gerrit Muller
version: 0.2
March 6, 2013 BLOATshitPropagation
Example of Shit Propagation
Class Old:
capacity = startCapacity
values = int(capacity)
size = 0
def insert(val):
values[size]=val
size+=1
if size>capacity:
capacity*=2
relocate(values,
capacity)
Class New:
capacity = 1
values = int(capacity)
size = 0
def insert(val):
values[size]=val
size+=1
capacity+=1
relocate(values,
capacity)
Class DoubleNew:
capacity = 1
values = int(capacity)
size = 0
def insert(val):
values[size]=val
size+=1
capacity+=1
relocate(values,
capacity)
def insertBlock(v,len):
for i=1 to len:
insert(v[i])
copy
paste
copy
paste
Reliability of Software Intensive Systems
20 Gerrit Muller
version: 0.2
March 6, 2013 BLOATshitPropagationExample
Example Tangram Project
Integration & Test
Reliability of Software Intensive Systems
21 Gerrit Muller
version: 0.2
March 6, 2013 RSIStangram
Moore’s Law
250
180
130
100
70
50
1997
1999
2001
2003
2005
2007
line width
in nm
100
1000
10
Reliability of Software Intensive Systems
22 Gerrit Muller
version: 0.2
March 6, 2013 ASMLmooresLaw
Challenge: Exponential Increase
performance
feedback and
adjustments
hw and sw
components
imaging
overlay
productivity
complexity
reliability
robustness
innovation
Reliability of Software Intensive Systems
23 Gerrit Muller
version: 0.2
March 6, 2013 ASMLproblemIRC
4 Views on a Waferstepper
illuminator laser sensor pulse-freq, bw, wavelength, .. uniformity lens wafer reticle aerial image NA abberations transmission laser light source illuminator beam shaping lens projection reticle stage positioning wafer stage positioning measurement alignment, levelling reticle handler input/output wafer handler input/output C&T contanimation, temperature system control coordination light reticles waferslaser illumi-nator lens measure-ment C&T reticlestage handlerreticle waferstage handlerwafer
system control coordination vertic al motio n hori-zontal motio n vertic al motio n hori-zontal motio n ethernet VME VME 250 mm/s wafer reticle slit v y t
v x expose step expose
dynamic exposure through slit
subsystems
control hierarchy
kinematic
physics/optics
Reliability of Software Intensive Systems
24 Gerrit Muller
version: 0.2
March 6, 2013 EBMIsystemDiagrams
Research: Test Strategy
2 0 1 0 1 1 Test t4 2 1 0 1 1 0 Test t5 1 1 1 3 C 10% 0 0 0 1 Fault state 5 10% 0 0 0 1 Fault state 4 10% 1 0 0 1 Fault state 3 10% 0 1 0 1 Fault state 2 10% 0 0 1 1 Fault state1 P Test t3 Test t2 Test t1 Test 0 S \ T 2 0 1 0 1 1 Test t4 2 1 0 1 1 0 Test t5 1 1 1 3 C 10% 0 0 0 1 Fault state 5 10% 0 0 0 1 Fault state 4 10% 1 0 0 1 Fault state 3 10% 0 1 0 1 Fault state 2 10% 0 0 1 1 Fault state1 P Test t3 Test t2 Test t1 Test 0 S \ TBalancing functionality, quality and time/cost
(over 20 % reduction of integration & test time)
Dynamic simulation of integration & test approach:
1. Create model (modules, interfaces, faults, tests)
2. Execute model at any time
3. Balancing based on time, cost and remaining risk.
Reliability of Software Intensive Systems
25 Gerrit Muller
version: 0.2
March 6, 2013 RSIStangramTestStrategy
Example Ideals Project
code size
refactoring
reduce by
cross cutting concerns
Reliability of Software Intensive Systems
26 Gerrit Muller
version: 0.2
March 6, 2013 RSISideals
Example Boderc Project
31x5E
2050
2090
Reliability of Software Intensive Systems
27 Gerrit Muller
version: 0.2
March 6, 2013 IALAdomain
Boderc Goal
A specific methodology
to predict
system performance
within industrial constraints and restricted design space
and analyze,
discuss, document,
and communicate
throughput, quality
power
computing
response time
people, process,
project duration,
and cost
multi-disciplinary
Boderc goal =
based on
modeling
Reliability of Software Intensive Systems
28 Gerrit Muller
version: 0.2
March 6, 2013 BS06bodercGoal
Shared Understanding by Modeling
10
0
10
1
10
6
10
5
10
4
10
3
10
2
10
7
system
multi-disciplinary
mono-disciplinary
number of
details
back-of-the-
envelop
formula
based
executable
Reliability of Software Intensive Systems
29 Gerrit Muller
version: 0.2
March 6, 2013 ESICpyramidModels
Many Models Needed to Understand System
7
. Thermo
modeling
architecture
8
. Control
9
. Virtual
printer models
10
motors
. Stepper
small, simple, goal-driven models
shorter cycle time, less cycles
number of details 100 101 106 105 104 103 102 107 6 3 2 5 4 6' 7 8 9 1 0 1 1
6
. Kinematic modeling
GUI ‘Life’ Animation varioshorter product creation lead time
Reliability of Software Intensive Systems
30 Gerrit Muller
version: 0.2
March 6, 2013 BS06mdModels
Coverage of Reliability by ESI Projects
reduce functionality
reduce code
more procedures
more/longer testing
improve
way-of-working
more formality
more automation
shared
understanding
improve robustness
improve detection
tolerant interfaces
robust patterns
improve recovery
more feedback
improve reliability
within cost&time
without performance
penalty
with increasing
requirements
increasing #suppliers
#sites, et cetera
more standardization
early integration
Boderc
Tangram
Ideals
Tangram
Tangram
Trader
Trader
Trader
Reliability of Software Intensive Systems
31 Gerrit Muller
version: 0.2
March 6, 2013 RSIScoverage
Towards a Conclusion, Some more Trends
Introduction
+ ESI
+ Speaker
Exploration
+ trends
+ consequences for reliability
+ potential solutions
Research Projects
+ Trader
+ Tangram
+ Ideals
+ Boderc
conclusion
Reliability of Software Intensive Systems
32 Gerrit Muller
version: 0.2
March 6, 2013 RSIScontentConclusion
Applications depend on chain of systems
users
Network
Providers
Service
Providers
Content
Providers
Home
Server
infotainment
appliance
watch video
browse photo's
calendar
and much more...
Reliability of Software Intensive Systems
33 Gerrit Muller
version: 0.2
March 6, 2013 CVCproductChain
Interoperability: systems get connected at all levels
world
hospital
radiology department
MR scanner
acquisition structionrecon- display host
workstation storage printer
workstation archive PC beamer RIS other examination rooms HIS LIS other clinical departments PC security admin clinicians at
home cliniciansaway expertsclinical suppliers patientsaway patients athome
hospital or other clinical center
Reliability of Software Intensive Systems
34 Gerrit Muller
version: 0.2
March 6, 2013 DYOFscopeOfInteroperability
Multi dimensional interoperability
applications
cilinical analysis
clinical support
administrative
financial
workflow
vendors
Philips
GE
Siemens
releases
R5
R6.2
R7.1
standards
Dicom
HL7
XML
media, networks
DVD+RW
memory stick
memory cards
bluetooth
11a/b/g
UTMS
languages
cultures
USA, UK,
China, India,
Japan, Korea
France, Germany
Italy, Mexico
based on
multiple
delivered by
multiple
integrating
multiple
in
multiple
and
multiple
and
multiple
Reliability of Software Intensive Systems
35 Gerrit Muller
version: 0.2
March 6, 2013 DYOFmultiInteroperability
Interopearbility Trends and Research Challenges
trends
market dynamics
globalization
hype waves
Moore's law
reliability
complexity
heterogeneity
dynamics
#engineers involved
interoperability
emerging behavior
future vs legacy
heterogeneous vendors
dynamics (continually changing)
(partial) solutions
standards
economical interests
latency due to slow process
most fundamental solution
design patterns
in system feedback
human-in-the-loop feedback
semantic understanding
containment
gateway
....
new research challenges!
Reliability of Software Intensive Systems
36 Gerrit Muller
version: 0.2
March 6, 2013 RSISinteroperability
Conclusion
typical amount of
errors
per product
reduce impact of
hidden faults
on system reliability
1000
10Mloc
100
1 Mloc
10
100 kloc
manyears and LOC
(lines of code)
per product
1990 1995 2000 2005
1000
10k
Based on average
3 errors/kloc
less code
less errors/kloc
Reliability of Software Intensive Systems
37 Gerrit Muller
version: 0.2
March 6, 2013 RSISconclusion