P R E S E N T A T I O N
Presentation Bio
Return to Main Menu
T4
Thursday, Dec 7, 2000
How safe is your software?
How safe is your software?
Johan Hedberg, SPLars Strandén, SP
SP Swedish National Testing and Research Institute johan.hedberg@sp.se
PALBUS
• Swedish joint research project between Industry, University and Research Institutes
• How to design dependable distributed control systems • How to validate/verify systems to reach dependability • Present useful methods for validation/verification
Background
• Trend with increased use of distributed control systems in safety critical applications
• Important to achieve a certain level of dependability in such systems
• Increased industry interest concerning dependability
Background, cont.
• Started in August 1999 and will end in April 2001
Application domain
• System• Distributed control system • Safety
• Embedded system • Real-time
System overview
Adjustment of format Filtering Calculation Calculation Filtering Adjustment of format Protocol generator / Protocol degenerator Calculation Filtering Adjustment of format Adjustment of format Filtering Calculation Protocol generator / Protocol degenerato Protocol generator / Protocol degenerator Calculation Filtering Adjustment of format Adjustment of format Filtering Calculation Protocol generator / Protocol degenerator Calculation Filtering Adjustment of format Adjustment of format Filtering Calculation Common communication busOverview of the distributed system
inparameter_node1 outparameter_node1 inparameter_node2 outtparameter_node2 inparameter_node3 outparameter_node3 inparameter_node4 outparameter_node4 * * * *
Definitions & standards
• According to Laprie´s definitions • Dependability - availability - reliability - safety - security • Relevant standards... FAILURE FAULT ERROR FAILURE FAULT ...
Adjudged or hypothesized cause System internal effect User perceived effect Activation (internal) Occurrence (external) Deviation of delivered service from compliance to system specification Fault - fault, bug, defect, mistake, et c
New types of errors
• Node error • Bus error • Timing error
• Data consistency error
• Initialization & restart error • Babbling idiot error
• Configuration error
Node A Node B
Node D Node C
New types of errors, cont.
• New error types require new fault detection and fault handling methods
• The distribution of computing gives new ways to detect and handle faults
Time triggered/event triggered
• Bus access techniques• Delay • Jitter
• Scheduling
• Fault detection and fault handling • Acceptance
• How do the protocols handle errors related to distributed systems?
Bus access techniques
node1 node2 node3 node4 node1 node2 node3 node4
increasing time Each nodes sending timeslot in a time triggered protocol
Bit wise arbitration mechanism:
Communication bus
node 1 node 2 node3
001101 001011 001010
Design principles for
dependable systems
• Focus on design principles specific for
distributed control systems • Utilize the protocol
optimally to reach as high dependability as possible • How to improve the
Validation & verification
methods
• Focused on methods related to the ”distribution”
• Divided into the following groups: - formal methods
- analysis - test
Fault tree applied to distributed
control systems
Failures related to the distribution of nodes Safety related failure in the system Failure in software Communic ation controler level Applicationlevel Bus level
Implementation of software do
not fulfill the specification > > > > > > Failure in hardware > EMC Humi dity Temp Vibra
Specification has not considered all
possible risks System level
Structure of a distributed
control system
• What do we mean with ”system aspects”?
• All levels must be considered to reach dependability System Application Bus Communication controller
Prototype system
• CAN based system• Available for all participants in the project
• Used by the participants to be able to analyze and test ideas and methods developed in the project
Applying analysis & test
methods
• Application dependent
• ”Practical” implementation of analysis methods described in the project
• Quality of developed methods • New methods
Trends
• Increased system complexity • More frequent use of COTS • Less embedded systemsConclusions
• Distributed control systems give new possibilities to supervise the behaviour of the application
• Decide, as early as possible, if a distributed architecture should be used in the development phase
• Present validation/verification methods to handle
software related errors in distributed control systems • Indicate a certain level of dependability in software by
PALBUS information
PALBUS results are available at the following address:
Thursday 7 December 2000 T4
How safe is your software? Johan Hedberg
Johan Hedberg received his Master of Science in Electrical Engineering 1999 from Chalmers University of Technology in Gothenburg, Sweden. His thesis work named ”Implementation of a Distributed Control Application Based on the TTP/C Architecture” was performed at Volvo Technological Development. At SP he has continued to work with distributed systems in a research project financed by NUTEK (The Swedish National Board for Industrial and Technical Development). The purpose of this project is to find out methods to evaluate dependability of distributed control systems. He is working at the SP section of Software & Safety.
Lars Strandén received his Master of Engineering Physics 1976 from University of
Technology in Uppsala, Sweden. He has worked with real-time embedded systems and large radar applications written in Ada at Ericsson Microwave Systems. He worked as a technical specialist concerning software development methods and received his Licentiate of
Engineering 1998 from Chalmers University of Technology in Gothenburg, Sweden. After that he has worked two years as a computer system consultant and works now at SP section Software & Safety. His interests are focused around dependable software within real-time embedded applications and he is currently involved in a research project concerning