2.3 Diagnosability
2.3.3 Diagnosability Checking in Distributed DES
A distributed Discrete Event System, or DDES, is a system with a set of communicating components, each one of them can be represented as an LTS and they share a set of events among each other. Under the assumption of a global observation of the system, the author of [Pencolé 2004] proposed the first approach to check diagnosability of Distributed Discrete Event Systems. He considered communications to be correct events that are not observable.
Formally speaking, he defined a DDES as a set of m local models Ti, 1 ≤ i ≤ m, sharing synchronous communication events, where a local model is defined as follows:
Definition 8. A local Labeled Transition System (lLTS) is a tuple Ti= ⟨Qi, Σi, δi, q0i⟩ where:
• Qiis a finite set of states,
• Σi= Σio∪ Σiu∪ Σif∪ Σic is a finite set of events occurring in Ti,
• δi⊆ Qi× Σi× Qiis the transition relation,
Fig. 2.3 Part of the Twin Plant built on synchronizing the pre-diagnoser in figure 2.2 with itself. In red the ambiguous state cycle witnessing non-diagnosability.
• q0i is the initial state.
with Σio a finite set of observable correct local events, Σiu a finite set of unobservable correct local events, Σif a finite set of unobservable faulty local events and Σic a finite set of communication events, the only ones to be shared by at least another local model of a neighboring component of Ti.
Figure 2.4 depicts a system with three components, that share only the communication events {c1, c2}.
Assumption 5. (Global observation) The system is globally observed.
This means that the observations in the system are globally ordered among the different components of a distributed system.
Assumption 6. (Synchronous communication) The communication events between the different components are synchronous.
In fact, studying asynchronous communication is out of the scope of this thesis.
x1
Fig. 2.4 Distributed DES with three components, that share only the events {c1, c2}
Assumption 7. (Communication correctness) The communication events between the different components are correct.
Notice that assuming communication events correct is not a restriction but a matter of modeling: if some communication event may be faulty, then the communication channel involved has just to be modeled as a new component by itself, containing at least one faulty local event.
Under the above assumptions, the problem of diagnosability in DDES is the one defined following the definition 4. Thus, it is to verify if the studied faults are diagnosable in the global system model (or a sub-part of it), which is the product of the local models synchronized on the communication events and on which delay closure with respect to these communication events is then applied: T =∁Σc(||ΣcTi), where Σc= ∪iΣic and Σo= ∪iΣio, Σu= ∪iΣiu, Σf = ∪iΣif. But one wants this verification to be achieved incrementally, starting at the level of the components without prior building of the global model.
The author of [Pencolé 2004] introduced an incremental diagnosability test which avoids building the Twin Plant for the whole global system if not needed. For this one starts by building a local Twin Plant for the faulty component to test the existence of a local critical path. If such a path does not exist, we know the system is diagnosable. But, if such a path exists, one should build local Twin Checkers of the neighboring components, i.e., those components which share communication events with the faulty one. The local Twin Checker is a structure similar to the local Twin Plant, i.e., where each path in it represents a pair of behaviors with the same observations, except that there is no fault information in it since it
is constructed from non-faulty components. After constructing local Twin Checkers, one tries to solve the ambiguity resulting from the existence of a critical path in the local Twin Plant, by synchronizing this local Twin Plant with the local Twin Checker of one neighbor on their communication events. In other words, one is trying to distinguish the faulty path from the correct one by exploiting the observable events in the neighboring components.
Thus, the occurrences of observable events that are consistent with the occurrences of the communication events could solve the ambiguity. The process is repeated until the diagnosability is answered, which necessarily happens in the worst case when the whole system is visited. Another important contribution in this work was to delete the unambiguous parts after each synchronization on the communication events, in order to reduce the amount of information transferred to the next check (if needed).
The figure 2.5 depicts the local pre-diagnoser of the first component (on the left) of the system depicted in the figure 2.4 and the figure 2.6 part of its local Twin Plant. This one displays a local critical path, proving that the fault f1is not locally diagnosable in the first component.
Fig. 2.5 The local pre-diagnoser of the first component (on the left) of the system depicted in the figure 2.4
The sizes of the considered parts of each local Twin Plant (or Twin Checker), also called local verifier, is reduced in the work of [Schumann & Pencolé 2007], where the authors describe the diagnosability problem as a distributed search problem. Thus, the global behavior is determined based on the local Twin Plants without computing any part of the
Fig. 2.6 Part of the local Twin Plant based on the local pre-diagnoser depicted in the figure 2.5
global Twin Plant. They propagate the fault information from the faulty component to other components by passing through a computable set of possibly non-diagnosable states in the different local Twin Plants of the different components depending on the connectivity between them and the faulty component. As a result, every state that is not possibly non-diagnosable is certainly non-diagnosable and so is deleted from the local Twin Plant, which leads to a reduced local Twin Plant. After that, the diagnosability of a fault is decided based on the set of distributed Twin Plants. Thus the fault is diagnosable iff none of the reduced Twin Plants contains an observable possibly non-diagnosable cycle (OPNC). A reduced Twin Plant is firstly obtained from the Twin Plant of each component in the system, then reduced Twin Plants are pairwise incrementally synchronized in order to remove remaining OPNCs to prove the diagnosability if possible, otherwise the approach gives a synthetic view of the non-diagnosability by returning all indistinguishable behaviors in the system. Thus one can deduce from the non-diagnosable states all possible critical pairs in the system. This approach is also adaptable to the available resources, thus it can stop when it runs out of memory and returns the current set of Twin Plants with OPNCs which contains all possible reasons for a potential non-diagnosability and which tells also that any set of the original components of the system which participated in any of the current reduced Twin Plants is not sufficient to diagnose the fault.
The work by [Ye & Dague 2010] has optimized the construction of local Twin Plants, by exploiting the fact that one distinguishes two behaviors (faulty and normal) and one synchronizes at two levels (observations first and communications later). The authors improved the construction of the Twin Plants proposed by [Pencolé 2004] by exploiting the different origin of the communication events (left and right copies) at the observation synchronization level to assign them directly to the two behaviors studied (left copy to the faulty behavior and right copy to the normal one). This helped in deleting the redundant information, then in abstracting the amount of information to be transferred later to next steps if the diagnosability is not answered.
Online/Offline Diagnosability Checking and Complexity
As we said before, the main problem while verifying diagnosability is to deal with states number explosion. This verification is usually done in an offline mode, in that the Twin Plant is first constructed, then a critical path is searched in it. Some recent approaches are proposed using Petri nets [Liu et al. 2014] to do the verification on-the-fly while construct-ing the Twin Plant and later by buildconstruct-ing a hybrid diagnoser for verifyconstruct-ing diagnosability [Boussif et al. 2015] by combining enumerative and symbolic representations, passing by a symbolic observer graph [Haddad et al. 2004], in order to build a deterministic diagnoser where on-the-fly technique can be used to reduce the required time and memory resources in diagnosability verification. However, approaches that use the non-deterministic pre-diagnoser are still, to the best of our knowledge, to be done in an offline mode.
The complexity of the Twin Plant approach proposed in [Jiang et al. 2001] is polynomial of the 4th degree, in terms of states number. This can be seen easily, from its definition, where the number of states in the pre-diagnoser is bounded by (|Q| × 2|Σf|), then the number of states in the Twin Plant is bounded by (|Q|2× 22|Σf|), which allows a search space of (|Q|4× 24|Σf|× |Σo|). Thus finding a critical path in the Twin Plant is polynomial in the number of system states and exponential in the number of faults. Therefore, we consider one fault at a time while checking diagnosability of the system, as we mentioned in assumption 3.
The worst case while checking diagnosability appears when the studied system is actually diagnosable. It implies proving the nonexistence of a counter-example witnessing non-diagnosability, i.e., all possibilities need to be tested as for proving the nonexistence of a plan in a planning problem, and usually in this case some approximations are used to avoid exploring all the search space, but we do not consider such approximations in this thesis.
Testing diagnosability was proved to be NLOGSPACE-hard for enumerative representations, and PSPACE-hard for succinct (symbolic) representations [Rintanen 2007]. However, when using succinct representations, one can apply more abstract reasoning through using modern
efficient tools like BDD (Binary Decision Diagram) tools or model checkers and SAT solvers. Actually the reduction of the diagnosability problem to a path finding problem by [Jiang et al. 2001] made the problem transferable to a satisfiability problem like what is done in planning problems [Kautz & Selman 1992]. The authors in [Rintanen & Grastien 2007]
formulated the diagnosability problem (in its Twin Plant version) into a SAT problem assuming a centralized DES with simple fault events. The work in this thesis can be considered as extensions and improvements over what they have done. We will review succinct transition systems used by their work in subsection 2.4.2.