Various types of faults that can occur in VLSIsystem can be classified as either soft (transient) or permanent (hardware) ones. Transient faults are induced by temporary environmental conditions, such as cosmic rays and electromagnetic interference. Permanent faults are the result of irreversible device and circuit changes.A new system is proposed which improves routing by lowering hardware overhead along with increasing the size of circuit and reducing hardware unutilized for fault recovery. Fig. 1 shows endocrine cellular communication.
Contributions: This paper shows that it is indeed possible to adapt fault-tolerant distributed algorithms to the particular needs of VLSI implementations. More specifically: (i) We adapt the simple variant of Srikanth & Toueg’s  consistent broadcasting introduced in  to the peculiarities of VLSI hardware implementations, namely, inherent fine-grained parallelism and very limited resources. Our major modifications are the enforce- ment of some atomic actions (interlocking) via implicit handshaking, and the replacement of k-bit messages by anonymous rising or falling signal transitions (zero-bit messages). (ii) We provide a fault-tolerant distributed tick generation algorithm (TS-Alg), which tolerates up to f Byzantine faulty instances in a system containing n ≥ 3f + 2 TS-Algs. Examples of Byzantine failures are spurious clock transitions or early timing failures that are perceived inconsistently at different TS-Algs. (iii) We prove that the resulting algorithm is correct, and derive bounds for its performance metrics like worst case precision and minimal/maximal clock frequency. Since our “system-level proof” rests upon some simple properties of certain digital logic blocks only, which can be easily verified by means of standard design tools, we can guarantee the correctness of any system of n ≥ 3f + 2 cor- rectly implemented TS-Algs. (iv) We provide some details of our synthesizable VHDL implementation of the algorithm, and demonstrate the feasibility of our approach by means of some measurement results obtained from an FPGA prototype system. These results will hence allow us to implement our DARTS clock generation scheme in a prototype SoC ASIC.
Fault tolerance has been a critical feature for reliable space borne electronic systems that run under hostile cosmic environments. Researchers are continuously looking for efficient ways to design more reliable faulttolerant electronic systems. The basic principle in faulttolerantsystem design is redundancy. We propose a fault-tolerant approach to reliable microprocessor design. The complexity of VLSI-based digital systems has been increasing continuously. As the level of complexity increases, systems become more susceptible to faults especially transient faults. Since it is almost impossible to detect such faults, the reliability of VLSI systems can be increased only by built-in faulttolerant mechanisms to recover from such faults. High- quality verification and testing is a vital step in the design of a successful microprocessor product. Designers must verify the correctness of large complex systems and ensure that manufactured parts work reliably in varied (and occasionally adverse) operating conditions. If successful, users will trust that when the processor is put to a task it will render correct results. If unsuccessful, the design can falter, often resulting in serious repercussions ranging from bad press, to financial damage, to loss of life. There have been a number of high-profile examples of faulty microprocessor designs. Self-repairing digital systems have recently emerged as the most promising alternative for fault-tolerant systems. However, such systems are still impractical in many cases, particularly due to the complex rerouting process that follows cell replacement. They lose efficiency when the circuit size increases, due to the extra hardware in addition to the functional circuit and the unutilization of normal operating hardware for fault recovery.
A switch architecture for concurrent testing and diagnosis for faults in multistage interconnection networks has been proposed in (Minsu Choi, 2003). The compound effect of faulttolerant operation has been evaluated and the results show graceful performance enhancement due to fault tolerance. A partially self checking scheme for combinational circuits with concurrent error detection facility has been presented in (KShirsagar, 2007). The results have been related to significantly reduce the area overhead in two level circuits. An FPGA architecture that is composed of functional cells has been discussed in (Lala, 2008) to bring out its error correction capabilities. An architecture that enables tolerance of single bit errors in a functional cell of the FPGA has been presented. The occurrence of faults relate to unpredictable changes in the components of a logic circuit and permanently alter the logic function in the sense it may lead to deviations from the specified values of logic variables. Thus, an astute fault generator along with a facility to arrive at the correct state of the variable is of crucial significance to ensure the dedicated purpose of the digital system in use. The paper houses four major divisions which explain the formulation procedure, simulation results, experimental elucidation and end with concluding comments.
the queue length of the ONU, but without considering the packet arriving during the waiting time. It is observed that the packet delay in those DBA schemes is close to 1.5 transmission cycle time. Therefore, the packets arriving in the waiting time cannot be transmitted in the current cycle even if the ONU is lightly-loaded. This will result in longer packet relayed delay and is unfair to the lightly-loaded ONU. To improve the drawbacks and offer better QoS in the Multi-EPON system, the proposed One-Wait DBA shifts the report time of ONUs purposely in order to enable the Bridge ONU to obtain more up-to-date buffer occupancy information from each ONU. This point is further illustrated in Fig. 6. The ONU i uploads its REPORT message between
Selecting optimal paths for efficient inter process communication is essential in parallel processing systems. In this system, if each and every processor has the status of all processors then an optimal routing can be possible. In a system it may be possible for each component suffers from hardware or software problem. If the system can’t handle the faulty problem, that is unreliable, inefficient -.
Alternative: SFB 15 "PUT" and SFB 14 "GET" in the fault-tolerantsystem: As an alternative, use two SFB 15 "PUT" blocks over two standard connections. First call the first block. If there was no error message when the block executed, the transfer is assumed to have been successful. If there was an error message, the data transfer is repeated via the second block. If a connection cancelation is detected later, the data is also transferred again to exclude possible information losses. You can use the same method with an SFB 14 "GET". If possible, use the mechanisms of S7 communication for communication.
MRFDI is tested for fault diagnosis of actuator fault, instrument error and environmental uncertainty. Results reveals that MRFDI is correlating variances in actuator feedback and aircraft parameter for robust fault diagnosis. FTFC is developed by incorporating twin actuator configuration and embedded with MRFDI. MRFDI injects protocol to FTFC to allocate control among primary and secondary actuator. FTFC is successfully allocating control to secondary actuator during occurrence of fault in actuator. The proposed fault diagnosis methodology demonstrates the ability to classify different fault sources existed in the aircraft. The future work is to implement system identification to prototype aircraft and test functionality of MRFDI and FTFC.
Some of these systems change the low level style and the execution of the planned circuits to avoid the appearance of sensitive sprays. The different procedures address the level of thought that accompanies it by including the redundancy that you will observe and the real problems. The protection of mechanized channels has been widely analyzed. for example, fault-tolerant executions have maintained the use of development grouping systems or design codes have been devised. the use of a reduced precision replica or a word level confirmation has been further criticized for another option to execute the fault amendment consisting of using 2 executions of absolutely exceptional deviations in parallel. Each of these images focuses on the security of a channel. The bumble encoding is used for non-tolerant management in the PC memory, the captivating and optical information accumulated by the media, satellites and commercial parties, organizes coincidences, remote frameworks, and for all intents and purposes some other kind of automated correspondence.
Multilevel inverters have a large number of power devices, any device failure may cause the abnormal operation of the electrical drives, and require shutdown of the inverter and the whole system to avoid further serious damage. However, in some critical industrial processes with high standstill cost and safety-aspect concern, a high reliability and survivability of the drive system is very important. Therefore, fault-tolerant operation of multilevel inverters has drawn lots of interest in recent years, and several researchers have addressed the fault- tolerant issues for the popular multilevel topologies, such as neutral-point-clamped (NPC) inverters, flying capacitor inverters, cascaded H-bridge inverters, and generalized inverters.
WSNs. One of the main benefits of residue number system is that they facilitate the detection and correction of error because all the digits are independent . However, there are several approaches that have been studied for faulttolerant wireless sensor network, some of which are discussed. Low Energy Adaptive Clustering Hierarchy (LEACH) Is the first and most effective energy-efficient hierarchical clustering algorithm for WSNs that was proposed for reducing power consumption . In LEACH protocols, the sensor nodes are divided into clusters, then sensor node with higher resources is selected as cluster head (CH). The CH organizes all the activities within its cluster. It is also the responsibility of CH to gather information from cluster nodes, aggregate and remove between gathered data in order to reduce the energy requirements for sending data packets from the CH to another CH or to the base station . In any case, an approach that depends on a packet-splitting algorithm based CRT that is characterized by a simple modular division between integers was proposed . The application has low overhead in calculation, correspondence and capacity, resistant to DoS assault. A trade-off between vitality effectiveness and dependability of the CRT sending plan when obligation cycling systems are considered was explored . This was accomplished with a direct increment in the general multifaceted nature and with low overhead. It was also observed that the constrained vitality utilization prerequisites and the low many-sided quality in the sensor equipment require vitality proficient error control and avoid high unpredictability codes to be sent . Redundant moduli that assume no part in deciding the dynamic range was presented. This was utilized as a part of WSNs to diminish renew information sending by means of happen error in information packet which was centered around low many-sided quality error detection techniques which was executed with low information repetition and productive vitality devouring in remote sensor hub utilizing residue number systems.
As feature sizes shrink, faults occur in on-chip network become a critical problem. At the same time, many applications require guarantees on both message arrival probability and response time. We address the problem of router failures by means of designing fault-tolerant architecture. The proposed architecture not only is able to recover from routers failure, but also improves the average response time of the system. In this design, in order to avoid adding a port in a router, a new component is also developed to reduce hardware overhead.
As the number of transistors on a chip increases, the problems associated with deep sub-micron will become more pronounced and may therefore pose problems of link and/or switch failures that have exacerbated reliability issues in on-chip interconnects ,. This argument strengthens the notion that chips need to be designed with some level of built-in fault tolerance . Two different kinds of errors are probable to occur in a NoC: transient and permanent errors. Transient failures can occur on a chip for many reasons: alpha particles emitted by trace uranium and thorium impurities in packages and also high-energy neutrons from cosmic radiations can cause soft errors in semiconductor devices. Similarly, low energy cosmic neutrons interacting with isotope boron-10 can cause soft errors. These events, generally called single-event upsets, can affect the storage elements of a chip such as latches, memory and registers . These faults are treated by error detection and correction coding techniques and retransmitting data. Crash or permanent failures can occur due to electro migration of a conductor or a connection failure permanently halting the operation of some modules. These faults like Gaussian noise on a channel and alpha particles strikes on memory and logic elements can cause one or more bits to be in error but do not cause permanent failures . Generally permanent errors do not disappear as the time passes, hence the retransmitting solution does not solve the problem. Therefore dynamic rerouting concept would be helpful. By assuming a faulty switch in a mesh-based NoC, it is obvious that the core directly connected to the faulty switch is inaccessible and the rerouting technique is not helpful any more. Hence, the faulttolerant NoC architecture proposed in  recovers from switch failures by adding a redundant link between each core and one of its neighboring switches, and applying modifications to the NoC components in addition to using a rerouting strategy.
In [26–28], various methods of topology modification have been proposed in order to preserve the network ro- bustness against failures. The authors in  proposed a new protocol for topology control in wireless mesh net- work of hand-held devices. They selected a dominant set of interconnected nodes where the routing function is active in these nodes. This protocol results in the reduc- tion of collision, overhead, interference and energy con- sumption. However, the network is assumed single- channel, in which each node has a simple radio interface with no rate and power adjustment capability. In , Peng et al. proposed a linear network coding based fault-tolerant routing, which can recover the lost packets by the source. This method, by using multi-path routing and random linear network coding improves the con- ventional node selection methods. Another topology control method is investigated in , where the authors have created a K-Connected graph based on the channel assignment and routing. In these references, the authors have studied only some of the objectives which does not include the fairness and balancing.
representations of distributed attacks into smaller units that correspond to the distributed events indicating the attacks, then execute and coordinate the detection tasks in the places where the corresponding events were observed. Tao Peng et al  discussed the challenges faced by Distributed IDS using Cumulative Sum Algorithm and a machine learning approach. They proposed a robust scheme to monitor local statistics and then decide when to share information so that both communication overheads among the distributed detecting system and detecting delay were minimized. Cansian et al  presented an attack signature model which works on intrusion signature handling and analyzing, from storage to manipulation. Using the model, the process of storing and analyzing information about intrusion signature would become less difficult. The argument here is that the misused detection. technique works only with set of known attacks and the testing set, and fails the detect any new attack with would not have been occurred earlier and not in the set of known attacks i.e. it cannot detect unknown attacks. Liu Jianxiago, Li Lijunan proposed models which adopts decentralized distributed system. This Model detects intrusion in real time with Flexibility and Expansibility
HIS paper will implement the application on a virtualization of embedded system. This application is based on PSTR task model and uses serial ports. The PSTR model on this application consists of the primary and shadow processes. We will make a virtual hardware system, and implement the PSTR model process on this virtual hardware system. Application tasks use their serial ports for hardware input, and tasks, on different virtual machines, communicate with each other. We must solve the problems which occur when we migrate the features of real system into virtualization system. We also have to solve the problems which result from concurrent execution. We need to make several additional software modules for virtualizing hardware features. First one is the management module which requires to supervise the overall system. Second, as a system requirement, we must make possible of the simultaneous use of serial ports for the PSTR implementation. Also, those serial ports on the virtualization system have to satisfy the performance requirements of the hardware serial ports. Lastly, a communication method between the primary and the shadow
The current research is limited in terms of actual implementation. Accurately simulating the environment of outer space on Earth for testing of space hardware is a very difficult challenge. This complicates the process of hardware validation, since hardware performance must be evaluated during each test while performing the function appropriate to that environment, and the cumulative effect of environmental conditions cannot be at once determined. As a result, actual hardware performance data from operating satellites and space hardware is generally considered much more valuable than ground test data, and for most mission-critical hardware, it is preferrable to have been previously flight tested, except in the case of dedicated testing or demonstration missions. We are developing a test bed for the entire closed-loop attitude control system and payload with actual nanosatellite reaction wheel hardware on a 3-axis rotating air bearing 12, 13 to validate the system as shown in Figs. 6 and 7. To overcome faults and noise injected into the hardware, adaptive terminal sliding mode control laws with a fuzzy system, second order sliding mode control laws with a fuzzy system and third order sliding mode control laws with a fuzzy system are developed for discrete time, and tested on the spherical air bearing system. These controller laws can increase the attitude tracking control accuracy without using redundant reaction wheels. The experimental results show the proposed analytical faulttolerant control laws to be effective.
If the individual processes in a distributed system take checkpoints at uncon- nected points in their execution, it is difficult to ensure that the system as a whole can recoverfrom errors by making each process resume execution from its latest checkpoint. So an important approach to fault-tolerance is based on structuring the communication between a group of processes into atomic actions [?, ?]. The activity of a group of processes constitutes an atomic action if there are no inter- actions between that group and the rest of the the system for the duration of the activity. This must also ensure of course that there is no error propagation to the rest of the system during the performance of the atomic action. Another method [?, ?, ?, ?, ?] limits the exchange of information between processes to conver- sation structures. A conversation permits the exchange of information between processes in such a way that processes may independently enter the conversation region but all communicating processes must leave it at the same time after the establishment of their individual checkpoints. A common feature of these methods is the requirement that recovery is performed from a consistent set of checkpoints. Recent work [?, ?] concerns the design of protocols to take a consistent set of checkpoints.
Redundancy is an additional resource supporting parallel computation of the same process. In faulttolerant design, redundancy helps to increase the level of reliability and safety through consensus. This is normally achieved through a logical mixture of hardware and software, keeping in mind, the contradictory requirements of cost and timing constraints. Hardware redundancy is the replication of hardware components within a system. It is commonly used for addressing hardware and operational faults and for supporting various forms of software redundancy. Hardware is replicated in units with independent resources such as processing unit, peripheral devices, input/output interfaces, power supply and clock facilities. The objective of hardware redundancy in faulttolerant architecture is to partition the system into fault containment regions such that the non-faulty FCR can operate correctly in spite of a fault in some other FCR.