Design and Analysis of Defect- and Fault-Tolerant Nano-Computing Systems. Debayan Bhaduri. Doctor of Philosophy in Computer Engineering

(1)

Debayan Bhaduri

Dissertation submitted to the Faculty of the Virginia Polytechnic Institute and State University in partial fulfillment of the requirements for the degree of

Doctor of Philosophy in

Computer Engineering

Dr. Sandeep K. Shukla, Chair Dr. Dong S. Ha Dr. Michael S. Hsiao Dr. Ira Jacobs Dr. James E. Shockley Dr. Paul S. Graham February 19, 2007 Blacksburg, Virginia

Keywords: Nanotechnology, Reliability analysis techniques, Reliability driven design methodologies, Markovian analysis techniques, probabilistic model checking

c

(2)

Debayan Bhaduri

(ABSTRACT)

The steady downscaling of CMOS technology has led to the development of devices with nanometer dimensions. Contemporaneously, maturity in technologies such as chem-ical self-assembly and DNA scaffolding has influenced the rapid development of non-CMOS nanodevices including vertical carbon nanotube (CNT) transistors and molecular switches. One main problem in manufacturing defect-free nanodevices, both CMOS and non-CMOS, is the inherent variability in nanoscale fabrication processes. Compared to current CMOS devices, nanodevices are also more susceptible to signal noise and thermal perturbations.

One approach for developing robust digital systems from such unreliable nanodevices is to introduce defect- and fault-tolerance at the architecture level. Structurally redundant architectures, reconfigurable architectures and architectures that are a hybrid of the previ-ous two have been proposed as potential defect- and fault-tolerant nanoscale architectures. Hence, the design of reliable nanoscale digital systems will require detailed architectural exploration. In this dissertation, we develop probabilistic methodologies and CAD tools to expedite the exploration of defect- and fault-tolerant architectures. These methodologies and tools will provide nanoscale system designers with the capability to carry out trade-off analysis in terms of area, delay, redundancy and reliability.

During execution, the next state of a digital system is only dependent on the present state and the digital signals propagate in discrete time. Hence, we have used Markov processes

(3)

(DTMCs) have been used to analyze logic architectures and Markov Decision processes (MDPs) have been used to analyze memory architectures. Since structurally redundant and reconfigurable nanoarchitectures may consist of millions of nanodevices, we have applied state space partitioning techniques and Belief propagation to scale these techniques.

We have developed three toolsets based on these Markovian techniques. One of these toolsets has been specifically developed for the architectural exploration of molecular logic systems. The toolset can generate defect maps for isolating defective nanodevices and provide capabilities to organize structurally redundant fault-tolerant architectures with the non-defective devices. Design trade-offs for each of these architectures can be computed in terms of signal delay, area, redundancy and reliability. Another tool called HMAN (Hybrid Memory Analyzer) has been developed for analyzing molecular memory systems. Besides analyzing reliability-redundancy trade-offs using MDPs, HMAN provides a very accurate redundancy-delay trade-off analysis using HSPICE. SETRA (Scalable, Extensible Tool for Reliability Analysis) has been specifically designed for analyzing nanoscale CMOS logic architectures with DTMCs. SETRA also integrates well with current industry-standard CAD tools.

It has been shown that multimodal computational models capture the operation of emerg-ing nanoscale devices such as vertical CNT transistors, instead of the bimodal Boolean computational model that has been used to understand the operation of current electronic devices. We have extended an existing multimodal computational model based on Markov Random Fields (MRFs) for analyzing structurally redundant and reconfigurable architec-tures. Hence, this dissertation develops multiple probabilistic methodologies and tools for performing nanoscale architectural exploration. It also looks at different defect- and fault-tolerant architectures and explores different nanotechnologies.

(4)

Mrs. Bharati Bhaduri

and

Mr. Dwipendra Nath Bhaduri

(5)

I would like to take this opportunity to thank my advisor Dr. Sandeep K. Shukla for his motivation and support throughout my graduate school. Dr. Shukla has helped me develop analytical and problem-solving skills that have been of immense help in overcoming many of the difficulties that arose during my doctoral work.

I thank Dr. Dong S. Ha, Dr. Michael S. Hsiao, Dr. Ira Jacobs and Dr. James E. Shockley for serving as committee members for this dissertation.

I would also like to thank Dr Paul S. Graham and Dr. Maya Gokhale of the Los Alamos National Laboratory for their constant support and guidance. My association with Dr. Graham dates back to 2004. Since then, he has mentored me during my interships at the Los Alamos National Laboratory and helped in providing a very strong practical angle to my doctoral work. I regard meeting Dr. Shukla and Dr. Graham as the two most important events during my gaduate school.

I would like to acknowledge the support of NSF grant CCR-0340740 and Los Alamos National Laboratory Scalable Reconfigurable Computing Project which provided for the funding for the work reported in this dissertation.

(6)

Patel, Sumit Ahuja, Animesh Patcha and Heather Quinn for being great friends. They have helped me in innumerable ways during my doctoral work.

It would be a long list to mention all the other friends I am indebted to. I gratefully thank all of them.

I would like to thank Mrs. and Dr. Subir Maitra for always having faith in my abilities to make it through graduate school.

The acknowledgement of the long-term support of my parents, Dipen and Bharati Bhaduri, my sister and brother-in-law Debalina and Indranil Roy, and my wife, Soma Bhaduri, may be ritual in a work of this nature, but is nonetheless appropriate and heartfelt.

(7)

Dedication iv

Acknowledgements v

List of Figures xvi

List of Tables xxii

1 Introduction 1

1.1 From microelectronics to nanoelectronics . . . 1

1.2 Emerging Technologies and Techniques . . . 3

1.2.1 Brief Introduction to Emerging Technologies . . . 3

(8)

1.3 Problems and Challenges . . . 7

1.4 Main Contributions . . . 10

1.5 Organization . . . 13

2 Background and Related Work 16 2.1 Defect-tolerance through Reconfiguration . . . 16

2.2 Fault-tolerance through Structural Redundancy . . . 17

2.3 Crossbar Architecture . . . 19

2.3.1 Crossbar-based Logic Architectures . . . 20

2.3.2 Crossbar-based Memory Architectures . . . 21

2.4 Markov Random Fields . . . 24

2.5 Hierarchical Logic Mapping Methodology . . . 28

2.6 Probabilistic Model Checking . . . 30

2.7 Generalized Reliability Analysis Techniques . . . 33

(9)

2.8.1 NANOLAB . . . 35

2.8.2 NANOPRISM . . . 35

2.8.3 Probabilistic Transfer Matrices . . . 36

2.8.4 SHARPE2000 . . . 37

3 Comparing von Neumann Multiplexing Architectures 39 3.1 Main Results . . . 41

3.2 Enhancements to our Automation Framework . . . 43

3.4 Von Neumann Multiplexing . . . 44

3.5 Fault Model . . . 47

3.6 Methodology . . . 48

3.7 Comparison with PTM-based Methodology . . . 51

3.8 Experimental Results . . . 55

(10)

3.8.2 Small Gate Failure Probabilities . . . 59

3.8.3 Large Gate Failure Probabilities . . . 61

3.8.4 Small Redundancy Factors . . . 64

3.9 Chip-Level Analysis . . . 65

3.10 Modeling of Noisy Interconnections . . . 67

3.10.1 Methodology . . . 68

3.10.2 Experimental Results . . . 69

3.10.2.1 Comparison of NAND and MAJ MUX . . . 70

3.10.2.2 Comparison of NAND MUX architectures withN = 10 andN = 20 . . . 71

3.10.2.3 Comparison of MAJ MUX architectures withN = 10 andN = 20 . . . 73

3.11 Conclusion . . . 74

4 Analyzing Fault-Tolerant Reconfigurable Architectures 76

(11)

4.3 Reconfigurable Architectures . . . 79

4.4 Mapping MRF-based Logic Functions to CMOS . . . 80

4.5 Loopy Belief Propagation and Loop Unrolling . . . 82

4.6 Fault Model . . . 85

4.8.1 Reliability Measures of the Axcelerator CLB . . . 87

4.8.2 Energy distributions at the Output of the ProAsic CLB . . . 88

4.8.3 Reliability Analysis of ALU . . . 89

4.8.4 Entropy at the Output of the 32 bit ALU . . . 90

4.8.5 Reliability Analysis of16×4Memory . . . 91

(12)

5.1.1 Probabilistic defect-mapping Mechanism . . . 100

5.1.2 Hierarchical Redundancy Insertion Methodology . . . 101

5.1.3 Toolset . . . 102

5.3 Nanofabric and Fault Models . . . 103

5.3.1 Nanofabric Model . . . 103

5.3.2 Fault Model . . . 104

5.4.1 Test Circuits . . . 105

5.4.2 Defect-mapping Technique . . . 107

5.4.3 Hierarchical Redundancy Insertion Methodology . . . 109

5.4.4 Design Flow . . . 111

(13)

5.5.2 Analyzing Hierarchical Redundancy Insertion Methodology . . . 119

6 A Hybrid Framework for Design and Analysis of Fault-Tolerant Molecular Memory Architectures 126 6.1 Main Contributions . . . 128

6.3 Fault (Defect) and Circuit Model . . . 129

6.5 Multi-Junction Memory Architecture . . . 134

6.6.1 Reliability vs. Redundancy vs. Delay . . . 141

6.6.2 Reliability vs. Redundancy vs. Area . . . 141

6.6.3 Comparison of Memory Architectures . . . 143

(14)

7.1.1 Organization . . . 150

7.2 Reliability Analysis with PMC . . . 150

7.2.1 Similarities with other Methodologies . . . 154

7.2.2 Scalability Problem . . . 155

7.3 Scalable Reliability Analysis . . . 157

7.3.1 Applicability to PRISM . . . 159

7.4 SETRA: An Overview . . . 162

7.4.1 SETRA Design Flow . . . 163

7.4.2 SETRA Implementation Details . . . 164

7.5.1 Non-redundant Designs . . . 169

7.5.2 CTMR-based Designs . . . 170

7.5.3 Designs with TMR at different granularity levels . . . 171

(15)

7.5.5 Chip-level Analysis . . . 175

8 Comparing SETRA and STARS-C 178 8.1 Main Contributions . . . 179

8.3 Library Characterization . . . 180

8.3.1 Yield Projection Models . . . 181

8.4 STARS-C: An Overview . . . 184

9 Summary and Conclusion 192

References 196

(16)

1.1 Design Philosophies . . . 9

1.2 Overview of Dissertation Structure . . . 14

2.1 TMR for sequential and combinational logic . . . 18

2.2 Different CTMR configurations . . . 19

2.3 A crossbar with molecular latches built form hysteretic resistors . . . 21

2.4 A non-redundant molecular crossbar memory [29] . . . 22

2.5 Redundancy at Module Level (Banking) [30] . . . 23

2.6 A NAND gate depicted as a MRF . . . 26

2.7 Three level design hierarchy from [68] c2004 IEEE . . . 28

2.8 The set of primitive flows from [68] c2004 IEEE . . . 29

(17)

3.1 Original von Neumann MUX schemes . . . 44

3.2 Probability of at least90%of the outputs being correct for small gate fail-ure probabilities . . . 59

3.3 Probability of at least90%of the outputs being correct for large gate fail-ure probabilities . . . 61

3.4 Probability of at least90%of the outputs being correct for small R . . . . 63

3.5 Maximum allowed device failure probability for MAJ MUX . . . 65

3.6 Probability of at least90%of the outputs being correct in the presence of different noise spikes and gate failure probability of0.001 . . . 70

4.1 Actel CLBs . . . 80

4.2 CMOS circuit for a single input inverter . . . 81

(18)

4.4 Translation of a single cycle graph to an unwrapped tree . . . 83

4.5 Entropy and energy distribution at the outputs of different CLBs . . . 87

4.6 A circuit diagram of a one bit Arithmetic Logic Unit (ALU) . . . 89

4.7 Entropy at the output of different architectures for a 32 bit ALU . . . 91

4.8 A16×4memory . . . 92

4.9 Comparison of two redundant architectural configurations of a memory . 93 5.1 Molecular Nanofabric . . . 97

5.2 Cover for AR filter design . . . 109

5.3 Behavioral and structural redundancy . . . 110

5.4 Design Flow . . . 111

5.5 Application of our methodology . . . 113

5.6 Percentage of non-defective PEs for different nanofabric sizes . . . 117

5.7 Broadcast latency . . . 119

(19)

5.9 Reliability-redundancy-delay trade-offs . . . 122

5.10 Trade-offs for different redundant configurations for the AR filter . . . 123

6.1 Circuit Template parameterized by memory size [2] cIEEE 2004 . . . . 130

6.2 HMAN Framework . . . 132

6.3 2×2Memory withR= 2 . . . 134

6.4 A2×2Memory withR = 4 . . . 136

6.5 Reliability vs. redundancy vs. delay trade-offs for256×256memory . . 140

6.6 Reliability vs. redundancy vs. area trade-offs for1024×1024memory . . 142

7.1 DTMC of a NAND gate . . . 151

7.2 Transition probability matrix for NAND DTMC . . . 152

7.3 Example circuit . . . 155

7.4 State space size for two small circuits . . . 156

7.5 Illustration of the technique . . . 157

(20)

7.7 SETRA design flow . . . 163

7.8 Snippet of MXML syntax . . . 165

7.9 Class diagram of SETRA’s circuit structure . . . 166

7.10 Hierarchical netlists handled by SETRA . . . 167

7.11 Maximum filter design . . . 168

7.12 Reliability of CTMR-based design variants of the adder tree . . . 170

7.13 Reliability of adder tree designs with TMR at different granularity levels . 172 7.14 Reliability of N-MR designs for the adder tree circuit . . . 173

7.15 Maximum allowed gate failure probability for N-MR technique . . . 175

8.1 Library characterization . . . 180

8.2 Transistor failure rates as feature size decreases (MPU,1/2pitch, uncon-tacted poly) . . . 183

8.3 Overview of STARS-C system . . . 185

(21)

projection models and circuits . . . 190

(22)

1.1 Parameters that have Improved due to Feature Scaling [67] . . . 2

1.2 Estimated Parameters for Emerging Technologies from [65] c2002 IEEE 4

2.1 Logic compatibility function of a NAND gate . . . 27

3.1 Probability of correctness at a NAND output . . . 53

3.2 Probability of all NAND MUX outputs being correct . . . 53

3.3 Comparative analysis of state space and runtime . . . 57

4.1 Operations of one bit ALU . . . 90

5.1 Execution time for the design and analysis of systems . . . 116

(23)

figurations . . . 138

6.2 Comparison of memory architectures . . . 143

7.1 State space size for multi-bit adders . . . 157

7.2 Reliability evaluation results . . . 169

8.1 Summary of reliability results . . . 188

8.2 Comparison of runtimes . . . 190

(24)

Introduction

1.1 From microelectronics to nanoelectronics

For four decades, the rapid pace of improvement in microelectronics has been based on the ability to exponentially decrease the minimum feature sizes used to fabricate integrated circuits. The different parameters associated with integrated circuits that have improved due to such feature scaling are outlined in Table 1.1. One of the improvements that is cited frequently is the improvement in integration level and is expressed as Moore’s law—the number of components per chip doubles every24months.

The scaling of CMOS technology has faced many barriers, but clever engineering solu-tions and new device architectures have thus far broken through such barriers. However, the size of a silicon atom will be an indisputable barrier in CMOS scaling [92]. From

2001, the ITRS Roadmap has challenged the practicality of CMOS scaling projections

(25)

Parameter Example

Integration Level Components/chip, Moore’s law

Cost Cost per function

Speed Microprocessor clock rate, GHz Power Laptop or cell phone battery life Compactness Small and light-weight products Functionality Nonvolatile memory, imager

Table 1.1: Parameters that have Improved due to Feature Scaling [67]

beyond MOSFET channel lengths of 9 nm and has addressed the need for non-CMOS technologies.

In the recent past, a number of novel non-CMOS nanotechnologies have emerged that have shown potential in enhancing the CMOS platform and demonstrated promise to develop fundamentally new approaches to information processing. Some of these nanotechnolo-gies have led to the development of novel memory and logic nanodevices. Engineered tunnel barrier memory, polymer memory and molecular memory are some of the novel nanomemories under research, whereas, carbon nanotubes (CNTs), molecular and spin devices are some of the non-CMOS logic devices that have matured significantly. These logic devices have high switching speed, low power consumption and demonstrate good scaling potential. These nanodevices represent charge-based logic and their scaling is limited by the minimum switching energy per binary operation, also called the thermody-namic limit [11]. Beyond this limit, the challenge is to invent and develop nanotechnolo-gies based on something other than electronic charge. Ferromagnetic logic and spin gain devices have been identified as some of the first potential non-charge-based devices.

(26)

1.2 Emerging Technologies and Techniques

This section introduces some novel technologies that are being looked at to build nanoscale devices. Also, some of the emerging techniques that can be used to integrate these devices are discussed.

1.2.1 Brief Introduction to Emerging Technologies

Table 1.2 compares CMOS and a set of emerging technologies in terms of speed, size, power consumed and manufacturing cost per device. Such a comparison illustrates that few of the new technologies are directly competitive with scaled CMOS and most are highly complementary. In Table 1.2, T refers to a single delay, CD refers to critical dimen-sion, Energy is the intrinsic operational energy (Joules/operation), and cost is defined as $ per gate.

A number of assumptions are made to estimate the parameters for these non-silicon man-ufacturing technologies in the absence of firm empirical data. For instance, some of these technologies are particularly effective for certain application areas. Specifically, the appli-cability of theoretical quantum computing is in finding the prime factors of large numbers in polynomial time, considerably in less time than any classical algorithm [110]. But quan-tum computing is much less efficient for other applications. In this case, the time required by a classical device to perform an operation using a classical algorithm is defined as “ef-fective” time per operation. The different parameters for quantum computing (shown in Table 1.2) are determined by calculating the “effective” time per operation for different algorithms. A similar approach is used for neuromorphic and optical computing.

(27)

Technology Tminsec Tmax sec CDmin m CDmaxm Energy Cost min Cost max

Si CMOS 3E-11 1E-6 8E-9 5E-6 4E-18 4E-9 3E-3

RSFQ 1E-12 5E-11 3E-7 1E-6 2E-18 1E-3 1E-2

Plastic 1E-4 1E-3 1E-4 1E-3 1E-24 1E-9 1E-6

Optical (digital) 1E-16 1E-12 2E-7 2E-6 1E-12 1E-3 1E-2

NEMS 1E-7 1E-3 1E-8 1E-7 1E-21 1E-8 1E-5

Neuromorphic 1E-13 1E-4 6E-6 6E-6 3E-25 5E-4 1E-2

Quantum Computing 1E-16 1E-15 1E-8 1E-7 1E-21 1E3 1E5

Table 1.2: Estimated Parameters for Emerging Technologies from [65] c2002 IEEE

Next, we briefly discuss some of these emerging technologies. [118, 119] discuss a new technology for silicon transistor fabrication on plastic substrates. Such transistors are called thin-film transistor (TFT) devices, wherein the active layer of these devices can be amorphous or polycrystalline-silicon. These devices can be used in combination with organic light-emitting devices (OLED) [62, 109] for the development of high-resolution flat-panel displays. Currently, the most common type of flat panel display is the Liquid Crystal Display (LCD) made on glass substrates. A large percentage of the manufacturing cost of such a display comes from the material cost of the glass panels used for the front and back display surfaces, the driver electronics used to address the display, and display breakage. High quality TFTs on plastic substrates could eliminate costs incurred due to the glass and the driver IC’s and thus provide highly flexible and cost-effective displays. However, this novel technique is only in its nascency and considerable research is being done to perfect this manufacturing technology.

(28)

pro-cessing is dependent on light transmission and interaction with solids. An optical logic gate is a switch that controls one light beam with another. The device is considered to be “on” when it transmits light and “off” when it blocks the light. Digital optical computers have certain advantages, and these are due to certain characteristics of light that is used as an information carrier. First, optical information-processing functions can be performed in parallel and second, optical beams do not cause any interference with each other. Lastly, optical signals can be propagated at the speed of light in a media.

Nanoelectromechanical systems (NEMS) consist of integrated electromechanical actua-tors of nanoscale dimensions that are driven by electrical energy. These systems have ini-tiated several researchers to look at the possibility of developing fast logic gates, switches, and even computers that are entirely mechanical. Optimistic empirical estimates predict that the switching speed of NEMS logic gates will be around0.1ns and these devices will dissipate less than 10−21 Joules. The idea of electromechanical actuators and mechani-cal computers dates back to the 1820s [105], when Charles Babbage designed the first mechanical computer, viewed as the forerunner to the modern computer. In the 1960s, electronic logic gates and integrated circuits vastly outperformed moving elements as a result of which the idea of mechanical computers was dropped. But with the rapid devel-opments in nanotechnology, it may be possible to manufacture complex molecular-scale mechanical elements that will move on time-scales of a nanosecond or less.

In contrast to other technologies, quantum computing exploits physical phenomenon unique to quantum mechanics. At the quantum level, the values of certain observable quantities are restricted to a discrete finite set (Quantization). This is to ensure that each classical bit can be stored as a stable state of the system. The fundamental unit of information in a quantum computer is called a quantum bit or qubit [123]. A qubit can be a1or a0, or, it can exist in a superposition that is simultaneously both1and0or somewhere in between.

(29)

This implies that quantum computers are not limited to only two states as compared to classical digital systems. The massive parallelism intrinsic in quantum computing is due to such superposition of different states [44, 45].

At the extreme end of the spectrum of these developing technologies are neuromorphic systems, which are silicon implementations of sensory and neural systems. The designs of such systems are inspired by neurobiology [113]. This is because the human brain is a clas-sical neuromorphic information processing system and can be considered as a motivation for future technological advancements. The different parameters shown in Table 1.2 for the human brain are approximate estimations. For example, the critical dimension of each neuron is computed by estimating the volume of the brain and the number of neurons. This developing technological area offers exciting possibilities such as sensory systems that can compete with human senses and pattern recognition systems that can run in real time.

1.2.2 Techniques to Integrate Nanoscale Devices

The current techniques such as casting, grinding, milling and lithography used to integrate silicon-based devices may not scale well for nanoscale device integration. This is because, a large number of nanodevices may be needed at the architectural level to circumvent reliability issues associated with nanoscale devices built from small number of atoms [40]. In this subsection, we discuss some of these techniques in brief.

Self-assembly is a spontaneous process by which atoms and molecules form organized aggregates or networks, typically by interacting with a solution or gas phase [64]. This technique involves a process known as convergent synthesis [86], that allows assembling

(30)

atomically precise devices. Structures formed by this process are covalently bonded, well-defined and stable.

Positional assembly [85] is not the same as self-assembly. The main difference between the two processes is as follows: positional assembly provides a higher degree of control on positional placements of atoms and molecules to form arbitrary stable structures, however, self-assembly does not do so. Self-assembly is largely a natural process that only needs an external initiation. Researchers are trying to conglomerate these two processes to perform novel and cost-effective nanoscale device integration.

1.3 Problems and Challenges

Nanoelectronics has advanced appreciably in the recent past and has shown potential for large scales of integration, specifically of an order of a trillion (1012_{) devices in a square}

centimeter. But at the same time some of the characteristics of these nanodevices and their fabrication methods pose as prominent limitations to such ultrascale integration. Some of these characteristics are manufacturing defects, unreliable device performance, intercon-nect limitations and thermal power dissipation [48, 67, 84].

The unreliability of nanoscale devices is a consequence of the inherent variability in fab-rication processes and the physical principles that govern their operation. As discussed in Subsection 1.2.2, self-assembly methods may have to be used at dimensions below those for which conventional lithographic-defined subtractive processing methods cannot be used. Since variability and imprecision are inherent in such self-assembly processes, it is estimated that significant number of devices, up to many percent, may suffer from

(31)

manu-facturing defects. The other source of unreliability is due to the physical principles of these devices that cause reduced noise tolerance and higher susceptibility to external influences such as electromagnetic interference, thermal perturbations, terrestrial radiation, etc., re-sulting in in-service transient faults. For the management of defective and error-prone devices, several defect- and fault-tolerant nanoarchitectures based on techniques such as multiplexing, Triple Modular Redundancy (TMR), Cascaded triple Modular Redundancy (CTMR) and reconfiguration are being investigated .

The problem of interconnects is due to a number of fundamental challenges. These are: (i) the geometrical challenge of interconnecting devices at the nanoscale dimensions at high speed and bandwidth, (ii) interfacing these devices with the macroscopic world, and (iii) the challenge of transforming long-distance communications to short-distance com-munications for nanodevices that have low drive capabilities. To tackle some of these challenges, fabrication processes for producing aligned wires and highly regular, homoge-neous and locally connected parallel architectures have been proposed.

The other challenge is thermal power dissipation that comes from device switching en-ergy and the enen-ergy needed for driving signals. Hence, there is a trade-off between clock speed and device density—clock speeds need to be decreased for high device densities and densities need to be lowered for high clock speeds. The problem of power dissipa-tion sets a general limit to the operadissipa-tional speed of any charge-based nanodevice. Hence, researchers are looking at charge recovery [70], reversible computing [76] and adiabatic computing [3].

In this dissertation, we focus on the problem of organizing defective and fault-prone nan-odevices. One of the solutions to this problem is the development of defect- and fault-tolerant nanoarchitectures that will aid the reliable integration of such nanodevices, hence

(32)

Figure 1.1: Design Philosophies

allowing the demonstration of their full potential [10]. Recent publications on the devel-opment of nanoarchitectures have been more focused on defect-tolerance and to a lesser extent on transient fault-tolerance. We emphasize the need for tackling both manufacturing defects and in-service transient faults and investigate the feasibility of (i) reconfigurable architectures that can be robust to defects, (ii) structural redundancy-based architectures that can tackle transient faults and (iii) hybrid reconfigurable and structurally redundant architectures that can provide robustness to both defects and faults.

As shown in Figure 1.1, there are two schools of thought for designing nanoarchitectures for specific systems and the application of any of these design philosophies depends on the failure-tolerance threshold of the system being designed. One school of thought [20] considers mapping systems directly onto unreliable nanofabrics (specific organizations of nanodevices) [39, 41, 50] with adequate redundancy factor(R) that can guarantee toler-ance to transient faults. We define R as the ratio of the circuit sizes of the redundant and non-redundant designs. This saves computational time and cost associated with defect mapping and avoidance techniques—techniques used to identify and avoid defective de-vices in the nanofabrics (Section 2.1). Note that this design philosophy is only applicable

(33)

to systems that are not mission-critical and admit low but nonzero failure probabilities. The other school of thought is applicable to mission-critical systems that need100% relia-bility guarantee, hence, this design philosophy uses defect mapping and defect avoidance techniques on reconfigurable nanofabrics to circumvent manufacturing defects and then applies structural redundancy-based techniques to tackle transient faults [55].

1.4 Main Contributions

In this dissertation, we have developed multiple methodologies, models and CAD tools that expedite and improve the aforementioned nanoarchitecture design philosophies, and aid in analyzing some of the emerging nanoarchitectures and providing guidelines for developing such architectures for specific nanodevices, nanotechnologies and computing systems. Some of the specific contributions are as follows:

1. Development of techniques and tools to analyze structural redundancy-based archi-tectures.

2. Development of techniques and tools to design and analyze hybrid reconfigurable and structural redundancy- based architectures.

3. Development of tools to design and analyze molecular logic architectures.

4. Development of tools to design and analyze molecular memory architectures.

5. Addressing scalability issues of our tools.

(34)

7. Comparison of our tools with other state of the art tools.

1. Multiplexing [121] has been identified as one of the most effective techniques for transient fault mitigation [67]. Hence, we select this architecture as a representative ex-ample for redundancy-based nanoarchitectures and develop models and methodologies to automate the analysis of these architectures both in the presence of computational and interconnect noise. We extend a probabilistic model checking (PMC) based tool called NANOPRISM (Subsection 2.8.2) and use this framework to (i) compare the reliability of individual NAND and MAJ multiplexing systems both in the presence of gate and wire noise, (ii) compute the device failure thresholds that can be tolerated by multiplexing-based nanochips, and (iii) compare these thresholds with theoretical values to show the advantages and limitations of our methodology.

2. There is a need to develop and analyze architectures that are robust to both manu-facturing defects and transient faults. To this end, we develop methodologies to analyze reconfigurable architectures that incorporate different redundancy-based techniques such as TMR, CTMR, multiplexing etc. The main challenges in designing such nanoarchi-tectures is determining the amount of redundancy and the architectural level at which such redundancy needs to be added. A computational scheme based on Markov Random Fields (MRF) was incorporated in a tool called NANOLAB (Subsection 2.8.1) to compute reliability-redundancy trade-offs of combinational circuits in the face of defects, thermal perturbations and interconnect noise. In this dissertation, we show how this methodology and tool can be extended to design and analyze hybrid reconfigurable architectures for both combinational and sequential systems. We implement a loopy Belief Propagation (BP) technique specifically to design and analyze fault-tolerant sequential nanosystems. Different industry standard (re)configurable nanofabrics and classical redundancy-based mitigation techniques are used to demonstrate the applicability of our methodology and

(35)

tool.

3. Chemically self-assembled molecular nanofabrics have been demonstrated as potential reconfigurable nanoarchitectures. Defect mapping methodologies [87, 96] that are used to circumvent permanent faults in these nanofabrics are inadequate to tackle probabilis-tic defect models and transient faults. Hence, we develop methodologies for molecular nanofabrics using PMC techniques (Section 2.6). We develop (i) a non-deterministic de-fect map generation scheme that extends a technique proposed by Dwyer et al. [96], (ii) a hierarchical methodology to design structural redundancy-based reconfigurable architec-tures by enhancing Jacome et al.’s methodology in [68], and (iii) developing a framework that integrates these techniques. We demonstrate the usefulness of this framework by de-signing and analyzing permanent and transient fault-tolerant signal and image processing molecular systems.

4. Likewise, the design of fault-tolerant molecular memory architectures will require in-tense analysis in terms of achievable performance measures–power dissipation, area, delay and reliability. Hence, we also develop a hybrid automation framework, called HMAN, that aids the design and analysis of such fault-tolerant architectures. HMAN uses PMC and circuit analysis techniques to analyze memory architectures at two different levels of the design abstraction, namely the system and circuit levels, and correlates different performance measures to provide guidelines for designing a robust nanomemory. We also illustrate the application of our framework by analyzing a hierarchical crossbar-based molecular memory.

5. In the recent years, other reliability evaluation methodologies such as probabilistic transfer matrices and probabilistic gate models (Section 2.8) similar to our probabilistic model checking based technique have been proposed. Scalability has been a concern in the

(36)

applicability of these methodologies to the reliability analysis of large nanocircuits. In this paper, we develop a general, scalable technique for these reliability evaluation methodolo-gies. Specifically, an algorithm is developed for the model checking based methodology.

6. One of the major problems with the reliability analysis methodologies proposed by us and others is their dissociation from the conventional CAD design flow. For small nanoscale circuits, this dissociation has not been as critical an issue as compared to the case of large nanocircuits. We have developed SETRA (Scalable, Extensible Tool for Reliability Analysis) to bridge this gap between standard CAD tools and the reliability evaluation methodologies.

7. To understand the advantages and limitations of our tools, we compare the analysis results with results obtained from a state of the art combinatorial tool STARS-C (Scalable Tool for Analyzing Reconfigurable Systems-Circuits) being developed at the Los Alamos National Laboratory. We use SETRA for this comparative study.

1.5 Organization

Figure 1.2 shows the structure of this dissertation—an overview that will help readers focus on topics or chapters that interest them. The dissertation begins with this introduc-tory chapter that discusses the problem, the solutions that are being proposed to tackle this problem and the solution space that we are focusing on. The next chapter, Chap-ter 2 discusses several background topics on defect- and fault-tolerant nanoarchitectures, probabilistic methodologies to design and analyze fault-tolerant nanosystems on such ar-chitectures and state of the art reliability analysis tools. Chapter 3 discusses a PMC-based

(37)

Introduction (Chapter 1) Background Check (Chapter 2) Archiecture (Chapters 3,4,5,6 and 7) Scalability (Chapters 7 and 8) SETRA (Chapter 7) Comparison (Chapter 8) Redundancy-based (Chapter 3) PRISM-based Tool (Chapter 3) SETRA (Chapter 7) Reconfigurable (Chapter 4) MRF-based Tool (Chapter 4) Hybrid (Chapters 4,5 and 6) Molecular (Chapters 5 and 6) Logic (Chapter 6) Industry-based (Chapter 4) MRF-based Tool (Chapter 4) PRISM- and HSPICE-based Tool (Chapter 5) SMART-based Tool (Chapter 6) (Chapter 5) STARS-C (Chapter 8) SETRA (Chapter 7) Memory

Figure 1.2: Overview of Dissertation Structure

methodology that we use to analyze structurally redundant von Neumann MUX systems and also shows how a PRISM-based tool has been enhanced for analyzing MUX archi-tectures. We develop a MRF-based design and analysis framework for industry-based reconfigurable nanoarchitectures and hybrid reconfigurable and redundancy-based nanoar-chitectures in Chapter 4. Chapters 5 and 6 discuss probabilistic methodologies to design and analyze fault-tolerant molecular logic and memory, respectively. These chapters ad-dress the need for scalable defect mapping and redundancy insertion methodologies, and demonstrates the importance of analyzing architectures at different levels of the design abstraction. Chapter 7 discusses the scalability issue of our CAD tools and proposes a technique and algorithm to scale our PMC-based reliability evaluation methodology. This algorithm is implemented in a tool called SETRA (Scalable, Extensible Tool for Reliability Analysis) that we have developed specifically to integrate our tools with the conventional CAD circuit design flow. We also compare SETRA with a combinatorial tool STARS-C

(38)

in Chapter 8 to highlight the advantages and limitations of our tool. Finally, Chapter 9 summarizes the work presented in this dissertation.

(39)

Background and Related Work

2.1 Defect-tolerance through Reconfiguration

A computer architecture that can be configured or programmed after fabrication to im-plement desired computations is said to be reconfigurable. Reconfigurable fabrics are composed of programmable logic elements (CLBs) and interconnects and these can be configured to implement any logic circuit. Defect-tolerance can be achieved by detecting defective components during an initial defect map phase and excluding them during actual configuration. It is expected that reconfigurable fabrics made from next generation fab-rication processes will go through a post-fabfab-rication defect mapping phase during which these fabrics will be configured for self-diagnosis [59, 87].

While such reconfigurable architectures may aid in avoiding manufacturing defects at the nanoscale, they will not provide tolerance to transient faults. There are two general classes

(40)

of defect mapping and avoidance techniques: (i) techniques that use test circuits to find the location and number of defects on a reconfigurable nanofabric [50, 87] and (ii) broadcast-based methods that flood test packets through the whole nanofabric to locate non-reachable nodes [96]. The test circuits or packets placed on the fabric during the self-diagnosis phase utilize resources that are available later for normal logic mapping. Although such defect mapping techniques can be performed with massive parallelism, they have been reported to be expensive in terms of cost and time.

2.2 Fault-tolerance through Structural Redundancy

A structurally redundant architecture is one which mitigates the effects of faults in the devices and interconnects that make up the architecture and guarantees a given level of reliability. Redundancy has been used for fault-tolerance both in hardware and software. Our focus is on resource or structural redundancy-based techniques such as Triple Mod-ular Redundancy (TMR), Cascaded Triple ModMod-ular Redundancy (CTMR), von Neumann multiplexing and their variations.

The concept of TMR [111] is to have three functionally identical units working in parallel and comparing their outputs with a majority (MAJ) gate to produce the final output. The units could be gates, logic blocks, logic functions or functional units. TMR provides a functionality similar to one of the three parallel units but provides a better probability of working [111]. The MAJ gate is a single failure point and three of these can be used instead of just one to improve the reliability of the system further.

(41)

Figure 2.1: TMR for sequential and combinational logic

robust error recovery mechanisms [52]. For example, if the triplicated units in TMR ar-chitectures have random logic with sequential elements and any one of the units are in error, there is no mechanism at the architecture level by which such an error can be de-tected until it manifests at the output of the modular unit. At that point in the execution of the system, the internal states of the erroneous and non-erroneous redundant modules will be inconsistent. [52] discusses the significance of feeding back the voted result to all the voted sequential elements to resynchronize all the redundant modules and avoid error build up. This implies that a Boolean network needs to be apportioned into combinational and sequential parts and then redundancy may be inserted. This is shown in Figure 2.1 where triple MAJ gates are used along with feedback.

CTMR [111] is similar to TMR, wherein the units working in parallel are TMR units combined with a MAJ gate. Figure 2.2 (a) shows a first-order CTMR configuration where the parallel processing units in each of the three TMR units are NAND gates. Due to the area and latency overheads associated with this technique, the triplicated units in the CTMR with a multi-layer voting scheme are normally functional units or logic blocks, not single gates as shown in Figure 2.2 (a). Since the triplicated functional units or logic blocks may consist of a large number of gates, their failure probability is more than individual

(42)

1 bit 1 bit 1 bit 1 bit MG MG MG MG x2 x2 y2 y2 x2 y2 x1 x1 y1 x1 y1 y1 x3 x3 y3 x3 y3 y3 MG = Majority Gate Z

(a) Generic CTMR: multi-layer voting

X1 X3 X2 MG MG MG Y1 Y3 Y2 MG MG MG Z1 Z3 Z2 MG MG MG MG = Majority Gate

(b) CTMR with triple voters: smaller granularity

Figure 2.2: Different CTMR configurations

gates. Hence, the multi-level CTMR with triple voters (Figure 2.2 (b)) may be used to apportion the system into optimally sized functional units or logic blocks to effectively allow the architecture to withstand more errors across the triplicated units [51].

2.3 Crossbar Architecture

In this section, we introduce crossbar architectures that are used for both logic and mem-ory design. The two-dimensional crossbar architecture is a general approach to integrate nanodevices, specifically molecular devices. Crossbar architectures are one of the more dominant nanoarchitectures because: (i) it involves only two sets of aligned and perpen-dicular wires with switches formed at the junction point of the wires, in other words, it is morphologically simple; (ii) it can be integrated with microscale addressing circuitry; and

(43)

(iii) it is defect-tolerant to a certain extent. One of the ways to enhance the inherent relia-bility of this architecture is by increasing the number of rows and columns of the crossbar, hence increasing the number of switches or junction points [117].

2.3.1 Crossbar-based Logic Architectures

Diode-based crossbar architectures are not sufficient for building complete systems since the output signals suffer from degradation, making cascading of many crossbar stages challenging. In this dissertation, we use a crossbar logic architecture that uses molecular latches [29] and a variant of the conventional molecular diode logic. Such an architecture (Figure 2.3) has been used in [29] to build complete systems. Logic values held in the molecular latches are encoded using impedance — an unusual characteristic of hysteric resistor-based latches.

The probability that a column wire can be used to form ak-input (krows) gate is(1−p)k, if each junction has an independent probability of failurep. The probability that at least one column out ofN columns will be able to implement thek-input gate isPgate(k, N) =

1−(1−(1−p)k₎N_{. In this paper, the crossbar is configured to implement a logic function}

as a combination of 2-input gates. Thus, the probability of a crossbar composed ofR 2 -input gates functioning correctly isPcircuit = (Pgate(2, N))R. Test circuits are configured

on such crossbars to deduce approximate failure probability of the crossbar (1−Pcircuit),

the individual junction failure probability (p), and then the probability of successfully configuring other logic functions is evaluated. In the next subsection we discuss some of the novel molecular crossbar-based memory architectures.

(44)

Figure 2.3: A crossbar with molecular latches built form hysteretic resistors

2.3.2 Crossbar-based Memory Architectures

The molecular memory in [29] was one of the first few crossbar-based memory architec-tures that was fabricated and tested. Each molecular junction in the crossbar architecture proposed in [29] is used as an active memory cell. This is done by applying voltage to the row and column nanowires (NWs) such that the molecular junctions exhibit hysteretic response and can hence store information. Figure 2.4 shows the rudimentary form of the memory. A4X4memory is realized in an8X8crossbar. The rest of the crossbar is con-figured as a 4X4 demultiplexer and a 4X4 multiplexer as shown in Figure 2.4. These de/multiplexers are used to address the different junctions of the crossbar memory. The demultiplexer controls the column NWs. This functionality is achieved by setting some of the crosspoints to low resistances in the4X4crossbar at the top right corner, and setting

(45)

Figure 2.4: A non-redundant molecular crossbar memory [29]

the voltagesVaandVbwith different input combinations such that any of the four vertical

NWs can be selected. In Figure 2.4, the input combination of1and 0selects the second vertical NW from the top right corner of the8X8crossbar.

Similarly,VcandVdare set to select a particular row NW, in this case the second row of the

memory is selected (as highlighted in Figure 2.4). The molecular junctions in the crossbar memory act as switches and resistors in series. When a certain voltage is applied to the selected row and column [29], the switch can be programmed to either be in closed or open state, hence the junction can either be in low or high resistance states, respectively. Note that while writing a logic value to a single memory bit, it has been observed that junctions on the same row or column may also get accidentally programmed. This is known as the half-select problem [29, 82]. The authors in [29] show that this problem may be solved by using a switching matrix that biases all the unselected wires to half of

(46)

Figure 2.5: Redundancy at Module Level (Banking) [30]

the voltage applied to the selected row or column. Once a particular molecular junction is programmed, the value it holds can be read out by applying a lower voltage [29]. The two peripheral circuits are used to apply and detect AC signals. AC voltages are applied to the crossbar so that there is a clear threshold for the current readout, hence, providing a clear distinction between logic low and high states.

Although [29] analyzes the electrical characteristics of the molecular junctions, the prob-lem of high defect density in molecular crossbars is not addressed in this work. One of the techniques for tackling defects is by introducing spare rows and columns. [30] analyzes this fault-tolerance technique and reports that for large defect rates the number of spares required increases rapidly with the size of the memory. Hence, such a fault-tolerance tech-nique is not viable for ultra-dense nanomemories. [30] also proposes a hierarchical mem-ory architecture commonly termed as banking. This memory architecture provides the functionality of a 2nX2n memory by dividing the mesh into 22(n−m) sub-arrays each of

(47)

size2mX2m. These crossbar based sub-modules are called banks and spare sub-modules are provided for fault-tolerance. Hence, a two tier redundancy is provided, i.e., at the basic crossbar architecture and module levels. Peripheral CMOS circuitry is used for providing power, address translation and support logic. Figure 2.5 shows this hybrid architecture. The module table and decoder selects the specific module by the module select lines and the address within that particular bank is selected by the module offset address line. The banks that are crossed in Figure 2.5 are defective and have to be replaced by any one of the spares.

2.4 Markov Random Fields

[7] proposed a probabilistic approach based on Markov Random Fields (MRFs) for ana-lyzing nanocircuits. An MRF is defined as a finite set of random variables,Λ={λ1,λ2,...

...,λk}. Each variableλi has a neighborhood, Ni, which has variables from{Λ-λi}. The

probability distribution of a given variable depends only on a typically small neighborhood of other variables that is called a clique. Due to the Hammersley-Clifford theorem [14],

P(λi|{Λ−λi}) = 1 Ze −1 KT P c∈CUc(λ) _(2.1)

The conditional probability in Equation 2.1 is the Gibbs distribution. Z is the normalizing constant and for a given nodei,Cis the set of cliques. Uc is the clique energy function [7]

and depends only on the neighborhood of the node whose energy state probability is being calculated.

(48)

The idea of this model of computation is to use such a Gibbs distribution-based technique to characterize the logic functionality of each gate and maximize the probability of being in valid energy configurations at the gate outputs. The logic functionality of each gate is represented by a logic compatibility function which is similar to a truth table. But instead of only considering the valid logic combinations, where validity means that for given logic values at the inputs, the corresponding logic value at the gate output is correct, the logic compatibility function considers the invalid logic operation scenarios as well (output value is invalid for a given logic combination at the inputs). Such a function is used to represent the logic or clique energy for each Boolean function and hence formulate energy-based transformation for the Boolean function.

Due to such a formulation of the logic compatibility function, this model of computa-tion implicitly considers structural defects and eliminates the need for defect mapping and defect avoidance. The reliability of Boolean networks can be evaluated by representing circuits as MRF-based logic gates, applying Belief Propagation to the output energy dis-tributions of each logic gate and computing the probabilities of the signals at the prime outputs. These output distributions are evaluated for specific probability distributions at the primary inputs and signal noise at the interconnects. Note that this model of compu-tation encodes signals over acontinuous range, i.e.,λi is a continuous random variable,

unlike conventional computational models where signals are encoded as bi-modal random variables (logic low or high).

Let us take a specific NAND gate example to walk through this MRF-based methodology. For a two input NAND gate, there are three nodes in the assumed MRF: the inputs x0

and x1, and the output x2. Figure 2.6 shows x0 and its neighborhood since the energy

state of this node depends on its neighboring nodes. The edges in Figure 2.6 depict the conditional probabilities with respect to the other input x1 and the output x2 (nodes in the

(49)

Figure 2.6: A NAND gate depicted as a MRF

same clique). The operation of the gate is designated by the logic compatibility function

z(x0,x1,x2) shown as a truth table in Table 2.1. z= 1 when x2 = (x0∧x1)0(valid logic

operations). Such a function takes all valid and invalid logic combinations into account so as to formulate the clique energy function for the NAND logic.

Entropy has been defined in many different yet equivalent ways. In the context of this dissertation, we define entropy as a measure of the disorder of a system. It is consid-ered to have high values when the system under consideration is very disordconsid-ered. The concept of entropy originated from classical thermodynamics but has found widespread application in dynamical systems theory, communication theory, information theory, etc. Considerable work has been done on statistical thermodynamics [114], which became the inspiration for adopting the word entropy in information theory. Let us consider a random variableX, which must take on one of the valuesx1, x2, ...., xnwith respective

probabil-itiesp1, p2, ...., pn. Then, the expected degree of uncertainty (randomness) in the system

(50)

i x0 x1 x2 z 0 0 0 1 1 1 0 0 0 0 2 0 1 1 1 3 0 1 0 0 4 1 0 1 1 5 1 0 0 0 6 1 1 0 1 7 1 1 1 0

Table 2.1: Logic compatibility function of a NAND gate

H(X) = −X

i

pilog(pi) (2.2)

This is information or algorithmic entropy [43] of the random variable X—the average amount of uncertainty associated with the random variableX.

It is also worth mentioning that Equation 2.1 relates logic and thermal energy, hence pro-viding a means of measuring the effect of thermal perturbation. The thermal energyKT

(K is the Boltzmann constant and T is the temperature in Kelvin) is normalized to the logic or clique energy. For example, KT = 0.1 can be interpreted as unit logic energy being ten times the thermal energy. The logic margins of nodes in a Boolean network de-crease at higher values ofKT and increase at lower values. The logic margin in this case is the difference between the probabilities of occurrence of a logic low and a logic high. Higher logic margins result in decrease in entropy or uncertainty in computation and hence better reliability of computation. If we consider high thermal perturbations, the reliability

(51)

R e g i o n

M a p p i n g U n i t

C o m p o n e n t

Figure 2.7: Three level design hierarchy from [68] c2004 IEEE

of computation is likely to be adversely affected, and if we can keep our systems far from these temperature values, the reliability is likely to improve. The model of computation in [7] thus considers such thermal perturbations and continuous signal noise as sources of errors.

2.5 Hierarchical Logic Mapping Methodology

[68] proposes a hierarchical approach to map logic onto reconfigurable nanofabrics. The authors show that this approach enhances the scalability of mapping large designs onto

(52)

Figure 2.8: The set of primitive flows from [68] c2004 IEEE

dense nanofabrics. This methodology is based on decomposing a nanofabric into a struc-tural hierarchy shown in Figure 2.7, decomposing designs into smaller logic functions and hierarchically mapping these. We have extended this methodology to allow hierarchical insertion of redundancy in molecular nanofabrics with delay and cost constraints. In this section, we discuss the original methodology and show in Chapter 5 how it is extended for analysis of redundancy-based molecular logic architectures.

The methodology in [68] proposes a three level design hierarchy. The lowest tier of the de-sign hierarchy are composed ofregionsthat comprise of eight processing elements (PEs) and the same number of switching elements (SEs). The PEs and SEs can either be cross-bars (Figure 2.3) or more complicated nanoBlocks [50]. Logic is configured on the PEs while the SEs are used as interconnects. These regions are the basic configurable blocks (structural primitives) and can be used collectively to form mapping units (MUs). Again, these MUs can be grouped together to form components. [68] also identifies7 primitive functional flows (behavioral primitives) shown in Figure 2.8 that can be directly mapped to regions. Data flow graphs (DFGs) of different designs are translated to DFGs composed of such flows, also called covers.

(53)

behav-ioral flow is configured on a single limited functionality region. These flows are instan-tiated on regions if and only if the probability of successful configuration is high. The probability of such successful configuration is estimated by applying Monte Carlo (MC) simulations. In this dissertation, we consider our molecular nanofabric models to have the same structural hierarchy, and represent designs that need to be mapped to these molecular nanofabrics as DFGs composed of these primitive functional flows (Chapter 5).

2.6 Probabilistic Model Checking

Probabilistic model checking(PMC) is an algorithmic procedure for ascertaining whether a given probabilistic system satisfies probabilistic specifications such asthe probability of logical correctness at the output of a logic network must be at least 0.9, given that each gate has a failure probability of 0.001. The nanosystem is usually modeled as a state transition system with probability values attached to the transitions. Examples of such transition systems are discrete time Markov chains (DTMCs), continuous time Markov chains (CTMCs) and Markov decision processes (MDPs). A probabilistic model checker applies algorithmic techniques [93] to analyze the state space and calculate performance measures of the probabilistic model.

The specifications or properties to be verified are specified typically in probabilistic exten-sions of temporal logic. The two most common temporal logics used for specifying prop-erties of probabilistic systems are probabilistic computation tree logic (PCTL) [22, 58] and continuous stochastic logic (CSL) [5, 8], both extensions of the logic CTL. PCTL is used to specify properties for DTMCs and MDPs and CSL is used for CTMCs. One common feature of the two logics is the probabilisticP operator. For example, the

(54)

for-mulaP≥1[♦terminate]states that the system will eventually terminate with probability1.

On the other hand, the formulaP≥0:95[¬repair U≤200terminate]asserts that the system

should terminate within 200 time steps without requiring any repairs with a probability

0.95or greater. In addition to theP operator, CSL also provides theS that helps in spec-ifying steady-state behavior. For instance,S <0.01[queue size = max]states that the probability that a queue is full is strictly less than0.01in the long run. Further properties can be analyzed by introducing the notion of costs (or, conversely, rewards). If each state of the probabilistic model is assigned a real-valued cost, one can compute properties such as the expected cost to reach specific states, the expected accumulated cost over some time period or the expected cost at a particular time instant. In this dissertation, we investigate the applicability of PMC techniques for analyzing nanoarchitectures by focusing on the PRISM [101] and SMART [112] PMC tools.

PRISM [74, 101] is a probabilistic model checker developed at the University of Birming-ham. It supports the analysis of three types of probabilistic models: DTMCs, CTMCs and MDPs. Note that we use DTMCs to model conventional digital circuits since DTMCs are suitable for such modeling [111]. The DTMC model of computation specifies the prob-ability of transitions between states such that the probabilities of performing a transition from any given state sums up to1. Our PMC-based nanoarchitecture modeling methodol-ogy work is based on (i) modeling architectures as templatized DTMCs with probabilistic assumptions about the occurrence of defects and faults at the gates and interconnects and (ii) using Markovian analysis techniques to evaluate different probabilistic properties of the architectures.

The PRISM description language is a high level language based on guarded commands. The basic components of the language are modulesandvariables. A probabilistic model is constructed as a number of modules which can interact with each other by means of

(55)

standard process algebraic operations [104]. A module contains a number of variables that represent its state. Its behavior is given by a set of guarded commands of the form:

[]<guard> → <command>;

The guard is a predicate over the variables of the model and the command describes a transition that the module can make if the guard is true (using primed variables to denote the next values of variables). If a transition is probabilistic, then the command is specified as:

The Stochastic Model checking Analyzer for Reliability and Timing (SMART) [112] is a tool similar to PRISM with a less expressive low-level input description language. Like PRISM, it can also be used to model and analyze complex probabilistic systems. As in PRISM, the nanoarchitectures can be modeled as DTMCs and different performance measures can be evaluated by analyzing the DTMC state space. The tool implements both numerical solution algorithms and discrete-event simulation techniques, and can integrate different high-level logical and stochastic modeling formalisms such as DTMCs, Petri Nets, etc. We observe that certain complex models are analyzed faster in SMART due to more flexibility in performing manual state space partitioning of the models.

(56)

2.7 Generalized Reliability Analysis Techniques

There are two distinct methods that can be used to analyze the reliability of nanoarchitec-tures. Thegeneralizedapproach entails the combinatorial modeling of generic defect- and fault-tolerant architectures without considering specific failure distributions of the inputs, gates and interconnects. The probability of correct functioning of each primary output is computed using combinatorial arguments with the assumption that each gate can fail inde-pendently. Thus, the reliability of a generic architecture is evaluated using stage-by-stage conditional probability computations. If such an architecture is a part of a larger fault-tolerant architecture, these output distributions can be combinatorially used to determine the reliability of the larger architecture.

For instance, [55] develops a generalized approach to compute the probability of the num-ber of erroneous outputs k being below a thresholdT, for a structurally redundant fault-tolerant architecture called NAND multiplexing (discussed in Chapter 3). The generalized probabilistic formulation is:

P(k≤T) = Z T −∞ 1 √ 2πpNz¯(1−z¯) e −1/2(k−Nz/¯ √Nz(1¯ −¯z))2 , (2.3)

whereN is the bundle size andz¯is the probability of each NAND output being in error. For more details on Equation 2.3, the readers should refer to [55]. It can be seen that specific values for N,z¯andT can be substituted in Equation 2.3 to determineP(k ≤ T). Thus, from the above example it can be seen how generalized approaches can be used to com-pute the reliability of architectures by: (i) framing combinatorial arguments for generic architectural configurations, (ii) substituting specific configuration values, and (iii) using

(57)

such reliability numbers to determine the reliability of larger fault-tolerant systems that contain the analyzed architecture. Equation 2.3 illustrates that generalized combinatorial equations can be complex.

2.8 Instance-based Reliability Analysis Techniques

The other method to analyze reliability of nanoarchitectures isinstance-basedin the sense that instead of forming complex generalized combinatorial arguments for architectures, instances of a particular architecture with specific redundancy factors R (R is the ratio of the redundant and non-redundant circuit sizes), primary input distributions, and gate and interconnect failure probabilities are used to develop probabilistic transition models of the architectures and reliability lower bounds are computed. In the recent past, we have developed such instance-based methodologies in [16, 17, 19] to analyze nanoarchitectures. We will discuss these methodologies in the next subsections.

Note that the main drawback of these instance-based methodologies is that many specific instances of a generic architecture need to be analyzed to predict the performance trends of that architecture, and computing the reliability of each such instance may be compu-tationally non-trivial. In this dissertation, we have addressed these issues and developed more scalable and computationally less intense instance-based methodologies to analyze the reliability of nanoarchitectures. These methodologies entail state space partitioning and hierarchical modeling techniques.

(58)

2.8.1 NANOLAB

NANOLAB [15, 17] is our MATLAB based reliability analysis tool that uses probabil-ity distribution of signal energy levels and entropy as reliabilprobabil-ity metrics. It automates the MRF-based methodology discussed briefly in Section 2.4. The existing automation framework consists of a library of functions and a Belief Propagation algorithm that can compute signal energy distributions and entropy at the primary/intermediate outputs and interconnects of combinationalcircuits, given specific signal noise spikes that affect the primary inputs and interconnects of the circuits. These functions work for any generic one-, two-, and three- input logic gates and can be extended to handle n-input logic gates. They take in as inputs the logic compatibility function [7, 27], the energy distributions at the gate inputs and signal noise. Energy distributions are returned as vectors by these functions and indicate the probability of the signals at the gate outputs being at different energy levels. These output probabilities are also computed over different values ofKT to analyze the effects of thermal perturbations. Structurally redundant combinational circuits can be analyzed by writing scripts that use these NANOLAB library functions.

2.8.2 NANOPRISM

NANOPRISM [16] is a tool built on the probabilistic model checker PRISM (Section 2.6) that applies Markovian techniques to automatically evaluate performance measures of nanoarchitectures in the presence of defects and transient faults. It consists of script-based libraries that generate probabilistic models representing specific defect-/fault-tolerant nanoar-chitectures for arbitrary logic networks. These libraries also support the modeling of re-dundancy insertion at different levels of granularity, such as the gate, logic block, logic

(59)

Figure 2.9: PTM (left) and ADD (right) representations of a NAND gate

function, and unit levels [16]. Hence, this tool automates the evaluation of redundancy vs. reliability vs. granularity trade-offs. Using this tool, we have illustrated some anomalous counter-intuitive design trade-offs that would not be possible to observe without significant and extensive theoretical analysis. Currently, the tool supports the modeling and analysis of TMR and CTMR fault-mitigation techniques.

2.8.3 Probabilistic Transfer Matrices

Probabilistic transfer matrices (PTMs) have been used in [95] to model defective logic gates using matrix representations, an idea that dates back to [79] . The representation of a NAND gate as a PTM with a failure probabilitypis shown in Figure 2.9.

The authors propose a framework in [72] based on PTMs that can be used for computing the output probabilities for combinational circuits. This involves the composition of gate PTMs in terms of the logic dependency of a circuit. Note that this composition technique takes into account signal dependencies between gates by considering the underlying joint

(60)

probabilities and also considers the effects of logical masking.

We have categorized this reliability technique as an instance-based approach that computes the reliability of a circuit for specific gate failure probabilities and input vectors. The au-thors in [72] use Algebraic Decision Diagrams (ADDs ) (Figure 2.9 to alleviate a potential memory bottleneck for PTMs representing large circuits. The ADD representations result in elimination of identical information and compression of the PTMs. PTM operations such as probability value extraction are performed on the ADDs.

The PTM representation encompasses all the input combinations of the NAND gate and hence is very similar to logic compatibility functions (Section 2.4) or DTMC models (Sec-tion 2.6) for NAND logic. A methodology called probabilistic gate models (PGMs) that is very similar to the PTM and our PMC based approaches is proposed in [57] The similari-ties between PTM, PGM and PMC methodologies are discussed in detail in Chapter 7.

2.8.4 SHARPE2000

SHARPE2000 (Symbolic Hierarchical Automated Reliability and Performance Evaluator) is a toolset that offers a wide range of modeling techniques to carry out fast and accurate instance-based reliability analysis of computing systems. The modeling techniques in-cluded in the tool are:

• combinatorial reliability models: reliability block diagrams, fault trees and reliabil-ity graphs.

(61)

• Markov and semi-Markov models.

• product-form queuing networks.

• generalized stochastic Petri nets.

This tool combines the expressiveness of Markov models and efficiency of combinato-rial models and has been used for for software and hardware reliability modeling [61]. SHARPE is also capable of supporting hybrid and hierarchical model composition which helps in modeling large systems. We believe that this toolset is a potential candidate for an-alyzing reliability of nanoarchitectures in the presence of defects and transient faults. We plan to build libraries on top of this modeling infrastructure to facilitate such reliability analysis.