Design of a Serialized Link for On-Chip Global Communication

(1)

by Amit Kedia

B . Tech., Indian Institute of Technology, Kharagpur, 2003

A T H E S I S S U B M I T T E D I N P A R T I A L F U L F I L L M E N T O F T H E R E Q U I R E M E N T S F O R T H E D E G R E E O F

M A S T E R O F A P P L I E D S C I E N C E

in

T H E F A C U L T Y O F G R A D U A T E S T U D I E S

(Electrical and Computer Engineering)

T H E U N I V E R S I T Y O F B R I T I S H C O L U M B I A October 2006

© Amit Kedia, 2006

(2)

Abstract

On-chip global communication is required for data and control transfers across various modules on the chip and determines the performance of the integrated circuit in current technology generation. A particularly difficult challenge at the present time is the routing complexity and congestion of parallel buses that span large distances on the chip.

This thesis presents the design o f an on-chip serialized link for replacing a parallel bus.

The serial channel uses a wave-pipelined signaling scheme which provides high data rates to compensate for the loss of parallelism. A serializer-deserializer ( S E R D E S ) transceiver is investigated to determine the required number of wires in the serial channel based on interconnect bandwidth and design overhead. Further, the robustness of the S E R D E S technique is studied in the presence of on-chip variations. A technique for reducing the energy consumption on the serial channel by lowering the average switching activity is also proposed.

Using our design technique, a 32-bit wide parallel bus operating at a frequency of l G h z in 90nm C M O S technology can be replaced by a serial channel consisting of 4 interconnects. Considering overhead due to the S E R D E S transceiver itself, the number of interconnects required in the serial channel increases to 8. Further, the intra-die variation forces the required number of interconnects to 16. This thesis also demonstrates that the energy of a serialized link can be reduced by up to 40% i f there is prior knowledge of the data to be transmitted on the bus.

(3)

List of Tables

Table 3.1: Electrical Parameters of a Repeated Interconnect Segment 22 Table 3.2: Analytical Model for Bandwidth of the On-Chip Serial Channel 24 Table 3.3: Analytical Model for Latency of the On-Chip Serial Channel 24 Table 3.4: Analytical Model for Average Energy per Bit of the On-Chip Serial Channel26

Table 3.5: Analytical Model for Area Occupied by the On-Chip Serial Channel 27

Table 3.6: Summary o f Results for the Two Design Points 31 Table 3.7: Summary o f the Design Technique using the T P A metric 39

Table 3.8: Design Variables for Our Design Technique 40 Table 3.9: Summary o f the Design Variables and Objectives for the Constant Width Case

43 Table 3.10: Summary o f Design Variables and Objectives for the Variable Width Case 46 Table 4.1: Specs, and Variables for 32Gbps Serial Channel and the Corresponding

S E R D E S specs 60 Table 4.2: Design Specs and Variables for 58Gbps Serial Channel and the Corresponding

S E R D E S 61 Table 4.3: Timing Parameters for the S E R D E S Circuit Components 61

Table 4.4: Timing Parameters for the S E R D E S Operation 62 Table 4.5: Specs, and Variables for 32Gbps Serial Channel and the Corresponding

S E R D E S Specs 64 Table 4.6: Specs, and Variables for 54Gbps Serial Channel and the Corresponding

S E R D E S Specs 64

(6)

Figure 1.1: Impact of Technology Scaling on Gate Delay and Interconnect Delay 2

Figure 1.2: On-Chip Serial Link 4 Figure 1.3: Practical Implementation of the On-Chip Serial Link 5

Figure 2.1: Repeater Insertion for Delay Reduction 9 Figure 2.2: Generic Block Diagram of an Off-Chip Serial link 13

Figure 2.3: Generic Block Diagram of the On-Chip Serial Link 15

Figure 2.4: Register Based Interconnect Pipelining 16 Figure 2.5: Wave Pipelining on an Interconnect 18 Figure 3.1: Block Diagram of the On-Chip Serial Link 19 Figure 3.2: Single Interconnect in the Serial Channel 22 Figure 3.3: Minimum Pulsewidth Permitted on the Wave-Pipelined Interconnect 23

Figure 3.4: Contour Plot of the Bandwidth of the Channel as a Function of m and n 29 Figure 3.5: Contour Plot of the Average Energy per Bit as a Function of m and n 30 Figure 3.6: Contour Plot of the Area Occupied by Repeaters as a Function of m and n .. 30 Figure 3.7: Contour Plot of the Latency of the Channel as a Function of m and n 31 Figure 3.8: Summary of the Choice of m and n for Desirable Performance in the Serial

Channel 32 Figure 3.9: Bandwidth vs. Interconnect Spacing 34

Figure 3.10: Bandwidth vs. Interconnect Width 35 Figure 3.11: Bandwidth versus Area Occupied by Interconnects in Metal 36

Figure 3.12: Problem Description 38 Figure 3.13: Design Flow for the Constant Width Case 42

Figure 3.14: Summary of the Design Variables (m, n and n^s) for jB)^=32Gbps 44

Figure 3.15: Design Flow for the Variable Width Case 45 Figure 4.1: Block Diagram for the On-Chip Serial Link (Our Design Aim) 47

Figure 4.2: 3-to-l Multiplexer Circuit 51

Figure 4.3: Delay Element 51 Figure 4.4: Serializer Circuit 53

(7)

Figure 4.6: Deserializer Circuit 56 Figure 4.7: Waveform for the Deserializer Operation from H S P I C E 56

Figure 4.8: Design Flow for the On-Chip Serial Link 59 Figure 4.9: Block diagram for Example On-Chip Serial Link 62

Figure 4.10: Timing Waveform for the Example On-Chip Serial Link 63

Figure 4.11: Spatial Variation Model for Intra-Die Variation 66

Figure 4.12: Operation of Deserializer for L i n k l 68 Figure 4.13: Operation of the Deserializer for Link2 70 Figure 4.14: M a x i m u m Bits in a Serial Byte as a Function of Variation Factor, k. 72

Figure 4.15: Simulation Setup in H S P I C E for L i n k l 73 Figure 4.16: Link 1, Timing Waveform at the Deserializer 74 Figure 4.17: Link 2, Timing Waveform at the Deserializer 75 Figure 5.1: Switching Activity Increased when Converting from Parallel to Serial 76

Figure 5.2: Average Switching on the Serial Interconnect Depends on Bit Ordering 79 Figure 5.3: Bit Ordering Depends on Statistical Characteristics of Data Traces 80

Figure 5.4: Graph for the Example of Figure 5.2 (a) 82 Figure 5.5: Percentage Reduction in Switching Activity using Bit Ordering 84

Figure 5.6: Comparison of the Bit Ordering Technique with S I L E N T (Instruction

Address Bus) 85 Figure 5.7: Comparison of the Bit Ordering Technique with S I L E N T (Instruction Bus) 85

(8)

Acknowledgement

The series of chapters that I present here is the result of several months of labor that would have been extremely difficult without the invaluable help that came to me from several quarters. The endeavor was made possible only because of the constant support and unflinching encouragement I received from Dr. Resve Saleh who I am very fortunate to have as my supervisor. He was always forthcoming in his critical comments and was ever willing to delve into the intricacies of my work so that I could improve the quality of my study. His guidance throughout this task was absolutely indispensable for its success.

I am also thankful to Dr. Steve Wilton and Dr. Shahriar Mirabbasi for their valuable feedback.

I would also like to extend special thanks to my colleagues Dr. Partha Pande, Dipanjan, Karim, Jeff, XiongFei, Eddy and Laxminarayan whose insights during discussions over various technical aspects were very useful. I also express my sincere gratitude to Melody and her parents who made my stay at Vancouver very pleasant and comfortable. I am thankful to my other colleagues in the System-on-Chip research group who supported me through the project in various ways. M y other friends - Chhaya, Manisha and Shirley have also played a vital role in keeping my spirits high while I was striving to accomplish this task well. I am indeed deeply indebted to my family that is one o f my chief enabling factors. They impart to me a strong sense o f motivation and dedication to my work.

Finally, I gratefully acknowledge the financial support provided by Natural Sciences and Engineering Research Council of Canada and PMC-Sierra.

Amit Kedia University of British Columbia

October 2006

(9)

Chapter 1 Introduction

This thesis presents the design and analysis of a serialized link for on-chip global communication. This introductory chapter begins by explaining the problems with the parallel buses used for on-chip global communication and proposes the serial link as a potential solution. The objectives of the thesis are presented and the thesis contributions are summarized. Finally, the organization of the chapters in the rest of the thesis is provided.

1.1 O n - C h i p G l o b a l C o m m u n i c a t i o n

On-chip global communication is required for data and control transfers across various modules on the chip and determines the performance of integrated circuits in current technology generations [1]. This is the opposite situation of many years ago when the performance was completely determined by the quality of the transistors.

The requirements of on-chip global communication are increasingly becoming more difficult to achieve as deep submicron effects continue to make global interconnects¹ slower than logic as seen from Figure 1.1 [1]. The graph shows the relative delays of interconnects (3 cases) and logic gates. A s the computational speed of a logic block increases for each

1 On-chip global interconnects are responsible for communication, clock distribution and power distribution. In this work we consider only the communication aspect o f interconnects.

(10)

i n t r o d u c e d t e c h n o l o g y n o d e , t h e c o m m u n i c a t i o n b e t w e e n t w o s u c h b l o c k s , w h e n s e p a r a t e d b y a l o n g d i s t a n c e , s u f f e r s d u e t o t h e d e g r a d e d e l e c t r i c a l p r o p e r t i e s o f the g l o b a l i n t e r c o n n e c t s . E v e n w i t h r e p e a t e r i n s e r t i o n , t h e r e is a s i g n i f i c a n t d i v e r g e n c e i n t h e l o g i c a n d i n t e r c o n n e c t g r a p h s .

100

0.11 L

250 180 130 90 65 45 32

Process Technology Node (nm)

Figure 1.1: Impact of Technology Scaling on Gate Delay and Interconnect Delay

F o r t h i s r e a s o n , s i g n i f i c a n t e f f o r t s a r e b e i n g d i r e c t e d t o w a r d s t h e a n a l y s i s a n d d e s i g n o f o n - c h i p g l o b a l i n t e r c o n n e c t s [2]. I n f a c t , it c a n b e s a i d that t h e d e s i g n o f g l o b a l i n t e r c o n n e c t s is n o w as m u c h a c i r c u i t d e s i g n p r o b l e m as it is a r o u t i n g p r o b l e m .

1.1.1 On-Chip Parallel Buses: Issues

T h e p h y s i c a l c h a n n e l s f o r o n - c h i p g l o b a l c o m m u n i c a t i o n s a r e o f t e n i m p l e m e n t e d as b u s e s .

A b u s c o n s i s t s o f a set o f w i r e s , o n e f o r e a c h b i t to b e t r a n s m i t t e d i n p a r a l l e l . D r i v e r s a n d

r e c e i v e r s a r e p r o v i d e d at t w o e n d s o f e a c h w i r e . I n a d d i t i o n , i n t e r m e d i a t e r e p e a t e r s a n d

(11)

registers are inserted along the wires depending upon the latency and throughput requirements.

With the demand for increased number of on-chip modules in the future system-on-chip (SoC) designs, the number of buses required for connecting these modules is increasing [3- 7]. Further, the average bus length is also increasing so the buses now span larger distances on the chip. A s a consequence, routing of on-chip buses is becoming complex, primarily due to increased routing congestion.

Parallel buses can occupy large areas, comparable to processing elements, and thus pose challenges for wiring-constrained SoC designs. Besides wiring complexity and increased area at the interconnect level, parallel buses also occupy large area even at the substrate level [3]. This is because of the large number and size of drivers, repeaters and registers inserted along the interconnect.

Other problems with parallel buses are high energy consumption [5], bandwidth limitations and crosstalk-induced delay and noise [3]. Parallel buses require a large number of line drivers and repeaters leading to increased energy dissipation. Skew and Jitter on the parallel bus makes receiver synchronization more difficult and thus leads to bandwidth limitations.

Crosstalk between adjacent lines in a parallel bus causes data-dependent signal delay further limiting transmission bandwidth. Moreover, crosstalk-induced noise of parallel bus poses issues with regards to reliable communication.

(12)

1.1.2 On-Chip Serial Link: Potential Solution

A promising solution to the problem of parallel buses is to replace it by an on-chip serial link [3-7] as shown in Figure 1.2.

rip bits

S i n g l e Interconnect channel

H I—

Transmitter

Y

S e r i a l C h a n n e l

Figure 1.2: On-Chip Serial Link

np bits

R e c e i v e r

The rip-bit parallel word to be transmitted across a link is serialized in some manner in the transmitter. This serial bit stream is driven onto the communication link and upon reaching the receiver is deserialized to reproduce the n^p-bit parallel word. This long-haul serial link can overcome various problems of parallel buses, especially those pertaining to wiring complexity and routability.

Further, serial links are more area efficient at the substrate level because o f the reduction in number o f line drivers and repeaters due to reduction in the number of interconnects in the communication channel.

1 . 2 D e s i g n I s s u e s f o r O n - C h i p S e r i a l L i n k : R e s e a r c h O b j e c t i v e

Although a serial link promises several benefits, it is more complex to design than a parallel bus. Hence, the scope of the proposed research is to address the various challenges or issues that arise when shifting from a parallel bus to serial link for on-chip global communication.

(13)

For simplicity, we assume that the sender and receiver are operating at the same frequency, although it is not true in general.

To serve as a direct replacement, a single line serial link must provide the same bandwidth as the parallel bus. But i f a single interconnect is insufficient, then multiple interconnects w i l l be needed. Therefore, any practical implementation of on-chip serial link will have the architecture as shown in Figure 1.3, where n^p bits are serialized and transmitted on n^s lines in the serial channel.

Channel Length = 1

(Time Period = Tclk )

1

*—' D.

IE c p.

Number of lines = n^t

O-O-

i > - £ >

PARALLEL LINK

SERIALIZED LINK

Buffers

C I k » — (Time Period = T^c]k )

\ 1 / 1

1 1

1

5 f o

a.

15 a a

—m Clk (Time Period = Tclk )

f O

B a

—m Clk (Time Period = T^cl k )

Figure 1.3: Practical Implementation of the On-Chip Serial Link

(14)

This thesis addresses the design of different components of an on-chip serialized link, i.e., the serial channel and the SerDes (Serializer/DeSerializer) as shown in Figure 1.3. The key considerations with the design o f each component in general are performance, design-ease, energy consumption, silicon efficiency and robustness. The overall goal is to minimize n^s.

The serial channel must provide same bandwidth as its parallel counterpart while simultaneously reducing the wiring area and the number of interconnects in the communication channel. High throughput-signaling schemes need to be investigated to compensate for the loss of data rate due to serialization. Besides the wiring area minimization, various other performance metrics for the serial channel, e.g., energy consumption and repeater density² need to be optimized or traded off.

The next challenge is the design of circuits required to interface the parallel input/output to the serial channel. The corresponding circuits perform serialization and de-serialization functions. Very few S E R D E S architectures for on-chip serial links [6-10] have been proposed so far and further investigation from the viewpoint of suitability and robustness of various designs is required.

Another concern is energy efficiency of on-chip serial link. A s already described in [11], serializing a parallel bus may lead to increase in the average switching activity and, hence, increase energy consumption as compared to the parallel bus. Techniques for reducing the average switching activity on the serial link need to be explored.

2 Repeater density relates to number of VIAS inserted on the channel.

(15)

1 . 3 T h e s i s C o n t r i b u t i o n s

The contributions of this thesis are as follows:

• A n analysis of interconnect with wave-pipelining is carried out to determine the number of interconnects needed to achieve a target bandwidth.

• A n existing S E R D E S architecture is analyzed from the timing perspective to demonstrate how signaling overhead translates to higher bandwidth requirements. The circuit is also analyzed for robustness in the presence of on-chip variations.

• A novel bit ordering technique for reducing the energy consumption on the serial link by lowering the average switching activity is proposed. The technique is based on bit- ordering and it relies on the complete statistical information of the data being transmitted on the serial link.

1 . 4 O r g a n i z a t i o n o f t h e T h e s i s

This thesis is structured as follows. The second chapter provides a general introduction to the field and the background material for the thesis. The design of an on-chip serialized link is divided into two parts, with Chapter 3 covering the design of the serialized channel.

Chapter 4 discusses the S E R D E S transceiver scheme required for interfacing to the serialized channel with parallel input/output. In Chapter 5, a novel technique for reducing the switching activity in the serial link is presented. Chapter 6 summarizes the thesis and describes directions for future work.

(16)

Chapter 2 Background

Advancements in System-on-Chip (SoC) methodologies have led to the evolution of on-chip global communication architectures [12]. U p until now, on-chip global communication has been addressed by ad-hoc direct interconnections (point-to-point links) or shared bus structures (single or hierarchical). But with the rising number of on-chip modules in future system-on-chip (SoC) designs and the increase in performance demands, the use of shared bus architectures is quickly reaching its limits. To address this issue, packet-switched network-on-chip (NoC) approaches have been proposed [13].

Whatever be the communication architecture, the actual transportation of data at the physical layer in each is performed by using an ensemble o f on-chip global interconnects collectively known as a parallel bus. In spite of the significant developments at the architectural level, the physical level is posing challenges, especially with technology scaling and the deep- submicron effects. The main problem is the routing complexity of large buses. If this problem could be resolved, then power dissipation and process variation issues would become a priority. A n y alternative approaches must satisfy the throughput and latency requirements, while providing proper synchronization of the data from the sender to the receiver. These issues are elaborated in sections to follow.

(17)

2.1 I s s u e s w i t h O n - C h i p G l o b a l I n t e r c o n n e c t s

2.1.1 Routing Complexity

The average length of interconnection is increasing because the various modules span large distances on the chip [3, 5, 14]. A s a consequence, the available wiring resources for global interconnect is decreasing and routing is becoming a major issue. Buses compete for global resources with the clock, power grid and other global signals. Moreover, global interconnect not only occupies routing area, it also consumes silicon area due to the large number and sizes of line drivers, repeaters and registers inserted on them. Further, the use of wider metal pitches, protective shield to prevent coupling and additional wires for differential signaling employed for boosting performance o f global communication are area-hungry.

2.1.2 Performance (Throughput vs. Latency)

The trend in global interconnects, as technology scales is that the resistance-capacitance (RC) delay of interconnects is getting worse. The traditional method of dealing with this

problem is repeater insertion [15] as shown in Figure 2.1.

Ro/m mC0

Input

mCo out

m Repeater

1/n

rfl/n), c(l/n)

Interconnect segment 1st repeated segment

Output

Repeater 2nd repeated segment nth repeated segment

Figure 2.1: Repeater Insertion for Delay Reduction

(18)

The interconnect of length / is divided into n equal length segments and each segment is driven by a repeater of size m times the minimum-sized inverter.

The Bakoglu optimal repeater insertion technique [15] minimizes the overall wire delay or latency. It does not try to maximize throughput since there is an assumption that only one data wave is traveling through the interconnect at a time. The optimal parameters for repeater insertion are:

where r and c are the resistance and capacitance of the interconnect per unit length. For the minimum-sized repeater, Ro is the equivalent resistance and Cojⁿ and Co_^out are the input and output capacitance, respectively.

Several variations of the repeater insertion technique, analyzed from various design approaches, can be found in the literature. A l l of them attempt to reduce interconnect latency.

Another method of reducing delay is to increase the cross-sectional area of the global interconnect by increasing their pitch, but this is not good since it w i l l lead to a significant increase in area occupied by the interconnects and also increased power dissipation due to increased capacitances.

(19)

If throughput is a primary design goal, then alternative methods must be applied to the problem. This has motivated our use of interconnect wave pipelining and we will need to revisit the repeater insertion problem. Further details are given in Section 2.3.

2.1.3 Power Dissipation (Energy)

With a large (and growing) number o f electronic systems being designed with battery considerations in mind, minimizing the on-chip global communication energy on interconnects becomes crucial, which necessitates the use of energy-efficient global communication techniques [16].

The energy consumed by the interconnect for on-chip global communication account for a significant fraction of the total energy consumed in an integrated circuit, and this fraction is expected to grow as technology scales further. The reason can be attributed to the increased interconnect capacitance due to the lateral and fringing capacitance as a result of interconnects getting closer. Further, repeaters and flip-flops inserted on the global interconnect for latency and throughput improvement also consume a lot of energy.

Circuit techniques such as low-swing signaling and bit encoding can be used to address the issue of increased energy consumption. Since the switching activity determines the power dissipation, some methods attempt to reduce the number of transitions on the bus [17].

Techniques such as adaptive supply voltage links are employed at the system level for energy-efficient communication on-chip global communication.

(20)

2.1.4 On-Chip Variation

Interconnect reliability is another major challenge for on-chip global communication and includes both signaling and manufacturing reliability [18]. Electrical noise due to crosstalk, electromagnetic interference, and radiation-induced charge injection can produce data errors.

Further, there will be perturbations in the electrical characteristics of the interconnects from on-chip variations [19] due to process, supply voltage and temperature effects.

2 . 2 O f f - C h i p S e r i a l L i n k v s . O n - C h i p S e r i a l L i n k

Off-chip serial links have been very popular for getting data on and off chips and boards.

Wire speeds range from 1 to 12 Gb/s and payloads from 0.8 to lOGbps. There are fewer pins needed on the chip, reduced simultaneous switching output (SSO) problems and lower cost.

As a result, the high-speed serial link is the clear choice for off-chip communication. The key mechanism delivering the high data rate operation in off-chip serial links is self- clocking [20].

The problem with the off-chip serial link design is design complexity [21]. The S E R D E S transceiver design is complicated because of electrical characteristics of the off-chip channel and partially because o f the self-clocking feature. A generic functional diagram of an off- chip serial link [20] is shown in Figure 2.2. Further details of the functioning of the off-chip serial link can be found in [20].

(21)

Transmit Side

Parallel Interface

Synchronizer (FIFO)

Encoding (8B/10B)

i

Multiplexer

De-emphasis i

Phase Locked Loop

i

Ul

Transmit Driver

SUMMARY

- • ^ • • Not needed for on-chip serial link SERDES Transceiver

Serial Channel

(Twisted pair, PCB trace:OFF-CHTP)

Ul

>

'53 o

Receive Side

Parallel Interface

Synchronizer (FIFO)

Decoding (8B/10B)

Demultiplexer

Clock and Data Recovery

ii

i

Receive Equalization

J

Matched Termination Recieve Driver

Figure 2.2: Generic Block Diagram of an Off-Chip Serial link

In order to incorporate the self-clocking property in the serial link, 8b/10B encoding is employed and a complicated clock and data recovery ( C D R ) circuit is needed. The multiplexing and demultiplexing operations are controlled using a high-speed clock generated using a complex phase-locked loop ( P L L ) circuit.

The off-chip channel includes backplanes traces ( P C B lines, twisted pairs, cables, etc.) and they behave as R L C transmission lines. Hence, proper impedance matching at the receiver is required to avoid interference due to reflection. Pre-emphasis and receiver equalization are

(22)

needed to reduce signal distortion caused by the low-frequency response of transmission lines. FIFO synchronizers are used both at the transmitter and receiver to synchronize the serial link with the parallel interface.

On the other hand, on-chip serial links are rather simple and do not require many of the functional blocks shown in Figure 2.2. Because of higher resistivity and relatively short length (as compared to off-chip wires), the inductances of on-chip interconnects are low enough to be safely ignored [22]. This eliminates the need for exactly matched termination (receive driver) because large reflections are absent due to low or no inductance. Further, skin effect is not a major issue for on-chip interconnects at this point [23] and thus the need for pre-emphasis and receiver equalization for on-chip serial links is eliminated.

The self-clocking property is not used in some on-chip serial links and hence 8b/10B encoding and C D R circuits are not required. Instead of using a complex P L L for high-speed clock generation to control the multiplexing and demultiplexing operations, locally generated high-speed clocks using simple ring oscillators are preferred [7]. Some clockless techniques for multiplexing and demultiplexing are also reported in the literature in [8, 9].

Figure 2.3 shows a generic block diagram obtained after eliminating various function present in the off-chip serial link. FIFO synchronizers are required for the case of on-chip serial links, but their design can be quite complicated and beyond the scope of our work.

(23)

Parallel

Interface FIFO Mux

Transmit Driver Simple Recieve;

Buffer

Serial Channel

( t a t a w H n B c f c O N - C H I P )

Demux FIFO Parallel Interface

Transmitter (SER) Receiver (DES)

Figure 2.3: Generic Block Diagram of the On-Chip Serial Link

In general, the design o f a S E R D E S transceiver for on-chip serial link is expected to be much simpler. But the on-chip variations are important and have to be kept in mind while designing such links. In order to make it robust against variations, it will introduce complexity in the design, as shown in this thesis.

2 . 3 H i g h - T h r o u g h p u t S i g n a l i n g T e c h n i q u e s

Data rates are greatly reduced due to serialization and serial links can be used only i f they can provide a bandwidth equal to its parallel counterpart. In addition, the signaling technique should also have low latency and low energy/power consumption.

Pipelining can be used to enhance throughput performance o f global interconnects. The most common and traditional technique for enhancing throughput is to insert registers to break a long wire into short-pipelined stages. This technique is illustrated in Figure 2.4 where a wire is divided into smaller stages and registers are inserted in between to enable high throughput relative to the case when no pipelining is used.

(24)

1st stage 2nd stage nth stage

Input

Register D Q

A Interconnect segment

Register D Q

A

Output

Clk

Figure 2.4: Register Based Interconnect Pipelining

But the limitation of register-based pipelining is that the speed up obtained due to register

insertion is not linear because registers have their own delay [24]. Further, registers are large

in size because not only do they have to store data, they must also drive interconnect

sections, and thus consume a lot of power and occupy large area. The most important

problem with register-based pipelining is that an additional clock signal has to be routed

along the length of the interconnect. This will create more routing problems and further

increase power consumption. There will also be the usual synchronization issues between

the clock and the data. However, register-based pipelining is preferable when throughput is a

priority over latency, power and area. For designs, where high throughput, low latency, low

power and low area occupied are given similar importance, traditional pipelining based on

register insertion is not favorable.

Wave pipelining [25] provides a potential alternative to handle the above problems because,

ideally, it has no registers and hence the associated overhead can be avoided. It significantly

reduces clock loading (no clock is needed) while retaining the external functionality and

timing of a synchronous circuit. Although we are interested in wave pipelining for

(25)

throughput enhancement over global interconnects, it is important to first understand wave pipelining in general and then extend the concept to global interconnects.

Wave pipelining was originally developed as a logic circuit technique in 1960's. It advocates the application of a new signal to a combinational block before the previous input reaches its intended destination storage elements. Multiple waveforms corresponding to successive evaluation co-exist concurrently within the same combinational logic block, providing multiple computational waves. Data are pipelined into circuits without using registers, because wires and transistors not only transmit data and compute but also hold them for a period of time. Wave pipelining theory has been successfully adopted in regular digital systems, such as D R A M , S R A M , and digital multipliers, to achieve high-speed performance. Although it appears simple, the circuits with wave pipelining are difficult to design because of the multi-path problem; usually multiple paths exists between input and output in a circuit. For wave pipelining to work, a small delay difference is needed among all the paths to avoid signal racing. Moreover multiple data paths also exist from input to some internal nodes, and these paths must also be balanced. Large and complex circuits make balancing multi-path delay difficult, imposing structural and physical constraints, which are too stringent to be practical [25].

In case of interconnect, wave pipelining is relatively simple because on-chip global interconnects are fairly simple circuits. They usually consist of a chain of wires and repeaters. Since there is only one path on each interconnection, no delay balancing is needed for internal nodes. The use of such an approach has been proposed in [21, 24, 26-28]. A s

(26)

shown in Figure 2.5, in wave pipelining, the signaling throughput is extended by allowing multiple data pulses to travel across the length of the interconnect at the same time.

Multiple waves travelling simultaneously across the length of the interconnect

Wave 7 Wave 6 Wave 5 Wave 4 Wave 3 Wave 2 Wave 1

Figure 2.5: Wave Pipelining on an Interconnect

Although wave pipelining on interconnects is relatively simple to design, as compared to logic circuits, the inevitable existence of process variation, supply voltage noise and thermal uncertainty imposes certain timing constraints. Another problem with wave pipelining is that since no storage elements are used to separate multiple waves, they are highly-susceptible to noise injection and hence are prone to soft errors. A l l the above problems can be mitigated or reduced by careful design and thus the shortcomings are not limiting in comparison to the benefits that can be derived from this approach.

(27)

Chapter 3 On-Chip Serial Channel Design

In this Chapter, the key considerations when designing a high-bandwidth, minimal-area on-

chip serial channel are presented. Figure 3.1 shows the practical implementation of the

serial-link replacing a parallel bus where n^p bits are converted to n^s bits and transmitted on n^s

interconnects in the channel.

1

a.

&.

c

I

i-.

•c

f0>

• i i >

i

Channel Length = 1

C l k » — (Time Period = 1,^)

Dri+ers

\

Number of lines = n, Buffers

•c

u

v>

u

Q

f o

>l

.ts a.

XI

I Clk

Our Design aim in this chapter Crime Period = T^cik )

Figure 3.1: Block Diagram of the On-Chip Serial Link

The serial channel must provide bandwidth equal to or greater then the bandwidth of its

parallel counterpart, i.e.,BW^x > BW^p. The bandwidth provided by the parallel bus is given

by:

BW^p=n^px(\/T^ak) (3.1)

where T^ctk is the clock period.

(28)

The signaling scheme in the channel is called throughput-centric wave-pipelined interconnect based on repeater insertion proposed in [28]. The serial channel is designed in a Metal 5 layer of 90nm C M O S technology and is assumed to span a length of 10mm across the chip. Analytical models for various performance metrics o f the on-chip serial channel based on the signaling technique are developed. The primary performance requirement of the channel (i.e., bandwidth) is a function of interconnect geometry as well as the number of repeaters and size of each repeater inserted on the interconnect. These factors are characterized and provide a framework for our modified design technique for the serial channel.

3 . 1 T h r o u g h p u t - C e n t r i c W a v e - P i p e l i n e d I n t e r c o n n e c t

When an on-chip parallel bus is replaced by a serial channel, wave pipelining can be used to compensate for the effective loss of parallelism. In wave pipelining, the maximum throughput can be extended by allowing multiple data waves to travel across the length of the interconnect at the same time. This signaling scheme can provide high bandwidth while reducing the wiring area and thus routing congestion, which is the central theme of this research. The advantage o f using this scheme as opposed to a more traditional register-based pipelining technique has already been discussed in Chapter 2. The main idea here is that the traditional optimal repeaters insertion [15] is not optimal from the throughput perspective. In [28] the author have carried out detailed analyses of the proposed scheme. Their work includes the derivation of a closed-form analytical expression for the maximum throughput of wave-pipelined R C interconnects. They have further studied the effect of various design

(29)

parameters on the throughput and other performance metrics of this scheme. Further, they have proposed various design techniques for optimizing different performance objectives targeting a number o f applications.

Our goal in this chapter is to leverage this scheme for designing a high-bandwidth, minimal- area serial channel for on-chip global communication.

3 . 2 A n a l y t i c a l M o d e l s f o r V a r i o u s P e r f o r m a n c e m e t r i c s

To facilitate design and analysis, the analytical models for various performance metrics o f the wave-pipelined scheme are developed in this section. The various performance metrics are the bandwidth, latency, average energy per bit and the area occupied by the channel.

Figure 3.2 shows one interconnect in the channel of length / and of width W. A s shown in the figure each interconnect is shielded from its neighbor by placing it between two co- planar supply rails of minimum width (W^min). Shielding provides excellent return paths for the high-frequency current and eliminates delay variations due to coupling. The wires in the channel (including the interconnect and the shield wires) are symmetrically spaced, with the spacing given by S. Orthogonal routing is assumed in the meal layer above and below the serial channel. The resistance and capacitance per unit length of the each interconnect in the channel are given by r and c, respectively. The total capacitance includes the area, lateral and fringing components. We will assume that the rise time of the drivers and the length of the interconnects are such that the inductance can be safely ignored [22].

(30)

length = 1

Input

Inpujt

m Repeater

r •! 1/n

ct _ o u t ^Rs e g Cgeg

U L

InteTconnect _] segment

Shield

Interconnect Shield

1/n

m

Interconnect segment

1st repeated segment 2nd repeated segment

1/n

Interconnect:

segment

Output

m Output

Repeater nth repeated segment

Figure 3.2: Single Interconnect in the Serial Channel

As shown in Figure 3.2, each interconnect is divided into n equal size segments, each of

which is driven by a repeater of size m times a minimum-sized inverter. The parameters of a

minimum size inverter used for timing analysis are on-resistance (Ro), input capacitance

(C⁰jn) and output capacitance ( C^{0 0}„ , ) . The number of repeaters inserted on the interconnect

channel is the same as the numbers of interconnect segments (n) and, hence, they can be

used interchangeably. The description of the electrical parameters of each repeated

interconnect segment is given in Table 3.1.

Table 3.1: Electrical Parameters of a Repeated Interconnect Segment

Parameters Expression On-resistance of the repeater R,=R⁰/m

Repeater Input Capacitance

C, J n = m C 0 j n

Repeater Output Capacitance _{Ct_}_nu_,_{= m}_C_{o o u l} Interconnect segment resistance

Interconnect segment capacitance C„^g=cl/n

(31)

All the interconnects in the channel are symmetric and have similar performances

characteristics.

3.2.1 A n a l y t i c a l M o d e l for B a n d w i d t h

The bandwidth of a channel is given by the product of the maximum throughput of an

interconnect times the number of interconnects in the channel. The maximum throughput is the determined by the minimum pulsewidth allowed, PW^min, on the interconnect. PW^mi„

depends on the degree of wave pipelining permitted on the interconnect and is limited to

ensure high-quality binary transmission. As shown in Figure 3.3, the minimum width of the input pulses must be such that the output makes at least 10% to 90% of the full swing [28].

Interconnect With Repeaters Inserted

Input

rv

Output

V(High)t

PW, _min

PW _{m m}

V(Low)-

V(High>-

V(Low)-

Input corresponding Output (Worst Case)

Figure 3.3: Minimum Pulsewidth Permitted on the Wave-Pipelined Interconnect

The analytical model for the bandwidth of the on-chip serial channel is summarized in Table

3.2 from the detailed derivation of the analytical throughput model in [28]. A distributed RC

model for the interconnect segment has been used to estimate the wire transients.

(32)

Table 3.2: Analytical Model for Bandwidth of the On-Chip Serial Channel

Parameters Expression

Bandwidth BW =n xT

s s max

Maximum throughput (one

interconnect only) ^{r -}^max ¹^p ^w_min

M i n i m u m pulsewidth ( k \

Sakurai time constant [29] °R^Cseg = = R,C, + R,C„^g + R„^gC, + 0AR^segC^xeg

Sakurai coefficient [29] A;, =1.01 Sakurai coefficient [29] A;, =1.01

K

Voltage swing v, at the output of the first repeated segment

vⁿ = 0.9;

For (/ = n;i< 1; / — ) v,_, = l/(2 - v,);

3.2.2 A n a l y t i c a l M o d e l for L a t e n c y

Latency is the time difference between when a signal enters the channel to when it exits channel. In this case, the signal delay is due to n segments o f wires and repeaters. The analytical model for latency of an on-chip serial channel shown in Table 3.3 is based on the traditional Elmore delay model [22]. Instead of a distributed R C model for the wire segments, a lumped R C pi-model has been assumed for each repeated segment.

Table 3.3: Analytical Model for Latency of the On-Chip Serial Channel

Parameter Expression

Latency Latency = n ~R, (C^lin + C^loul) + R,C^seg + R^segC^l0Ul + 0.5R^segC^seg

(33)

The accuracy of this model been raised as an issue in [30, 31] and better models have been proposed but, for our purposes, the model is quite suitable because of its simplicity and acceptable accuracy.

3.2.3 Analytical Model for Average Energy per Bit

Energy per bit is the energy consumed in transmitting a single bit from the source to the destination on the interconnect. A toggle will force each repeater to switch once but each power event requires 2 toggles. Therefore, the power will be pattern-dependent. Average energy per bit is the total energy on an interconnect divided by the number o f bits transmitted on the channel. This is multiplied by the number of interconnects n^s. The details are described in Table 3.4. The expressions for various components of the power consumption (i.e., switching, leakage and short-circuit) are also presented based on [32].

A number of key design parameters can be tuned to reduce E^h. Reducing the size of each

repeater and the number of repeaters will reduce all three components of power. However this will reduce bandwidth and increase latency. Therefore, better techniques are required to

reduce E^b without compromising BW or latency (or both). The switching activity factor

can also be minimized to reduce the switching power and short-circuit power. This is usually done at the architectural level by encoding the data to be transmitted on the channel. A novel technique for reducing the switching for on-chip serial channel is described in Chapter 5.

(34)

Table 3.4: Analytical Model for Average Energy per Bit of the On-Chip Serial Channel

Parameter Expression

Average Energy per bit ^ = ^ / ( / * ) ( « , ) Average power consumption on the

channel p - p +p +p

total switching short—circuit leakage

Average Switching Power ^{P^}^g^=oc((n^s^)(m)(n){C⁰^{_,„ +C}⁰^_^ou^{,) + lc)v}^DD2^f^clk Average Short Circuit Power ^short-circuit^{= a} (("i ) (^m) (") ^ short-circuit ) ^DDIclk

Average Leakage Power ^ = ( i - « ) ( ( « J ( ' « ) ( « ) V ^ ) Short Circuit Current for a minimum sized

inverter³ ^short-circuit

Leakage current of a minimum sized

inverter

!*

Clock frequency _folk

Switching Factor (fraction of repeaters that

are switched during an average clock cycle) ^a

3.2.4 Analytical Model for Area of the Channel

Most of the traditional latency-centric repeater insertion techniques to reduce area [33]

provide models for area of repeaters occupied on the silicon substrate only. The central

theme of our work is ON reducing the wiring area (area occupied by channel in metal layer)

to reduce the routing congestion. Hence, we formulate expressions for the area occupied by

the metal and repeaters on the silicon substrate and summarize them in Table 3.5. It should

be noted that the expression for interconnect area is specific for our serial channel as shown

Short-circuit current actually depends on the shape of input and output waveforms, which in turn are dependent on design parameters, i.e., interconnect parasitics, size and number of repeaters. For the purpose of simplification we will assume that is constant for various design parameters.

(35)

in Figure 3.2. For the channel area minimization, A^lers and A^wire must be considered

separately.

Table 3.5: Analytical Model for Area Occupied by the On-Chip Serial Channel

Parameter Expression

Area occupied on silicon by the repeaters _Aepeaiers = (n.s) (171 X ") Anverler_min

Area o f an inverter of minimum size ^Ainverter _ min

Area occupied by the channel in metal K^m = [(«, )W + (n,)2S + (n, + \)W^min ] (/)

3 . 3 P e r f o r m a n c e M e t r i c s v s . N u m b e r a n d S i z e o f R e p e a t e r s

In this section, we characterize the performance metrics of the on-chip serial channel as a function of the number of repeaters and size of each repeater. Initially, bandwidth is characterized by itself. Then, other performance metrics are calibrated for repeater insertion for a given bandwidth requirement. Analytical models from Section 3.2 are used for the purpose of characterization. Since the interconnects in the channel are symmetric, we characterize a single interconnect only. The interconnect is assumed to be 1 Omm long and its other dimensions are set to their respective minimums (i.e., W^min and S^mi„).

3.3.1 Bandwidth

The equation for the serial bandwidth (BW^S) for a single interconnect channel from Table 3.2 can be written in the expanded forms as shown below; with n^s=l:

(36)

BW. = (3.1) r (

C^Qm + 0.4 rc

•n . In

/

)

^1-v,

The ratio n/l represents the number of repeaters per unit length o f the interconnect and is the repeater density per unit length. Using Eqn. (3.1), we summarize below the bandwidth dependency on m and n/l.

A. Number of repeaters: BW^S increases monotonically with the increase in n/l independent of the value of m. Moreover, BW^S will saturate i f n/l is increased beyond a certain value [28].

Since the length of the wire is fixed for our case, henceforth we represent n/l as simply n.

B. Repeater size: BW^S increases as m increases, reaches a maximum and then decreases independent of n/l. Thus, there exists an optimum repeaters size m = ^ / / ^ c / C⁰r , which maximizes the BW^S obtained by differentiating Eqn. (3.1).

The contour plot in

Figure 3.4 was obtained using Eqn. (3.1). Interestingly the same value of BW^S can be obtained for different combination of m and n. Thus, for a given BW^S requirement, the designers can either choose a large number of small repeaters or a small number of large repeaters. The choice has to be made by proper evaluation of other performance metrics of the serial channel and their variation with respect to m and n.

(37)

Cg^> Bandwidth (Gbps)

Repeater Size (m)

Figure 3.4: Contour Plot of the Bandwidth of the Channel as a Function of m and n

3.3.2 Other Performance Metrics

Figure 3.5 shows the contour plot for the average energy per bit (E^h) with respect to m and

n. Similarly, the area occupied by repeaters in silicon (A^repaters) and the latency plots are

shown in Figure 3.6 and Figure 3.7, respectively. The area occupied by the interconnects in

metal (A^wire) is not shown since it is independent of m and n, as seen in Table 3.5.

In order to analyze the proper choice of m and n, two different design points that gives the

same BW^S of 3 Gbps are considered. The two design points, i.e., A and B are shown in the

contour plots for each performance metric. Point A corresponds to the case of a small

number of large repeaters and point B corresponds to the case of a large number of small

repeaters. The observations for the two design points are summarized in Table 3.6.

(38)

Figure 3.5: Contour Plot of the Average Energy per Bit as a Function of m and n

Figure 3.6: Contour Plot of the Area Occupied by Repeaters as a Function of m and n

(39)

10 20 30 40 50 60 70 80 Repeater Size (m)

Figure 3.7: Contour Plot of the Latency of the Channel as a Function of m and n

Table 3.6: Summary of Results for the Two Design Points

Performance Metrics Design point A m = 51, n = 10

Design point B

m = 7 , n = 30 Figure

BW^s(Gbps) ^! ^n/a

Latency (nsec) 0.86 2.06 Figure 3.7

E^b (pJoules) 1.45 1.2 Figure 3.6

A 1A

repealers! inverter _ min 570 210 Figure 3.6

5 n/a

From the results in Table 3.6 it can be concluded that for a given BW^S requirement, the

choice of a large number of small repeaters is suitable for reducing E^b and A^lers because

both of them depend on the product of m and n as seen from analytical models. Figure 3.8

provides general design guidelines based on these observations. The proper choice of

operating point depends on the desired design tradeoffs.

(40)

C^g5 Bandwidth (Gbps)

10 20 30 40 50 60 70 80 Repeater Size (m)

Figure 3.8: Summary of the Choice of m and n for Desirable Performance in the Serial Channel

3 . 4 B a n d w i d t h v s . I n t e r c o n n e c t W i d t h a n d S p a c i n g

In this section, we consider interconnect width (W) and spacing (S) changes and study its

impact on the bandwidth. Since the interconnects in the channel are symmetric, we analyze a

single interconnect only. Further, it is assumed that the repeaters inserted will be of optimal

size^(m^op^t) corresponding to the interconnect dimension⁴. Analysis for fixed repeater sizes

across varying interconnect dimensions has been carried out in [28]. This is done in order to

facilitate efficient design of high-bandwidth, minimal-area, on-chip serial channel described

later in Section 3.5. The effect of W and S is analyzed separately and the observations are

By optimal repeater, we mean that the size of the repeater for which the maximum throughput of the throughput-centric signaling scheme is maximum independent upon the number of repeaters inserted on the interconnect.

(41)

summarized with brief explanations. Further, a comparative analysis of a W increase or an S increase for increasing BW^S is provided from a minimal wiring-area perspective.

3.4.1 Interconnect Spacing and Width Variation

A . Interconnect Spacing (5): Let us first consider the change in S between the wires in the channel while keeping W = W^min. One should expect the BW^S to increase as S increases.

From the plot of BW^S vs. 5" in Figure 3.9, it can be seen that the BW^S does increase initially but saturates later for large value of S. The reason for this is due to an initial decrease in the total capacitance per unit length (c) of the wire as 5" increases⁵. Eventually, the capacitance value reaches a fixed value. Another point worth mentioning is that the m^opt decreases as S

increases and this implies a reduced E^h and A^{l e r s}.

5 For an interconnect of fixed width, the total capacitance of the interconnect decreases due to decrease in coupling capacitance as its spacing to a neighbor wire increases. The resistance of the wire is unaffected with spacing variations.

(42)

« 1 CD

Spacing (factor x S ^ )

Figure 3.9: Bandwidth vs. Interconnect Spacing

B. Interconnect Width (W): Next, we study the change in BW^S with change in W while

holding S = S^min. The wire resistance and capacitance per unit length of the interconnect

changes in opposite directions with ^variation. Hence, one should expect that the BW^S to

remain roughly constant because a decrease in r will be offset by the increase in c. But, from

the plot of BW^S versus W in Figure 3.10, it can be seen that, with the increase in W, BW^S

increases. The reason is that the relative increase in c is less⁶ and so the decrease in r

overshadows the increase in c, thus allowing BW^S to increase. Further, with the increase in

W, the value of m^opt increases and this will result in increased E^b and A^repealers.

6 For width variation.the area component of C changes with W, while the lateral and the fringing components remain approximately constant.

(43)

15

01 i i i i i i i I

2 4 6 8 10 12 14 16 Width (factor x W . )

Figure 3.10: Bandwidth vs. Interconnect Width

3.4.2 Comparison of Spacing and Width Variation

After analyzing individually the effect on BW^S for W and S changes, a comparison is

necessary to understand the relative benefit of each for increasing the BW^S. One metric that

can be used to evaluate the relative benefit is the area occupied by the channel in metal.

Figure 3.11 shows the BW^S as a function of A^wire. The solid lines represents the case when S

is varied, keeping W = W^min and the dotted lines represent the case when W is varied

keeping S = S^mm.

From the plot, it can be seen that for increasing wire area, BW^S increases for both cases as

explained above. But the incremental change in BW^S is much larger for the case when W \s

increased as compared to the case when S is increased. It can be inferred that spacing the

Design of a Serialized Link for On-Chip Global Communication

Abstract

Table of Contents

3.3.1 B a n d w i d t h 2 7 3.3.2 O t h e r P e r f o r m a n c e M e t r i c s ; 2 9

3.4.1 Interconnect S p a c i n g a n d W i d t h V a r i a t i o n 3 3 3.4.2 C o m p a r i s o n o f S p a c i n g a n d W i d t h V a r i a t i o n 3 5

3.5.1 P r o b l e m D e s c r i p t i o n 3 7 3.5.2 P r o p o s e d D e s i g n T e c h n i q u e 3 9

4.1.1 C i r c u i t C o m p o n e n t s 5 0

4.1.2 S e r i a l i z e r 5 2 4.1.3 D e s e r i a l i z e r 5 4

4.2.1 F i x e d Interconnect W i d t h 6 0 4.2.2 V a r i a b l e Interconnect W i d t h 6 3

4.3.1 V a r i a t i o n M o d e l 6 5 4.3.2 T h e o r e t i c a l A n a l y s i s 6 7 4.3.3 E x p e r i m e n t a l V a l i d a t i o n 7 2

4.3.4 S u m m a r y 7 5

5.2.1 O v e r v i e w o f P r o p o s e d T e c h n i q u e 7 9 5.2.2 P r o b l e m F o r m u l a t i o n a n d S o l u t i o n 81

List of Tables

Acknowledgement

Chapter 1 Introduction

1.1 O n - C h i p G l o b a l C o m m u n i c a t i o n

1.1.1 On-Chip Parallel Buses: Issues

T h e p h y s i c a l c h a n n e l s f o r o n - c h i p g l o b a l c o m m u n i c a t i o n s a r e o f t e n i m p l e m e n t e d as b u s e s .

A b u s c o n s i s t s o f a set o f w i r e s , o n e f o r e a c h b i t to b e t r a n s m i t t e d i n p a r a l l e l . D r i v e r s a n d

r e c e i v e r s a r e p r o v i d e d at t w o e n d s o f e a c h w i r e . I n a d d i t i o n , i n t e r m e d i a t e r e p e a t e r s a n d

1.1.2 On-Chip Serial Link: Potential Solution

1 . 2 D e s i g n I s s u e s f o r O n - C h i p S e r i a l L i n k : R e s e a r c h O b j e c t i v e

1

O-O-

1 . 3 T h e s i s C o n t r i b u t i o n s

1 . 4 O r g a n i z a t i o n o f t h e T h e s i s

Chapter 2 Background

2.1 I s s u e s w i t h O n - C h i p G l o b a l I n t e r c o n n e c t s

2.1.1 Routing Complexity

2.1.2 Performance (Throughput vs. Latency)

2.1.3 Power Dissipation (Energy)

2.1.4 On-Chip Variation

2 . 2 O f f - C h i p S e r i a l L i n k v s . O n - C h i p S e r i a l L i n k

Parallel Interface

SUMMARY

>

Parallel Interface

ii

2 . 3 H i g h - T h r o u g h p u t S i g n a l i n g T e c h n i q u e s

Chapter 3

On-Chip Serial Channel Design

1

I

•c

f0>

•c

>l

3 . 1 T h r o u g h p u t - C e n t r i c W a v e - P i p e l i n e d I n t e r c o n n e c t

3 . 2 A n a l y t i c a l M o d e l s f o r V a r i o u s P e r f o r m a n c e m e t r i c s

Input

Output

3.2.3 Analytical Model for Average Energy per Bit

!*

3.2.4 Analytical Model for Area of the Channel

3 . 3 P e r f o r m a n c e M e t r i c s v s . N u m b e r a n d S i z e o f R e p e a t e r s

3.3.1 Bandwidth

)

3.3.2 Other Performance Metrics

3 . 4 B a n d w i d t h v s . I n t e r c o n n e c t W i d t h a n d S p a c i n g

3.4.1 Interconnect Spacing and Width Variation

3.4.2 Comparison of Spacing and Width Variation