• No results found

Interconnection technologies

N/A
N/A
Protected

Academic year: 2021

Share "Interconnection technologies"

Copied!
22
0
0

Loading.... (view fulltext now)

Full text

(1)

Interconnection technologies

Ron Ho

VLSI Research Group

(2)

Acknowledgements

Many contributors to the work described here

> Robert Drost, David Hopkins, Alex Chow, Tarik Ono, Jo Ebergen

> … and the entire Sun Labs VLSI Research Group

(3)

Multicore chips today

Niagara: 8 cores

> 32 threads (4 threads/core), 1 shared FP

> 90nm CMOS, 1.2GHz, 63W, 380mm2 > High-volume production now

Niagara2: 8 (improved) cores

> 64 threads (8 threads/core), 1 FP per core

> 65nm CMOS, 342mm2

> Shipping systems in 2H07

(4)

Multicore chips tomorrow?

How do you scale up to tens or hundreds of cores?

> …Assuming you want to (more total power for high performance)

> …And on a single chip: avoid the costs of chip-to-chip IO

> Bumps are big (180μm), links are power-hungry (20pJ/bit = 20mW/Gbps)

You need to do some combination of:

> Making the cores very small

> But you lose functionality: what kind of programming models do you want?

> Making the die very big

> But you lose the yield/cost battle very quickly

> 5 10 15 20 25 30 35 100 200 300 400 500 600 C o s t pe r s qua re mm

(5)

A different direction…

We can rethink the “single-chip” requirement…

> If we reduce (eliminate) the costs of chip-to-chip communication

> Power, area (bandwidth density), latency, known-good-die

Break a multi-core chip into a many-chip-system

> Smaller chips lead to higher yields and lower cost

> Different chips lead to system adaptability and reconfigurability

> Aggregate systems of chips effectively break the reticle limit

(6)

Proximity I/O

Pioneered by Ivan Sutherland at Sun Labs

The big idea:

Transmit Transmit Receive Receive Chip1 Chip3 Chip2

Source: Drost, Sun, CICC 2003

Not a soldered connection!

(7)

200um 20um

Area bump bonds Proximity I/O pads

180um 20um

Proximity I/O benefits

Bandwidth density 60x-100x greater than balls

Much lower power

> No ESD required

> Use wide parallel links instead of narrow SerDes links

> Very small Tx and Rx circuits

(8)

Proximity I/O challenges

Misalignment is the proverbial monkey’s wrench

> Initial imperfect chip placement

> Dynamic chip movement from vibration or thermal expansion

A robust solution is (at least) two-pronged

> Combination of specialized packaging and custom electronics

> Here, discuss some of the electronic/circuit strategy

X Y

Z

θ

Chip1 Chip2

(9)

Fixing misalignment

Detect misalignment using Vernier-like arrays

> Measure capacitance between chips to sub-fF resolution

Correct in-plane misalignment using data steering

(10)

Dealing with noise

Receivers are differential sense-amplifiers

> “Butterfly” scheme rejects noise (receivers are offset-limited)

> We employ a clock—it uses unclocked receivers with larger pads

(11)

Silicon measurements

144b Proximity I/O datapath, 180nm TSMC chip

> 1.8Gbps per pin for 260Gbps in 0.5mm2 > Measured BER<10-15

> 3pJ/bit at a 1.8V power supply

> > 0.7UI timing margin, > 200mV voltage margin at speed

> Measured sensitivity to chip separation

(12)

Proximity I/O as an enabler

Low-cost chip-to-chip communication

> Off-chip I/O looks like an extension of on-chip wires

> Many chips look like a (big) single chip

> What kinds of interconnect networks should we consider?

Proximity I/O at the overlap

(13)

Proximity I/O-based grids

Another, perhaps more realistic grid

> All big (high-power) chips are all face-up on a cold plate

> Face-down chips merely transmit data

(14)

A red flag?

Such a system will have lots of VLSI wires

> The entire interconnection network consists of on-chip wires

Latency and bandwidth characteristics well-known

> …And so are the energy costs: E=C*V*ΔV

> Cap is about 0.45pF/mm (incl. repeaters), and not really scaling

> Voltage is about 1V, and not really scaling

> Therefore, energy is about 0.45pJ/bit/mm of linear length

> So 100 64b buses at 4GHz, 10% activity over 20mm: 24W!

(15)

Efficient capacitively-driven wires

Reduced power: low voltage swing on wires

> Swing is Vdd/(n+1) without requiring a second power supply

Reduced power: small load seen by driver

> Allows use of a 1/n-th sized driver, saving power and area

Reduced latency: capacitor pre-emphasizes edges

> Also, charge distribution effectively cuts wire delay in half

Rwire, Cwire

Cload

Cc

Cc = (Cwire+Cload)/n where n = 10 - 20

(16)

Pre-emphasis extends bandwidth

The inline capacitor acts as a high-frequency short

> Example of a 14mm minimum width wire (simulated)

> Bandwidth RC-limited to 50MHz

> Capacitor (of 1/20th total capacitance) extends it to 180MHz

-60 -40 -20 0

1e+06 1e+07 1e+08 1e+09 1e+10

Norm alized vol tage at end of wire (dB) -3dB bandwidth improves by 3.3x Full-swing voltage 50mV swing through Cc

(17)

Making a better repeater

Analysis of energy-efficiency shows benefits

> Compare against repeaters for a 10mm wire in a 180nm process

0 2 4 6 8 10 12 0.8 1 1.2 1.4 1.6 1.8 Energy i n pJ Performance in 1/nS Capacitively-driven wire, varying coupling capacitance from Cwire/18 to Cwire/40

One repeater stage

Two repeaters Three repeaters

Repeater curves are

from varying driver sizes 30% faster

10X lower power

(18)

driver long wire

Building pitchfork capacitors

Exploit a “problem” of wires: sidewall capacitance

> Can use more tines or multiple wire layers

Coupling caps ideal for summing junctions

> Use them to build a cheap FIR filter

Cload Cwire Cpar Cc1 Cc2 z-1

(19)

Some costs

Differential wires cost area, power; twisted for noise

Sense-amps have offset and biasing requirements

> Biasing can use refresh, with (n+1) channels for an n-bit bus

Verification requires some care

> Correctness comes from being near, not from being connected!

A B A_b B_b A B_b A_b B

(20)

Silicon measurements

Multiple datapaths on a 180nm TSMC 1.8V chip

> Measured 10x less energy at a 50mV swing (max. savings 18x)

> Measured 4x bandwidth extension from pre-emphasis

> Bit error rates < 10-11 (limited by test time) > 50% UI eye opening on each experiment

14mm, unrepeated, min.width

> RC-limited to 55MHz

> Capacitor pushes it to 200MHz

> Sub-optimal 2-tap FIR (wrong delay)

-20 -15 -10 -5 0 5 10 15 20

Voltage

After capacitor End of wire

End of wire, using FIR filter

1.7 1.7 1.7 1.8 1.8 1.8 1.9 1.9 1.9 00010000 sequence

(21)

Look at new system architectures

A pair of complementary enabling technologies

> Multi-chip grids connected using Proximity I/O and efficient wires

> Chip-to-chip latency, bandwidth, power equal to on-chip wires

> Long on-chip wires can be lower power and higher performance

Multi-chip grids look like a big “virtual” chip

> With (re)configurability, cost, and “reticle-breaking” benefits

A question: how to best interconnect these chips?

(22)

Interconnection technologies

Ron Ho, Ph.D.

References

Related documents

The emergency fuel control caution light is located in the pedestal-mounted caution panel (figure 2-13).. The illumination of the worded segment GOV EMER is a reminder to the pilot

A sequential mixed-methods approach to data collection (Tashakkori and Teddlie 2003; Riazi and Candlin 2014) was used to investigate the research questions. The three phases of data

Directs the Department of Human Services (DHS) to create a statewide coordinated behavioral health crisis response system, including walk-in crisis centers, mobile crisis response

Upon completion of any building, structure, equipment or other work for which a permit has been issued and before same is occupied or used, a final inspection shall be

ROBERTSON® Bits and Screws provide maximum recess penetration and “Cling Fit”. Les vis et embouts ROBERTSON® procurent une pénétration maximale et une

1.2 The resolution consolidates the precepts of Nottinghamshire County Council, Nottinghamshire Police and Crime Commissioner, Nottinghamshire Fire Authority,

To help organizations understand the opportu- nity of information and advanced analytics, MIT Sloan Management Review partnered with the IBM Institute for Business Value to

(3) To identify hospital characteristics that affect 7-day mortality (acute health outcome), 30-day mortality (sub-acute health outcome), and in-hospital