• No results found

Optimal Technology Mapping and Cell Merger for Asynchronous Threshold Networks

N/A
N/A
Protected

Academic year: 2021

Share "Optimal Technology Mapping and Cell Merger for Asynchronous Threshold Networks"

Copied!
64
0
0

Loading.... (view fulltext now)

Full text

(1)

Optimal Technology Mapping and Cell Merger

for Asynchronous Threshold Networks

Cheoljoo Jeong Steven M. Nowick

Department of Computer Science

(2)

Outline

• Introduction

• Background

– Technology mapping

– Robust asynchronous threshold networks (NCL) – NCL CAD flow

– Hazard issues: Orphans

• Motivational Examples

– Robustness & asynchronous technology mapping

• Robust Technology Mapping and Cell Merger Algorithms

• Experimental Results

– Near-complete DES encryption circuit

(3)

Problem Definition

• Asynchronous Threshold Networks

– Robust asynchronous circuit that consists of threshold gates. – Threshold gates:

• Each input has a weight

• Fires when the weighted sum of inputs exceeds a threshold value.

• Cell Merger and Technology Mapping Problems

– Given: unoptimized robust netlist

– Produce: optimized robust netlist (consisting of library cells)

Optimized mapped network:

consists of library cells, preserves

robustness

Optimized cell-merged network: consists of library cells, preserves

robustness

Output

•Unoptimized threshold network: robust •Async technology library (characterized)

Input

Technology Mapping Cell Merger

(4)

Related Work

• Technology Mapping for Asynchronous Circuits

– Siegel et al.: tech map for Burst-mode circuits

– Cortadella et al.: tech map for QDI control circuits

• Optimization of Asynchronous Threshold Circuits

– Smith et al.: a few local optimization techniques – Theseus Logic: local cell merger

P. Siegel et al. Automatic technology mapping for generalized fundamental-mode asynchronous designs. DAC’93. 1993.

Cortadella et al. Decomposition and technology mapping of speed-independent circuits. IEEE TCAD 1999.

(5)

Summary of Results

• A robust technology mapping algorithm for asynchronous

threshold networks is presented:

– First systematic approach to robust technology mapping – Maps across both datapath and control circuits

– Maps sequential gates with hysteresis – Targets either delay or area

– Integrates into existing asynchronous tool flow of Theseus Logic

• The cell merger problem formulated and solved:

– Limited special case of technology mapping

• Only adjacent cells in the given unoptimzed netlist can be merged

• Experimental Results: tech map

– Average output delay improvements: 20.9% – Worst-case circuit delay improvements: 26.3% – Average area improvements: 2.7%

(6)

Outline

• Introduction

• Background

– Technology mapping

– Robust asynchronous threshold networks (NCL) – NCL CAD flow

– Hazard issues: Orphans

• Motivational Examples

– Robustness & asynchronous technology mapping

• Robust Technology Mapping and Cell Merger Algorithms

• Experimental Results

– Near-complete DES encryption circuit

(7)

Technology Mapping

[DeMicheli94] G. De Micheli. Synthesis and Optimization of Digital Circuits (1994).

[Step 1] Decomposition [Step 2] Partitioning

• Technology Mapping

– Task of transforming an technology-independent logic network into a bound network as an interconnection of library elements

(8)

Technology Mapping (cont.)

Subject graph [Step 3] Covering Mapped circuit matches to library cells

(after decomposition and partitioning; from prev. slide)

(9)

Overview of Null Convention Logic

• NCL (Null Convention Logic)

– Robust asynchronous design style based on threshold networks – Uses delay-insensitive encoding

– Uses four-phase signaling protocol: alternates evaluate and reset phases – Asynchronous threshold gates with hysteresis property

(10)

NCL Asynchronous Commercial CAD Flow

(Theseus Logic)

VHDL specification

3NCL circuit

2NCL circuit

Robust NCL circuit

dual-rail expansion

Theseus’s template-based cell merger

uses synchronous Synopsys tool: front-end

only limited local optimizations currently used abstract multi-valued circuit

instantiated Boolean circuit (robust, unoptimized)

(11)

Basics of NCL Circuits

• 3NCL Circuits: Abstract multi-valued threshold circuit

– Starting point for NCL synthesis flow

– 3NCL is a three-valued logic with {0, 1, NULL}

– 3NCL circuits alternate between DATA and NULL phases – During the DATA (Evaluate) phase:

• outputs have DATA values only after all inputs have DATA values – During the NULL (Reset) phase:

• outputs have NULL values only after all inputs have NULL values.

z

a

b

3NCL OR gate 3-valued output 3-valued inputs

(12)

Basics of NCL Circuits

• 3NCL Circuits: Abstract multi-valued threshold circuit

– Starting point for NCL synthesis flow

– 3NCL is a three-valued logic with {0, 1, NULL}

– 3NCL circuits alternate between DATA and NULL phases – During the DATA (Evaluate) phase:

• outputs have DATA values only after all inputs have DATA values – During the NULL (Reset) phase:

• outputs have NULL values only after all inputs have NULL values.

z

a

b

3NCL OR gate 3-valued output 3-valued inputs

N

N

N

(13)

Basics of NCL Circuits

• 3NCL Circuits: Abstract multi-valued threshold circuit

– Starting point for NCL synthesis flow

– 3NCL is a three-valued logic with {0, 1, NULL}

– 3NCL circuits alternate between DATA and NULL phases – During the DATA (Evaluate) phase:

• outputs have DATA values only after all inputs have DATA values – During the NULL (Reset) phase:

• outputs have NULL values only after all inputs have NULL values.

z

a

b

3NCL OR gate 3-valued output 3-valued inputs

1

N

N

(14)

Basics of NCL Circuits

• 3NCL Circuits: Abstract multi-valued threshold circuit

– Starting point for NCL synthesis flow

– 3NCL is a three-valued logic with {0, 1, NULL}

– 3NCL circuits alternate between DATA and NULL phases – During the DATA (Evaluate) phase:

• outputs have DATA values only after all inputs have DATA values – During the NULL (Reset) phase:

• outputs have NULL values only after all inputs have NULL values.

z

a

b

3NCL OR gate 3-valued output 3-valued inputs

1

0

1

(15)

Basics of NCL Circuits

• 3NCL Circuits: Abstract multi-valued threshold circuit

– Starting point for NCL synthesis flow

– 3NCL is a three-valued logic with {0, 1, NULL}

– 3NCL circuits alternate between DATA and NULL phases – During the DATA (Evaluate) phase:

• outputs have DATA values only after all inputs have DATA values – During the NULL (Reset) phase:

• outputs have NULL values only after all inputs have NULL values.

z

a

b

3NCL OR gate 3-valued output 3-valued inputs

N

0

1

(16)

Basics of NCL Circuits

• 3NCL Circuits: Abstract multi-valued threshold circuit

– Starting point for NCL synthesis flow

– 3NCL is a three-valued logic with {0, 1, NULL}

– 3NCL circuits alternate between DATA and NULL phases – During the DATA (Evaluate) phase:

• outputs have DATA values only after all inputs have DATA values – During the NULL (Reset) phase:

• outputs have NULL values only after all inputs have NULL values.

z

a

b

3NCL OR gate 3-valued output 3-valued inputs

N

N

N

(17)

Basics of NCL Circuits (cont.)

• Delay-Insensitive Encoding

– Single signal is represented by two wires (0-rail and 1-rail)

a

a0 a1 Not allowed 1 1 1 0 1 0 1 0 NULL 0 0 a a0 a1 dual-rail expansion

(18)

Basics of NCL Circuits (cont.)

a0 b0 a1 b1 z1 z0

z

a

b

3NCL OR gate 2NCL OR network

• Transforming 3NCL to 2NCL Circuits

– 2NCL circuits = dual-rail implementation of 3NCL circuits

– Single 3NCL signal: is represented by two wires (0-rail / 1-rail) – Single 3NCL gate: expanded into small network of 2NCL gates

3-valued output dual-rail output dual-rail inputs 3-valued inputs

(19)

Basics of NCL Circuits (cont.)

• Structure of NCL Circuits: Single Pipeline Stage

Combinational logic Asynchronous registrati on Completion detector Stage N Stage N-1 Stage N+1 Asynchronous registrati on

(20)

Basics of NCL Circuits (cont.)

• Pipeline Operation: Four-Phase Signaling Protocol

– Stages alternate between DATA (Evaluate) and NULL (Reset) phases

Combinational logic CD Stage N Stage N-1 Stage N+1 Async Register Combinational logic Async Register Combinational logic CD

(21)

Basics of NCL Circuits (cont.)

• Pipeline Operation: Four-Phase Signaling Protocol

– Stages alternate between DATA (Evaluate) and NULL (Reset) phases

Combinational logic CD Stage N Stage N-1 Stage N+1 Async Register Combinational logic Async Register Combinational logic CD

(22)

Basics of NCL Circuits (cont.)

• Pipeline Operation: Four-Phase Signaling Protocol

– Stages alternate between DATA (Evaluate) and NULL (Reset) phases

Combinational logic CD Stage N Stage N-1 Stage N+1 Async Register Combinational logic Async Register Combinational logic CD

(23)

Basics of NCL Circuits (cont.)

• Pipeline Operation: Four-Phase Signaling Protocol

– Stages alternate between DATA (Evaluate) and NULL (Reset) phases

Combinational logic CD Stage N Stage N-1 Stage N+1 Async Register Combinational logic Async Register Combinational logic CD

(24)

Basics of NCL Circuits (cont.)

• Pipeline Operation: Four-Phase Signaling Protocol

– Stages alternate between DATA (Evaluate) and NULL (Reset) phases

Combinational logic CD Stage N Stage N-1 Stage N+1 Async Register Combinational logic Async Register Combinational logic CD

(25)

Basics of NCL Circuits (cont.)

• Pipeline Operation: Four-Phase Signaling Protocol

– Stages alternate between DATA (Evaluate) and NULL (Reset) phases

Combinational logic CD Stage N Stage N-1 Stage N+1 Async Register Combinational logic Async Register Combinational logic CD

DATA (Evaluate) phase

0

(26)

Basics of NCL Circuits (cont.)

• Pipeline Operation: Four-Phase Signaling Protocol

– Stages alternate between DATA (Evaluate) and NULL (Reset) phases

Combinational logic CD Stage N Stage N-1 Stage N+1 Async Register Combinational logic Async Register Combinational logic CD

DATA (Evaluate) phase

0

(27)

Basics of NCL Circuits (cont.)

• Pipeline Operation: Four-Phase Signaling Protocol

– Stages alternate between DATA (Evaluate) and NULL (Reset) phases

Combinational logic CD Stage N Stage N-1 Stage N+1 Async Register Combinational logic Async Register Combinational logic CD

(28)

Basics of NCL Circuits (cont.)

• Pipeline Operation: Four-Phase Signaling Protocol

– Stages alternate between DATA (Evaluate) and NULL (Reset) phases

Combinational logic CD Stage N Stage N-1 Stage N+1 Async Register Combinational logic Async Register Combinational logic CD

NULL (Reset) phase

(29)

Basics of NCL Circuits (cont.)

• Pipeline Operation: Four-Phase Signaling Protocol

– Stages alternate between DATA (Evaluate) and NULL (Reset) phases

Combinational logic CD Stage N Stage N-1 Stage N+1 Async Register Combinational logic Async Register Combinational logic CD

NULL (Reset) phase

(30)

Basics of NCL Circuits (cont.)

• Pipeline Operation: Four-Phase Signaling Protocol

– Stages alternate between DATA (Evaluate) and NULL (Reset) phases

Combinational logic CD Stage N Stage N-1 Stage N+1 Async Register Combinational logic Async Register Combinational logic CD

NULL (Reset) phase

(31)

Basics of NCL Circuits (cont.)

• Pipeline Operation: Four-Phase Signaling Protocol

– Stages alternate between DATA (Evaluate) and NULL (Reset) phases

Combinational logic CD Stage N Stage N-1 Stage N+1 Async Register Combinational logic Async Register Combinational logic CD

NULL (Reset) phase

1

(32)

Basics of NCL Circuits (cont.)

• Pipeline Operation: Four-Phase Signaling Protocol

– Stages alternate between DATA (Evaluate) and NULL (Reset) phases

Combinational logic CD Stage N Stage N-1 Stage N+1 Async Register Combinational logic Async Register Combinational logic CD

NULL (Reset) phase

1

0 1

(33)

Industrial Applications of Null Convention Logic

• Theseus Logic, Inc.: asynchronous startup company

– Built asynchronous version of Motorola MCORE processor – 18+ other chips designed and fabricated using NCL flow

• Five had over 150,000 transistors • Largest one 660,000 transistors.

– Currently, NCL used in DARPA “CLASS” project (led by Boeing) • Major new CAD initiative (with Philips, Theseus, Columbia, UNC, etc.) • Developing commercially-viable asynchronous CAD flow

(34)

Hazard Issues

• Delay-Insensitivity (= Delay Model)

– Assumes arbitrary gate and wire delay

• circuit operates correctly under all conditions – Most robust design style

• “Orphans”: Hazards to Delay-Insensitivity

– “Ineffective” signal transition sequences (= unobservable paths)

– Wire orphans: timing requirements on wires at fanout points

(35)

Hazard Issues (cont.)

• Wire Orphan Example:

Wire orphan example

primary outputs

In NCL flow, wire orphans are not a problem: eliminated by enforcing physical timing constraints

(36)

Hazard Issues (cont.)

• Wire Orphan Example:

Wire orphan example

primary outputs

In NCL flow, wire orphans are not a problem: eliminated by enforcing physical timing constraints

0

(37)

Hazard Issues (cont.)

• Wire Orphan Example:

Wire orphan example

primary outputs

In NCL flow, wire orphans are not a problem: eliminated by enforcing physical timing constraints

0

0

(38)

Hazard Issues (cont.)

• Wire Orphan Example:

Wire orphan example

primary outputs

In NCL flow, wire orphans are not a problem: eliminated by enforcing physical timing constraints

0

0

0

(39)

Hazard Issues (cont.)

• Gate Orphan Example:

Gate orphan example

In NCL flow, gate orphan freedom not guaranteed: must be avoided during logic synthesis

a

0

b

0

a

1

b

1

z

1

z

0

(40)

Hazard Issues (cont.)

• Gate Orphan Example:

Gate orphan example

In NCL flow, gate orphan freedom not guaranteed: must be avoided during logic synthesis

a

0

b

0

a

1

b

1

z

1

z

0 0 0

(41)

Hazard Issues (cont.)

• Gate Orphan Example:

Gate orphan example

In NCL flow, gate orphan freedom not guaranteed: must be avoided during logic synthesis

a

0

b

0

a

1

b

1

z

1

z

0 0 0 0 0

(42)

Hazard Issues (cont.)

• Gate Orphan Example:

Gate orphan example

In NCL flow, gate orphan freedom not guaranteed: must be avoided during logic synthesis

a

0

b

0

a

1

b

1

z

1

z

0 0 0 0 0

gate orphan! = not observable

(43)

Outline

• Introduction

• Background

– Technology mapping

– Robust asynchronous threshold networks (NCL) – NCL CAD flow

– Hazard issues: Orphans

• Motivational Examples

– Robustness & asynchronous technology mapping

• Robust Technology Mapping and Cell Merger Algorithms

• Experimental Results

– Near-complete DES encryption circuit

(44)

Motivational Examples:

Challenges to Robust Technology Mapping

• Arbitrary decomposition can be dangerous (= non-robust).

a

b

c

a

c

b

3-input AND gate decomposition into

two 2-input AND gates

(45)

Motivational Examples:

Challenges to Robust Technology Mapping

• Arbitrary decomposition can be dangerous (= non-robust).

a

b

c

a

c

b

3-input AND gate decomposition into

two 2-input AND gates

CONCLUSION: must carefully restrict decomposition to ensure robustness

0

0 0

0

(46)

Motivational Examples :

Challenges to Robust Technology Mapping

• DAG (Directed Acyclic Graph)-based covering can be

dangerous.

subject graph = DAG mapped network

CONCLUSION: must avoid arbitrary DAG-covering for robustness

w z a b c d w z a b c d

(47)

Outline

• Introduction

• Background

– Technology mapping

– Robust asynchronous threshold networks (NCL) – NCL CAD flow

– Hazard issues: Orphans

• Motivational Examples

– Robustness & asynchronous technology mapping

• Robust Technology Mapping and Cell Merger Algorithms

• Experimental Results

– Near-complete DES encryption circuit

(48)

Cell Merger Algorithm

• The Cell Merger Problem

– Input: unoptimized robust netlist

– Output: optimized robust netlist (consists of library cells) – Solved as limited special case of technology mapping

• Only adjacent cells in the given unoptimized netlist can be merged

• Overview of the Proposed Algorithm

– No decomposition (NEW)

• Every cell function is a base function

• Original netlist is subject graph

– Partitioning (use existing techniques) – Pattern graph generation (NEW)

• New bottom-up approach proposed

(49)

Pattern Graph Generation

• Overview of Pattern Graph Generation

– Iteratively generates all pattern graphs in single bottom-up approach • Generates: two-cell merger, three-cell merger (up to four-cell mergers) • Includes all single-cell pattern graphs: may be useful for matching – Merges cell functions rather than cells themselves

• Single cell function may represent multiple cells

• Examples:

Two-cell merger: merging AND2 and OR2

(50)

Matching and Covering

• Matching and Covering

– Uses traditional synchronous approach (with dynamic programming) – Targets either area or delay minimization

• Delay Minimization

– Nonlinear delay model used • based on table-lookup

• uses load binning to handle load-dependent delays

• alternative approach: use load-independent delay model

matches to library cells

(51)

Technology Mapping Algorithm

• The Technology Mapping Problem

– Input: unoptimized robust netlist

– Output: optimized robust netlist (consists of library cells) – More general approach

• Allows more general mapping to library cells

• Overview of the Proposed Algorithm

– Gate-orphan-free decomposition (NEW)

• Decompose original netlist into gate-orphan-free netlist – Partitioning (use existing techniques)

– Pattern graph generation (NEW)

• Use new positive monotonic finite basis (= AND2/OR2)

– vs. synchronous approaches (= NAND2/INV)

• Must include some complex irreducible nodes – Matching and covering (use existing techniques)

(52)

Gate-Orphan-Free Decomposition

• Basic Idea

– Decompose each node using simple base functions

• new monotonic basis proposed = AND2/OR2

• safe as long as no gate orphans introduced – If no such guarantee:

• node not decomposed: the node function is registered as new complex base function (irreducible)

• Overview of Gate-Orphan-Free Decomposition

– AND2 and OR2 gates: not decomposed = primitive base functions – Large OR cells: safely decomposed into network of OR2 cells

– Large AND cells: not decomposed -- decomposition unsafe – Existing (Theseus) local cell merger optimizations undone

• through reverse lookup table

(53)

Pattern Graph Generation

• After decomposition and partitioning, subject graphs

consist of:

– simple base functions (AND2/OR2) – complex base functions (irreducible)

• Overview of Pattern Graph Generation:

– Library cell functions: decomposed using simple finite basis. – If library cell function = complex base function, then:

(54)

Matching and Covering

• Matching and Covering

– Uses traditional synchronous approach (with dynamic programming) – Targets either area or delay minimization

• Delay Minimization

– Nonlinear delay model used • based on table-lookup

• uses load binning to handle load-dependent delays

• alternative approach: use load-independent delay model

matches to library cells

(55)

Outline

• Introduction

• Background

– Technology mapping

– Robust asynchronous threshold networks (NCL) – NCL CAD flow

– Hazard issues: Orphans

• Motivational Examples

– Robustness & asynchronous technology mapping

• Robust Technology Mapping and Cell Merger Algorithms

• Experimental Results

– Near-complete DES encryption circuit

• Concluding Remarks

(56)

Experimental Results (cont.)

83.7% 103.2% 99.9% 99.9% Average improvement 57.5% 83.4% 80.5% 75.3% 3/2/4 des-r05 83.6% 102.1% 100.0% 99.9% 590/306/2196 des-r04 83.5% 104.1% 100.0% 99.9% 590/298/2186 des-r03 73.8% 100.0% 95.3% 95.7% 11/4/10 des-r02 87.4% 101.9% 100.2% 100.0% 352/64/1079 des-r01 Delay Area Delay Area #i/#o/#g Name Delay-optimized circuit Area-optimized circuit

(57)

Experimental Results

79.1% 120.4% 100.2% 97.3% Average improvement 57.5% 83.3% 80.5% 75.0% 3/2/4 des-r05 80.0% 122.3% 100.3% 97.8% 590/306/2196 des-r04 79.7% 122.6% 100.3% 97.8% 590/298/2186 des-r03 71.0% 155.9% 95.3% 95.7% 11/4/10 des-r02 73.3% 110.7% 100.2% 95.5% 352/64/1079 des-r01 Delay Area Delay Area #i/#o/#g Name Delay-optimized circuit Area-optimized circuit

(58)

Experimental Results (cont.)

• Delay Optimization

– Avg. improvement for worst-case output delay

• Considers single worst-case path for each individual output • Cell Merger: 16.3%

• Tech map: 20.9%

– Avg. improvement for worst-case circuit delay • Considers the single worst-case path in entire circuit • Cell Merger: 20.1%

• Tech map: 26.3%

• Area Optimization

– Cell Merger: 0.1% avg. improvement – Tech map: 2.7% avg. improvement

• Analysis of results

– Due to its local nature, Theseus optimizer left more room for improvement w.r.t. delay than w.r.t. area

(59)

Concluding Remarks

• Conclusions

– Robust technology mapping and cell merger algorithms for asynchronous threshold circuits proposed

– Fully-automated in software tool – Avg. improvements for tech map

• Delay: 20.9% • Area: 2.7%

– Worst-case circuit paths: delay improvements of 26.3%

– Our method to be included in DARPA CLASS async tool flow

• Future Work

– Careful robust extension of DAG-covering algorithms – Support hybrid cost functions (e.g. area and delay)

(60)
(61)

Asynchronous Circuits

clock

distributed control

(communicate locally via handshaking) centralized control

components operate at varying rates entire system operates at fixed rate

no global clock global clock

Asynchronous System Synchronous System

• Synchronous vs. Asynchronous Systems

handshaking interfaces

Asynchronous System Synchronous System

(62)

Asynchronous Circuits (cont.)

• Industrial Applications of Asynchronous Circuits

– Sun:

• UltraSPARC uses async FIFOs for memory interface – Theseus Logic:

• asynchronous Motorola MCORE processor – Philips:

• Feb. 2006 press release: asynchronous ARM processor commercially

available by ARM Ltd.

(63)

Basics of NCL Circuits (cont.)

• Asynchronous Threshold Gates with Hysteresis Property

– Threshold gates with sequential SET/RESET functionality

– SET function:

• Each input xi has weight wi • Gate has threshold value T

• If weighted sum

Σ

wi xi >= T, gate fires

– RESET function:

(64)

Asynchronous Circuits

• Benefits of Asynchronous Circuits

– Robustness to process variation – Mitigates: timing closure problem – Low power consumption, low EMI – Modularity

• Challenges of Asynchronous Circuits

– Lack of CAD tools

– Robust design is required: hazard-freedom – Area overhead

References

Related documents

The codes used are defined as follows: %Bare = percent bare ground; %Clay = percent clay in surface soil; %Litter = percent litter cover; %OM = percent organic matter in surface

The totality of accomplishments realized to date are due in large part to the dedicated and coordinated effort of all 16 colleges, the WTCS staff and Board, WSG and our

Alternative Fuel Vehicle (AFV) and Fueling Infrastructure Loans The Virginia Board of Education may use funding from the state Literary Fund to grant loans to school boards

Applicants will need to have one of the following: • A recognised Bachelor’s degree at 2.2 or above, or an equivalent academic qualification • NQF Level 6 or equivalent

Abstract This paper deals with the application of temporal averaging methods to recurrent networks of noisy neurons undergoing a slow and unsupervised modifica- tion of

Supreme Court held that NEPA does not require agencies to prepare an EIS for appropriation. requests because these requests are not “proposals for legislation” as stipulated in

However, due to physical limita- tions ( e.g. , cost, power, and board space), high-speed mem- ory turns out to be a scarce resource, and its size is limited. On the other hand,

This included in one household each: (1) reptiles kept until 5 weeks before detection of Salmonella Tennessee in the child, (2) a visit by a friend who kept a reptile, (3)