Optimal Technology Mapping and Cell Merger
for Asynchronous Threshold Networks
Cheoljoo Jeong Steven M. Nowick
Department of Computer Science
Outline
• Introduction
• Background
– Technology mapping
– Robust asynchronous threshold networks (NCL) – NCL CAD flow
– Hazard issues: Orphans
• Motivational Examples
– Robustness & asynchronous technology mapping
• Robust Technology Mapping and Cell Merger Algorithms
• Experimental Results
– Near-complete DES encryption circuit
Problem Definition
• Asynchronous Threshold Networks
– Robust asynchronous circuit that consists of threshold gates. – Threshold gates:
• Each input has a weight
• Fires when the weighted sum of inputs exceeds a threshold value.
• Cell Merger and Technology Mapping Problems
– Given: unoptimized robust netlist
– Produce: optimized robust netlist (consisting of library cells)
Optimized mapped network:
consists of library cells, preserves
robustness
Optimized cell-merged network: consists of library cells, preserves
robustness
Output
•Unoptimized threshold network: robust •Async technology library (characterized)
Input
Technology Mapping Cell Merger
Related Work
• Technology Mapping for Asynchronous Circuits
– Siegel et al.: tech map for Burst-mode circuits
– Cortadella et al.: tech map for QDI control circuits
• Optimization of Asynchronous Threshold Circuits
– Smith et al.: a few local optimization techniques – Theseus Logic: local cell merger
P. Siegel et al. Automatic technology mapping for generalized fundamental-mode asynchronous designs. DAC’93. 1993.
Cortadella et al. Decomposition and technology mapping of speed-independent circuits. IEEE TCAD 1999.
Summary of Results
• A robust technology mapping algorithm for asynchronous
threshold networks is presented:
– First systematic approach to robust technology mapping – Maps across both datapath and control circuits
– Maps sequential gates with hysteresis – Targets either delay or area
– Integrates into existing asynchronous tool flow of Theseus Logic
• The cell merger problem formulated and solved:
– Limited special case of technology mapping
• Only adjacent cells in the given unoptimzed netlist can be merged
• Experimental Results: tech map
– Average output delay improvements: 20.9% – Worst-case circuit delay improvements: 26.3% – Average area improvements: 2.7%
Outline
• Introduction
• Background
– Technology mapping
– Robust asynchronous threshold networks (NCL) – NCL CAD flow
– Hazard issues: Orphans
• Motivational Examples
– Robustness & asynchronous technology mapping
• Robust Technology Mapping and Cell Merger Algorithms
• Experimental Results
– Near-complete DES encryption circuit
Technology Mapping
[DeMicheli94] G. De Micheli. Synthesis and Optimization of Digital Circuits (1994).
[Step 1] Decomposition [Step 2] Partitioning
• Technology Mapping
– Task of transforming an technology-independent logic network into a bound network as an interconnection of library elements
Technology Mapping (cont.)
Subject graph [Step 3] Covering Mapped circuit matches to library cells(after decomposition and partitioning; from prev. slide)
Overview of Null Convention Logic
• NCL (Null Convention Logic)
– Robust asynchronous design style based on threshold networks – Uses delay-insensitive encoding
– Uses four-phase signaling protocol: alternates evaluate and reset phases – Asynchronous threshold gates with hysteresis property
NCL Asynchronous Commercial CAD Flow
(Theseus Logic)
VHDL specification
3NCL circuit
2NCL circuit
Robust NCL circuit
dual-rail expansionTheseus’s template-based cell merger
uses synchronous Synopsys tool: front-end
only limited local optimizations currently used abstract multi-valued circuit
instantiated Boolean circuit (robust, unoptimized)
Basics of NCL Circuits
• 3NCL Circuits: Abstract multi-valued threshold circuit
– Starting point for NCL synthesis flow– 3NCL is a three-valued logic with {0, 1, NULL}
– 3NCL circuits alternate between DATA and NULL phases – During the DATA (Evaluate) phase:
• outputs have DATA values only after all inputs have DATA values – During the NULL (Reset) phase:
• outputs have NULL values only after all inputs have NULL values.
z
a
b
3NCL OR gate 3-valued output 3-valued inputsBasics of NCL Circuits
• 3NCL Circuits: Abstract multi-valued threshold circuit
– Starting point for NCL synthesis flow– 3NCL is a three-valued logic with {0, 1, NULL}
– 3NCL circuits alternate between DATA and NULL phases – During the DATA (Evaluate) phase:
• outputs have DATA values only after all inputs have DATA values – During the NULL (Reset) phase:
• outputs have NULL values only after all inputs have NULL values.
z
a
b
3NCL OR gate 3-valued output 3-valued inputsN
N
N
Basics of NCL Circuits
• 3NCL Circuits: Abstract multi-valued threshold circuit
– Starting point for NCL synthesis flow– 3NCL is a three-valued logic with {0, 1, NULL}
– 3NCL circuits alternate between DATA and NULL phases – During the DATA (Evaluate) phase:
• outputs have DATA values only after all inputs have DATA values – During the NULL (Reset) phase:
• outputs have NULL values only after all inputs have NULL values.
z
a
b
3NCL OR gate 3-valued output 3-valued inputs1
N
N
Basics of NCL Circuits
• 3NCL Circuits: Abstract multi-valued threshold circuit
– Starting point for NCL synthesis flow– 3NCL is a three-valued logic with {0, 1, NULL}
– 3NCL circuits alternate between DATA and NULL phases – During the DATA (Evaluate) phase:
• outputs have DATA values only after all inputs have DATA values – During the NULL (Reset) phase:
• outputs have NULL values only after all inputs have NULL values.
z
a
b
3NCL OR gate 3-valued output 3-valued inputs1
0
1
Basics of NCL Circuits
• 3NCL Circuits: Abstract multi-valued threshold circuit
– Starting point for NCL synthesis flow– 3NCL is a three-valued logic with {0, 1, NULL}
– 3NCL circuits alternate between DATA and NULL phases – During the DATA (Evaluate) phase:
• outputs have DATA values only after all inputs have DATA values – During the NULL (Reset) phase:
• outputs have NULL values only after all inputs have NULL values.
z
a
b
3NCL OR gate 3-valued output 3-valued inputsN
0
1
Basics of NCL Circuits
• 3NCL Circuits: Abstract multi-valued threshold circuit
– Starting point for NCL synthesis flow– 3NCL is a three-valued logic with {0, 1, NULL}
– 3NCL circuits alternate between DATA and NULL phases – During the DATA (Evaluate) phase:
• outputs have DATA values only after all inputs have DATA values – During the NULL (Reset) phase:
• outputs have NULL values only after all inputs have NULL values.
z
a
b
3NCL OR gate 3-valued output 3-valued inputsN
N
N
Basics of NCL Circuits (cont.)
• Delay-Insensitive Encoding
– Single signal is represented by two wires (0-rail and 1-rail)
a
a0 a1 Not allowed 1 1 1 0 1 0 1 0 NULL 0 0 a a0 a1 dual-rail expansionBasics of NCL Circuits (cont.)
a0 b0 a1 b1 z1 z0z
a
b
3NCL OR gate 2NCL OR network• Transforming 3NCL to 2NCL Circuits
– 2NCL circuits = dual-rail implementation of 3NCL circuits
– Single 3NCL signal: is represented by two wires (0-rail / 1-rail) – Single 3NCL gate: expanded into small network of 2NCL gates
3-valued output dual-rail output dual-rail inputs 3-valued inputs
Basics of NCL Circuits (cont.)
• Structure of NCL Circuits: Single Pipeline Stage
Combinational logic Asynchronous registrati on Completion detector Stage N Stage N-1 Stage N+1 Asynchronous registrati on
Basics of NCL Circuits (cont.)
• Pipeline Operation: Four-Phase Signaling Protocol
– Stages alternate between DATA (Evaluate) and NULL (Reset) phases
Combinational logic CD Stage N Stage N-1 Stage N+1 Async Register Combinational logic Async Register Combinational logic CD
Basics of NCL Circuits (cont.)
• Pipeline Operation: Four-Phase Signaling Protocol
– Stages alternate between DATA (Evaluate) and NULL (Reset) phases
Combinational logic CD Stage N Stage N-1 Stage N+1 Async Register Combinational logic Async Register Combinational logic CD
Basics of NCL Circuits (cont.)
• Pipeline Operation: Four-Phase Signaling Protocol
– Stages alternate between DATA (Evaluate) and NULL (Reset) phases
Combinational logic CD Stage N Stage N-1 Stage N+1 Async Register Combinational logic Async Register Combinational logic CD
Basics of NCL Circuits (cont.)
• Pipeline Operation: Four-Phase Signaling Protocol
– Stages alternate between DATA (Evaluate) and NULL (Reset) phases
Combinational logic CD Stage N Stage N-1 Stage N+1 Async Register Combinational logic Async Register Combinational logic CD
Basics of NCL Circuits (cont.)
• Pipeline Operation: Four-Phase Signaling Protocol
– Stages alternate between DATA (Evaluate) and NULL (Reset) phases
Combinational logic CD Stage N Stage N-1 Stage N+1 Async Register Combinational logic Async Register Combinational logic CD
Basics of NCL Circuits (cont.)
• Pipeline Operation: Four-Phase Signaling Protocol
– Stages alternate between DATA (Evaluate) and NULL (Reset) phases
Combinational logic CD Stage N Stage N-1 Stage N+1 Async Register Combinational logic Async Register Combinational logic CD
DATA (Evaluate) phase
0
Basics of NCL Circuits (cont.)
• Pipeline Operation: Four-Phase Signaling Protocol
– Stages alternate between DATA (Evaluate) and NULL (Reset) phases
Combinational logic CD Stage N Stage N-1 Stage N+1 Async Register Combinational logic Async Register Combinational logic CD
DATA (Evaluate) phase
0
Basics of NCL Circuits (cont.)
• Pipeline Operation: Four-Phase Signaling Protocol
– Stages alternate between DATA (Evaluate) and NULL (Reset) phases
Combinational logic CD Stage N Stage N-1 Stage N+1 Async Register Combinational logic Async Register Combinational logic CD
Basics of NCL Circuits (cont.)
• Pipeline Operation: Four-Phase Signaling Protocol
– Stages alternate between DATA (Evaluate) and NULL (Reset) phases
Combinational logic CD Stage N Stage N-1 Stage N+1 Async Register Combinational logic Async Register Combinational logic CD
NULL (Reset) phase
Basics of NCL Circuits (cont.)
• Pipeline Operation: Four-Phase Signaling Protocol
– Stages alternate between DATA (Evaluate) and NULL (Reset) phases
Combinational logic CD Stage N Stage N-1 Stage N+1 Async Register Combinational logic Async Register Combinational logic CD
NULL (Reset) phase
Basics of NCL Circuits (cont.)
• Pipeline Operation: Four-Phase Signaling Protocol
– Stages alternate between DATA (Evaluate) and NULL (Reset) phases
Combinational logic CD Stage N Stage N-1 Stage N+1 Async Register Combinational logic Async Register Combinational logic CD
NULL (Reset) phase
Basics of NCL Circuits (cont.)
• Pipeline Operation: Four-Phase Signaling Protocol
– Stages alternate between DATA (Evaluate) and NULL (Reset) phases
Combinational logic CD Stage N Stage N-1 Stage N+1 Async Register Combinational logic Async Register Combinational logic CD
NULL (Reset) phase
1
Basics of NCL Circuits (cont.)
• Pipeline Operation: Four-Phase Signaling Protocol
– Stages alternate between DATA (Evaluate) and NULL (Reset) phases
Combinational logic CD Stage N Stage N-1 Stage N+1 Async Register Combinational logic Async Register Combinational logic CD
NULL (Reset) phase
1
0 1
Industrial Applications of Null Convention Logic
• Theseus Logic, Inc.: asynchronous startup company
– Built asynchronous version of Motorola MCORE processor – 18+ other chips designed and fabricated using NCL flow
• Five had over 150,000 transistors • Largest one 660,000 transistors.
– Currently, NCL used in DARPA “CLASS” project (led by Boeing) • Major new CAD initiative (with Philips, Theseus, Columbia, UNC, etc.) • Developing commercially-viable asynchronous CAD flow
Hazard Issues
• Delay-Insensitivity (= Delay Model)
– Assumes arbitrary gate and wire delay
• circuit operates correctly under all conditions – Most robust design style
• “Orphans”: Hazards to Delay-Insensitivity
– “Ineffective” signal transition sequences (= unobservable paths)
– Wire orphans: timing requirements on wires at fanout points
Hazard Issues (cont.)
• Wire Orphan Example:
Wire orphan example
primary outputs
In NCL flow, wire orphans are not a problem: eliminated by enforcing physical timing constraints
Hazard Issues (cont.)
• Wire Orphan Example:
Wire orphan example
primary outputs
In NCL flow, wire orphans are not a problem: eliminated by enforcing physical timing constraints
0
Hazard Issues (cont.)
• Wire Orphan Example:
Wire orphan example
primary outputs
In NCL flow, wire orphans are not a problem: eliminated by enforcing physical timing constraints
0
0
Hazard Issues (cont.)
• Wire Orphan Example:
Wire orphan example
primary outputs
In NCL flow, wire orphans are not a problem: eliminated by enforcing physical timing constraints
0
0
0
Hazard Issues (cont.)
• Gate Orphan Example:
Gate orphan example
In NCL flow, gate orphan freedom not guaranteed: must be avoided during logic synthesis
a
0b
0a
1b
1z
1z
0Hazard Issues (cont.)
• Gate Orphan Example:
Gate orphan example
In NCL flow, gate orphan freedom not guaranteed: must be avoided during logic synthesis
a
0b
0a
1b
1z
1z
0 0 0Hazard Issues (cont.)
• Gate Orphan Example:
Gate orphan example
In NCL flow, gate orphan freedom not guaranteed: must be avoided during logic synthesis
a
0b
0a
1b
1z
1z
0 0 0 0 0Hazard Issues (cont.)
• Gate Orphan Example:
Gate orphan example
In NCL flow, gate orphan freedom not guaranteed: must be avoided during logic synthesis
a
0b
0a
1b
1z
1z
0 0 0 0 0gate orphan! = not observable
Outline
• Introduction
• Background
– Technology mapping
– Robust asynchronous threshold networks (NCL) – NCL CAD flow
– Hazard issues: Orphans
• Motivational Examples
– Robustness & asynchronous technology mapping
• Robust Technology Mapping and Cell Merger Algorithms
• Experimental Results
– Near-complete DES encryption circuit
Motivational Examples:
Challenges to Robust Technology Mapping
• Arbitrary decomposition can be dangerous (= non-robust).
a
b
c
a
c
b
3-input AND gate decomposition into
two 2-input AND gates
Motivational Examples:
Challenges to Robust Technology Mapping
• Arbitrary decomposition can be dangerous (= non-robust).
a
b
c
a
c
b
3-input AND gate decomposition into
two 2-input AND gates
CONCLUSION: must carefully restrict decomposition to ensure robustness
0
0 0
0
Motivational Examples :
Challenges to Robust Technology Mapping
• DAG (Directed Acyclic Graph)-based covering can be
dangerous.
subject graph = DAG mapped network
CONCLUSION: must avoid arbitrary DAG-covering for robustness
w z a b c d w z a b c d
Outline
• Introduction
• Background
– Technology mapping
– Robust asynchronous threshold networks (NCL) – NCL CAD flow
– Hazard issues: Orphans
• Motivational Examples
– Robustness & asynchronous technology mapping
• Robust Technology Mapping and Cell Merger Algorithms
• Experimental Results
– Near-complete DES encryption circuit
Cell Merger Algorithm
• The Cell Merger Problem
– Input: unoptimized robust netlist
– Output: optimized robust netlist (consists of library cells) – Solved as limited special case of technology mapping
• Only adjacent cells in the given unoptimized netlist can be merged
• Overview of the Proposed Algorithm
– No decomposition (NEW)
• Every cell function is a base function
• Original netlist is subject graph
– Partitioning (use existing techniques) – Pattern graph generation (NEW)
• New bottom-up approach proposed
Pattern Graph Generation
• Overview of Pattern Graph Generation
– Iteratively generates all pattern graphs in single bottom-up approach • Generates: two-cell merger, three-cell merger (up to four-cell mergers) • Includes all single-cell pattern graphs: may be useful for matching – Merges cell functions rather than cells themselves
• Single cell function may represent multiple cells
• Examples:
Two-cell merger: merging AND2 and OR2
Matching and Covering
• Matching and Covering
– Uses traditional synchronous approach (with dynamic programming) – Targets either area or delay minimization
• Delay Minimization
– Nonlinear delay model used • based on table-lookup
• uses load binning to handle load-dependent delays
• alternative approach: use load-independent delay model
matches to library cells
Technology Mapping Algorithm
• The Technology Mapping Problem
– Input: unoptimized robust netlist
– Output: optimized robust netlist (consists of library cells) – More general approach
• Allows more general mapping to library cells
• Overview of the Proposed Algorithm
– Gate-orphan-free decomposition (NEW)
• Decompose original netlist into gate-orphan-free netlist – Partitioning (use existing techniques)
– Pattern graph generation (NEW)
• Use new positive monotonic finite basis (= AND2/OR2)
– vs. synchronous approaches (= NAND2/INV)
• Must include some complex irreducible nodes – Matching and covering (use existing techniques)
Gate-Orphan-Free Decomposition
• Basic Idea
– Decompose each node using simple base functions
• new monotonic basis proposed = AND2/OR2
• safe as long as no gate orphans introduced – If no such guarantee:
• node not decomposed: the node function is registered as new complex base function (irreducible)
• Overview of Gate-Orphan-Free Decomposition
– AND2 and OR2 gates: not decomposed = primitive base functions – Large OR cells: safely decomposed into network of OR2 cells
– Large AND cells: not decomposed -- decomposition unsafe – Existing (Theseus) local cell merger optimizations undone
• through reverse lookup table
Pattern Graph Generation
• After decomposition and partitioning, subject graphs
consist of:
– simple base functions (AND2/OR2) – complex base functions (irreducible)
• Overview of Pattern Graph Generation:
– Library cell functions: decomposed using simple finite basis. – If library cell function = complex base function, then:
Matching and Covering
• Matching and Covering
– Uses traditional synchronous approach (with dynamic programming) – Targets either area or delay minimization
• Delay Minimization
– Nonlinear delay model used • based on table-lookup
• uses load binning to handle load-dependent delays
• alternative approach: use load-independent delay model
matches to library cells
Outline
• Introduction
• Background
– Technology mapping
– Robust asynchronous threshold networks (NCL) – NCL CAD flow
– Hazard issues: Orphans
• Motivational Examples
– Robustness & asynchronous technology mapping
• Robust Technology Mapping and Cell Merger Algorithms
• Experimental Results
– Near-complete DES encryption circuit
• Concluding Remarks
Experimental Results (cont.)
83.7% 103.2% 99.9% 99.9% Average improvement 57.5% 83.4% 80.5% 75.3% 3/2/4 des-r05 83.6% 102.1% 100.0% 99.9% 590/306/2196 des-r04 83.5% 104.1% 100.0% 99.9% 590/298/2186 des-r03 73.8% 100.0% 95.3% 95.7% 11/4/10 des-r02 87.4% 101.9% 100.2% 100.0% 352/64/1079 des-r01 Delay Area Delay Area #i/#o/#g Name Delay-optimized circuit Area-optimized circuitExperimental Results
79.1% 120.4% 100.2% 97.3% Average improvement 57.5% 83.3% 80.5% 75.0% 3/2/4 des-r05 80.0% 122.3% 100.3% 97.8% 590/306/2196 des-r04 79.7% 122.6% 100.3% 97.8% 590/298/2186 des-r03 71.0% 155.9% 95.3% 95.7% 11/4/10 des-r02 73.3% 110.7% 100.2% 95.5% 352/64/1079 des-r01 Delay Area Delay Area #i/#o/#g Name Delay-optimized circuit Area-optimized circuitExperimental Results (cont.)
• Delay Optimization
– Avg. improvement for worst-case output delay
• Considers single worst-case path for each individual output • Cell Merger: 16.3%
• Tech map: 20.9%
– Avg. improvement for worst-case circuit delay • Considers the single worst-case path in entire circuit • Cell Merger: 20.1%
• Tech map: 26.3%
• Area Optimization
– Cell Merger: 0.1% avg. improvement – Tech map: 2.7% avg. improvement
• Analysis of results
– Due to its local nature, Theseus optimizer left more room for improvement w.r.t. delay than w.r.t. area
Concluding Remarks
• Conclusions
– Robust technology mapping and cell merger algorithms for asynchronous threshold circuits proposed
– Fully-automated in software tool – Avg. improvements for tech map
• Delay: 20.9% • Area: 2.7%
– Worst-case circuit paths: delay improvements of 26.3%
– Our method to be included in DARPA CLASS async tool flow
• Future Work
– Careful robust extension of DAG-covering algorithms – Support hybrid cost functions (e.g. area and delay)
Asynchronous Circuits
clock
distributed control
(communicate locally via handshaking) centralized control
components operate at varying rates entire system operates at fixed rate
no global clock global clock
Asynchronous System Synchronous System
• Synchronous vs. Asynchronous Systems
handshaking interfaces
Asynchronous System Synchronous System
Asynchronous Circuits (cont.)
• Industrial Applications of Asynchronous Circuits
– Sun:
• UltraSPARC uses async FIFOs for memory interface – Theseus Logic:
• asynchronous Motorola MCORE processor – Philips:
• Feb. 2006 press release: asynchronous ARM processor commercially
available by ARM Ltd.
Basics of NCL Circuits (cont.)
• Asynchronous Threshold Gates with Hysteresis Property
– Threshold gates with sequential SET/RESET functionality
– SET function:
• Each input xi has weight wi • Gate has threshold value T
• If weighted sum
Σ
wi xi >= T, gate fires– RESET function:
Asynchronous Circuits
• Benefits of Asynchronous Circuits
– Robustness to process variation – Mitigates: timing closure problem – Low power consumption, low EMI – Modularity
• Challenges of Asynchronous Circuits
– Lack of CAD tools
– Robust design is required: hazard-freedom – Area overhead