• No results found

Demystifying Data-Driven and Pausible Clocking Schemes

N/A
N/A
Protected

Academic year: 2021

Share "Demystifying Data-Driven and Pausible Clocking Schemes"

Copied!
31
0
0

Loading.... (view fulltext now)

Full text

(1)

Demystifying Data-Driven and

Pausible Clocking Schemes

Robert Mullins

Computer Architecture Group

Computer Laboratory, University of Cambridge

ASYNC 2007, 13th IEEE International Symposium on

(2)

System-Timing: Emerging Challenges

• Current shift is from

complex monolithic designs to networks of energy

efficient cores

• Distinct block and system-level timing challenges

• Network-level timing

– Physically distributed – Activity may be sparse

– Interconnect delay and power are significant

– Significant variations in

temperature, supply voltage and process parameters

Higher-level control, timing and scheduling is naturally event-driven

(3)

Combining Local and Global

Approaches to Timing

• Synchronization free approaches

• Coping with metastability

– Timing-Safe

• Allocate a fixed period of time for metastability to resolve, e.g. two flip-flop synchronizer

– Value-Safe

• Wait for metastability to resolve, e.g. clock stretching or pausing techniques

• Clock is generated locally

• Value-safe ideas are less well understood,

avoided by industry

(4)

Advantages of a value-safe approach

• Efficiency

– Synchronization delay is minimized – Opportunities for optimization

• Robustness

– Inherently robust, no trade-off against performance. – Only way to guarantee data is never lost, no MTBF. Could still have functional failures if we are delayed too long – don’t hit performance requirements

• Transparency

– Synchronous block is unaffected by clocking wrapper. – Less true for traditional synchronization and

clock-gating approaches.

• Simplicity and modularity

(5)

Adding an asynchronous interface

to a clock generator

(6)

Adding an asynchronous interface

to a clock generator

C

Req

Ack

CLOCK

(7)

Adding an asynchronous interface

to a clock generator

C

(8)

Adding an asynchronous interface

to a clock generator

C

Req Grant MUTEX

CLOCK

(9)

Input register

driven by a

pausible clock

(10)

10 C C CLOCK Ack Req CLOCK Req Grant MUTEX

Data-Driven Clock Pausible Clock

- May need to add a mechanism to ensure block receives enough clock edges, e.g. to flush

pipeline

- Need to add an explicit sleep mechanism if we want to halt clock generator during periods of inactivity

Helps classify and understand existing techniques. In reality, the design space is a continuum

(11)

Stretchable Clocks

A type of data-driven clock

1. Rising clock edge is generated

2. Stretch signal may be asserted

(synchronously) in response to clk+

3. Low-phase of clock is stretched until

some operation has completed and

stretch signal is removed

(12)

Stretchable Clocks

C

Req

Ack

(13)

Stretchable Clocks

C

Ack

Req

(14)

Stretchable Clocks

C

Ack

Req

CLOCK

Stretch

(15)

Stretchable Clocks

C

Ack

Req

CLOCK

(16)

Stretchable Clocks

C

Ack

Req

CLOCK

Stretch

(17)

Input Ports

• Arbitrated Inputs

– At most one input can be served per cycle

• Synchronised Inputs

– Cannot proceed until multiple inputs are ready

• Sampled Inputs

– Can progress with a variable number of data inputs (or none)

• Need to also choose event to trigger sampling of inputs

• Paper provides implementation details for each

input port type for pausible and data-driven clock

generators

(18)

Output Ports

• Scheduled

– Ensure data is output on a particular clock cycle, stall until data is consumed

• Registered

– Addition of an output register allows next computation to proceed while data is consumed

• Polled

– Sample output port ready signal and take appropriate action. Clock period is only ever extended to allow metastability to resolve, not because output is

(19)

A GALS Wrapper Example

• Free running clock • Asynchronous input

– we know nothing about when data will arrive

– For simplicity, lets assume we can always accept new data

• Registered output feeding asynchronous FIFO

(20)

A GALS Wrapper Example: Step 1.

Local clock generator with

(21)

A GALS Wrapper Example: Step 2.

Pausible Clock Template

(22)

A GALS Wrapper Example: Step 3.

Provide registered output port support (stretchable clock template)

(23)
(24)

Data-Driven Clocking for On-Chip Networks

• Why is global synchrony limiting for on-chip

networks?

– Reconfigurable networks, adaptive low-voltage interconnect drivers, irregular topologies, ….

• Problem with traditional synchronization

techniques

– Latency (could easily double best-case latency, our routers are single-cycle – support VCs < 30FO4)

• Problems with fully-asynchronous

implementations

– Latency (for the router designs we have examined) – More difficult to speculate? Scheduling is expensive?

(25)

Data-Driven Clocking for On-Chip Routers

• Router should be clocked when one or

more inputs are valid (or flits are buffered)

• Elevator analogy…

– Free running (paternoster) elevator

• Chain of open compartments

• Must synchronise before you jump on!

– Traditional elevator (data-driven clock)

• Wait for someone to arrive

• Close doors, decide who is in and who is out • Metastability issue again (potentially painful!)

(26)

Data-Driven Clock with Sampled Inputs

Local Clock Generator Template Sample inputs when at least one input is ready (and clock is low) Assert Lock Either admitted or locked out

(Close Lift Doors)

(27)

Clock Tree Insertion Delays

• Delay from root to leaf of clock tree can be

considerable (certainly non-zero!)

• If every clock cycle is the same, this clock

insertion delay is not normally an issue

• If we stretch the clock the insertion delay must

be considered in our timing analysis (also true

for clock gating in synchronous world)

• Not difficult to handle, but can increase time

required to admit new data

(28)

Clock Tree Insertion Delays

Can place logic here

(29)

Clock Tree Insertion Delays

• How do we handle multi-cycle insertion delays?

• In practice, we would want to avoid very large synchronous blocks

• Need to ensure we admit data on the correct clock cycle

• Cannot cheat and promote data!

We simply remember on which clock cycle data has been scheduled to be

(30)

Summary

• Value-safe techniques are simple and robust

– Powerful framework for composing synchronous sub-systems

– Build efficient event-driven global communication and scheduling infrastructure?

– Scope for supporting low-power techniques? (self-timed power-gating, DVFS support,

timing-speculation…)

• Scope for exploiting event-driven scheduling and

clocking at system-level.

• Synchronization costs are low enough to prompt

use in on-chip network applications

• More in the paper, aims to be a useful survey

and hopefully fills some gaps too.

(31)

References

Related documents

D MDA-MB- 468 cells that were either untransfected or expressing GFP, GFP-Bax, GFP-Bax S184A, or GFP-Bax S184E were treated with ABT-737 (100 nM, 48 h), and apoptosis was evaluated

All these little tokens and rituals Pressfield mentions remind him of the importance of work; many of these toys call to mind powerful stories or important people in

Edward de Bono coined the term “lateral thinking” in 1970 to suggest a way of problem solving that involved challenging the existing paradigms. 186 While the notions

In one significant case, the FDA approved several biosimilar versions from different manufacturers that would in principle compete with the biologic drug

No valid input clocks selected Lock Acquisition (Fast Lock) Locked Mode Holdover Mode Phase lock on selected input clock is achieved An input is qualified and available for

Typically, one branch of the clock distribution is included in the feedback loop of the clock generator [15] to ensure that the output of the clock distribution (clock in local

Lim describes the use of a stoppable clock generator [20] where a single input to the clock generator is again used to delay the generation of the next rising clock edge until data

When both the station clock and an external clock are used, the unit’s transmit clock for transmission over the STS-3/OC-3/STM-1 uplink is locked to the station clock input