Pipelined Architectures - Asynchronous Architectures

2.1 Asynchronous Architectures

2.1.1 Pipelined Architectures

Pipelining is a common technique used in both synchronous and asynchronous design to improve the throughput of a design. In pipelining, computation is fragmented into multiple portions that can be performed independently; each portion is given its own dedicated hardware for storage and computation. Because each stage in the pipeline has its own dedicated set of resources, it can operate on a different instance of a problem than its neighbor, much like an assembly line.

In this way, multiple instances of a problem are computed at once, improving performance as a whole, although the time associated with a specific problem instance may go up due to overheads in the pipelining process. Aggressive, fine-grained pipelining can result in very high performance circuits since many problems are being solved con- currently. However, pipelining comes at a cost of area, particularly increased storage to hold the data associated with each problem instance, as well as resource area associated with each dedicated function unit.

2.1.1.1 Pipeline Stages and Styles

In hardware, asynchronous pipelines consist of several pipeline stages that communicate via request-acknowledge handshaking signals (Figure 2.1). Typically, a stage initiates computation when it receives new data and a request from its left neighbor. Once data has been accepted (latched), the left neighbor is acknowledged. The stage may then perform operations on the data and forward the results along with a new request to its

Figure 2.1: Simple asynchronous pipeline

clock

handshaking interface

Figure 2.2: Synchronous vs. asynchronous communication

right neighbor. This behavior is unlike that of a synchronous approach (Figure 2.2), in which signals are received from a global clock to latch data.

Several techniques exist to transmit data between stages. Two-phase (Sutherland, 1989) and four-phase (Williams, 1991) protocols are used to signal arrival and accep- tance of data. In two-phase handshaking, a transition on a request line indicates new data is available, a transition on an acknowledge line indicates the new data has been latched. A four-phase handshake is level-based rather than transition-based. A legal four-phase scenario is as follows: a request goes high indicating new data, the acknowledge line goes high to indicate the data has been latched, the request then resets to zero, and soon after the acknowledge resets to zero. Other variants are possible depending on the meaning attached to each event. For example, the acknowledge resetting may indicate that data has been latched. The exact implementation is up to the designer.

Aside from the variety in handshake protocols, data can also be encoded in multiple different ways. Bundled data (Sutherland, 1989) is a common approach in which a single wire is dedicated to each bit, and the data itself is combined with a control signal with amatched delay that corresponds to the computation time of logic between the stages. Dual-rail encoding (Williams, 1991) is a different paradigm where two wires are associated with each bit of data; some combination of signals on the two wires indicate that computation has completed. This type of encoding is more robust to timing variation, but will incur an additional area penalty due to completion detection. Several different pipeline styles exist, from GasP (Sutherland and Fairbanks, 2001) to MOUSETRAP (Singh and Nowick, 2001) to Sutherland’s micro-pipelines (Suther- land, 1989) to high-capacity (Singh and Nowick, 2007) pipelines. The work presented in this thesis targets two-phase, bundled-data pipelines, but is certainly amenable to other styles as well.

2.1.1.2 Data-flow Pipelines

In this thesis, I will refer to data-flow pipelines as pipelines in which each individual piece of data travels through the architecture without any unnecessary synchronization. That is, each stage will consist solely of data belonging to one “variable”, and will travel along a channel until it synchronizes with another piece of data only as needed to perform a computation.

This type of pipeline can have a highly complex topology; rather than a flat, linear flow of data, there may be many paths forking and joining throughout the full pipeline. In order to achieve high performance with this type of pipeline,slack-matching is often needed in order to match the buffering on one path to that of another parallel path; this will be described in Section 2.1.3. Data-flow pipelines are the specific target of the synthesis approach proposed in Chapter 5.

2.1.1.3 Data-driven Pipelines

Data-driven pipelines are proposed in Chapter 3 as an alternative to data-flow pipelines. In these pipelines, slack-matching has been explicitly performed via construction; large blocks of data are synchronized at once and referred to as the “context” of an individual problem. As data accumulates, the size of each synchronized buffer increases; as data is consumed and is no longer needed in the pipeline, it is dropped from future synchronized buffers. This type of pipeline consists of large, linear blocks with minimal fork and join constructs, in contrast to data-flow pipelines that have lightweight buffers and complex topologies.

While data-flow pipelines are more efficient; data-driven pipelines are found in ex- amples such as common pipelined processors, which may have several stages (i.e., fetch, decode, execute). Data is not directly passed from source to destination, but typically goes through every stage even if a stage does not operate on the data.

In document A behavioral design flow for synthesis and optimization of asynchronous systems (Page 34-37)