IMPROVING THE PERFORMANCE LEVEL OF THE PACKET BASED COMMUNICATION OF NETWORK BY USING 16*16 MULTI CHANNEL ROUTERS

(1)

Volume 5, Issue 6, June 2016

IMPROVING THE PERFORMANCE LEVEL OF THE PACKET BASED COMMUNICATION OF NETWORK BY

**USING 16*16 MULTI CHANNEL ROUTERS**

C.Suganya¹,A.Bashilabanu²

1Assistant Professor,Department of ECE, Bharath Niketan Engineering College, India

2PG Student, Department of ECE, Bharath Niketan Engineering College, India

Abstract

As the feature size is continuously decreasing and integration density is increasing, interconnections have become a dominating factor in determining the overall quality of a chip. Due to the limited scalability of system bus, it cannot meet the requirement of current System-on-Chip (SoC) implementations where only a limited number of functional units can be supported. Network-on-Chip (NoC) architectures have been proposed to be an alternative to solve the above problems by using a packet-based communication network.

Buffers are one of the major resource used by the routers in virtual channel flow control. The main contributions of this work aims to find the best solution for dynamic buffer architecture. This buffer structure should avoid HoL blocking problem, increase buffer utilization compared to previous design, decrease arbitration time. Also exploring and analyzing shared-queue router architectures that maximize buffer utilization for boosting network throughput and proposing a router architecture which allows input packets to bypass shared queues for reducing zero-load and packet latency.

Index Terms:multichannel routers, NoC, Storage Unit, Shared Buffer ,IC,SoC,SB,QoS.

I.INTRODUCTION

Intricate designs such as billion transistors, one million gates, thousands of circuits on a single IC chip pose innumerable challenges to IC designers. The most successful IC designers overcome all such challenges to provide the correct functional and reliable operation of the ICs. One of the big attractions of integrated electronics is the reduced cost. The cost advantage continues to increase with the evolution of technology toward the production of larger and larger circuit functions on a single semiconductor substrate. Gordon E Moore proposed the cramming of more components onto integrated circuits in 1965, well known as Moore’s law (the number of transistors on a single chip doubles every 18 to 24 months). This trend, is projected to hold true through 2010 and beyond as per international Technology Roadmap on Semiconductors.It is to keep pace with such intricate levels of integration that the design engineers have come up with a new design methodology

called System-on-Chip (SoC).SoC is an integrated circuit that implements all functions of a complete electronic system. SoC consists of cores and an interconnection architecture connecting these cores. NoC is used for the interconnection architecture which provides a high level parallelism when compared to point-to-point interconnect wires or shared buses. As the technology scales down, the gate delay decreases, but the wire delay increases relatively and this global wire delay becomes the main factor which can decide the overall performance. Difficult timing closure becomes the main problem among many design issues which is caused by long global wire delay. Many VLSI designers are trying to solve this long global wire delay problem through buffer insertion. In addition, many current System on Chips (SoCs) use a system bus to connect several functional units. The slave unit would comply with this system bus protocol in order to be synchronized with the master unit. However, these SoC system buses can support only limited number of functional units, and thus will face scaling problems in heterogeneous MPSoCs (Multi Processor System-on-Chips) or large scale CMPs (Chip- Multi Processors). Even though a multiple bus structure with bridge and bus matrix structure could be the alternative plans, these solutions still do not scale well and have the disadvantages of high power consumption. In order to solve these long global wire delay and scalability issues, many studies suggested the use of a packet based communication network which is known as Network-on-Chip (NoC).

Difficult timing closure becomes the main problem among many design issues which is caused by long global wire delay. Many VLSI designers are trying to solve this long global wire delay problem through buffer insertion.

. A.Routers

In NoC, a router sends packets from a source to a destination router through several intermediate nodes. If the head of packet is blocked during data transmission, the router cannot transfer the packet any more. In order to remove the blocking problem, there are various methods of routing technique is followed. Routers route the data

(2)

Volume 5, Issue 6, June 2016 according to a chosen protocol. Their job is to deliver

messages from their source to their designated destination.

The packet is routed through networks depending on a routing strategy. The routing algorithms could be one of the following two strategies. Deterministic routing such as XY routing is when the routes between given pairs of nodes are pre-programmed and thus follow the same path between two nodes. Adaptive routing is when the path taken by a packet may depend on other packets, and each router should know network traffic status in order to avoid a congested region in advance.

B.Switching Technique

Switching mechanisms determine how network resources are allocated for data transmission when the input channel is connected to the output channel selected by the routing algorithm. There are typically four popular switching techniques: store and forward,virtual cut-through, wormhole switching, and circuit switching. The first three techniques are categorized into a packet-switching method.

In a store-and-forward switching method, the entire packet has to be stored in the buffer when a packet arrives at an intermediate router.

Fig.B. Store And Forward Switching

After a packet arrives, the packet can be forwarded to a neighboring node which has available buffering space, available to store the entire packet. This switching technique requires a lot of buffering space more than the size of the largest packet. It should increase the on-chip area. In addition to the area, it could cause large latency because a certain packet cannot traverse to the next node until its whole packet is stored.

Fig.B.Virtual Cut-Through Switching Technique C.Routing Protocol

Routing protocol is a protocol that specifies how routers communicate with each other to diffuse information that allows them to select routes between any two nodes on a network. In general, routing protocol can be either deterministic or adaptive. Deterministic routing, such as XY routing, is when the routes between given pairs of nodes are

pre-programmed and thus follow the same path between two nodes. This routing protocol can cause a congested region in the network and poor utilization of the network capacity. On the other hand, adaptive routing is when the path taken by a packet may depend on other packets in order to improve performance and fault tolerance. In adaptive router, each router should know the network traffic status in order to avoid a congested region in advance. In addition, modules which need heavy intercommunication should be placed close to each other to minimize congestion. Adaptive routing can support higher performance than the deterministic routing method with deadlock-free network.

However, higher performance requires a higher number of virtual channels. A higher number of virtual channels can cause long latency because of design complexity. Therefore, if network traffic is not heavy and the in-order packet is delivered, the deterministic routing could be selected D.Flow Mechanism

A message is a contiguous group of bitsthat are delivered from a source node to a destination node. A packet is the basic unit of routing and the packet is divided into flits. A flit (flow control digit) is the basic unit of bandwidth and storage allocation. Therefore, flits do not contain any routing or sequence information and have to follow the route for the whole packet. A packet is composed of a head flit, body flits (data flits), and a tail flit. A head flit allocates channel state for a packet, and a tail flit de-allocates it. The typical value of flits is between 16 bits to 512 bits. A phit (physical transfer digit) is the unit that can be transferred across a channel in a single clock cycle. The typical value of phit ranges within 1 bit to 64 bits.

Fig.D. Unit Of Resource Allocation

Flow control can be examined with the same method as the switching technique. A role of flow control mechanism is to decide which data is serviced first when a physical channel has many data to be transferred. In a store- and-forward and a virtual cut-through switching method, flow control is performed at packet level, which means that an entire packet is stored in buffers and forwarded to a neighboring router.

II.VIRTUAL CHANNEL ROUTER

This deadlock problem in WH router can be solved by a Virtual Channel router. The concept of virtual channels is introduced to present deadlock-free routing in wormhole

(3)

Volume 5, Issue 6, June 2016 switching networks. This method can split one physical

channel into several virtual channels.

A.Virtual Channels

For real-time streaming data, circuit switching supports a reserved, point-to point connection between a source node and a target node. Circuit switching has two phases: circuit establishment and message transmission.

Before message transmission, a physical path from the source to the destination is reserved.

Fig.A. Concept Of Virtual Channels

A header flit arrives at the destination node, and then an acknowledgement (ACK) flit is sent back to the source node. As soon as the source node receives the ACK signal, the source node transmits an entire message at the full bandwidth of the path. The circuit is released by the destination node or by a tail flit. Even though circuit switching has the overhead of circuit connection and release phase, if a data stream is very large to amortize the overhead, circuit switching will be used continuously. Since most Network-on-Chip systems need less buffering space and has a low latency requirement, the wormhole switching method with a virtual channel is the most suitable switching method.

B.VC Router Architecture

In this VC router design, an input buffer has multiple queues in parallel, each queue is called a VC, that allows packets from different queues to bypass each other to advance to the crossbar stage instead of being blocked by a packet at the head of the queue (however all queues at one input port can be still blocked if all of them do not win SA or if all corresponding output VC queues are full).Because now an input port has multiple VC queues, each packet has to choose a VC of its next router’s input port before arbitrating for output switch.

Fig.B. Five Stage Virtual Channel Router

Virtual channels share the same physical channel, but these virtual channels are logically separated with different input and output buffers.

C.Buffering in packet switch

In crossbar switch architecture, buffering is necessary to store packet because the packets which arrive at nodes are unscheduled and should be multiplexed by control information. Three buffering cases happen in a NoC router.

The first buffering condition is the output port can receive only one packet at a time when two packets arrive at the same output port at the same time. The second buffering condition is that the next stage of network is blocked and the packet in the previous stage cannot be routed into next router. And finally, a packet has to wait for arbitration time to get route path in a current router, the current router must store this packet in buffer. Therefore, the place of buffer space can be located in three parts: The Output Queue, The Input Queue, and the Central Shared Queue.Output Queue:

In buffer architecture, output queues can be used if output buffers are large enough to accept all input packets, and switch fabric runs at least N times faster than the speed of the input lines in an N by N switch. However, since high speed switch fabricis currently not available and output queues should have as many input ports as an input line can support, output queue buffer architecture should make logic delay large.Input Queue:Input buffers require only one input port in a packet switch because only one packet can arrive at a time. Therefore, it can speed up performance with many input ports.That is why many researchers use input queue buffer architecture. But, the input queue buffer architecture has the Head-of-Line (HoL) blocking problem. HoL can happen while a packet in the head of queue waits for getting output port, another packet behind it cannot proceed to go to idle output port. HoL blocking significantly reduces throughput in NoC.

Fig.C. Head of Line Blocking

Shared Central Queue: All the input ports and output ports can access shared central buffer. For example,if the number of input ports is N and the number of output ports is N, central buffer has minimum 2N ports for all input and output ports. As N increases, access time to memory also increases which brings performance down. This large access time should occur whenever packet transmission happens. In addition to implementation difficulties, shared central buffer also causes down performance because of large access time.

(4)

D.Challenges in router design

When an output channel of an upstream router does not sending packets, the input port of its downstream router is also idle while other input ports may be busy. This situation frequently happens for nonrandom deterministic traffic patterns after we mapped multitask applications onto a many-core platform. Under these traffic patterns, at run time, not all input ports of the routers have packets for processing. At many routers in the platform, a few their input ports receive packets all the time while others are often empty. Clearly, we wish at this situation, idle queues would share their storage capacity with busy queues of other input ports. This workload sharing would allow more packets to advance rather than being stalled at upstream routers hence, should improve the network throughput.

III.MULTI CHANNEL ROUTER

In contrast to typical macro networks which represent general platforms for a large spectrum of applications, most NoCs are developed for one small set of applications.

Consequently, the designer has a good understanding of the traffic characteristics and can avoid congestion by wisely mapping the IPs and allocating the routing paths.A desirable feature of a routing algorithm is its freedom from deadlock and livelock. All deterministic routing algorithms are livelock-free. Freedom from deadlock is especially critical for NoCs. Indeed, implementing a mechanism which automatically detects and recovers from deadlock may not be affordable in terms of silicon resources; it also may lead to unpredictable delays. Since the traffic characteristics vary significantly across different applications, it is necessary to reallocate the routing paths when the NoC platform is used for different applications.

A.Structured Wiring

NoCs have been improving on-chip wiring in two distinct ways.First, the packetization paradigm enables easily the implementation of communication serialization. A typical on-chip bus requires around 100 to 200 wires: 32 or 64 bits of write data, 32 or 64 bits of read data, 32 bits of address, plus control signals. On the other hand, a NoC sends packets, and can do so by splitting them over multiple cycles in flits. Therefore, it does not, in principle, have constraints over how many wires need to be deployed in parallel. By deploying highly serialized links, routing can be simplified, while area and crosstalk can be minimized. In practice, a lower bound is set by performance needs.

Published NoC research shows that some implementations have gone for a fixed flit width and packet structure, whereby the number of wires is much more manageable than in buses, e.g. 32; some others even allow for complete flexibility, letting designers choose their favorite performance/ wiring tradeoff. A second contribution of NoCs to a tidier wiring implementation is through wire segmentation. As NoC wires are laid point-to-point, as opposed to being multipoint nets in buses, it is possible to optimize NoC topologies to constrain maximum wire lengths.

Fig.A. Multi Channel Router Structure

This is done either by choosing highly regular topologies, e.g. meshes,or by suitable NoC topology synthesis. Furthermore, as shown in Section 3, links can be explicitly segmented to further break critical paths. This is simpler on a NoC than on a bus, where most specifications, implicitly assume single cycle communication among masters and slaves. Multi channel Router design has been designed with 4 blocks (rows and columns) each blocks have a different core cells all the 15 core cells of signals are transmitted from the core0 cell.

B.Routability

Bus-based architectures have been extended with components su ch as crossbars, as e.g. in Multilayer AHB [33], whereby fully connected data lanes allow for parallel communication among a plurality of masters and slaves.

Crossbars are successful at providing non-blocking access and minimizing arbitration delays. Unfortunately, if the inputs and outputs of the crossbars are 100- to 200-wires wide as in buses, crossbars may exhibit serious physical wire routability issues.

Fig.B. Routability Structure Multi Channel Router Due to this, commercial tools often constrain the maximum crossbar size to 8x8 or less. NoCs permit wire serialization, largely obviating the issue. Figure 2, based on , shows that NoC switches of radix 10x10 can be efficiently designed, and even much larger switches are still feasible, though at an area and frequency cost. Alternatively, smaller NoC routers can be chosen, completely solving routability concerns.

C.Synchronization Schemes

To tackle the increasing challenges of global clock distribution in large chips, including the power cost and variability concerns, a variety of Globally Asynchronous

C or e 8 C

or e 9 Co

re 11

Co re 10

C or e 4 C

or e 5 C

or e 7

C or e 6

C or e 0 C

or e 1 C

or e 3

C or e 2

Co re 12 Co

re 13 Co

re 15

Co

re

14

(5)

Volume 5, Issue 6, June 2016 Locally Synchronous (GALS) chip design paradigms have

been proposed. NoCs offer a natural backbone for the implementation of such approaches. This is because packet- switching networks (i) are distributed, (ii) natively provide ways to tackle heterogeneity, including in timing, and (iii) natively decouple transaction injection and transaction transport times. Among others, fully asynchronous communication and pausible clocking have been proposed and demonstrated. By incorporating all necessary timing adaptation features natively in the on-chip communication framework, designs can converge more quickly and easily, strengthening the “plug &play” view of system composability.

D.Chip Level Router

A multicore system in which processors communicate together through a 2-D mesh network of routers is shown in below figure. Each router has five ports that connect to four neighboring routers and its local processor. A Network Interface (NI) locates between a processor and its router for transforming processor messages into packets to be transferred on the network and vice versa.

Fig.D. Chip Multiprocessors Interconnected By Network of Mc Router

Generally, a NoC router has five input and output ports, each of which is for local Processing Element (PE) and four directions: North, South, West, and East. Each router also has five components: Routing Computation (RC) Unit, Virtual Channel Allocator (VA), Switch Allocator (SA), Flit Buffers (BUF), and Crossbar as we cansee in above Figure 3.8. When the header flit arrives at the internal flit buffer, the RC unit sends incoming flits to one of physical channels.

A Network Interface (NI) locates between a processor and its router for transforming processor messages into packets to be transferred on the network and vice versa. The Virtual Channel Allocation unit receives the credit information from the neighboring routers, arbitrates all the header flits which access the same VCs, and then select one of them according to the arbitration policy. Therefore, this header flit can set up the path where the following data and tail flits can traverse this route successfully. In a typical router, each input port has an input buffer for temporarily storing the

packets in case that output channel is busy. This buffer can be a single queue as in a Worm Hole (WH)router or multiple queues in parallel as in Virtual Channel (VC) routers. These buffers, in fact, consume significant portions of area and power that can be more than 60% of the whole router.

Bufferless routers remove buffers from the router hence save much area. However, their performance becomes poor in case packet injection rates are high. Because of having no buffers, previous router designs proposed to drop and retransmit packets or to deflect them once network contention occurs that can consume even higher energy per packet than a router with buffers.The transmitting router sends the control information to the receiving router, and receiving router may update VC ID at the internal buffer with this control information.In addition, the proposed router architecture has simple control circuitry making it dissipate less packet energy than VC routers and achieving higher throughput by letting queues share workloads when the network load becomes heavy Switch Allocation (SA) unit arbitrates the waiting flit in all VCs accessing the crossbar and allow only one flit to get crossbar permission.

The SA operation is based on the VA stage since the flit data in the buffer comes from the previous router in the route.

IV.EXISTING WORK

Router architecture has been designed and developed for WH,VC routers,In this schems single and multi queue methods are used for storing and packet transformation.

mainly works for the performance of the communication.

dead lock is avoided by using 2x2 routers and time wastage is avoided by using 4x4 routers.

DRAWBACKS:

Blocking problems are held during transformation conjestion are faced while buffer utilisation.Therefore the performance is poor for transformation.

V.PROPOSED WORK:

To overcome the limitation of the existing methods multiple channels are introduced by using 16x16 routers .To enrich the performance level better than old schemes.

VI.REAL TIME APPLICATION

NoC are a key enabling technology for the provision of many additional services ranging from different Quality of Service(QoS) levels to fault-tolerance. Apart from global communications, the other major challenge facing designers now is high power dissipation. Power dissipation issues have grown to such importance that they now directly constrain attainable performance. Additionally, technology trends suggest that with further technology scaling communication power will demand an increasing proportion of the already limited system power budgets. For NoCs, it is now therefore important to understand any performance benefits they can deliver in the context of the power costs they demand. Networks enable the use of fault-tolerant wiring and protocols The network handles both pre- scheduled and dynamic traffic. .

(6)

Volume 5, Issue 6, June 2016 A. Example On-Chip Interconnection Network

With Routers

To give a flavor for on-chip interconnection networks this section sketches the design of a simple network.

Consider a 12mm x 12mm chip in 0.1mm CMOS technology with a 0.5mm minimum wire pitch. As shown in below Figure 4.1, we divide this chip into 16 3mm x 3mm tiles. A system is composed by placing client logic (e.g.,processors, DSPs, peripheral controllers, memory subsystems, etc.) into the tiles. The client logic blocks communicate with one another only over the network. There are no top-level connections other than the network wires.

Fig.A. Partitioning The Die Into Network Logic The network logic occupies a small amount of area between the tiles and consumes a portion of the top two metal layers for network interconnect. This baseline network uses a 2-dimensionalfolded torus topology with the nodes 0- 3 in each row cyclically connected in the order 0,2,3,1. I/O pads may connect directly to adjacent tiles or may be addressed as special clients of the network.

VII.RESULTS AND DISUSSIONS A. VIRTUAL CHANNEL PARALLEL QUEUE

Fig.A.Vc Router Parallel Queue Data Transmission Virtual Channel Router Architecture has been designed in such a way that the queue structure is designed parallel. Once the enable signal is initiated when one buffer is busy with holding data, the parallel buffer which will be

free will take the data and once the cross bar is free enough to initiate the transmission the router initiates the data transmission .

B. Core 0 Flit Signal Transmission Vc Router

Fig.B. Core0 Flit Signal Transmission For A Vc Router In a Virtual channel router, Buffer block is designed and developed considering clock, rd_ack, rst and enable signals as control signals, data input signals s_in and output control signals as valid signals and data output signals as s_out which is an 8 bit signal. Write operation will be performed, when en signal goes high. When en signal is high, the s_in signal transfers the data into s out output signal. Once all the signal gets transferred, valid signal goes high.

C.Core 0 Credit Signal Generation Vc Router

Fig.C. Core0 Credit Signal Transmission For Vc Router Virtual Channel Router design has been designed with 4 core blocks being core0, core1, core2 and core3. The information gets transmitted from core0 to core1, core2 and core3. The Credit signal generation for VC router has been shown. Once the enable signal is initiated, the input data will be transmitted which in turn triggers the credit signal generation. The credit signal generated results for core0 credit signal generation for VC router for design has been shown in the above simulated results.

(7)

Volume 5, Issue 6, June 2016 D.Core0 To Core3 Credit Transmission Vc Router

Fig. D. Core0 To Core3 Signal Gen Transmission Vc Router

Virtual Channel Router design has been designed with 4 core blocks being core0, core1, core2 and core3. The information gets transmitted from core0 to core1, core2 and core3. The Credit signal generation for VC router has been shown. Once the enable signal is initiated, the input data will be transmitted which in turn triggers the credit signal generation. The credit signal generated results for core0 to core3 for virtual router design has been shown in the above simulated results.

E.Core0 To Core1 Credit Transmission Multichannel Router

Fig E. Core0 To Core1 Signal Gen Transmission Multi Channel Router

Virtual Channel Router design has been designed with 4 core blocks being core0, core1, core2 and core3. The information gets transmitted from core0 to core1. The Credit signal generation for VC router has been shown. Once the enable signal is initiated, the input data will be transmitted which in turn triggers the credit signal generation. The credit signal generated results for core0 to core1 for virtual router design has been shown in the above simulated results.

F.Core0 To Core2 Credit Transmission Multi Channel Router

Fig F. Core0 To Core2 Signal Gen Transmission Multi Channel Router

The information gets transmitted from core0 to core2. The Credit signal generation for MC router has been shown. Once the enable signal is initiated, the input data will be transmitted which in turn triggers the credit signal generation. The credit signal generated results for core0 to core2 for virtual router design has been shown in the above simulated results.

G.Core0 To Core4 Credit Transmission Multi Channel Router

Fig.G. Core0 To Core4 Signal Gen Transmission Multi Channel Router

(8)

Volume 5, Issue 6, June 2016 F.Core0 To Core5 Credit Transmission Multi Channel

Router

Fig F. Core0 To Core5 Signal Gen Transmission Multi Channel Router

G. Core0 To Core6 Credit Transmission Multi Channel Router

Fig G. Core0 To Core6 Signal Gen Transmission Multi Channel Router

H.Core0 To Core10 Credit Transmission Multi Channelrouter

Fig H. Core0 To Core10 Signal Gen Transmission Multi Channel Router

I.Core0 To Core14 Credit Transmission Multi Channel Router

Fig I. Core0 To Core14 Signal Gen Transmission Multi Channel Router

(9)

J.Core0 To Core15 Credit Transmission Multi Channel Router

Fig J. Core0 To Core15 Signal Gen Transmission Multi Channel Router

I.Core0 To All 15 One By One Credit Transmission Multi Channel Router

Fig I. Core0 To All 15 One By One Signal Gen Transmission Multi Channel Router

The information gets transmitted from core0 to all core15 transmitted one by one. The Credit signal generation for MC router has been shown. Once the enable signal is initiated, the input data will be transmitted which in turn triggers the credit signal generation. The credit signal generated results for core0 to all core15 one by one signal transmitted for virtual router design has been shown in the above simulated results.

TABLE A. LATENCY RESULTS COMPARISON

TRAFFIC PATTERN

VC ROUTER Latency (Cycles)

MC ROUTER Latency(Cycles)

Buffer Write 12 8

Core0 to Core1 28 22

Core0 local to router

15 12

Core0 to core3 38 32

Credit Path 45 34

From the above table 6.1, we can observe that latency performance is compared between VC router and MC router whereas MC router achieves very less latency when compared to WH router. The clock cycle computation for the data valid signal from the clock cycle where the enable signal gets triggered is considered as the latency cycles which controls the speed of response for the particular router. So we can conclude that MULTI channel router is benefited compared to VC router.

VIII.CONCLUSION

By implementing 16x16 router architecture to perform maximum buffer utilisation ,higher speed in path identification to make high efficiency in time to utilizes high throughput also increasing the heavy load usage better than 4x4 routers in rare cases HOL blocking problem can be avoided which reduces zero load with packet latency , therefore traffic can be avoided. These implementation have been made by using Icarus software.

IX.FUTURE ENHANCEMENT

For more storing and packet transformation, we can move for 25x25 router for better performance.

REFERENCES

1. A. Banerjee, P. T. Wolkotte, R. D. Mullins, S. W.

Moore, and G. J. M. Smit, “An energy and performance exploration of network-on chip architectures,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 17, no. 3, pp. 319–329, Mar. 2009.

2. A. Kumar, L.-S. Peh, P. Kundu, and N. K. Jha,

“Towards ideal on-chip communication using express virtual channels,” IEEE Micro, vol. 28,no.

1, pp. 80–90, Jan. 2008.

3. A. Prakash, “Randomized parallel schedulers for switch-memory-switch routers: Analysis and numerical studies,” in Proc. IEEE INFOCOM, vol.

3. Mar. 2004, pp. 2026–2037.

4. D. Bertozzi, A. Jalabert, S. Murali, R. Tamhankar, S. Stergiou, L. Benini, and G. De Micheli, “NoC synthesis flow for customized domain specific multiprocessor systems-on-chip,” IEEE Trans.

Parallel Distrib. Syst., vol. 16, no. 2, pp. 113–129, Feb. 2005.