• No results found

A.2 Communication between IP components

A.2.6 Constraint

In the current NoC, out-of-order communication is not supported in communica- tion between a M_S/Sys and a S_S/Sys. To avoid this, the system designer has to make sure that a M_S/Sys is not allowed to send another request packet to the same S_S/Sys, if the corresponding response packet to a previous request packet of the same type has not been received yet. However, this constraint does not apply to the communication between two M_S/Sys if both are able to handle out-of-order com- munication.

Table A.3: Transmission of a PCK_R

Step Direction Description

1 M_S/Sys →NI M_S/Sys writes packet routing header, operation code, remote read address and data size to register NI_IN. NI creates a PCK_R and sends it.

- NoC Packet is transferred over NoC and reaches desti- nation node.

2 NI →S_S/Sys NI notifies S_S/Sys via interrupt.

3 S_S/Sys←NI S_S/Sys reads the first flit(s) for getting packet routing header, operation code, read address and data size from NI_OUT. The following information is extracted from the packet header: payload size,

priority, src/dst node.

4 S_S/Sys S_S/Sys stores src/dst, payload size, priority informa- tion for creating PCK_RR later.

Table A.4: Transmission of a PCK_RR

Step Direction Description

1 S_S/Sys→NI S_S/Sys writes packet routing header, operation code and the first data bytes to register NI_IN. The header information is obtained from the cor- responding PCK_R. NI obtains the knowledge of data payload size.

2 S_S/Sys→NI S_S/Sys writes data repeatedly to register NI_IN. In parallel, NI creates and sends the packet.

- NoC Packet is transferred over NoC and reaches desti- nation node.

3 NI →M_S/Sys NI notifies M_S/Sys via interrupt.

4 M_S/Sys ←NI M_S/Sys reads the first flit(s) for getting packet routing header, operation code and the first data bytes from NI_OUT. The following information is extracted from the packet header: payload size,

src/dst node.

Table A.5: Transmission of a PCK_W

Step Direction Description

1 M_S/Sys →NI M_S/Sys writes packet routing header, operation code, remote write address and the first data bytes to register NI_IN. NI obtains the knowledge of data payload size.

2 M_S/Sys →NI M_S/Sys writes the remaining payload data to NI_IN. In parallel, NI creates and sends the packet. - NoC Packet is transferred over NoC and reaches desti-

nation node.

3 NI →S_S/Sys NI notifies S_S/Sys via interrupt.

4 S_S/Sys←NI S_S/Sys reads the first flit(s) for getting packet routing header, operation code, write address and the first data bytes from NI_OUT. The following in- formation is extracted from the packet header: pay-

load size, priority, src/dst node.

5 S_S/Sys←NI S_S/Sys reads the remaining payload data from NI_OUT and writes the data to target address. 6 S_S/Sys S_S/Sys stores src/dst, priority information for cre-

ating PCK_WR later.

Table A.6: Transmission of a PCK_WR

Step Direction Description

1 S_S/Sys→NI S_S/Sys writes packet routing header and the oper- ation code to register NI_IN. The header informa- tion is obtained from the corresponding PCK_W. NI creates a PCK_WR and sends it.

- NoC Packet is transferred over NoC and reaches desti- nation node.

2 NI →M_S/Sys NI notifies M_S/Sys via interrupt.

3 M_S/Sys ←NI M_S/Sys reads flit(s) for getting packet routing header and operation code from NI_OUT. The fol- lowing information is extracted from the packet header: src/dst node.

Table A.7: Transmission of a PCK_S

Step Direction Description

1 M_S/Sys1→NI M_S/Sys1 writes packet routing header, operation code and synchronization value to register NI_IN. - NoC Packet is transferred over NoC and reaches desti-

nation node.

2 NI→M_S/Sys2 NI notifies M_S/Sys2 via interrupt.

3 M_S/Sys2←NI M_S/Sys2 reads flit(s) for getting packet routing header, operation code and synchronization value from NI_OUT. The following information is ex- tracted from the packet header: payload size, src/dst

node.

4 M_S/Sys2 M_S/Sys2 might store src/dst, synchronization value information for creating another PCK_S as re- sponse later.

Acronyms

ACE AXI Coherency Extension ACO Ant Colony Optimization

ADL Architecture Description Language AHB Advanced High-Performance Bus

AMBA Advanced Microcontroller Bus Architecture API Application Programming Interface

ASIC Application-Specific Integrated Circuit

ASIP Application-Specific Instruction-set Processor

AT Area-Time product

ATE Area-Time-Energy product AXI Advanced Extensible Interface BER Bit Error Rate

CCMU Cache Coherence Management Unit

CGRA Coarse-Grained Reconfigurable Architecture CKF Compiler-Known Function

CP Constraints Programming

CP Cyclic Prefix

CSDF Cyclo-static Synchronous Data Flow

DFG Data Flow Graph

DMA Direct Memory Access DSP Digital Signal Processor DT Data flit of a NoC packet

DVFS Dynamic Voltage-Frequency Scaling

DW Data Width

FCFS First-Come First-Served FDD Frequency Division Duplex FFT Fast Fourier Transformation FIFO First In, First Out

flit flow control unit

FPGA Field-Programmable Gate Array

fps Frame per Second

FSM Finite State Machine

GA Genetic Algorithm

GE equivalent gate count in units of two-input drive-one NAND gate GPP General Purpose Processor

GPU Graphical Processing Unit

H-ARQ Hybrid Automatic Repeat reQuest HD Header flit of a NoC packet

HDT Header-Data-Tail flit of a NoC packet HOSK Hardware Operating System Kernel HW-RTOS HardWare Real Time Operating System IDCT inverse discrete cosine transform

ILP Integer Linear Programming IQT Inverse Quantization

ISS Institute for Integrated Signal Processing Systems at the RWTH Aachen University

ISS Instruction-Set Simulator

ITRS International Technology Roadmap for Semiconductors LDPC Low-Density Parity-Check

LISA Language for Instruction Set Architecture LLR Log-Likelihood Ratio

LNA Low-Noise Amplifier LRU Least Recently Used LTE Long Term Evolution

LTRISC a template RISC processor provided by Synopsys Processor Designer MAC Medium Access Control

MIMO Multiple-Input Multiple-Output MMSE Minimum Mean Square Error MoC Model of Computation

MP Matching Pursuit

MPSoC Multi-Processor System-on-Chip M_S/Sys Master subsystem

NI Network Interface

NoC Network-on-Chip

OFDM Orthogonal Frequency-Division Multiplexing OMAP Open Multimedia Application Platform

OS Operating System

PC Personal Computer

PC Program Counter

PE Processing Element

PHY PHYsical layer

PSP Processor-Support Package

QAM Quadrature Amplitude Modulation QoS Quality of Service

QRD QR Decomposition

RB Resource Block

RF Radio Frequency

RISC Reduced Instruction Set Computer

RR Round-Robin

RTL Register-Transfer Level

S/Sys subsystem

SA Simulated Annealing

SDFG Synchronous Data Flow Graph SDF Synchronous Data Flow

SD Sphere-Decoding

SIMD Single Instruction Multiple Data SMP Symmetric Multiprocessing SPE Synergistic Processing Element SQRD Sorted QR Decomposition S_S/Sys Slave subsystem

SSRAM Synchronous Static Random Access Memory

TD Turbo Decoding

TI Texas Instruments

TL Tail flit of a NoC packet

TL Transaction-Level

TS Tabu Search

TTI Transmission Time Interval

VC Virtual Channel

VLIW Very Large Instruction Word VPU Virtual Processing Unit

1.1 ITRS 2012: Predictions for number of transistors per chip at production and clock frequency . . . 1 1.2 Qualitative comparison between flexibility, performance and power con-

sumption of different PE types (adapted from [23]) . . . 4 2.1 An exemplary system connected by a bus . . . 22 2.2 An exemplary system connected by a NoC, with 8 PE nodes and 8

memory nodes . . . 23 3.1 Basic concept of OSIP-based systems . . . 32 3.2 OSIP software layer . . . 33 3.3 Hardware integration of OSIP-based systems . . . 35 3.4 Design flow for OSIP development . . . 39 3.5 An exemplary scheduling hierarchy . . . 41 3.6 Instruction-level profiling of C-implementation . . . 42 3.7 Pipeline structure of the OSIP Core . . . 45 3.8 Structure of AGU for OSIP_DT . . . 47 3.9 Load/Store of OSIP_DT and normal data . . . 47 3.10 Update OSIP_DT . . . 49 3.11 Hardware-supported node comparison . . . 50 3.12 Compare nodes and Compare nodes & Continue . . . 52 3.13 Control of the OSIP state at the PFE-Stage . . . 54 3.14 Execution cycles of “hot-spot” commands and their occurrence frequency

in percentage . . . 58 3.15 Power profile of OSIP . . . 60 4.1 Setup of baseline system . . . 64 4.2 Task graph and OSIP configuration of the synthetic application . . . 65 4.3 Task graph and OSIP configuration of H.264 . . . 67 4.4 Synthetic application: OSIP efficiency analysis in systems with an ide-

alized communication architecture . . . 69 4.5 Synthetic application: Energy consumption ratio between LT-OSIP and

OSIP . . . 71 4.6 H.264: OSIP efficiency analysis in systems with an idealized communi-

cation architecture . . . 71

4.7 H.264: Energy consumption ratio between LT-OSIP and OSIP . . . 72 4.8 H.264: Impact of the communication architecture in OSIP-based systems 73 4.9 H.264: Composition of the execution time from PE’s view . . . 75 4.10 H.264: Impact of the communication architecture on the OSIP state . . . 75 4.11 Cache system . . . 78 4.12 An example of preventing data race conditions for write buffers by

using spinlocks . . . 80 4.13 H.264: Frame rate at different optimization levels . . . 81 4.14 H.264: Composition of the execution time at different optimization lev-

els of the communication architecture in the 11-PE-system . . . 82 4.15 Synthetic application: Execution time at different optimization levels . . 83 4.16 H.264: Joint impact of OSIP and the communication architecture . . . . 85 4.17 Synthetic application: Joint impact of OSIP and the communication ar-

chitecture in best case scenario . . . 86 4.18 Synthetic application: Joint impact of OSIP and the communication ar-

chitecture in worst case scenario . . . 88 4.19 Synthetic application: Joint impact of OSIP and the communication ar-

chitecture in average case scenario . . . 89 5.1 Impact of the spinlock acquisition order . . . 91 5.2 Basic spinlock control flow . . . 94 5.3 Enhanced spinlock control flow . . . 96 5.4 Flow of granting a spinlock request . . . 98 5.5 Block diagram of triggering the OSIP core for spinlock reservation . . . 99 5.6 Synthetic application: Performance improvement based on spinlock

reservation . . . 103 5.7 H.264: Performance improvement based on spinlock reservation . . . . 106 5.8 Synthetic application: Comparison of performance improvement using

spinlock reservation between OSIP-, LT-OSIP- and UT-OSIP-based sys- tems . . . 107 5.9 H.264: Comparison of performance improvement using spinlock reser-

vation between OSIP-, LT-OSIP- and UT-OSIP-based systems . . . 109 6.1 OSIP subsystem for NoC integration . . . 115 6.2 An exemplary NoC-based system with integrated OSIP . . . 115 6.3 2D mesh-like NoC topology . . . 117 6.4 Structure of multi-flit packets and single-flit packets . . . 117 6.5 Router block diagram . . . 121 6.6 NI block diagram . . . 122 6.7 Power profile: Contribution of different power groups . . . 124 6.8 Block diagram of a MIMO-OFDM doubly iterative receiver . . . 125 6.9 Radio frame structure of LTE (FDD mode, normal CP) . . . 127 6.10 Pilot pattern of 2×2 MIMO system . . . 127 6.11 CSDF of 2×2 digital MIMO-OFDM receiver . . . 130 6.12 Node assignment in the NoC-based system . . . 131

6.13 Structure of VPU subsystem . . . 132 6.14 Block diagram of OSIP subsystem . . . 133 6.15 Throughputs and latencies of MIMO-OFDM receiver with different NoC

configurations . . . 135 6.16 Structure of VPU subsystem with DMA . . . 137 6.17 Two execution flows of VPU subsystems . . . 138 6.18 Improvement of system performance with enhancements in the NoC-

based communication architecture . . . 139 6.19 Joint impact of OSIP and NoC-based communication architecture on

system performance . . . 141 6.20 OSIP busy state at different NoC-based communication architectures . 142 6.21 An example of issuing an command to OSIP based on fine-grained com-

munications between PE and proxy . . . 144 6.22 Examples of multi-OSIP-systems in different organizations . . . 146

2.1 List of task management references in the literature . . . 11 3.1 Synthesis results of OSIP and LT-OSIP . . . 57 3.2 Power consumption of OSIP and LT-OSIP . . . 59 6.1 Power consumption of NoC . . . 123 6.2 System parameters for achieving a throughput of 150 Mbit/s in a 2×2

MIMO system . . . 128 6.3 Selected implementations of algorithmic kernels . . . 129 6.4 Definition of VPUs . . . 130 A.1 Packet Types . . . 151 A.2 Payload structure of packet types . . . 152 A.3 Transmission of a PCK_R . . . 154 A.4 Transmission of a PCK_RR . . . 154 A.5 Transmission of a PCK_W . . . 155 A.6 Transmission of a PCK_WR . . . 155 A.7 Transmission of a PCK_S . . . 156 165

[1] “Open Virtual Platforms.” [Online]. Available: http://www.ovpworld.org

[2] 3GPP, “3GPP Release 8.” [Online]. Available: http://www.3gpp.org/specifications/ releases/72-release-8

[3] B. Ackland, A. Anesko, D. Brinthaupt, S. Daubert, A. Kalavade, J. Knobloch, E. Micca, M. Moturi, C. Nicol, J. O’Neill, J. Othmer, E. Sackinger, K. Singh, J. Sweet, C. Terman, and J. Williams, “A Single-Chip, 1.6-Billion, 16-b MAC/s Multiprocessor DSP,” IEEE

Journal of Solid-State Circuits, vol. 35, no. 3, pp. 412–424, March 2000.

[4] W. Ahmed, M. Shafique, L. Bauer, and J. Henkel, “Adaptive Resource Management for Simultaneous Multitasking in Mixed-Grained Reconfigurable Multi-Core Processors,” in Proceedings of International Conference on Hardware/Software Codesign and System Synthe-

sis (CODES+ISSS), Taipei, Taiwan, Oct 2011, pp. 365–374.

[5] M. Al Faruque, R. Krist, and J. Henkel, “ADAM: Run-Time Agent-Based Distributed Application Mapping for on-chip Communication,” in Proceedings of Design Automation

Conference (DAC), Anaheim, CA, USA, 2008, pp. 760–765.

[6] M. Amos, Theoretical and Experimental DNA Computation, ser. Natural Computing Series. Springer, June 2005, vol. XIII, ISBN 978-3-540-28131-3.

[7] T. E. Anderson, “The Performance of Spin Lock Alternatives for Shared-Memory Multi- processors,” IEEE Transactions on Parallel and Distributed Systems, vol. 1, no. 1, pp. 6–16, Jan. 1990.

[8] A. Andriahantenaina, H. Charlery, A. Greiner, L. Mortiez, and C. Zeferino, “SPIN: A Scalable, Packet Switched, On-Chip Micro-Network,” in Proceedings of Design, Automa-

tion and Test in Europe Conference and Exhibition (DATE), Munich, Germany, 2003, pp.

70–73.

[9] A. Andriahantenaina and A. Greiner, “Micro-network for SoC: Implementation of a 32- port SPIN network,” in Proceedings of Design, Automation and Test in Europe Conference

and Exhibition (DATE), Munich, Germany, 2003, pp. 1128–1129.

[10] ARM, “AMBA System Architecture.” Online: http://www.arm.com/products/ system-ip/amba-specifications

[11] ARM, “ARM MPCore Technology.” Online: http://www.arm.com/products/ processors/cortex-a/index.php

[12] ARM, “ARM SoC Designer.” Online: https://developer.arm.com/products/ system-design/cycle-models/arm-soc-designer

[13] ARM, “ARM926EJ-S Processor.” Online: http://www.arm.com/products/processors/ classic/arm9/arm926.php

[14] O. Arnold and G. Fettweis, “On the Impact of Dynamic Task Scheduling in Heteroge- neous MPSoCs,” in Proceedings of International Conference on Embedded Computer Systems:

Architectures, Modeling, and Simulation (SAMOS), Samos, Greece, 2011, pp. 17–24.

[15] O. Arnold, B. Noethen, and G. Fettweis, “Instruction Set Architecture Extensions for a Dynamic Task Scheduling Unit,” in Proceedings of IEEE Computer Society Annual Sympo-

sium on VLSI (ISVLSI), Amherst, USA, 2012, pp. 249–254.

[16] Arteris, “FlexNoC Interconnect IP.” Online: http://www.arteris.com/flexnoc

[17] G. Ascia, V. Catania, and M. Palesi, “Multi-objective Mapping for Mesh-based NoC Architectures,” in Proceedings of International Conference on Hardware/Software Codesign

and System Synthesis (CODES+ISSS), Stockholm, Sweden, Sept 2004, pp. 182–187.

[18] E. Beigné, F. Clermidy, P. Vivet, A. Clouard, and M. Renaudin, “An Asynchronous NOC Architecture Providing Low Latency Service and Its Multi-Level Design Framework (ASYNC),” in Proceedings of IEEE International Symposium on Asynchronous Circuits and

Systems, New York City, USA, 2005, pp. 54–63.

[19] L. Benini and G. De Micheli, “Networks on Chips: a New SoC Paradigm,” Computer, vol. 35, no. 1, pp. 70–78, 2002.

[20] E. Biscondi, T. Flanagan, F. Fruth, Z. Lin, and F. Moerman, “Maximizing Multicore Efficiency with Navigator Runtime,” White Paper, 2012. Online: www.ti.com/lit/wp/ spry190/spry190.pdf

[21] T. Bjerregaard, “The MANGO Clockless Network-on-Chip: Concepts and Imple- mentation,” Ph.D. dissertation, Informatics and Mathematical Modelling, Technical University of Denmark (DTU), 2005. Online: http://www2.imm.dtu.dk/pubdb/p. php?4025

[22] T. Bjerregaard and J. Sparso, “A Router Architecture for Connection-Oriented Service Guarantees in the MANGO Clockless Network-on-Chip,” in Proceedings of Design, Au-

tomation and Test in Europe Conference and Exhibition (DATE), Munich, Germany, 2005, pp.

1226–1231.

[23] H. Blume, H. Hubert, H. Feldkamper, and T. Noll, “Model-based Exploration of the Design Space for Heterogeneous Systems on Chip,” in Proceedings of IEEE International

Conference on Application-Specific Systems, Architectures and Processors (ASAP), San Jose,

CA, USA, 2002, pp. 29–40.

[24] M. Bohr, R. Chau, T. Ghani, and K. Mistry, “The High-k Solution,” IEEE Spectrum, vol. 44, no. 10, pp. 29–35, Oct 2007.

[25] E. Bolotin, I. Cidon, R. Ginosar, and A. Kolodny, “QNoC: QoS Architecture and Design Process for Network on Chip,” Journal of Systems Architecture, vol. 50, pp. 105–128, 2004. [26] A. Bonfietti, L. Benini, M. Lombardi, and M. Milano, “An Efficient and Complete Ap- proach for Throughput-maximal SDF Allocation and Scheduling on Multi-Core Plat- forms,” in Proceedings of Design, Automation and Test in Europe Conference and Exhibition

[27] H. Boyapati and R. V. R. Kumar, “A Comparison of DSP, ASIC, and RISC DSP Based Implementations of Multiple Access in LTE,” in Proceedings of International Symposium

on Communications, Control and Signal Processing (ISCCSP), Limassol, Cyprus, 2010, pp.

1–5.

[28] Cadence Design Systems, Inc., “Tensilica Customizable Processors.” Online: https:// ip.cadence.com/ipportfolio/tensilica-ip

[29] G. Castilhos, M. Mandelli, G. Madalozzo, and F. Moraes, “Distributed Resource Man- agement in NoC-based MPSoCs with Dynamic Cluster Sizes,” in Proceedings of IEEE

Computer Society Annual Symposium on VLSI (ISVLSI), Natal, Brazil, 2013, pp. 153–158.

[30] J. Castrillon, R. Leupers, and G. Ascheid, “MAPS: Mapping Concurrent Dataflow Ap- plications to Heterogeneous MPSoCs,” IEEE Transactions on Industrial Informatics, vol. 9, no. 1, pp. 527–545, 2013.

[31] J. Castrillon, D. Zhang, T. Kempf, B. Vanthournout, R. Leupers, and G. Ascheid, “Task Management in MPSoCs: An ASIP Approach,” in Proceedings of IEEE/ACM International

Conference on Computer-Aided Design (ICCAD), San Jose, CA, USA, 2009, pp. 587–594.

[32] J. Castrillon, A. Tretter, R. Leupers, and G. Ascheid, “Communication-aware Mapping of KPN Applications Onto Heterogeneous MPSoCs,” in Proceedings of Design Automation

Conference (DAC), San Francisco, CA, USA, 2012, pp. 1266–1271.

[33] J. Ceng, W. Sheng, J. Castrillon, A. Stulova, R. Leupers, G. Ascheid, and H. Meyr, “A High-level Virtual Platform for Early MPSoC Software Development,” in Pro-

ceedings of International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS), Grenoble, France, 2009, pp. 11–20.

[34] S. Chandra, F. Regazzoni, and M. Lajolo, “Hardware/Software Partitioning of Operat- ing Systems: A Behavioral Synthesis Approach,” in Proceedings of the ACM Great Lakes

Symposium on VLSI (GLSVLSI), Philadelphia, PA, USA, 2006, pp. 324–329.

[35] W. Che and K. S. Chatha, “Unrolling and Retiming of Stream Applications Onto Em- bedded Multicore Processors,” in Proceedings of Design Automation Conference (DAC), San Francisco, CA, USA, 2012, pp. 1272–1277.

[36] G. Chen, F. Li, S. Son, and M. Kandemir, “Application Mapping for Chip Multiproces- sors,” in Proceedings of Design Automation Conference (DAC), Anaheim, CA, USA, June 2008, pp. 620–625.

[37] L. Chen, T. Marconi, and T. Mitra, “Online Scheduling for Multi-Core Shared Recon- figurable Fabric,” in Proceedings of Design, Automation and Test in Europe Conference and

Exhibition (DATE), Dresden, Germany, March 2012, pp. 582–585.

[38] X. Chen, Z. Lu, A. Jantsch, and S. Chen, “Handling Shared Variable Synchronization in Multi-Core Network-on-Chips with Distributed Memory,” in Proceedings of the IEEE

International SOC Conference (SOCC), Indianapolis, Indiana, USA, 2010, pp. 467–472.

[39] X. Chen, A. Minwegen, Y. Hassan, D. Kammler, S. Li, T. Kempf, A. Chattopadhyay, and G. Ascheid, “FLEXDET: Flexible, Efficient Multi-Mode MIMO Detection Using Reconfigurable ASIP,” in IEEE Annual International Symposium on Proceedings of Field-

Programmable Custom Computing Machines (FCCM), Toronto, Ontario, Canada, April

[40] J. Choi, H. Oh, S. Kim, and S. Ha, “Executing Synchronous Dataflow Graphs on a SPM- based Multicore Architecture,” in Proceedings of Design Automation Conference (DAC), San Francisco, CA, USA, 2012, pp. 664–671.

[41] C.-L. Chou and R. Marculescu, “Incremental Run-time Application Mapping for Homo- geneous NoCs with Multiple Voltage Levels,” in Proceedings of International Conference on

Hardware/Software Codesign and System Synthesis (CODES+ISSS), Salzburg, Austria, 2007,

pp. 161–166.

[42] C.-L. Chou and R. Marculescu, “User-Aware Dynamic Task Allocation in Networks-on- Chip,” in Proceedings of Design, Automation and Test in Europe Conference and Exhibition

(DATE), Munich, Germany, 2008, pp. 1232–1237.

[43] C.-L. Chou and R. Marculescu, “FARM: Fault-Aware Resource Management in NoC- based Multiprocessor Platforms,” in Proceedings of Design, Automation and Test in Europe

Conference and Exhibition (DATE), Grenoble, France, March 2011, pp. 1–6.

[44] M. Coppola, R. Locatelli, G. Maruccia, L. Pieralisi, and A. Scandurra, “Spidergon: A novel on-chip communication network,” in Proceedings of International Symposium on

System-on-Chip (SoC), Tampere, Finland, 2004, p. 15.

[45] M. Coppola, M. D. Grammatikakis, R. Locatelli, G. Maruccia, and L. Pieralisi, Design of

Cost-Efficient Interconnect Processing Units: Spidergon STNoC (System-on-Chip Design and Technologies), F. Mafie, Ed. CRC Press, 2009.

[46] A. Coskun, T. Rosing, and K. Gross, “Temperature Management in Multiprocessor SoCs Using Online Learning,” in Proceedings of Design Automation Conference (DAC), Anaheim, CA, USA, June 2008, pp. 890–893.

[47] A. Coskun, T. Rosing, and K. Gross, “Utilizing Predictors for Efficient Thermal Manage- ment in Multiprocessor SoCs,” IEEE Transactions on Computer-Aided Design of Integrated

Circuits and Systems, vol. 28, no. 10, pp. 1503–1516, Oct 2009.

[48] T. Craig, “Building FIFO and Priority-Queuing Spin Locks from Atomic Swap,” University of Washington, Department of Computer Science, Tech. Rep. TR 93-02-02, 1993. Online: ftp://ftp.cs.washington.edu/tr/1993/02/UW-CSE-93-02-02.pdf

[49] M. Dall’Osso, G. Biccari, L. Giovannini, D. Bertozzi, and L. Benini, “×pipes: A Latency Insensitive Parameterized Network-on-chip Architecture For Multi-Processor SoCs,” in

Proceedings of International Conference on Computer Design (ICCD), San Jose, CA, USA,

2003, pp. 536–539.

[50] W. J. Dally, “Virtual-Channel Flow Control,” IEEE Transactions on Parallel and Distributed

Systems, vol. 3, no. 2, pp. 194–205, 1992.

[51] W. J. Dally and C. L. Seitz, “Deadlock-Free Message Routing in Multiprocessor Inter- connection Networks,” IEEE Transactions on Computers, vol. C-36, no. 5, pp. 547–553, 1987.

[52] A. Das, A. Kumar, and B. Veeravalli, “Reliability-Driven Task Mapping for Lifetime Extension of Networks-on-Chip Based Multiprocessor Systems,” in Proceedings of Design,

Automation and Test in Europe Conference and Exhibition (DATE), Grenoble, France, March