A.2 Communication between IP components
A.2.6 Constraint
In the current NoC, out-of-order communication is not supported in communica- tion between a M_S/Sys and a S_S/Sys. To avoid this, the system designer has to make sure that a M_S/Sys is not allowed to send another request packet to the same S_S/Sys, if the corresponding response packet to a previous request packet of the same type has not been received yet. However, this constraint does not apply to the communication between two M_S/Sys if both are able to handle out-of-order com- munication.
Table A.3: Transmission of a PCK_R
Step Direction Description
1 M_S/Sys →NI M_S/Sys writes packet routing header, operation code, remote read address and data size to register NI_IN. NI creates a PCK_R and sends it.
- NoC Packet is transferred over NoC and reaches desti- nation node.
2 NI →S_S/Sys NI notifies S_S/Sys via interrupt.
3 S_S/Sys←NI S_S/Sys reads the first flit(s) for getting packet routing header, operation code, read address and data size from NI_OUT. The following information is extracted from the packet header: payload size,
priority, src/dst node.
4 S_S/Sys S_S/Sys stores src/dst, payload size, priority informa- tion for creating PCK_RR later.
Table A.4: Transmission of a PCK_RR
Step Direction Description
1 S_S/Sys→NI S_S/Sys writes packet routing header, operation code and the first data bytes to register NI_IN. The header information is obtained from the cor- responding PCK_R. NI obtains the knowledge of data payload size.
2 S_S/Sys→NI S_S/Sys writes data repeatedly to register NI_IN. In parallel, NI creates and sends the packet.
- NoC Packet is transferred over NoC and reaches desti- nation node.
3 NI →M_S/Sys NI notifies M_S/Sys via interrupt.
4 M_S/Sys ←NI M_S/Sys reads the first flit(s) for getting packet routing header, operation code and the first data bytes from NI_OUT. The following information is extracted from the packet header: payload size,
src/dst node.
Table A.5: Transmission of a PCK_W
Step Direction Description
1 M_S/Sys →NI M_S/Sys writes packet routing header, operation code, remote write address and the first data bytes to register NI_IN. NI obtains the knowledge of data payload size.
2 M_S/Sys →NI M_S/Sys writes the remaining payload data to NI_IN. In parallel, NI creates and sends the packet. - NoC Packet is transferred over NoC and reaches desti-
nation node.
3 NI →S_S/Sys NI notifies S_S/Sys via interrupt.
4 S_S/Sys←NI S_S/Sys reads the first flit(s) for getting packet routing header, operation code, write address and the first data bytes from NI_OUT. The following in- formation is extracted from the packet header: pay-
load size, priority, src/dst node.
5 S_S/Sys←NI S_S/Sys reads the remaining payload data from NI_OUT and writes the data to target address. 6 S_S/Sys S_S/Sys stores src/dst, priority information for cre-
ating PCK_WR later.
Table A.6: Transmission of a PCK_WR
Step Direction Description
1 S_S/Sys→NI S_S/Sys writes packet routing header and the oper- ation code to register NI_IN. The header informa- tion is obtained from the corresponding PCK_W. NI creates a PCK_WR and sends it.
- NoC Packet is transferred over NoC and reaches desti- nation node.
2 NI →M_S/Sys NI notifies M_S/Sys via interrupt.
3 M_S/Sys ←NI M_S/Sys reads flit(s) for getting packet routing header and operation code from NI_OUT. The fol- lowing information is extracted from the packet header: src/dst node.
Table A.7: Transmission of a PCK_S
Step Direction Description
1 M_S/Sys1→NI M_S/Sys1 writes packet routing header, operation code and synchronization value to register NI_IN. - NoC Packet is transferred over NoC and reaches desti-
nation node.
2 NI→M_S/Sys2 NI notifies M_S/Sys2 via interrupt.
3 M_S/Sys2←NI M_S/Sys2 reads flit(s) for getting packet routing header, operation code and synchronization value from NI_OUT. The following information is ex- tracted from the packet header: payload size, src/dst
node.
4 M_S/Sys2 M_S/Sys2 might store src/dst, synchronization value information for creating another PCK_S as re- sponse later.
Acronyms
ACE AXI Coherency Extension ACO Ant Colony Optimization
ADL Architecture Description Language AHB Advanced High-Performance Bus
AMBA Advanced Microcontroller Bus Architecture API Application Programming Interface
ASIC Application-Specific Integrated Circuit
ASIP Application-Specific Instruction-set Processor
AT Area-Time product
ATE Area-Time-Energy product AXI Advanced Extensible Interface BER Bit Error Rate
CCMU Cache Coherence Management Unit
CGRA Coarse-Grained Reconfigurable Architecture CKF Compiler-Known Function
CP Constraints Programming
CP Cyclic Prefix
CSDF Cyclo-static Synchronous Data Flow
DFG Data Flow Graph
DMA Direct Memory Access DSP Digital Signal Processor DT Data flit of a NoC packet
DVFS Dynamic Voltage-Frequency Scaling
DW Data Width
FCFS First-Come First-Served FDD Frequency Division Duplex FFT Fast Fourier Transformation FIFO First In, First Out
flit flow control unit
FPGA Field-Programmable Gate Array
fps Frame per Second
FSM Finite State Machine
GA Genetic Algorithm
GE equivalent gate count in units of two-input drive-one NAND gate GPP General Purpose Processor
GPU Graphical Processing Unit
H-ARQ Hybrid Automatic Repeat reQuest HD Header flit of a NoC packet
HDT Header-Data-Tail flit of a NoC packet HOSK Hardware Operating System Kernel HW-RTOS HardWare Real Time Operating System IDCT inverse discrete cosine transform
ILP Integer Linear Programming IQT Inverse Quantization
ISS Institute for Integrated Signal Processing Systems at the RWTH Aachen University
ISS Instruction-Set Simulator
ITRS International Technology Roadmap for Semiconductors LDPC Low-Density Parity-Check
LISA Language for Instruction Set Architecture LLR Log-Likelihood Ratio
LNA Low-Noise Amplifier LRU Least Recently Used LTE Long Term Evolution
LTRISC a template RISC processor provided by Synopsys Processor Designer MAC Medium Access Control
MIMO Multiple-Input Multiple-Output MMSE Minimum Mean Square Error MoC Model of Computation
MP Matching Pursuit
MPSoC Multi-Processor System-on-Chip M_S/Sys Master subsystem
NI Network Interface
NoC Network-on-Chip
OFDM Orthogonal Frequency-Division Multiplexing OMAP Open Multimedia Application Platform
OS Operating System
PC Personal Computer
PC Program Counter
PE Processing Element
PHY PHYsical layer
PSP Processor-Support Package
QAM Quadrature Amplitude Modulation QoS Quality of Service
QRD QR Decomposition
RB Resource Block
RF Radio Frequency
RISC Reduced Instruction Set Computer
RR Round-Robin
RTL Register-Transfer Level
S/Sys subsystem
SA Simulated Annealing
SDFG Synchronous Data Flow Graph SDF Synchronous Data Flow
SD Sphere-Decoding
SIMD Single Instruction Multiple Data SMP Symmetric Multiprocessing SPE Synergistic Processing Element SQRD Sorted QR Decomposition S_S/Sys Slave subsystem
SSRAM Synchronous Static Random Access Memory
TD Turbo Decoding
TI Texas Instruments
TL Tail flit of a NoC packet
TL Transaction-Level
TS Tabu Search
TTI Transmission Time Interval
VC Virtual Channel
VLIW Very Large Instruction Word VPU Virtual Processing Unit
1.1 ITRS 2012: Predictions for number of transistors per chip at production and clock frequency . . . 1 1.2 Qualitative comparison between flexibility, performance and power con-
sumption of different PE types (adapted from [23]) . . . 4 2.1 An exemplary system connected by a bus . . . 22 2.2 An exemplary system connected by a NoC, with 8 PE nodes and 8
memory nodes . . . 23 3.1 Basic concept of OSIP-based systems . . . 32 3.2 OSIP software layer . . . 33 3.3 Hardware integration of OSIP-based systems . . . 35 3.4 Design flow for OSIP development . . . 39 3.5 An exemplary scheduling hierarchy . . . 41 3.6 Instruction-level profiling of C-implementation . . . 42 3.7 Pipeline structure of the OSIP Core . . . 45 3.8 Structure of AGU for OSIP_DT . . . 47 3.9 Load/Store of OSIP_DT and normal data . . . 47 3.10 Update OSIP_DT . . . 49 3.11 Hardware-supported node comparison . . . 50 3.12 Compare nodes and Compare nodes & Continue . . . 52 3.13 Control of the OSIP state at the PFE-Stage . . . 54 3.14 Execution cycles of “hot-spot” commands and their occurrence frequency
in percentage . . . 58 3.15 Power profile of OSIP . . . 60 4.1 Setup of baseline system . . . 64 4.2 Task graph and OSIP configuration of the synthetic application . . . 65 4.3 Task graph and OSIP configuration of H.264 . . . 67 4.4 Synthetic application: OSIP efficiency analysis in systems with an ide-
alized communication architecture . . . 69 4.5 Synthetic application: Energy consumption ratio between LT-OSIP and
OSIP . . . 71 4.6 H.264: OSIP efficiency analysis in systems with an idealized communi-
cation architecture . . . 71
4.7 H.264: Energy consumption ratio between LT-OSIP and OSIP . . . 72 4.8 H.264: Impact of the communication architecture in OSIP-based systems 73 4.9 H.264: Composition of the execution time from PE’s view . . . 75 4.10 H.264: Impact of the communication architecture on the OSIP state . . . 75 4.11 Cache system . . . 78 4.12 An example of preventing data race conditions for write buffers by
using spinlocks . . . 80 4.13 H.264: Frame rate at different optimization levels . . . 81 4.14 H.264: Composition of the execution time at different optimization lev-
els of the communication architecture in the 11-PE-system . . . 82 4.15 Synthetic application: Execution time at different optimization levels . . 83 4.16 H.264: Joint impact of OSIP and the communication architecture . . . . 85 4.17 Synthetic application: Joint impact of OSIP and the communication ar-
chitecture in best case scenario . . . 86 4.18 Synthetic application: Joint impact of OSIP and the communication ar-
chitecture in worst case scenario . . . 88 4.19 Synthetic application: Joint impact of OSIP and the communication ar-
chitecture in average case scenario . . . 89 5.1 Impact of the spinlock acquisition order . . . 91 5.2 Basic spinlock control flow . . . 94 5.3 Enhanced spinlock control flow . . . 96 5.4 Flow of granting a spinlock request . . . 98 5.5 Block diagram of triggering the OSIP core for spinlock reservation . . . 99 5.6 Synthetic application: Performance improvement based on spinlock
reservation . . . 103 5.7 H.264: Performance improvement based on spinlock reservation . . . . 106 5.8 Synthetic application: Comparison of performance improvement using
spinlock reservation between OSIP-, LT-OSIP- and UT-OSIP-based sys- tems . . . 107 5.9 H.264: Comparison of performance improvement using spinlock reser-
vation between OSIP-, LT-OSIP- and UT-OSIP-based systems . . . 109 6.1 OSIP subsystem for NoC integration . . . 115 6.2 An exemplary NoC-based system with integrated OSIP . . . 115 6.3 2D mesh-like NoC topology . . . 117 6.4 Structure of multi-flit packets and single-flit packets . . . 117 6.5 Router block diagram . . . 121 6.6 NI block diagram . . . 122 6.7 Power profile: Contribution of different power groups . . . 124 6.8 Block diagram of a MIMO-OFDM doubly iterative receiver . . . 125 6.9 Radio frame structure of LTE (FDD mode, normal CP) . . . 127 6.10 Pilot pattern of 2×2 MIMO system . . . 127 6.11 CSDF of 2×2 digital MIMO-OFDM receiver . . . 130 6.12 Node assignment in the NoC-based system . . . 131
6.13 Structure of VPU subsystem . . . 132 6.14 Block diagram of OSIP subsystem . . . 133 6.15 Throughputs and latencies of MIMO-OFDM receiver with different NoC
configurations . . . 135 6.16 Structure of VPU subsystem with DMA . . . 137 6.17 Two execution flows of VPU subsystems . . . 138 6.18 Improvement of system performance with enhancements in the NoC-
based communication architecture . . . 139 6.19 Joint impact of OSIP and NoC-based communication architecture on
system performance . . . 141 6.20 OSIP busy state at different NoC-based communication architectures . 142 6.21 An example of issuing an command to OSIP based on fine-grained com-
munications between PE and proxy . . . 144 6.22 Examples of multi-OSIP-systems in different organizations . . . 146
2.1 List of task management references in the literature . . . 11 3.1 Synthesis results of OSIP and LT-OSIP . . . 57 3.2 Power consumption of OSIP and LT-OSIP . . . 59 6.1 Power consumption of NoC . . . 123 6.2 System parameters for achieving a throughput of 150 Mbit/s in a 2×2
MIMO system . . . 128 6.3 Selected implementations of algorithmic kernels . . . 129 6.4 Definition of VPUs . . . 130 A.1 Packet Types . . . 151 A.2 Payload structure of packet types . . . 152 A.3 Transmission of a PCK_R . . . 154 A.4 Transmission of a PCK_RR . . . 154 A.5 Transmission of a PCK_W . . . 155 A.6 Transmission of a PCK_WR . . . 155 A.7 Transmission of a PCK_S . . . 156 165
[1] “Open Virtual Platforms.” [Online]. Available: http://www.ovpworld.org
[2] 3GPP, “3GPP Release 8.” [Online]. Available: http://www.3gpp.org/specifications/ releases/72-release-8
[3] B. Ackland, A. Anesko, D. Brinthaupt, S. Daubert, A. Kalavade, J. Knobloch, E. Micca, M. Moturi, C. Nicol, J. O’Neill, J. Othmer, E. Sackinger, K. Singh, J. Sweet, C. Terman, and J. Williams, “A Single-Chip, 1.6-Billion, 16-b MAC/s Multiprocessor DSP,” IEEE
Journal of Solid-State Circuits, vol. 35, no. 3, pp. 412–424, March 2000.
[4] W. Ahmed, M. Shafique, L. Bauer, and J. Henkel, “Adaptive Resource Management for Simultaneous Multitasking in Mixed-Grained Reconfigurable Multi-Core Processors,” in Proceedings of International Conference on Hardware/Software Codesign and System Synthe-
sis (CODES+ISSS), Taipei, Taiwan, Oct 2011, pp. 365–374.
[5] M. Al Faruque, R. Krist, and J. Henkel, “ADAM: Run-Time Agent-Based Distributed Application Mapping for on-chip Communication,” in Proceedings of Design Automation
Conference (DAC), Anaheim, CA, USA, 2008, pp. 760–765.
[6] M. Amos, Theoretical and Experimental DNA Computation, ser. Natural Computing Series. Springer, June 2005, vol. XIII, ISBN 978-3-540-28131-3.
[7] T. E. Anderson, “The Performance of Spin Lock Alternatives for Shared-Memory Multi- processors,” IEEE Transactions on Parallel and Distributed Systems, vol. 1, no. 1, pp. 6–16, Jan. 1990.
[8] A. Andriahantenaina, H. Charlery, A. Greiner, L. Mortiez, and C. Zeferino, “SPIN: A Scalable, Packet Switched, On-Chip Micro-Network,” in Proceedings of Design, Automa-
tion and Test in Europe Conference and Exhibition (DATE), Munich, Germany, 2003, pp.
70–73.
[9] A. Andriahantenaina and A. Greiner, “Micro-network for SoC: Implementation of a 32- port SPIN network,” in Proceedings of Design, Automation and Test in Europe Conference
and Exhibition (DATE), Munich, Germany, 2003, pp. 1128–1129.
[10] ARM, “AMBA System Architecture.” Online: http://www.arm.com/products/ system-ip/amba-specifications
[11] ARM, “ARM MPCore Technology.” Online: http://www.arm.com/products/ processors/cortex-a/index.php
[12] ARM, “ARM SoC Designer.” Online: https://developer.arm.com/products/ system-design/cycle-models/arm-soc-designer
[13] ARM, “ARM926EJ-S Processor.” Online: http://www.arm.com/products/processors/ classic/arm9/arm926.php
[14] O. Arnold and G. Fettweis, “On the Impact of Dynamic Task Scheduling in Heteroge- neous MPSoCs,” in Proceedings of International Conference on Embedded Computer Systems:
Architectures, Modeling, and Simulation (SAMOS), Samos, Greece, 2011, pp. 17–24.
[15] O. Arnold, B. Noethen, and G. Fettweis, “Instruction Set Architecture Extensions for a Dynamic Task Scheduling Unit,” in Proceedings of IEEE Computer Society Annual Sympo-
sium on VLSI (ISVLSI), Amherst, USA, 2012, pp. 249–254.
[16] Arteris, “FlexNoC Interconnect IP.” Online: http://www.arteris.com/flexnoc
[17] G. Ascia, V. Catania, and M. Palesi, “Multi-objective Mapping for Mesh-based NoC Architectures,” in Proceedings of International Conference on Hardware/Software Codesign
and System Synthesis (CODES+ISSS), Stockholm, Sweden, Sept 2004, pp. 182–187.
[18] E. Beigné, F. Clermidy, P. Vivet, A. Clouard, and M. Renaudin, “An Asynchronous NOC Architecture Providing Low Latency Service and Its Multi-Level Design Framework (ASYNC),” in Proceedings of IEEE International Symposium on Asynchronous Circuits and
Systems, New York City, USA, 2005, pp. 54–63.
[19] L. Benini and G. De Micheli, “Networks on Chips: a New SoC Paradigm,” Computer, vol. 35, no. 1, pp. 70–78, 2002.
[20] E. Biscondi, T. Flanagan, F. Fruth, Z. Lin, and F. Moerman, “Maximizing Multicore Efficiency with Navigator Runtime,” White Paper, 2012. Online: www.ti.com/lit/wp/ spry190/spry190.pdf
[21] T. Bjerregaard, “The MANGO Clockless Network-on-Chip: Concepts and Imple- mentation,” Ph.D. dissertation, Informatics and Mathematical Modelling, Technical University of Denmark (DTU), 2005. Online: http://www2.imm.dtu.dk/pubdb/p. php?4025
[22] T. Bjerregaard and J. Sparso, “A Router Architecture for Connection-Oriented Service Guarantees in the MANGO Clockless Network-on-Chip,” in Proceedings of Design, Au-
tomation and Test in Europe Conference and Exhibition (DATE), Munich, Germany, 2005, pp.
1226–1231.
[23] H. Blume, H. Hubert, H. Feldkamper, and T. Noll, “Model-based Exploration of the Design Space for Heterogeneous Systems on Chip,” in Proceedings of IEEE International
Conference on Application-Specific Systems, Architectures and Processors (ASAP), San Jose,
CA, USA, 2002, pp. 29–40.
[24] M. Bohr, R. Chau, T. Ghani, and K. Mistry, “The High-k Solution,” IEEE Spectrum, vol. 44, no. 10, pp. 29–35, Oct 2007.
[25] E. Bolotin, I. Cidon, R. Ginosar, and A. Kolodny, “QNoC: QoS Architecture and Design Process for Network on Chip,” Journal of Systems Architecture, vol. 50, pp. 105–128, 2004. [26] A. Bonfietti, L. Benini, M. Lombardi, and M. Milano, “An Efficient and Complete Ap- proach for Throughput-maximal SDF Allocation and Scheduling on Multi-Core Plat- forms,” in Proceedings of Design, Automation and Test in Europe Conference and Exhibition
[27] H. Boyapati and R. V. R. Kumar, “A Comparison of DSP, ASIC, and RISC DSP Based Implementations of Multiple Access in LTE,” in Proceedings of International Symposium
on Communications, Control and Signal Processing (ISCCSP), Limassol, Cyprus, 2010, pp.
1–5.
[28] Cadence Design Systems, Inc., “Tensilica Customizable Processors.” Online: https:// ip.cadence.com/ipportfolio/tensilica-ip
[29] G. Castilhos, M. Mandelli, G. Madalozzo, and F. Moraes, “Distributed Resource Man- agement in NoC-based MPSoCs with Dynamic Cluster Sizes,” in Proceedings of IEEE
Computer Society Annual Symposium on VLSI (ISVLSI), Natal, Brazil, 2013, pp. 153–158.
[30] J. Castrillon, R. Leupers, and G. Ascheid, “MAPS: Mapping Concurrent Dataflow Ap- plications to Heterogeneous MPSoCs,” IEEE Transactions on Industrial Informatics, vol. 9, no. 1, pp. 527–545, 2013.
[31] J. Castrillon, D. Zhang, T. Kempf, B. Vanthournout, R. Leupers, and G. Ascheid, “Task Management in MPSoCs: An ASIP Approach,” in Proceedings of IEEE/ACM International
Conference on Computer-Aided Design (ICCAD), San Jose, CA, USA, 2009, pp. 587–594.
[32] J. Castrillon, A. Tretter, R. Leupers, and G. Ascheid, “Communication-aware Mapping of KPN Applications Onto Heterogeneous MPSoCs,” in Proceedings of Design Automation
Conference (DAC), San Francisco, CA, USA, 2012, pp. 1266–1271.
[33] J. Ceng, W. Sheng, J. Castrillon, A. Stulova, R. Leupers, G. Ascheid, and H. Meyr, “A High-level Virtual Platform for Early MPSoC Software Development,” in Pro-
ceedings of International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS), Grenoble, France, 2009, pp. 11–20.
[34] S. Chandra, F. Regazzoni, and M. Lajolo, “Hardware/Software Partitioning of Operat- ing Systems: A Behavioral Synthesis Approach,” in Proceedings of the ACM Great Lakes
Symposium on VLSI (GLSVLSI), Philadelphia, PA, USA, 2006, pp. 324–329.
[35] W. Che and K. S. Chatha, “Unrolling and Retiming of Stream Applications Onto Em- bedded Multicore Processors,” in Proceedings of Design Automation Conference (DAC), San Francisco, CA, USA, 2012, pp. 1272–1277.
[36] G. Chen, F. Li, S. Son, and M. Kandemir, “Application Mapping for Chip Multiproces- sors,” in Proceedings of Design Automation Conference (DAC), Anaheim, CA, USA, June 2008, pp. 620–625.
[37] L. Chen, T. Marconi, and T. Mitra, “Online Scheduling for Multi-Core Shared Recon- figurable Fabric,” in Proceedings of Design, Automation and Test in Europe Conference and
Exhibition (DATE), Dresden, Germany, March 2012, pp. 582–585.
[38] X. Chen, Z. Lu, A. Jantsch, and S. Chen, “Handling Shared Variable Synchronization in Multi-Core Network-on-Chips with Distributed Memory,” in Proceedings of the IEEE
International SOC Conference (SOCC), Indianapolis, Indiana, USA, 2010, pp. 467–472.
[39] X. Chen, A. Minwegen, Y. Hassan, D. Kammler, S. Li, T. Kempf, A. Chattopadhyay, and G. Ascheid, “FLEXDET: Flexible, Efficient Multi-Mode MIMO Detection Using Reconfigurable ASIP,” in IEEE Annual International Symposium on Proceedings of Field-
Programmable Custom Computing Machines (FCCM), Toronto, Ontario, Canada, April
[40] J. Choi, H. Oh, S. Kim, and S. Ha, “Executing Synchronous Dataflow Graphs on a SPM- based Multicore Architecture,” in Proceedings of Design Automation Conference (DAC), San Francisco, CA, USA, 2012, pp. 664–671.
[41] C.-L. Chou and R. Marculescu, “Incremental Run-time Application Mapping for Homo- geneous NoCs with Multiple Voltage Levels,” in Proceedings of International Conference on
Hardware/Software Codesign and System Synthesis (CODES+ISSS), Salzburg, Austria, 2007,
pp. 161–166.
[42] C.-L. Chou and R. Marculescu, “User-Aware Dynamic Task Allocation in Networks-on- Chip,” in Proceedings of Design, Automation and Test in Europe Conference and Exhibition
(DATE), Munich, Germany, 2008, pp. 1232–1237.
[43] C.-L. Chou and R. Marculescu, “FARM: Fault-Aware Resource Management in NoC- based Multiprocessor Platforms,” in Proceedings of Design, Automation and Test in Europe
Conference and Exhibition (DATE), Grenoble, France, March 2011, pp. 1–6.
[44] M. Coppola, R. Locatelli, G. Maruccia, L. Pieralisi, and A. Scandurra, “Spidergon: A novel on-chip communication network,” in Proceedings of International Symposium on
System-on-Chip (SoC), Tampere, Finland, 2004, p. 15.
[45] M. Coppola, M. D. Grammatikakis, R. Locatelli, G. Maruccia, and L. Pieralisi, Design of
Cost-Efficient Interconnect Processing Units: Spidergon STNoC (System-on-Chip Design and Technologies), F. Mafie, Ed. CRC Press, 2009.
[46] A. Coskun, T. Rosing, and K. Gross, “Temperature Management in Multiprocessor SoCs Using Online Learning,” in Proceedings of Design Automation Conference (DAC), Anaheim, CA, USA, June 2008, pp. 890–893.
[47] A. Coskun, T. Rosing, and K. Gross, “Utilizing Predictors for Efficient Thermal Manage- ment in Multiprocessor SoCs,” IEEE Transactions on Computer-Aided Design of Integrated
Circuits and Systems, vol. 28, no. 10, pp. 1503–1516, Oct 2009.
[48] T. Craig, “Building FIFO and Priority-Queuing Spin Locks from Atomic Swap,” University of Washington, Department of Computer Science, Tech. Rep. TR 93-02-02, 1993. Online: ftp://ftp.cs.washington.edu/tr/1993/02/UW-CSE-93-02-02.pdf
[49] M. Dall’Osso, G. Biccari, L. Giovannini, D. Bertozzi, and L. Benini, “×pipes: A Latency Insensitive Parameterized Network-on-chip Architecture For Multi-Processor SoCs,” in
Proceedings of International Conference on Computer Design (ICCD), San Jose, CA, USA,
2003, pp. 536–539.
[50] W. J. Dally, “Virtual-Channel Flow Control,” IEEE Transactions on Parallel and Distributed
Systems, vol. 3, no. 2, pp. 194–205, 1992.
[51] W. J. Dally and C. L. Seitz, “Deadlock-Free Message Routing in Multiprocessor Inter- connection Networks,” IEEE Transactions on Computers, vol. C-36, no. 5, pp. 547–553, 1987.
[52] A. Das, A. Kumar, and B. Veeravalli, “Reliability-Driven Task Mapping for Lifetime Extension of Networks-on-Chip Based Multiprocessor Systems,” in Proceedings of Design,
Automation and Test in Europe Conference and Exhibition (DATE), Grenoble, France, March