Top PDF Performance and Energy Trade-offs for 3D IC NoC Interconnects and Architectures

Performance and Energy Trade-offs for 3D IC NoC Interconnects and Architectures

Performance and Energy Trade-offs for 3D IC NoC Interconnects and Architectures

Figure 4-14: Wireless Comparison with 64 bits/flit Non-Uniform Traffic Energy per Message without Waiting 4.2.2 Latency The average latency for the non-uniform traffic patterns are shown in Figure 4-15 and Figure 4-16. The inductive coupling sparse mesh does really well performance wise and is only slightly behind the quickest TSV results while outperforming the slower TSV networks that use 32 bits per flit and staying competitive with the slower TSV networks that use 64 bits per flit. The energy increase for typical workloads may not be worth the performance increase compared to the other inductive coupling networks. For instances where a wireless interconnect is essential, such as the implementation of a liquid cooling layer, the sparse 3D mesh could be the best option to maintain similar vertical performance to the rest of the chip using TSVs. The capacitive coupling mesh does not perform as well. It consumes a lot of extra energy and the latency is significantly higher compared to all of the inductive coupling networks. The high vertical transmission time as illustrated by Table 4-2 and Table 4-3 is the main contributor to the excess latency compared to the other networks. On average, the inductive coupling ring network is only
Show more

66 Read more

Reliability-Performance Trade-offs in Photonic NoC Architectures

Reliability-Performance Trade-offs in Photonic NoC Architectures

18 2.1.1 Micro ring resonator The micro ring resonators are used for modulating, filtering, and routing the light waves on the PNoC. MRRs should be small in size, capable of modulating the light signals at high speed, and consume less energy. Today, the MRRs are as small as 4um in diameter and with a free spectral range of 6.92 THz [7]. These MRRs can modulate a light signal at a speed of 12.5 Gb/s. The adiabatic micro ring modulators are able to meet the requirements of the PNoC architectures than the older mach-zehnder modulator (MZM) [7]. This is because the adiabatic MRR has better power consumption and lesser resistance than the MZMs. The adiabatic transition from wide, multimode contact to narrow, single mode contact eliminates unwanted spatial modes. The single mode coupling and lesser resistance in the adiabatic MRRs increases the speed of operation. The light waves will be coupled on to the MRR only when the wavelength of the light matches with the resonant frequency of the MRR. The resonant frequency of the MRR can be changed by applying heat to them. The heat is applied on the MRR with the help of local heaters. We assume a single heater element per MRR in the PNoCs enabling the thermal tuning.
Show more

65 Read more

Reliability-Performance Trade-offs in Photonic NoC Architectures

Reliability-Performance Trade-offs in Photonic NoC Architectures

18 2.1.1 Micro ring resonator The micro ring resonators are used for modulating, filtering, and routing the light waves on the PNoC. MRRs should be small in size, capable of modulating the light signals at high speed, and consume less energy. Today, the MRRs are as small as 4um in diameter and with a free spectral range of 6.92 THz [7]. These MRRs can modulate a light signal at a speed of 12.5 Gb/s. The adiabatic micro ring modulators are able to meet the requirements of the PNoC architectures than the older mach-zehnder modulator (MZM) [7]. This is because the adiabatic MRR has better power consumption and lesser resistance than the MZMs. The adiabatic transition from wide, multimode contact to narrow, single mode contact eliminates unwanted spatial modes. The single mode coupling and lesser resistance in the adiabatic MRRs increases the speed of operation. The light waves will be coupled on to the MRR only when the wavelength of the light matches with the resonant frequency of the MRR. The resonant frequency of the MRR can be changed by applying heat to them. The heat is applied on the MRR with the help of local heaters. We assume a single heater element per MRR in the PNoCs enabling the thermal tuning.
Show more

65 Read more

Design Trade-offs for reliable On-Chip Wireless Interconnects in NoC Platforms

Design Trade-offs for reliable On-Chip Wireless Interconnects in NoC Platforms

This particular form of CDMA that is used in the WiNoC results in decreasing the effective data transmission bandwidth per channel as each bit is encoded into a codeword consisting of several chips before transmission. However, it is shown in [4] that the same aggregate wireless bandwidth when distributed into multiple links improves performance of the wireless NoC compared to a single link with high bandwidth due to better connectivity of the network. The adopted Walsh codes have as many orthogonal codewords as the number of chips. For instance, in a set of Walsh codes with eight chips there are eight orthogonal codewords. However, only seven out of them are balanced with equal number of high and low chips which is required for the simple digital correlator in the CDMA receiver. This implies that seven wireless channels can operate simultaneously.
Show more

74 Read more

Evaluation of temperature-performance trade-offs in wireless network-on-chip architectures

Evaluation of temperature-performance trade-offs in wireless network-on-chip architectures

21 Chapter 2 Related Work Conventional NoCs use multi-hop, packet switched communication. NoCs have been shown to perform better by inserting long-range, wired links following principles of small-world graphs [23]. A comprehensive survey regarding various WiNoC architectures and their design principles are presented in [24]. Notable examples include, design of a WiNoC based on CMOS ultra wideband (UWB) [19], 2D concentrated mesh-based WCube architecture using sub-THz wireless links [20], and the inter-router wireless scalable express channel for NoC (iWISE) architecture [25]. Possibilities of creating novel architectures aided by the on-chip wireless communication have been explored in [11] and [8]. These two works proposed design of hierarchical and hybrid WiNoC architectures using long-range wireless shortcuts. The whole system is partitioned into multiple small clusters of neighboring cores called subnets. In the upper level of the network, the subnets are connected via wireline and wireless links. In both these designs the subnets are connected in a basic regular structure like a mesh or a ring, in the second level of the hierarchy, long-range wireless shortcuts are placed on top of that. It is also shown that a WiNoC, where the network architecture is designed following the power-law based small-world connectivity [26], is more robust in presence of wireless link failures compared to the hierarchical counterpart [23]. Though there have been several investigations regarding the performance evaluation and associated design trade-offs of various WiNoC architectures, analysis of their thermal profiles has not received much attention. The work in [27] shows that by incorporating dynamic voltage and frequency scaling (DVFS) in a WiNoC, the thermal profile of a multicore chip can be improved.
Show more

73 Read more

An efficient 2D router architecture for extending the performance of inhomogeneous 3D NoC-based multi-core architectures

An efficient 2D router architecture for extending the performance of inhomogeneous 3D NoC-based multi-core architectures

I. I NTRODUCTION Recently, three-dimensional Network-on-Chip (3D NoC) has been proposed to solve the communication demands of modern multi-core architecture design. However, 3D ICs have alignment issues along with low yield and high temperature dissipation, which affect the reliability of the implemented on- chip cores. Specifically, the 3D routers have a larger area and power consumptions than a 2D router with a similar architecture. Moreover, Through Silicon Via (TSV) which has been accepted as a viable inter-layer wiring technique has a complex and expensive manufacturing process [1]. To optimize the performance and manufacturing cost of 3D NoCs with minimal distortion to the modularity, inhomogeneous archi- tectures have been proposed to combine 2D and 3D routers in 3D NoCs [2]–[4]. Several inhomogeneous 3D architectures focusing on different NoC router architectures, minimal hop- count between 2D and 3D routers in each layer, and uniform distribution of 2D and 3D routers have been proposed [5]. However, due to the limited number of 3D routers and vertical links, inhomogeneous 3D NoCs have a performance trade-off. While inhomogenous 3D NoCs promises to resolve the poor scalability and performance issues of conventional traditional 2D NoCs, the multi-hop among the long wired 2D routers is still a performance bottleneck. Our goal is to mitigate the performance reduction in inhomogeneous 3D NoCs by proposing an efficient router architecture that accounts of the manufacturing cost in terms of area and power consumption.
Show more

7 Read more

Assessing Data Deduplication Trade-offs from an Energy and Performance Perspective

Assessing Data Deduplication Trade-offs from an Energy and Performance Perspective

Campina Grande, PB, Brazil raquel@dsc.ufcg.edu.br Abstract—The energy costs of running computer systems are a growing concern: for large data centers, recent estimates put these costs higher than the cost of hardware itself. As a consequence, energy efficiency has become a pervasive theme for designing, deploying, and operating computer systems. This paper evaluates the energy trade-offs brought by data dedupli- cation in distributed storage systems. Depending on the work- load, deduplication can enable a lower storage footprint, reduce the I/O pressure on the storage system, and reduce network traffic, at the cost of increased computational overhead. From an energy perspective, data deduplication enables a trade-off between the energy consumed for additional computation and the energy saved by lower storage and network load. The main point our experiments and model bring home is the following:
Show more

6 Read more

Performance and Energy Aware Wavelength Allocation on Ring-Based WDM 3D Optical NoC

Performance and Energy Aware Wavelength Allocation on Ring-Based WDM 3D Optical NoC

V. C ONCLUSION Wavelength allocation is a critical issue for the BER and energy performance of WDM ONoC. In this paper, we demon- strate that allocation of multiple parallel optical signals to sup- port communication on ONoC can improve the execution of task graph representing an application. However WDM introduces crosstalk between simultaneous communications located on the waveguide. This crosstalk depends on the distance between the optical signals which travel the waveguide at the same time. The consequence is a reduction of SNR of the communications depending on the wavelength selection. Therefore, we propose an approach enabling the concurrent optimization of WDM ONoC. A set of most promising solutions in a design space is obtained. As results, the most energy-efficient solution is ob- tained when each communication is performed on 1 wavelength. Moreover, the optimized time tends to the minimal execution time with growing number of wavelengths. From the designer point of view, trade-off is then possible to respect the application constraints in terms of energy or timing performance. Future work will concern the possibility to evaluate the performance for different task mapping. Since the task mapping allows to move the communication in space and in time respectively, the system performance including throughput, BER and bit energy will be better improved.
Show more

7 Read more

Temperature Evaluation of NoC Architectures and Dynamically Reconfigurable NoC

Temperature Evaluation of NoC Architectures and Dynamically Reconfigurable NoC

9 Chapter 1 Introduction The increase in computational complexities has led to an ever increasing demand of fast and powerful computers. Such powerful computers are often used in fields like astrophysics, weather forecasting, bioinformatics, oil and gas exploration as well as high-end consumer electronics. In accordance with the Moore‟s law the number of transistors packed in a single chip is increasing at a massive scale. However, the clock speed technology cannot match the same growth rate as that of the transistor technology due to exponential increase in power dissipation directly proportional to frequency. In order to satiate the hunger for computational power, chip designers need to come up with new and innovative techniques to build powerful computers. This led to the advent of the multi-core chip era. As the number of cores on chip increase, interconnects start playing an important role in the performance of the system. A lot of research has been put into making on-chip interconnection architectures better and reliable. However, the potential of wireless interconnects to design reconfigurable NoCs to mitigate dynamic thermal issues is yet to be fully exploited.
Show more

63 Read more

High Performance On-Chip Interconnects Design for Future Many-Core Architectures

High Performance On-Chip Interconnects Design for Future Many-Core Architectures

asymmetric, mainly between many compute cores and a few memory controllers (MCs). Thus the MCs often become hot spots [4], leading to skewed usage of the NoC resources such as wires and buffers. Specifically, heavy reply traffic from MCs to cores potentially causes a network bottleneck, degrading the overall system per- formance. Therefore, when we design a bandwidth-efficient NoC, the asymmetry of its on-chip traffic must be considered. In prior work [4, 5, 37], the on-chip network is partitioned into two independent, equally divided (logical or physical) subnetworks between different types of packets to avoid cyclic dependencies that might cause pro- tocol deadlocks. Due to the asymmetric traffic in GPGPUs skewed heavily towards reply packets, however, such partitioning can lead to imbalanced use of NoC re- sources given in each subnetwork. Thus, it fails to maximize the system throughput, particularly for memory-bound applications requiring a high network bandwidth to accommodate many data requests. The throughput-effectiveness is a crucial met- ric for improving the overall performance in throughput-oriented architectures, thus designing a high bandwidth NoC in GPGPUs is of primary importance. In the GPGPU domain, this is the first study evaluating and analyzing the mutual impacts of different MC placements and routing algorithms on system-wide performance. We observe that the interference from disparate types of GPGPU traffic can be avoided by adopting the bottom MC placement with proper routing algorithms, obviating the need of physically partitioned networks.
Show more

133 Read more

Efficient Collective Communication for Multi-core NOC Interconnects

Efficient Collective Communication for Multi-core NOC Interconnects

Chapter 1 Introduction 1.1 Network-On-Chip Architectures The system architecture has been constantly evolving to meet the computing needs. Initially, the clock frequency of uni-processor architecture was scaled to make the system faster. However, the combined pressures from increased power consumption and the diminishing performance returns led to the adoption of multi-core processor architectures [14]. Currently, multi-core architectures are widely used in both general-purpose computing chips and application-specific Systems-on- Chip (SoC). These multi-core architectures mainly use bus or point-to-point interconnects for information exchange between the cores. This approach has kept the system design simple, but has resulted in overheads due to increased contention over the interconnect. In systems with lesser number of cores, this overhead is small and is offset by the improved performance of using multiple cores.
Show more

57 Read more

Towards the practical design of performance-aware resilient wireless NoC architectures

Towards the practical design of performance-aware resilient wireless NoC architectures

2 Department of Computer Science and Engineering, The Chinese University of Hong Kong, HK SAR 3 Department of Electrical and Electronic Engineering, UCL, London, UK 4 ECS, Faculty of Physical Sciences and Engineering, University of Southampton, UK Abstract —Recently, an improved surface wave-enabled com- munication fabric has been proposed to solve the reliabil- ity issues of emerging hybrid wired-wireless Network-on-Chip (WiNoC) architectures. Thus, providing a promising solution to the performance and scalability demands of the fast-paced technological growth towards exascale and Big-Data processing on future System-on-Chip (SoC) design. However, WiNoCs trade- off optimized performance for cost by restricting the number of area and power hungry wireless nodes. Consequently, in this paper, we propose a low-latency adaptive router with a low-complexity single-cycle bypassing mechanism to alleviate the performance degradation due to the slow wired routers in such emerging hyhbrid NoCs. The proposed router is able to redistribute traffic in the network to alleviate average packet latency at both low and high traffic conditions. As a second contribution the paper presents an experimental evaluation of a practically implemented surface wave communication fabric. By reducing the latency between the wired nodes and wireless nodes the proposed router can improve performance efficiency in terms of average packet delay by an average of 50% in WiNoCs.
Show more

6 Read more

Challenge: Resolving Data Center Power Bill Disputes:
The Energy-Performance Trade-offs of Consolidation

Challenge: Resolving Data Center Power Bill Disputes: The Energy-Performance Trade-offs of Consolidation

Figure 2: CPU performance bounds of quorum-102. consumption of the server is the sum of the individual com- ponents’ consumption, as experimentally shown in [26]. Therefore, collecting activity patterns of VMs is key to es- timate the energy behavior of a modeled real system under VM consolidation and live migration strategy operation. As we discussed in § 3.2, to attain a reliable estimate of energy requirements for data center servers we need to obtain us- age information for individual components. Indeed, we can access such information through the Xen hypervisor, which maintains fine grain accounting of the usage statistics for all computational resources: i.e., CPU, disk I/O, memory I/O, and network I/O. However, it is important and challenging, to calibrate those statistics so that they refer to the load of a real system and not to the virtual load of the VM. This, though, is guaranteed through the resource utilization em- ulation model, discussed in § 4.1. Furthermore, such statis- tics can be emulated and used as input to a utilization-based energy estimation model. Like in [26], such model can be built using measurements from real server-grade machines, whereas emulation can be suitably used to estimate the load.
Show more

6 Read more

Energy-efficiency and Performance in Communication Networks: Analyzing Energy-Performance Trade-offs in Communication Networks and their Implications on Future Network Structure and Management

Energy-efficiency and Performance in Communication Networks: Analyzing Energy-Performance Trade-offs in Communication Networks and their Implications on Future Network Structure and Management

During the course of this study, both user feedback mechanisms were implemented to check their usability and benefit. For the permanent notification it was decided to visual- ize the current status of the network similar to the signal strength graph using filled bars. When opening the notification, the verbal description of the current service class and an exemplary use case as given in Table 5 . 3 is displayed. Further, the remaining time to the next predicted quality class change and predicted quality class are indicated. An example of such a notification is given in Figure 5 . 11 a. The derived service quality map for the se- lected route is shown in Figure 5 . 11 b. Beginning from the departure station the predicted network quality is indicated in the left graph, with a legend to the right describing the quality classes and a usage scenario. The current location on the route is indicated by graying out the already passed route segments. The time scale on the left of the graph lets the user estimate the time within each service class, and thus plan how to spend the time on the train most efficiently. This feedback is expected to greatly improve user experience on trains by first communicating the current quality of the network and thus avoiding disappointments and failed connection attempts to use the network. Thus, the users may spend their time on trains more efficiently by planning their tasks accordingly. As a side effect, the energy consumption of the cellular modem can be reduced by not activating it during periods of poor coverage. On the other side, knowing the route segments showing poor coverage, mobile operators can expand their network accordingly, thus gradually eliminating the need to plan ahead.
Show more

237 Read more

Efficiency and flexibility trade-offs for soft input soft output sphere decoding architectures

Efficiency and flexibility trade-offs for soft input soft output sphere decoding architectures

With the advances of technology scaling hardware implementations of more com- plex receiver algorithms and standards become feasible at affordable costs. These advanced algorithms can be utilized in order to improve the communication perfor- mance provided to the user. Two main aspects of the communication performance are generally experienced by the user: Throughput and—more implicitly—error rates. 3 In scenarios such as voice communications, error rates are experienced quite intu- itively by the quality of service. In data transmission scenarios, both throughput and error rates can be combined to a single user experience of the achievable error-free throughput called goodput. The nominal throughput can be increased by improved communication standards, for instance by a higher bandwidth, a higher modulation order or the use of MIMO technology. The goodput is significantly influenced by the receiver implementation. Analog frontends and digital baseband implementations determine which goodput is achievable under which channel conditions. Sophisti- cated receiver algorithms may trade-off area and energy efficiency against error rates or achievable data rates. All this calls for digital components supporting the scaled throughput as well as analog components with the required accuracy and range. Par- ticularly the recently standardized MIMO modes of WLAN, HSPA and 3GPP-LTE lead to a serious challenge in designing efficient MIMO demapping circuits.
Show more

236 Read more

Sustainable energy: trade-offs and synergies between energy security, competitiveness, and environment

Sustainable energy: trade-offs and synergies between energy security, competitiveness, and environment

Netherlands Environmental Assessment Agency - 1 - 1. Introduction 1 There are concerns about Europe’s economic performance in a globalizing world. In response the EU has adopted its Lisbon Agenda for becoming the most competitive economic region in the world. Another concern is increased oil and gas prices due to geo-political instability that may endanger its economic performance. This asks for sound energy policy. Such policy should be linked to environmental issues such as climate change and air quality. The recent Green Paper on a European strategy for sustainable, competitive and secure energy by the European Commission (EC, 2006) has put energy policy high on the EU agenda.
Show more

14 Read more

Analysis of Trade Offs Between Performance and Energy Consumption Due to Execution of Software on Multiple Platforms and Hardware

Analysis of Trade Offs Between Performance and Energy Consumption Due to Execution of Software on Multiple Platforms and Hardware

Whilst the performance and the breadth of application of computers is increasing, so too is our awareness of the cost and scarcity of the energy required to power them, as well as the materials needed to make them in the first place. However, because computing developments can enable individuals and businesses to adopt greener lifestyles and work styles, in terms of the environmental de- bate computing is definitely both part of the problem and part of the solution. The computing industry is more prepared and far more competent than almost any other industry when it comes to facing and responding to rapid change. Environmentally it is not a good thing that most PCs -- especially in companies -- have typically entered a landfill after only a few years in service. However, this reality does at least mean that a widespread mindset already exists for both adapting to and paying money for new computer hard- ware on a regular basis. Hence, whereas it took decades to get more energy efficient cars on the roads, it will hopefully only take a matter of years to reach a state of affairs where most computers are using far less power than they needlessly waste today.
Show more

10 Read more

Trade-offs between economic and environmental performance of an autonomous hybrid energy system using micro hydro

Trade-offs between economic and environmental performance of an autonomous hybrid energy system using micro hydro

3 1. Introduction Owing to unprecedented levels of socio-economic and environmental costs associated with large scale hydro electricity generation projects, there is a growing interest in local mini/micro hydropower in several countries [1]. Micro hydro has been identified as one of the most affordable renewable energy solutions for rural electrification in a multitude of viable low-head sites in isolated areas throughout the world [2,3]. The global technical potential of small hydropower is estimated 150-200 GW e ; only about 20% of this potential has been exploited to date [4]. However, 100% reliance on such ‘run-of- the-river’ facilities may not be viable for off-grid applications owing to their seasonal dependence on the stream flow [5]. Specifically, tropical countries face issue of intermittent power supply from micro hydro during the dry season, which is expected to be further aggravated from dwindling stream flow owing to global climate change [6,7] and rapidly increasing rural/local electricity demands [8,9]. Conventionally, the seasonal shortfall in electricity generation from micro hydros have been fulfilled by Diesel generator backup [10]. With the Diesel fuel prices (DFP) projected to remain low for the next 10-15 years, according to the OPEC Forecasting of Crude Oil Price [11], this would inadvertently result in tendency of operating Diesel Gensets over longer operating hours to meet the increased electricity demand in the foreseeable future. However, the majority of such sites are located in ecologically sensitive areas in developing countries (e.g. highlands and national parks), and any further aggravation in the use of standalone Diesel Gensets as backup to fulfil such unmet loads would lead to potentially detrimental environmental impacts of pollutant emissions on the precious flora and fauna in the region and the local population. Thus, while on the one hand micro hydros have shown credible performance capability for offering local remedy to current local energy demands [12–16], their potential for meeting the growing rural electricity demand sustainably, mainly in the context of availability of cheaper Diesel over the short-to-near term future, is questionable since simple scaling up of the current practice is going to be unsustainable.
Show more

31 Read more

Exploring Trade-offs of Compiler Optimizations to Enable Performance Portability for

Exploring Trade-offs of Compiler Optimizations to Enable Performance Portability for

Peter the Great St.Petersburg Polytechnic University, Saint Petersburg, Russia a@expx.org Abstract. Performance portability problem is manifested for architec- tures with deep memory hierarchies, in particular, as a result of insufficient spatial locality support by compiler infrastructures. A polyhedral opti- mization approach can target spatial locality, but faces a number of challenges like an ambiguity in compatibility with other optimizations, a lack of polyhedral ready benchmarks and effects of non-uniformity of real world systems with a multi-level memory. Complementing the prior research of selecting optimizations, this paper focuses on experimental characterization of loop tiling and vectorization using the full proxy ap- plication. The presented approach makes the portability of performance provable for target architectures with deep memory hierarchies. To this end, large-scale ccNUMA macronodes are considered as experimental prototypes for hypothetical HPC designs of a capacity-bandwidth type, capable of imposing singular challenges for performance portability.
Show more

13 Read more

MISSION POSSIBLE ENERGY TRADE-OFFS

MISSION POSSIBLE ENERGY TRADE-OFFS

„ Provide each student with a copy of the Student Guide and Facts about the Energy Sources and explain the activity to the class. Make sure the students understand that this activity is an exercise to explore trade-offs and the need for multiple energy sources. The activity includes a limited number of variables and is not intended to reflect the realities of the global or national economies. In addition, explain that some energy sources, such as solar and wind, do not produce consistent amounts of electricity all the time, so their total capacity must be increased when choosing these sources. Note also that several energy sources––wind, hydropower, and waste-to-energy––include a limited number of facilities that can be built because of geographical or fuel limitations.
Show more

16 Read more

Show all 10000 documents...