From the HW viewpoint, an important alternative for MPSoC prototyping and validation is HW emulation. In industry, one of the most complete sets of statistics is provided by Palladium II , which can accomodate very complex systems (i.e. up to 256 Mgate). However, its main disadvantages are its operation frequency (circa 1.6 Mhz) and cost (around $1 million). Then, ASIC Integrator  is much faster for MPSoC architectural exploration. However, its major drawback is the limitation to up to five ARM-based cores and only AMBA interconnection mechanisms. The same limitation of proprietary cores usage for exploration occurs with Heron . Finally, other relevant industrial emulation approaches are System Explore  and Zebu-XL , both based in multi-FPGA emulation in the order of Mhz. They can be used to validate IPs, but are not flexible enough for fast MPSoC design exploration or detailed statistics extraction. In the academic world, the most complete emulation platform up-to-date for exploring MPSoC alternatives is TC4SOC . It uses a propietary 32-bit VLIW core and enables the exploration of the interconnection mechanisms and different protocols by using an FPGA to reconfigure the network interfaces. Also,  proposes a HW/SW emulation framework that enables the exploration of different Network-on- Chip (NoC) interconnection mechanisms. However, they do not allow designers to exhaustively extract statistics and explore the other two architectural levels we propose, namely memory hierarchy and processing cores.
Long distance data communication over multi-hop wireline paths in conventional Network- on-Chips (NoCs) cause high-energy consumption and degradation in performance. Many emerging interconnect technologies such as 3D integration, photonic, Radio Frequency (RF) and wireless interconnects have been envisioned to alleviate the issues of a metal/dielectric interconnect system. Most computing platforms such as embedded systems to server blades comprise of multiple Systems-on-Chips (SoCs). Traditionally, these multichip platforms are interconnected using metal traces over a substrate such as a Printed Circuit Board (PCB). Communications in multichip platforms involves data transfer between internal nets and the peripheral I/O ports of the chips as well as across the PCB traces. This multi-hop communication leads to higher energy consumption, decrease in data bandwidth and increase in message latency. To satisfy the increasing demand for high speed and low power interconnects, THz Wireless NoC (WiNoC) enabled with high-speed direct links between distant cores is desired.
The majority of embedded systems are located in real- time applications. Amongst others, the real-time performance of multiprocessor computers relies on the predictability of the interprocessor communication. For an MPSoC, deterministic behaviour of the interconnection network has to be guaranteed. This requirement is hardly to implement with conventional packet routing that takes place in direct, i.e. static networks. In static networks, adaptive multi-hop routing together with packet prioritization induces an undesirable indeterminism to network latency. The formation of hotspots due to excessive data traffic in router nodes excludes predictability also. We therefore propose, a new paradigm for MPSoCs, which makes use of multistage interconnection networks (MINs) as a network on chip.
Speed is a very interesting feature of present time systems either it is electronic or mechanical. From a long time system designers are trying to speed up the system. There are many constraints for the designers as only speed of the system is not required cost, power, size and complexity should not increase up to a certain limit. Now in case of microprocessor/microcontroller speed is the main consideration. No one wants the lower speed microprocessor/microcontroller. Any embedded system speed is dependent on the speed of the processor. Early time processor designers were followed a concept of increasing the number of data lines. In the processor design first four bit microprocessor was developed after that eight, sixteen, thirty two and so on. But soon this concept was saturated as the number of lines increases up to a large extent. The complexity and power dissipation increases due to the increase in the capacitance by these lines. In the same way the concept of pipelining was also saturated after a certain design complexity level. So designers move to a new architecture that is the multi-core architecture. In multi core architecture similar microcontrollers are connected in parallel. These microcontrollers are connected to a common shared bus. So the speed of the multi-microcontroller architecture depends on speed of the interconnection network. This interconnection network is nothing but network-on-chip (NoC). So there is a need of synchronization of the buses of microcontrollers to common shared bus by means of arbitration technique. To synchronize the buses we need tri state buffers/latches.
As the number of the elements and transactions among cores of the Multiprocessor SoC (MPSoC) and Multi core processor (MCP) increases, the reliability and performance of the system becomes a key design and implementation issue for large scale system. Fault may be either transient or permanent and packet congestion in the interconnection network is sources for reducing the reliability and performance in a NoC based system. A fault tolerant and high performance network system provides continuous operation in the presence of faults or congestions.
Abstract—To meet the growing computation-intensive applications and the needs of low-power, high- performance systems, and the number of computing resources in single-chip has enormously increased, because current VLSI technology can support such an extensive integration of transistors. This paper presents adaptive routing selection strategies suitable for network-on-chip (NoC). The main prototype presented in this paper use west first routing algorithm to make routing decision at runtime during application execution time. Messages in the NoC are switched with a wormhole cut-through switching method, where different messages can be interleaved at flit-level in the same communication link without using virtual channels. Hence, the head-of-line blocking problem can be solved effectively and efficiently .
It will allow wireless transfer of audio and video data at up to 5 gigabits per second, ten times the current maximum wireless transfer rate, at one-tenth the cost . National Information and Communications Technology Research Centre (NICTA) researchers have chosen to develop this technology in the 57-64GHz unlicensed frequency band as the millimeter -wave range of the spectrum makes possible high component on-chip integration as well as allowing for the integration of very small high gain arrays. The available 7GHz of spectrum results in very high data rates, up to 5
As previously noted, using smaller cores as a memory traffic reduction technique has a low level of effectiveness in aiding CMP core scaling, because the die area freed up for additional cache space has a −α exponent damping effect on memory traffic reduction. However, since smaller cores are slower, the memory traffic will be stretched over a longer time period, allowing them to fit the bandwidth envelope more easily, but at a direct cost to performance. We classify cache compression, 3D-stacked cache, unused data filtering, and sectored caches as techniques with a medium level of effectiveness. While these techniques also increase the effective cache capacity, compared to using smaller cores, they have the potential to increase the amount of cache per core much more drastically (up to 5×). Therefore, they provide a moderate benefit to CMP core scaling. Sectored caches directly impact memory traffic requirements by reducing the amount of data that must be fetched on-chip. However, since the fraction of memory traffic that can be eliminated is modest (40% in our realistic assumptions), it only has a medium level of effectiveness. The most effective techniques for CMP core scaling are DRAM caches, link compression, smaller cache lines, and cache+link compression. Interestingly, although DRAM caches only affect memory traffic indirectly, their huge 8× density improvement over SRAM allows very effective core scaling.
Communication between chips can occur through conventional, RF, optical and 3D interconnects. Metal wires are used to interconnect blocks in SoC’s and other chips which not only increase heat dissipation but also create restriction for signals, noise and interference. Wireless communication between two blocks having same substrate requires on chip antenna. Usage of Giga Hertz frequency results in millimeter antenna of very small size. Transceivers use electromagnetic waves at radio frequencies. Various types of antennas are used for different purposes and antenna on-chip is one configuration among them -.
This section gives an overview of the experimental setup of the proposed system model, and evaluates its performance in detail. The wireless multichip architecture is a hybrid network with both wired and wireless interconnects. The system is considered to have 64 cores per chip, and the number of chips in the system is varied from one to a maximum of four for this work’s experiments, yielding different systems of sizes 64, 128, 192 and 256 cores. Every core in the multichip system is integrated with a NoC switch, and the switches within each independent chip are interconnected using an intrachip NoC architecture using two different topologies as explained in Chapter 3. Wireless Interfaces (WIs) are integrated in exactly 4 switches in the each of these chips in order to realize interchip wireless communication. The on-chip zig-zag antennas considered in this work for these WIs are able to provide a bandwidth of 16 GHz around a center frequency of 60 GHz . The CDMA based wireless transceivers achieve a total data rate of 6 Gbps  for all channels. The total power dissipation of this transceiver, which is the combined power consumption of all its components including the CDMA encoder/decoder, BPSK modulator, LNA mixer, ADC  and the VGA , is 20.6272 mW.
There are three flex parts needed for the flex power module. First is used as interconnection between gate driver and MOSFET, which has a double layer double access feature. Gate drive bare die is flipped and attached to the top side of this layer as well as a 0.1uf bypass capacitor and gate resistor in a 0805 standard package. The power MOSFET die is attached to the bottom side of the layer. Gate signal generated by a signal generator is amplified by gate driver and applied to the MOSFET gate through vias. Copper traces are extended and solderable pads are placed at both ends of the part in order to make electrical connection with the PCB mother board. Figure 2-4 is the actual circuit picture.
Finally a few comments on the nature of the design chal- lenges faced when designing a NOC. The packet-switched interconnect consisting of routers and links simply transports packets from source to destination. Here, the challenges are mostly hardware design issues like area, speed and power consumption of the routers and links, as well as the use of mesochronous, plesiochronous or asynchronous circuit design solutions. The design of the network adapters on the other hand is heavily influenced by system-level issues like the programming model and the memory hierarchy. The memory model is typically distributed non-coherent shared memory. In in order to hide the latency associated with accessing memory in a remote node, DMA-controllers are often used to transfer blocks of data. This functionality may be provided by the network adapter or it may be provided by the processor node – the boundary between the two is not 100% obvious.
CNT bumps had been demonstrated by several groups as potential off-chip interconnects [4-6]. Soga et al. have shown the bumps’ good mechanical flexibility and low bundle resistance of 2.3 Ω (for a 100-μm diameter bump) . Hermann et al. demonstrated a reliable electrical flip chip interconnect using CNT bumps over 2,000 tempera- ture cycles . CNT bumps for practical applications such as high-power amplifier application had also been demonstrated . In all the mentioned works, the CNT bumps were grown using the chemical vapor deposition [CVD] approach. The mechanism for vertical alignment during the CVD approach is achieved by the van der Waals forces between the walls of CNTs, resulting in tubes that are not exactly ‘ aligned ’ . The poor ‘ align- ment’ forms bends, reduces the mean free path [m.f.p.], and increases the resistance of CNTs . Plasma- enhanced CVD [PECVD] is able to resolve this issue by the introduction of electric field to achieve alignment as well as lower growth temperature .
scheme supporting the runtime path arrangement occurs in the setup phase. Restriction of the routing function for deadlock- free data transfer in the virtual circuits with a priority approach may lead to throughput degradation in packet- switched NoCs . Moreover, the implementation of queuing buffers in the packet-switched routers dramatically increases the cost in terms of area required and power consumption. The circuit-switching approach is favored to provide hard guaranteed throughput due to its attractive QoS property, once a circuit is set up .After this setup, end-to-end data can be pipelined in order at the full rate of the dedicated links with low delay, no data jitter, and in a lossless manner (i.e.,without data dropping) due to there being no collisions among the data streams. Importantly without queuing buffer and complex routing/arbitrating implementation, the circuit switched router results in a low-cost (i.e., area, power) design suitable for the limited on-chip budget. However, a path-setup scheme used in circuit-switched NoCs needs critical considerations for proper functioning (i.e., dead and live-lock free) and low- latency setup with minimization of introduced hardware overhead. Moreover, the dynamic and distributed feature of the path-setup in the circuit- switched NoC is also mandatory to ensure system flexibility and scalability in dynamic management (allocation) of the guaranteed communication ci rcuits. This work advocates the guaranteed throughput implementation with the circuit- switching approach due to compact implementation of routers suitable for on-chip environment and an intrinsic hard QoS property after a circuit has been setup. A novel, practical pipeline circuit-switched switch design is proposed, termed backtracking wave-pipeline switch (or BW switch), to support on-chip hard guaranteed throughput applications.
The cost of a solar home system is dominated by the batteries and panel  as shown in Figure 1. Individuals who cannot afford to invest in a full SHS could instead install a cheaper controller and converter, connecting their dwelling to a network and allowing them to purchase excess electricity from other connected systems. The quality of service may not be as good as a SHS, as there may be times when all systems of the network are unwilling to sell, but this first step may be an effective method to spread the cost and facilitate quicker and more cost effective electrification for those who cannot afford to purchase a complete SHS. In a location with many SHSs it may make sense for new customers not to buy a SHS at all and simply purchase an interconnection. This could also be used to connect new load centers without energy access to minigrids with excess energy. For example, small enterprises, who don't have the capital to invest in all of the components of an off- grid system could utilize this pooled energy resource and unlock business opportunities within local communities, while also providing income generating potential for prosumers in the sale of energy to these small business.
This paper proposed a SoC-based AES cryptographic system that uses a minimum area and yet provides an adequately high throughput. The SubByte and Inverse SubByte units were implemented with an LUT-based logic which, although has a fixed access and cannot be pipelined, reduces the critical path that would have been caused by the combinational logic-based implementation. Furthermore, a low-critical-path and shared MixColumn/Inverse MixColumn unit was implemented. A proposed common unit, ComMult, which combines the multiplication for both MixColumn and Inverse MixColumn in a compact structure with low critical path. The proposed key generator utilizes one 32-bit XOR in the computation of the round keys. In every cycle, a 32-bit word is computed and appended to a shift register. The use of the shift registers provided a smaller logic than would be required for say, a finite state register. The proposed AES crypto core yielded a maximum frequency of 621.12MHz which translates to an average throughput of 7.15Gbps, 6.06Gbps and 5.25Gbps for AES-128, AES-192 and AES-256 respectively. The total area was 16.1K gates with 9.216KB memory.
Abstract: India and Sri-lanka are facing a growing gap, between power demand and supply. However, India's power system is very large, and it's North Eastern and Eastern regions have surplus power and the neighboring countries also have significant hydro potential. This surplus energy should be available to support power exports from India to export power to Sri-lanka. The proposed transmission interconnection between Madurai and Tuticorin in Tamil Nadu, India, and Anuradhapura and Puttalam in Sri-lanka optimizes the installed capacity by way of utilizing the diversity in peak demand, sharing spinning reserves and also optimizing the overall generation mix. This paper is focused on different studies conducted in the past and pre-feasibility study for interconnection has been carried with most viable interconnection scheme on the basis of present energy scenario of both countries which is more viable presently. This interconnection scheme appears as a solution problem of energy deficit such that instated of constructing more generating plants separately these two countries can share some part of their resources and generation among them via transmission by proper scheduling of load through load curve.
The exponential growth in computing performance quickly led to more sophisticated computing platforms. This rapid growth increased the demand for faster computing performance; every new enhancement in processors leads to greater performance demands. Moore’s Law predicts that the number of transistors on a computer microprocessor will double every two years or so, providing regular leaps in computing power . Over more than four decades, this has driven the impressive growth in computer speed and accessibility. But lately, Moore’s Law has begun to show signs of falling, which insists on the emergence of multi-core processors .
Integration of a multi-core chip drives production yields down and they are more difficult to manage thermally than lower-density single-chip designs. Intel has partially countered this first problem by creating its quad-core designs by combining two dual-core on a single die with a unified cache, hence any two working dual-core dies can be used, as opposed to producing four cores on a single die and requiring all four to work to produce a quad-core. From an architectural point of view, ultimately, single CPU designs may make better use of the silicon surface area than multiprocessing cores, so a development commitment to this architecture may carry the risk of obsolescence. Finally, raw processing power is not the only constraint on system performance. Two processing cores sharing the same system bus and memory bandwidth limits the real-world performance advantage. If a single core is close to being memory bandwidth limited, going to dual-core might only give 30% to 70% improvement. If memory bandwidth is not a problem, a 90% improvement can be expected. It would be possible for an application that used two CPUs to end up running faster on one dual-core if communication between the CPUs was the limiting factor, which would count as more than 100% improvement.
Architectural design follows and here the system is partitioned into computational blocks with interconnect- ing communication channels; these channels can be simple connections such as those used in buses or can be modules of any complexity, such as buffering data and performing data manipulation or transformations. Formally separating out the computation from commu- nication allows each to be modeled separately, enabling extremely large and complex designs to be partitioned between different teams, enhancing the chances of cor- rect operation when the different sections are merged. A key step in this process is the formal specification of the interfaces between modules. Another significant advantage of this approach is that software development can commence in parallel with hardware development; previously when the interconnections weren’t formally modeled, software development was unable to proceed significantly until the hardware was defined at the Reg- ister Transfer Level.