Optical interconnect (OI) is fast becoming one of leading technologies to overcome capacity requirements of emerging heterogeneous and bandwidth-intensive short reach communication links . Bandwidth scaling, have so far been sustained by mature technologies of fiber ribbons or individual fibers, but may not be enough to exploit huge optoelectronic bandwidth disparity existing between requirement and availability in forthcoming era of big data and high speed internet traffic . To combat fast growing internet traffic and commensurate increase in data rate, researchers are compelled to explore novel means for signal transmission in next generation exa–bandwidth OIs . OI configuration based on multicore fiber (MCF) shows promise to cope with the cable-size limitation in data centers, terabit switches, core routers and digital cross connect systems which necessitate high fiber count and high density cable . Space division multiplexing (SDM) employing MCF has sparked tremendous interest among researchers as a leading candidate for future exaflop (10 18 ) high
an optical vortex field in a single mode multicore fiber array. The derivations employed beam propagation me- thod and transparent boundary conditions to obtain the numerical solutions. Investigations explained the highly dynamical nature of the propagating vortex in the pro- posed structure. Different parameters were examined such as the number of cores and the size of the structure that could have an effect on the creation of new vortices. It is found that vortices are very responsive to variations of the array parameters either by increasing or by de- creasing the number of vortex pairs given that the topo- logical charge is conserved.
Abstract: Toward the next-generation exa-scale short-reach optical interconnects (OIs) supporting large-capacity data transmission, a compact computer-compatible 8-core heterogeneous trench-assisted multicore fiber (TA-MCF) is proposed, in which cores are arranged in a rectangular array. To analyze the crosstalk (XT) between adjacent cores of TA-MCF OI, a rigorous full-vectorial H-field finite element method (FEM) and coupled power theory are applied. The impact of various trench design parameters on the mode- coupling coefficient C mn and the coupling length L c is discussed in detail. An accurate
16 Read more
In order to obtain mode coupling coefficient between adjacent cores, firstly electric field distribution of MCF is calculated using commercial finite element analysis software FemSIM . Key issue of inter-core crosstalk in MCF interconnects transmission system is evaluated in absence of nonlinear noise as MCF is less sensitive to fiber nonlinearities . Crosstalk in any core ‘A’ is defined as the ratio of crosstalk power leaking into ‘A’ from adjacent core to the signal power guided in core ‘A’. Crosstalk among cores that are in different rows are minimal due to larger separation and are ignored . The distributed crosstalk powers between neighboring cores (see Fig. 1) over a length 100 m is calculated by coupled power theory . First, the power coupling coefficient is obtained for correlation length d c and then integrated over
20 Read more
Owing to the continuous growth of cloud services, proliferation of smart devices, baseband centralization in radio access networks and high speed broadband penetration, data centers have experienced explosive increases in capacity demand . It is a challenge to meet these requirements with conventional copper interconnects because of the inevitable increase in energy consumption and number of interconnections that follow. To cope with these ever-growing bandwidth requirement, optical interconnect (OI) is fast becoming a viable solution for bandwidth intensive short-reach communications because of its low propagation loss and high data transfer density . Although, parallel spatial paths in the form of fiber ribbons or individual fibers have been widely used, they are costly, bulky, unmanageable and may not be enough to combat the rapidly growing capacity demand of future high performance short-reach OI system . In order to relax the stress of data starvation of futuristic data centers, core routers, digital cross connect systems, and on- chip integrated photonic systems, OI configuration based on multicore fiber (MCF) has sparked tremendous interest among researchers as a potential and effective solution . MCF is a single strand of glass fiber contains a multitude of single mode cores at different positions in the aggregated fiber cross-section, which effectively reduces the fiber volume and cable complexity in short-reach OI applications . Space division multiplexing (SDM) technology using MCF is a powerful candidate to overcome the capacity crunch foreseen for the near future in short-reach OI communication systems .
Fig. 2 is a schematic of the experiment to apply bending and measure the induced differential strain. The bending test rig comprised a cantilever beam formed by a brass tube of outer diameter 1.6 mm. The FBGs in the MCF were recoated using a 410 µm mold and inserted into a PTFE tube which was then inserted into the brass cantilever beam. The PTFE tubing was used to ensure a well toleranced fit in the brass tube; thus ensuring the fiber took the shape of the cantilever beam when deflected. The FBGs were located 3.5 ± 1 mm from the fixed end of the cantilever beam. The cantilever was displaced by two orthogonal motorized micrometers at a distance of 199 mm from the fixed end. This configuration allows small curvatures to be applied independently along two axes; however this arrangement does not produce constant strain along the length of the fiber. For small deflections of the cantilever beam, the curvature at a distance z from the fixed end of the cantilever is given by
10 Read more
We have looked at a class of interesting data mining algorithms and shown efficient parallel implementations with speedups on large “production” problems of greater than 7.5 on an eight core system. We do not believe parallel studies of these algorithms have been discussed previously. Although the parallelism used familiar data parallel techniques, it was not trivial to get good multicore performance in face of the memory, cache and fluctuation overheads discussed here. We are currently tackling applications with millions of data points (PubChem for example has 18 million chemical compounds) each with thousands of properties (dimension D in equations 1-6); the data deluge will only increase the challenge! Simple approaches like K-means and the basic EM approach often find local minima. Here we are developing a suite of robust data mining algorithms that can be applied to large problems and use techniques like annealing to mitigate the local minima issue.
available string matching algorithms are optimized for uniprocessor computer architectures. They either cannot be efficiently executed by Multicore which have severe limits on fast memory or cannot fully take advantage of the high level parallelism of Multicore. On the other hand, multicore are difficult to program for high performance. Currently multicore programmers use hand-tuned and manually resource mapping approach which is not efficient for bioengineering application developers. Therefore an effective design methodology is required to explore string matching algorithms optimized for multicore.
the improved architectures and increasing clock speed of the last 15 years has allowed dramatic performance increase within a well established fixed (sequential) programming paradigm [2-4]. Understanding the data deluge is an important problem in all areas of computing from eScience to the commodity computing such as home PC’s that are the main driver of the semiconductor industry. Thus we posit that it is important to look at data analysis and data mining and derive efficient multicore implementations. We would like these to be relevant for both eScience and commodity applications. The former could involve data from high throughput instruments used in Life Sciences. The latter includes the analysis of environmental and surveillance monitors or the data fetched from the Internet that could characteristic a user’s interests. The RMS (Recognition, Mining, Synthesis) analysis from Intel [5, 6] identified data mining and gaming as critical applications for multicore chips. Scientific data is likely to be so voluminous that we need any implementation to work well on clusters of multicore chips with preferably the same programming model for the inter-chip as well as the intra-chip parallelism. On the other hand commodity applications might well not need cluster implementations but probably would prefer thread-based runtimes involving managed code – Java or C#. In most cases the data is likely to be distributed and so good Grid compatibility is an important requirement. High performance (scientific) computing has never had very sophisticated programming environments as the field is not large enough to support a major commercial software activity. Multicore could change the situation because of its broad importance but we then need a scientific computing programming model that is based on one applicable to commodity systems.
16 Read more
The trends discussed in the introduction motivate the SALSA (Service Aggregated Linked Sequential Activities)  at the Community Grids Laboratory. SALSA is exploring a set of data mining applications implemented in parallel on multicore systems. This is implemented in managed code C# with parallel synchronization from a runtime CCR (Concurrency and Computation Runtime) developed at Microsoft Research [13, 14]. CCR supports both MPI style synchronization and the dynamic threading essential in many concurrent commodity applications. Further there is a service model DSS (Decentralized System Services) built on top of CCR . CCR is a possible choice of runtime that could bridge between scientific and commodity applications as it supports the key concurrent primitives used in both of them. SALSA proposes that one builds applications as a suite of services [8, 9] rather than traditional subroutine or class libraries. The service model allows one to support integration within grid, cluster and inter-chip environments. Thus SALSA is exploring a possible future application (data mining) on multicore chips using a programming model that could be used across a broad set of computer configurations and could be the basis of a programming model that links scientific computing to commodity applications. We note that we program in a low level style with user responsible for explicit synchronization in the fashion that is familiar from MPI. There certainly could be general or domain specific higher level environments such as variants of automatic compilation, OpenMP, PGAS or even the new languages from Darpa’s HPCS program [6, 16]. Our work can still be relevant as it uses a runtime that is a natural target for such advanced high- level environments.
Application developers of today need to produce code which is error-free, and whose performance is optimized for plethora of devices. Performance of application code is studied e.g. by analyzing performance data obtained by executing application with tracing tool. Developers typically have their favorite tools which they prefer to use but unfortunately target devices are based on differ- ent computing platforms that have different performance probes which cause difficulties for using same tool with different multicore platforms. Universal Tracing Interface for Multicore Processors (UTIMP) aims to provide an unchangeable tracing interface enabling developers to perform re- quired tracing tasks with the UTIMP, utilizing the favorite tool when possible, for different multi- core platforms.
11 Read more
We need to understand how to broaden the success in parallelizing science and engineering to the much larger range of applications that could or should exploit multicore chips. In fact this broader set of applications must get good speedups on multicore chips if Moore’s law is to continue while we move from the single CPU architecture and clock speed improvements that drove past exponential performance increases to performance improvement driven by increasing cores per chip. We will focus on “scalable” parallelism where a given application can get good performance on 16-32 cores or more. On general principles backed up by experience from scientific and engineering, lessons from a small number of cores are only a good harbinger for scaling to larger systems if they backed up with a good model of the parallel execution. So in this section we analyze lessons from science and engineering on scalable parallelism – how one can get speed-up that is essentially proportional to the number of cores as one scales from 4-16- 128 and more cores. These lessons include in particularly a methodology for measuring speedup and identifying and measuring bottlenecks which is intrinsically important and should be examined and extended for general multicore applications. Note that multicore offers not just an implementation platform for science and engineering but an opportunity for improved software environments sustained by a broad application base. Thus there are clear mutual benefits in incorporating lessons and technologies developed from scalable parallel science and engineering applications into broader commodity software environments.
Fault-tolerance is also becoming a major concern when designing a NoC. As the integration of large numbers of components is pushed to smaller scales, a number of communication reliability issues are raised. Crosstalk, power supply noise, electromagnetic and inter-symbol interference are examples of such issues. Moreover, fabrication faults may appear, in the form of defective core nodes, wires or switches. In some cases, even though some regions of the chip are defective, the remaining chip area may be fully functional. From a NoC point of view, the presence of fabrication defects can turn an initial regular topology into an irregular one. In an off-chip network, the defective component(s) can simply be replaced, while for a multicore chip, if the routing layer cannot handle the fault(s), the chip is discarded. This is an issue known as yield and has a strong correlation to manufacturing costs.
13 Read more
Multi–core algorithm design is an integral concept of the project. The anticipated efficiency of present multi–core algorithms undoubtedly is imperative in the potential outcome of the project. The research of Liu et al. (2010) into the performance of multicore hardware systems establishes valuable conclusions. Liu et al. (2010) tested the decrease in processing time relative to the number of hardware processors utilised. Throughout the trials, the algorithm used was the Adaptive Differential Pulse Code Modulation (ADPCM). Yatsuzuka et al. (1998) outlines that ADPCM has widespread usage in public telephone networks for reducing the bandwidth required for both telephone conversations and internet traffic. The results of Liu et al. determined that for large values of data, the performance increase approached the number of cores. This conclusion is understandable in that whilst it is acknowledged that there are processing overheads in the creation of threads and assigning tasks (Silberschatz, Galvin & Gagne 2009), these actions can be diminished when compared to a large overall processing time. When the data is small, the algorithm is not as efficient.
157 Read more
There are many important trends influencing scientific computing. One is the growing adoption of the eScience paradigm which emphasizes the growing importance of distributed resources and collaboration. Another is the data deluge with new instruments, sensors, and the Internet driving an exponential increase of data and the associated data information knowledge wisdom pipeline which itself derives more bytes to worry about as in the results of simulations . Multicore chips are challenging because they require concurrency to exploit Moore’s law whereas the improved architectures and increasing clock speed of the last 15 years has allowed dramatic performance increase within a well established fixed (sequential) programming paradigm [2-4]. Understanding the data deluge is an important problem in all areas of computing from eScience to the commodity computing such as home PC’s that are the main driver of the semiconductor industry. Thus we posit that it is important to look at data analysis and data mining and derive efficient multicore implementations. We would
A novel 8-channel demux device based on multicore photonic crystal fiber (PCF) structures that operate at C-band range (1530-1565nm) has been demonstrated. The PCF demux design is based on replacing some air-holes areas with lithium niobate and silicon nitride materials over the PCF axis alongside with the appropriate optimizations of the PCF structure. The beam propagation method (BPM) combined with Matlab codes were used to modeled the demux device and to optimized the geometrical parameters of the PCF structure. Simulation results show that 8-channel can be demultiplexing after light propagation of 5 cm with large bandwidth (4.03-4.69nm) and crosstalk ((-16.88)-(-15.93) dB). Thus, the proposed device has a great potential to be integrated in dense wavelength division multiplexing (DWDM) technology for increasing performances in networking systems.
Abstract: Sensitive cell detection by magnetic resonance imaging (MRI) is an important tool for the development of cell therapies. However, clinically approved contrast agents that allow single-cell detection are currently not available. Therefore, we compared very small iron oxide nanoparticles (VSOP) and new multicore carboxymethyl dextran-coated iron oxide nanoparticles (multicore particles, MCP) designed by our department for magnetic particle imaging (MPI) with discontinued Resovist ® regarding their suitability for detection of single mesenchymal stem
19 Read more
Parallel Multicore processing aims to produce the same results by using multiple processors that ultimately increases the CPU utilization . In this study, the spirit of the data parallelism method was utilized to create a parallel particle swarm optimization (PPSO) algorithm. The purpose of applying parallel processing to particle swarm optimization goes further than merely being a hardware accelerator. Rather, a distributed formulation is developed which gives better solutions with reduced overall computation. It is difficult to find an algorithm which is efficient and effective for all types of problems. Our research has indicated that the performance of PPSO can be highly dependent on the level of correlation between parameters and the nature of the communication strategy. In PMPSO the particles update its velocity and position by using the following equation which denotes k th iteration with n cores. The mathematical form of the parallel particle swarm
The main advantage to multicore systems is that raw performance increase can come from increasing the number of cores rather than frequency, which translates into a slower growth in power consumption. General-purpose multicores are becoming necessary even in the realm of digital signal processing (DSP) where, in the past, one general-purpose control core performed many special purpose application-specific integrated circuits (ASICs) as part of a “system on chip.” This is primarily due to the variety of applications and performance required from these chips. This has driven the need for more general-purpose processors. Recent examples would include software-defined radio (SDR) base stations, or cell phone processors that are required to support numerous codecs and applications all with different characteristics, requiring a general programmable multicore.
A processor is a component that dominates the energy consumption of each node. Thus, most of techniques are designed to reduce energy consumption of the processor [14, 15, 16]. Most of the current processors are implemented with dynamic voltage and frequency scaling regulators to optimally adjust the voltage/frequency at run-time according to the workload behavior. Paper  focuses on the dynamic control of the voltage regulators in a chip multicore processors platform. Results from detailed simulations based on realistic experimental setups demonstrate up to 9% total energy saving. In  a minimum energy voltage scheduling on a single multi- core processor with software controlled dynamic voltage scaling is proposed. Another hardware approach for improving power management for multi-core systems is with implementation of multiple voltage islands . Different frequency scaling schemes were investigated in the past that minimize the total energy consumption. The problem is even more critical in the real-time applications scheduling with pre-specified tasks’ deadlines . The energy consumption heavily depends on the characteristic of applications. Some applications are com- putationally intensive, other are data intensive, while other are hybrid of both. Energy-aware programming models, regarding various types of workloads and architectures, are required to develop energy efficient appli- cations [21, 22]. Programming approaches, based on optimal utilization of shared hardware on-chip resources, e.g. cache memory, memory bandwidth, threads or cores, for improved energy efficiency are becoming popular because of their simple implementation. In heterogeneous multicore processors that integrate CPU cores and data-parallel accelerators (GPU) last-level cache memory can be managed to minimize the energy consump- tion . The dynamic assignment of thread-to-core can be particularly beneficial in saving the energy [24, 25]. These approaches are particularly important in the case of modern multi-core building blocks that have a lot of options for controlling the execution parameters, e.g., number of cores, system frequency, core binding, etc. The energy can be thus saved at all levels of large-scale architectures, even dynamically, depending on the application program characteristics.
12 Read more