• No results found

The first “Many-Core” Architectures

Many-Task Computing on Many-Core Architectures

Many-Task Computing on Many-Core Architectures

... a result, existing vendors must spend extra time and effort to modify or rewrite parts of their codebase to take advantage of the new capabilities provided by General Purpose GPUs (GPGPUs). Besides that, barely rewriting ...

14

Architectural Support and Compiler Optimization for Many-Core Architectures.

Architectural Support and Compiler Optimization for Many-Core Architectures.

... The discrete GPGPUs work as coprocessors attached to CPUs through PCI Express bus as shown in Figure 2.1. Instead of operating on the system memory directly, each GPGPU has own off-chip memory system. If the GPGPU needs ...

140

Vectorizing unstructured mesh computations for many core architectures

Vectorizing unstructured mesh computations for many core architectures

... 6.2. Baseline performance First, we briefly present and discuss the performance of the updated back-ends, described in Section 5, on Airfoil and Volna. This will serve as a baseline for analyzing the performance ...

21

On the acceleration of wavefront applications using distributed many-core architectures

On the acceleration of wavefront applications using distributed many-core architectures

... TABLE II C OMPILER CONFIGURATIONS . VI. W ORKSTATION P ERFORMANCE The first set of experiments investigate the performance of a single workstation executing the LU benchmark in both single- and double-precision. ...

16

On the acceleration of wavefront applications using distributed many core architectures

On the acceleration of wavefront applications using distributed many core architectures

... TABLE II C OMPILER CONFIGURATIONS . VI. W ORKSTATION P ERFORMANCE The first set of experiments investigate the performance of a single workstation executing the LU benchmark in both single- and double-precision. ...

16

Numerical Reproducibility for the Parallel Reduction on Multi- and Many-Core Architectures

Numerical Reproducibility for the Parallel Reduction on Multi- and Many-Core Architectures

... It is also used to compute the lower part of the mantissa of x that is added to the digit at the position k − 1 within the superaccumulator. When the result of the accumulation exceeds the range of the superaccumulator ...

17

Coarray-based Load Balancing on Heterogeneous and Many-Core Architectures

Coarray-based Load Balancing on Heterogeneous and Many-Core Architectures

... platforms. First of all, we analyze the performance of MT and MO on individual de- vices (without inter-process communication) using all the cores available on each ...

28

On the acceleration of wavefront applications using distributed many-core architectures

On the acceleration of wavefront applications using distributed many-core architectures

... TABLE II C OMPILER CONFIGURATIONS . VI. W ORKSTATION P ERFORMANCE The first set of experiments investigate the performance of a single workstation executing the LU benchmark in both single- and double-precision. ...

16

Programming models for many-core architectures: a co-design approach

Programming models for many-core architectures: a co-design approach

... The first language in the figure is C99 [25]. The language exhibits an imperative or structured paradigm, and the abstract machine belongs to the class of register machines. Because the language is ...

193

Fast Parallel Set Similarity Joins on Many-core Architectures

Fast Parallel Set Similarity Joins on Many-core Architectures

... a core operation for text data integration, cleaning, and ...The first algorithm, called gSSJoin, does not rely on any filtering scheme and, thus, exhibits much better robustness to variations of threshold ...

16

Many-Core Architectures: Hardware-Software Optimization and

Modeling Techniques

Many-Core Architectures: Hardware-Software Optimization and Modeling Techniques

... from the external memory latency. Another important observation made by the authors is that: the combination of prefetching and software caching can further improve the benefits obtained with only the software cache. The ...

184

CU2CL: A CUDA-to-OpenCL Translator for
Multi- and Many-core Architectures

CU2CL: A CUDA-to-OpenCL Translator for Multi- and Many-core Architectures

... Our project seeks to produce a tool that can be rapidly adopted by the CUDA and OpenCL communities. While nu- merous frameworks and tools for source-to-source translation exist [16]–[18], we chose to explore a number of ...

9

Modeling Algorithm Performance on Highly-threaded Many-core Architectures

Modeling Algorithm Performance on Highly-threaded Many-core Architectures

... First of all, I would like to express my sincere gratitude to the two great advisors: Dr. Roger Chamberlain and Dr. Kunal Agrawal. Roger is the most patient and friendly person I have ever met. Since picked up ...

169

High Performance On-Chip Interconnects Design for Future Many-Core Architectures

High Performance On-Chip Interconnects Design for Future Many-Core Architectures

... between many compute cores and a few memory controllers ...accommodate many data ...throughput-oriented architectures, thus designing a high bandwidth NoC in GPGPUs is of primary ...the first ...

133

Simulating Nonlinear Neutrino Oscillations on Next-Generation Many-Core Architectures

Simulating Nonlinear Neutrino Oscillations on Next-Generation Many-Core Architectures

... The first implemented dump mode is responsible for taking a snapshot of the neutrino flavor states by writing them to a file: without any further processing, the raw values of the neutrinos’ wavefunctions are ...

292

Hyperspectral image classification using parallel autoencoding diabolo networks on multi-core and many-core architectures

Hyperspectral image classification using parallel autoencoding diabolo networks on multi-core and many-core architectures

... The first solution has a lower parallelization effort when compared to the latter, but is not capable of achieving the same level of performance. Moreover, the serial implementation also allows us to perform a ...

15

Adaptive Optimization of Sparse Matrix Vector Multiplication on Emerging Many Core Architectures

Adaptive Optimization of Sparse Matrix Vector Multiplication on Emerging Many Core Architectures

... Our study builds upon large-scale experiments involved over 9,500 distinct profiling runs performed on 956 sparse datasets and five mainstream SpMV representations. We show that the best sparse matrix representation ...

10

Initial condition for efficient mapping of level set algorithms on many core architectures

Initial condition for efficient mapping of level set algorithms on many core architectures

... A case study on GPU The evolution process is divided into two steps. The first one is the planner step and the second is the evolution step. The planner creates the so-called plan. It contains the position offsets ...

11

Parallelized Kalman-Filter-Based Reconstruction of Particle Tracks on Many-Core Architectures

Parallelized Kalman-Filter-Based Reconstruction of Particle Tracks on Many-Core Architectures

... Event data from the full CMS simulation suite is translated into a reduced form that can be processed by our standalone tracking code. Seeds used are from the first iteration of the CMS iterative tracker [1]. ...

6

HICFD – Highly Efficient Implementation of

CFD Codes for HPC Many-Core Architectures

HICFD – Highly Efficient Implementation of CFD Codes for HPC Many-Core Architectures

... The first code, TRACE, uses a structured grid. Direct indexing is used in loops, and array indices are linearly transformed loop indices. We extracted four differ- ent computation kernels from the code and ...

12

Show all 10000 documents...

Related subjects