scalable high-performance environment

Top PDF scalable high-performance environment:

Evaluating and Enabling Scalable High Performance Computing Workloads on Commercial Clouds

Evaluating and Enabling Scalable High Performance Computing Workloads on Commercial Clouds

The area of building infrastructure and support systems for urgent computing fo- cuses on how best to take advantage of existing computational resources for urgent comput- ing. Generally this is accomplished through some type of priority system in order to allow for urgent computing tasks to be completed in a timely manner. One such system that was designed to help with the urgent computing issue is the Special Priority and Urgent Com- puting Environment (SPRUCE) [11]. SPRUCE utilizes a novel token-based authorization system that can be used to facilitate and track urgent computation sessions by approved users. SPRUCE is designed to work closely with existing resource providers and supercom- puting centers to allow each resource provider to have full control regarding the policies around which resources and parts of their system can be utilized for urgent computation. The way that SPRUCE works is that a user or group of users is given a special generated “token” that gives the jobs that they submit to the system special “priority”. This means that their jobs will execute first and in a more timely manner than a standard system user. By submitting at a higher priority than the standard users, SPRUCE users can also preempt or cancel another standard users’ jobs. However, these preemption and priority settings vary as SPRUCE does not provide standard rules. Instead these priority and pre- emption settings are defined by the resource providers. This can lead to inconsistencies between SPRUCE sites and could lead to some confusion amongst researchers attempting to submit their urgent computing jobs as the processing delay can vary widely between different SPRUCE sites.
Show more

142 Read more

GloMoSim: A Scalable Network Simulation Environment

GloMoSim: A Scalable Network Simulation Environment

Detailed, high fidelity models of large networks represent a significant challenge for the networking community. As the military moves towards deploying a digital communication infrastructure, it is imperative that the performance of the communication devices be thoroughly studied prior to deployment to understand the limits of the network and its ability to handle diverse traffic under stringent operating conditions. This paper presented a simulation library called GloMoSim whose goal is to support accurate performance prediction of large-scale network models using parallel execution on a diverse set of parallel computers. The library has already been used to simulate networks with thousands of wireless nodes and provides a rich set of models for both existing and novel protocols at multiple layers of the protocol stack. It has been used to undertake numerous performance studies among alternative protocols both at UCLA and other organizations. It is available for download at http://pcl.cs.ucla.edu/projects/domains/glomosim.html
Show more

12 Read more

Investigation Analysis On Data Prefetching And Mapreduce Techniques For User Query Processing

Investigation Analysis On Data Prefetching And Mapreduce Techniques For User Query Processing

But, the prefetching was not carried out at earlier stage to minimize the job completion time. A scalable pipeline of components constructed on the Spark engine for large-scale data processing in [3]. The main aim was to collect the data from dataset access logs for organizing them into weekly snapshots and predictive techniques to forecast the datasets. But, the latency was not minimized because prefetching was not carried out. An envisioned future large-scale computing architecture were adapted in [4] for batch processing of big data application in MapReduce model. But, computational complexity was not minimized by future large-scale computing architectures. The big data processing framework was designed in [5] to join the climate and health data and to find the correlation between the climate parameters. But, prefetching was not performed in big data processing framework. A novel intermediate data partition scheme was designed in [6] to reduce the network traffic cost for MapReduce job. Every aggregator reduced the merged traffic from multiple map tasks through addressing the aggregator placement issue. However, the data partitioning was not carried out efficiently through intermediate data partition scheme.High-Level MapReduce Query Languages was built in [7] on MR that converted the queries into an executable native MR jobs. But, the complexity was not minimized through High- Level MapReduce Query Languages. A new A* algorithm introduced in [8] reduced the Map and Reduce tasks for running the path computation on Hadoop MapReduce framework. The designed framework enhanced the feasibility and reliability. A* algorithm minimized computation time. However, MapReduce tasks was not performed by A* algorithm. A novel approach was designed in [9] to improve the metadata management performance for Hadoop in multitenant environment based on the prefetching mechanism. However, the map reduce function was not employed in the multitenant environment. A new efficient pattern mining algorithm was introduced in [10] by using MapReduce framework and Hadoop open-source implementation in the big data. A maximal AprioriMR algorithm was designed for mining condensed frequent patterns. But, the execution time was not minimized using efficient pattern mining algorithm. Secured Map Reduce (SMR) Layer was designed in [11] between the HDFS and MR Layer for improving the security and privacy. The designed model provided the privacy and security through B
Show more

5 Read more

A Cloud-Assisted Internet of Things Framework for Pervasive Healthcare in Smart City Environment

A Cloud-Assisted Internet of Things Framework for Pervasive Healthcare in Smart City Environment

However, pervasive healthcare applications generate a vast amount of sensor data that have to be managed properly for further analysis and processing [3]. Efficient management of the large number of monitored data from various IoTs in terms of processing, storing and analysis is an important issue to deal with for its large scale adoption in pervasive healthcare services. Since IoTs are limited in memory, energy, computation, and communication capabilities, they require a powerful and scalable high performance computing and massive storage infrastructure for real-time processing and storing of the data as well as analysis (on-line and off-line) of the processed information under context using inherently complex models to extract knowledge about the health condition of patients.
Show more

5 Read more

Scalable parallel evolutionary optimisation based on high performance computing

Scalable parallel evolutionary optimisation based on high performance computing

The correctness verification did not attract much attention in existing works when most EAs were designed for running on the conventional serial-based computing facilities because the correctness verification can be simply achieved by comparing the outputs of the different im- plementations of the same algorithm, directly. However, things get complicated when the EAs are put on the GPU-based environment because the difference of the outputs generally caused by the unpredictable execution order of the parallel processes. An example of the conventional PSO algorithm is given as an example in both the sequential and the GPU-based ways in this section to showcase the difficulty and failure of correctness verification in the GPU-based environment against the CPU-based environment. In addition, four programming languages including Matlab, Python, C/C++, and CUDA are used to establish four versions of PSO to get more objective results. Four test functions, namely, Sphere, Ackley, Griewank, and Rastrigin, are used with D = 10 where D is the problem dimension. The outputs of solution quality are measured by the mean FEVs [88] of 30 independent instances repeated with differ- ent random seeds. The configuration of these PSO implementations are based on the standard PSO [241], and the maximal fitness evaluation is set to D ∗ 10 4 as the stopping criterion. It has been known in common sense that the Random Number Generators (RNGs) influence the outputs of the EAs. Thus, two different RNG configurations are used in the example and are listed as follows:
Show more

189 Read more

Research on High-performance and Scalable Data Access in Parallel Big Data Computing

Research on High-performance and Scalable Data Access in Parallel Big Data Computing

In order to address the above challenges, we firstly propose a scalable locality-aware middleware (SLAM), which allows scientific analysis applications to benefit from data-locality exploitation with the use of HDFS, while also maintaining the flexibility and efficiency of the MPI program- ming model. SLAM aims to enable parallel processes to achieve high I/O performance in the environment of data-intensive computing and it consists of three components: (1) a data-centric scheduler (DC-scheduler), which transforms a compute-centric mapping into a data-centric one so that a computational process always accesses data from a local or nearby computation node, (2) a data location-aware monitor to support the DC-scheduler by obtaining the physical data distribu- tion in the underlying file system, and (3) a virtual I/O translation layer to enable computational processes to execute conventional I/O operations on distributed file systems. SLAM can benefit not only parallel programs that call our DC-scheduler to optimize data access during development, but also existing programs in which the original process-to-file assignments could be intercepted and re-assigned so as to achieve maximum efficiency on a parallel system.
Show more

133 Read more

A Minimal Linux Environment for High Performance Computing Systems

A Minimal Linux Environment for High Performance Computing Systems

Sandia Labs has produced a long lineage of HPC systems with scalable runtime components. Many of these systems have used a LWK. Cplant [4], however, implemented the same runtime components on a commodity cluster system using a typical Linux OS. The runtime components that we will discuss perform the same basic function whether implemented for use with a LWK or a commodity Linux kernel. The components of the runtime system important for this discussion are named pct and yod. Pct is the runtime component that is executed during system initialization on each compute node. The pct process remains persistent for the life of the node. Yod is the command used to launch the HPC application. In the most basic usage of yod, an application is executed on the system by using the yod command combined with a flag that specifies the list of nodes the HPC application should be launched on. Yod communicates with the pct processes on all of the nodes on the list and distributes the application to each of the nodes. In our implementation the application executable is effectively copied, although in a scalable fanout fashion, to each node’s file- system. Recall from the discussion in Section 4 (Light-os: In Brief) that each node's file-system is a memory resident file- system. Once the fanout process is complete the pct starts the application. After the application is started pct basically gets out of the way to allow the application the maximum amount of node resources possible.
Show more

7 Read more

Scalable Data Analysis in High Performance Computing

Scalable Data Analysis in High Performance Computing

- strates the first Map-Reduce implementation of DBSCAN. The core idea of this approach is the same as the first par- allelization attempt of Xu, that is, to parallelize singular neighborhood queries—this time in form of Map-Tasks. He et al. [19] present another implementation of a parallel DB- SCAN based on the Map-Reduce paradigm. They are the first to introduce the notion of a cell-based preprocessing step in order to perform a fully distributed clustering with- out the need to replicate the entire dataset or to communi- cate inbetween. Finally, Patwary et al. [25] have published research work that shows a parallel DBSCAN that scales up to hundreds of cores. Their main contribution is a quick merging algorithm based on a disjoint-set data structure. However, they either need to fit the entire dataset into main memory or need a manual preprocessing step that splits the data within a distributed computing environment.
Show more

156 Read more

A High Performance of 3D Object Reconstruction in a Cloud Environment

A High Performance of 3D Object Reconstruction in a Cloud Environment

A private Cloud is set Using Eucalyptus which is an open source software which implements Infrastructure as a service (IaaS) Cloud [27]. The IaaS infrastructure allows the end user to flexibly execute distributed scientific applications over the allocated VMs. Eucalyptus is used as the Cloud management layer because of its compatibility with commercial cloud products such as Amazon EC2 and S3 [27]. This compatibility enables to run a scientific application on a private cloud using Eucalyptus and a public cloud using Amazon without modification in execution framework or the distributed application [7]. The experiments are conducted on a private Cloud consists of four physical machines, each machine has i7 core Intel 2.2 GHz processor and 32GB memory. The virtualization layer is based on Xen hypervisor version 4.3. The VMs are deployed using Eucalyptus version 4. Each node runs Ubuntu version 12 operating system. OpenMPI version 1.5 is used with Open-MX version 1.4 as an optimised architecture for message passing MPI. 1 Gigabit Ethernet network fabric is used for networking. Open-MX is used to improve the commination performance of the MPI [28]. On Each
Show more

9 Read more

Parallel and Distributed GIS for Processing Geo data: An Overview

Parallel and Distributed GIS for Processing Geo data: An Overview

Geographic Information System (GIS) is a collection of applications whose tasks include (collaborating with other systems and) gathering geographic data, store and process spatio–temporal data (geo-data) and share the derived geographic knowledge with the users and other applications. Some of the most important routine applications of GIS are spatial analysis, digital elevation model (DEM) analysis such as line of sight and slope computations, watershed and viewshed analysis, etc. GIS has became quite an important tool for geospatial sciences and has gone beyond typical tasks of mapping to performing complex spatio-temporal analysis and operations. The number of users relying upon Decision and Support Systems (DSS) built upon GIS has increased as a result of the availability of very high resolution satellite imagery and integration of spatial data and analyses with GIS packages which now satisfies the needs of many and is not just used for specialized operations. Moreover, Global Positioning Systems (GPS) in a range of mobile devices and sensors which includes updates in very short intervals has led to geo-information explosion. Parallel and Distributed Computing systems are now essential for computing over such huge amount of data and deliver faster results. The focus on development should thus be shifted from traditional GIS to Parallel and Distributed GIS as the traditional GIS systems have become quite mature and saturated while technologies such as MPI (Message Passing Interface) and GPGPUs (General Purpose Graphics Programming Units) can be readily utilized for faster geo-data processing. The performance improved using recently developed technologies such as CUDA (Nvidia GPUs), OpenCL (ATI GPUs) and Intel's Xeon Phi co-processors could be as much as ten times if not more compared to traditional Geographic Information Systems.
Show more

8 Read more

Vormetric and PCI Compliance in AWS A COALFIRE WHITE PAPER

Vormetric and PCI Compliance in AWS A COALFIRE WHITE PAPER

Vormetric (@Vormetric) is the industry leader in data security solutions that span physical, virtual and cloud environments. Data is the new currency and Vormetric helps over 1300 customers, including 17 of the Fortune 25 and many of the world’s most security conscious government organizations, to meet compliance requirements and protect what matters — their sensitive data — from both internal and external threats. The company’s scalable Vormetric Data Security Platform protects any file, any database and any application — anywhere it resides — with a high performance, market-leading data security platform that incorporates application transparent encryption, privileged user access controls, automation and security intelligence. For more information, please visit: www.vormetric.com.
Show more

22 Read more

SAP Customer Success Story High Tech Alternative Logic. Alternative Logic: Creating an Innovative Remote Workforce Management Platform

SAP Customer Success Story High Tech Alternative Logic. Alternative Logic: Creating an Innovative Remote Workforce Management Platform

FieldLogic is an Alternative Logic hosted software platform that allows organizations to manage their remote workforces effectively and efficiently. Remote workers use mobile devices to log their hours, locations, movements, and job status. FieldLogic enables organizations to access up-to- the-minute progress reports through any Internet connection. It also enables customer service teams to accurately update customers and manage their expectations. The result: operations run more smoothly, more efficiently, and more cost–effectively. FieldLogic also reduces the risk of error or fraud. It improves communications between field teams and central offices and provides organizations with clear audit trails. The software is flexible and scalable and can be customized for businesses regardless of size or industry sector.
Show more

8 Read more

Software and Performance Engineering of the Community Atmosphere Model

Software and Performance Engineering of the Community Atmosphere Model

A. Mirin and W. Sawyer, A Scalable Implementation of a Finite-Volume Dynamical Core in the Community Atmosphere Model, International Journal for High Performance Computer Applications, 19(3), August 2005, pp. 203-212. W. Putman, S-J. Lin, and B-W. Shen, Cross-Platform Performance of a Portable

24 Read more

A high performance computational environment for UHTS studies

A high performance computational environment for UHTS studies

The chance to use the high accuracy information from the next genera- tion sequencing technologies for a large number of samples is revolutionary step for molecular biomedicine. It can open to the possibility to investi- gate transcription events and gene dynamics through a numerical point of view. On very large set of experiments statistic measures and mathemat- ical models can be applied to describe biology events. Tasks of interest could include the identification of signatures of disease in different indi- viduals or the validation of models that can describe widespread features between different populations. The relatively small number of studies can be a constraint to this type of investigation. Fortunately, as in microar- ray or mass spectrometry cases, public repositories are growing to store data from the next generation studies produced in laboratories around the world. A repository can be used to access to a large number of samples from experiments with different individuals, populations and sequencing platforms.
Show more

114 Read more

Scalable High Performance Data Analytics: Harp and Harp-DAAL: Indiana University

Scalable High Performance Data Analytics: Harp and Harp-DAAL: Indiana University

Motivation for faster and bigger problems • Machine Learning ML Needs high performance – Big data and Big model – Iterative algorithms are fundamental in learning a non-trivial model – M[r]

48 Read more

Information Federation in Grid Information Services

Information Federation in Grid Information Services

services vStrategies for high-performance, scalable in-memory storage vStrategies for efficient distribution, replica-content placement, consistency enforcement by utilizing pub-sub base[r]

36 Read more

SmartCities Public Final Report

SmartCities Public Final Report

To design and implement a centrally managed, high-performance, scalable engine for data movement, data transformation and interface management across all the scheme applications provider[r]

35 Read more

Towards a scalable environment for traffic information services

Towards a scalable environment for traffic information services

We elicited the need to analyse traffic data to identify patterns and shifting trends in congestion of traffic, in order for a provider of traffic information services to offer new services. To facilitate this, we studied parts of the enterprise architecture and identified elements that can be added to acquire scalability. These are a more scalable data platform suitable for storing large volumes of data that need to be anal- ysed and a processing framework capable of performing these analytics. We created a prototype with these elements to evaluate its scalability and to support the development of a new information service in the form of a dashboard. A prototype dashboard was also made, to demonstrate the possibilities the scalable environment offers. Although the prototype only had limited visualizations, it confirmed that our proposed scalable architecture can provide the wanted information services that show patterns and trends. These services can be used by the customers of Simacan and other providers of traffic infor- mation. They can be used by traffic news outlets, providing news about growth or decline in the amount of congestion, for example. Information about the regular patterns of congestion can be useful for any logistics party involved with the planning of transportation. The information about changes in conges- tion can be used by any party in the area of road management, to support decision making concerning changes to the road network.
Show more

62 Read more

CIF21 DIBBs: Middleware and High Performance Analytics Libraries for Scalable Data Science

CIF21 DIBBs: Middleware and High Performance Analytics Libraries for Scalable Data Science

• MIDAS: Integrating Middleware – from project • SPIDAL Scalable Parallel Interoperable Data Analytics Library: Scalable Analytics for Biomolecular Simulations, Network and Computational[r]

27 Read more

Paas Providers And Their Offerings

Paas Providers And Their Offerings

Heroku is a PaaS platform based on abstract computing environment known as dynos, that are Unix style virtualized containers. Dynos run processes in isolated environments and allows users to run apps inside these containers [14]. Heroku takes care of other thing that are required to run apps including container orchestration, load balancing, logging, configuration, failovers, security, and many others. It has a powerful ecosystem needed for deploying and running modern apps. Heroku is a polyglot PaaS platform as it allows the developer to build, deploy, run and scale applications across multiple languages. It supports Ruby, Scala, Clojure, Python, PHP, Go etc. Heroku is well integrated with Git for deploying applications. Using Git, a single command can push application to the remote Heroku repository. There are many other ways of deploying applications like GitHub integration, Dropbox Sync, or Heroku API to build and release apps [19]. Heroku combines application‟s source code and its dependencies like packages, modules and libraries that must be available in the runtime environment. It compiles source code, find out dependencies and bundles all these resources in a structure called slug. The slug contains all the resources required to run an application.
Show more

7 Read more

Show all 10000 documents...