Measurement is a required process in high performance networks for eﬃcient quality-of-service (QoS) pro- visioning and service veriﬁcation. Active measurement is an attractive approach because the measurement traﬃc injected into the network can be controlled and the measurement tasks can be distributed through- out the network. However, the execution of measurement tasks in common parts of a network may face contention for resources, such as computational power, memory, and link bandwidth. This contention could jeopardize measurement accuracy and aﬀect network services. This contention for limited resources deﬁnes a conﬂict between measurement tasks. Furthermore, we consider two sets of measurement tasks, those used to monitor network state periodically, called periodic tasks, and those for casual measurements issued as needed, called on-demand measurement tasks. In this paper, we propose a novel scheduling scheme to resolve contention for resources of both periodic and on-demand measurement tasks from graph coloring perspective, called ascending-order of the sum of clique number and degree of tasks. The scheme selects tasks according to the ascending order of the sum of clique number and conﬂict task degree in a conﬂict graph and allows concurrentexecution of multiple measurement tasks for high resource utilization. The scheme decreases the average waiting time of all tasks in periodic measurement tasks scheduling. For on-demand measurement tasks, the proposed scheme minimizes the waiting time of inserted on-demand tasks while keeping time space utilization high. In other words, the total time spent on ﬁnishing all the tasks is shortened. We evaluate our proposed schemes under diﬀerent measurement task assignment scenarios through computer simulations, and compare the performance of this scheme with others that also allow concurrenttaskexecution. The simulation results show that the proposed scheme produces eﬀective contention resolution and low execution delays.
PanDA can benefit from pilot capabilities in several ways. Pilot do not require to package a workload into a batch submission script, simplifying the deployment requirement. Further, pilots enable the concurrent and sequential execution of a number of tasks, until the available walltime is exhausted: Concurrent because a pilot holds multiple nodes of an HPC machine, enabling the execution of multiple tasks at the same time; sequential because when a task completes, another task can be executed on the freed resources. In this way, tasks can be late bound to an active pilot, depending on the current and remaining availability. This is important because, in principle, ATLAS would not have to bind a specific portion of tasks to an HPC machine in advance but it could bind tasks only when the HPC resources become available. This would bring further coherence to the ATLAS software stack, as pilots and late binding are already used for grid resources.
System software uses priorities to favorize higher priority processes over the low priority ones, improves the response time of selected processes and allow soft real-time systems to achieve the given constraints. In section Section II, we illustrated on an example how being able to prioritize a task can be useful in GPU systems. Here, we evaluate the speedup when the application is prioritized compared to nonprioritized execution. We asign the high priority to one application per workload while others have equal lower priority. The token scheduler is setup so that the high priority application gets all the SMs, which means that other application fairly share the SMs when the high priority application is not running in the execution engine. In this experiment, we evaluate two additional schedulers. Priority Queues (FCFS-PQ) is the FCFS scheduler with one active context in the execution unit and different priorities assigned to each context. In this scheme, when the running kernel finishes its execution, a kernel from the context with the highest priority is scheduled to run in the GPU. This scheduling scheme could relatively easily be implemented on a multi-queue GPU architecture like Kepler. Preemptive Priority Queues (FCFS-PPQ) is also a FCFS scheduler but with multiple active contexts and preemptive kernel execution, implemented on top of the extensions pro- posed in the paper. As soon as the high priority kernel is launched, scheduler starts draining the SMs that execute lower priority kernels. Once the high priority kernel is executed, other, lower priority kernels can continue with the execution. Figure 10 shows the speedup of prioritized applications when using different scheduling schemes. As expected from the descriptions of the schedulers, all of them provide the im- provement over FCFS-S on average. The simplest scheme with priorities (FCFS-PQ) provides the significant improvement for eight concurrent processes because only this case has relatively large number of on-flight kernels waiting in the queue. The token scheduler provides improvement even with the small number of applications because of its ability to preempt the low priority kernels once the high priority kernel is issued. However, in FCFS-PPQ in general provides the biggest speedup in turnaround time (up to 3x for eight processes) because it has the same ability to preempt the low priority
The third basic strategy for resolving a conflict is to circumvent it by choosing non-conflicting (compatible) methods for carrying out tasks. For example, two tasks A and B might each require the gaze resource to acquire important and urgently needed information from spatially distant sources. Because both tasks are important, shedding one is very undesirable; and because both are urgent, delaying one is not possible. In this case, the best option is to find compatible methods for the tasks and thereby enable their concurrentexecution. For instance, task A may also be achievable by retrieving the information from memory (perhaps with some risk that the information has become obsolete); switching to the memory-based method for A resolves the conflict. To resolve (or prevent) a task conflict by circumvention, mechanisms for selecting between alternative methods for achieving a task should be sensitive to potential resource conflicts (Freed and Remington, 1997).
Abstract: Multiple sclerosis (MS) is a disease that heavily affects postural control, predispos- ing patients to accidental falls and fall-related injuries, with a relevant burden on their families, health care systems and themselves. Clinical scales aimed to assess balance are easy to administer in daily clinical setting, but suffer from several limitations including their variable execution, subjective judgment in the scoring system, poor performance in identifying patients at higher risk of falls, and statistical concerns mainly related to distribution of their scores. Today we are able to objectively and reliably assess postural control not only with laboratory-grade standard force platform, but also with low-cost systems based on commercial devices that provide accept- able comparability to gold-standard equipment. The sensitivity of measurements derived from force platforms is such that we can detect balance abnormalities even in minimally impaired patients and predict the risk of future accidental falls accurately. By manipulating sensory inputs (dynamic posturography) or by adding a concurrent cognitive task (dual-task paradigm) to the standard postural assessment, we can unmask postural control deficit even in patients at first demyelinating event or in those with a radiologic isolated syndrome. Studies on neuroanatomical correlates support the multifactorial etiology of postural control deficit in MS, with the associa- tion with balance impairment being correlated with cerebellum, spinal cord, and highly ordered processing network according to different studies. Postural control deficit can be managed by means of rehabilitation, which is the most important way to improve balance in patients with MS, but there are also suggestions of a beneficial effect of some pharmacologic interventions. On the other hand, it would be useful to pay attention to some drugs that are currently used to manage other symptoms in daily clinical setting because they can further impair postural controls of patients with MS.
Rogers et al.  detect scenarios when the L1 data cache is trashed. They monitor the cache lines to detect warps that have lost intra-warp locality because of other warps evicting data that would have been used by them. A scoring system increases the score of such warps. Warps below a certain score are not selected by the scheduler, thereby reducing the number of active warps. Kayiran et al.  monitor the amount of time spent waiting for memory. The number of threads is reduced if the time is more than an empirically found threshold. Lee et al.  find the optimal amount of TLP by launching the maximum number of warps initially and then using a greedy scheduling policy. After the first thread block completes, the optimal number of blocks is estimated using the number of instructions that have been completed until then. Awatramani et al.  detect the optimal thread block count by comparing the pipeline stalls at different block counts. They launch half the maximum number of warps and then use history information of the previous block counts to guide the scheduler. Lee et al.  identify a problem similar to the tail effect mentioned in Sect. 3.5.5 for workloads that have varying warps execution times. They throttle warp execution by assigning a time slice to each warp, proportional to its the execution time with the RR scheduler. The tail effect is reduced by giving a larger time slice to longer running warps.
ABSRACT: In Distributed Computing System (DCS), the application software divides into small tasks and proper mapping of these tasks among processors are one of the important parameter which determine the efficient utilization of available Processor’s Capacity. In this paper the task allocation in DCSs in such a way that the load on each processing node is almost balanced. Further, the development of an effective algorithm for creating clusters of tasks to by using both inter task communication cost and capacity of each processor. Allocating n-clusters to n-processors (where, # of cluster equal to # of processors) with different capacities of a given distributed computing system. The proposed method is making the clusters of those tasks in which inter task communication is maximum.
Mixed criticality. In mixed-criticality scheduling (MCS) theory, tasks are char- acterized by several different WCET parameters denoting different estimates of the true WCET value, these different estimates being made at different levels of assurance. The scheduling objective is then to validate the correct execution of each task at a level of assurance that is consistent with the criticality level assigned to that task: tasks assigned greater criticality must be shown to ex- ecute correctly when more conservative WCET estimates are assumed, while less critical tasks need to have their correctness demonstrated only when less conservative WCET estimates are assumed.
Scheduling job executions within a distributed environ- ment has always received a lot of attention by both aca- demic and industrial researchers. The academic community mostly focuses on Grid computing due to its nature which facilitates the collaboration between many organisations for sharing computing resources . Each proposed research aims to optimise one or multiple criterion regarding the per- formance of an application on the grid. Ranganathan and Foster proposed the task assignment and data replication ap- proaches to reduce the geographical distance between data and computation . Beaumont et al. aimed to maximise the number of tasks executed concurrently while ensuring fair resource sharing between applications . Benoit et al. took another approach to minimise the application runtime while sharing resources betweens applications .
Automated negotiation by software agents is a key enabling technology for agent mediated e-commerce. To this end, this paper considers an important class of such negotiations – namely those in which an agent engages in multiple con- current bilateral negotiations for a good or service. In particular, we consider the situation in which a buyer agent is looking for a single service provider from a number of available ones in its environment. By bargaining simultaneously with these providers and interleaving partial agreements that it makes with them, a buyer can reach good deals in an eﬃcient manner. However, a key problem in such encounters is managing commitments since an agent may want to make intermediate deals (so that it has a deﬁnite agreement) with other agents before it gets to ﬁnalize a deal at the end of the encounter. To do this eﬀectively, however, the agents need to have a ﬂexible model of commitments that they can reason about in order to determine when to commit and to decommit. This paper provides and evaluates such a commitment model and integrates it into a concurrent negotiation model.
Users initially submit their workflows with deadline and replication factor in the form of abstract data structure format to PM (Preprocessor module). The PM discovers the services required for those tasks submitted and divides the tasks based on computation services and storage services. It also generates a threshold which helps in prioritizing the tasks and a heuristic metric which indicates replication count. Then RSM replicates the tasks based on the priority and replication count. It allocates the particular services to these tasks in cloud environment. After mapping, ResEM sends the tasks to the servers and starts the timer based on an expected execution time. If ResEM receives the successful output from the server within the expiry time then it activates all the other tasks which are dependent on the task. If it fails to receive successful output from server, then ResEM waits for other replicas. If all replicas also fail then it resubmits the task.
Because of the popularity of Python, there have been many efforts to improve the performance of this language. Some specialized their solutions to machine learning while others provide wider range of support for numerical com putations in general. NumPy  provides excellent sup port for numerical computations on CPUs within a single node. Theano  provides a syntax similar to NumPy, however, it supports multiple architectures as the backend. Theano uses a symbolic representation to enable a range of optimizations through its compiler. PyTorch  makes heavy use of GPUs for high performance execution of deep learning algorithms. Numba  is a jit compiler that speeds up Python code by using decorators. It makes use of LLVM compiler to compile and optimize the decorated parts of the Python code. Numba relies on other libraries, like Dask  to support distributed computation. Dask is a distributed parallel computation library implemented purely in Python with support for both local and dis tributed executions of the Python code. Dask works tightly with NumPy and Pandas  data objects. The main limitation of Dask is that its scheduler has a per task overhead in the range of few hundred microseconds, which limits its scaling beyond a few thousand of cores. Google's Tensorflow  is a symbolic math library with support for parallel and distributed execution on many architectures and provides many optimizations for operations widely used in machine learning. Tensorflow is a library for dataflow programing which is a programming paradigm not natively supported by Python and, therefore, not widely used.
for whatever reason they deem appropriate, by simply pay- ing a decommitment fee to the other partner. However, our model is different in a number of important ways. First, the original LVC only covers a two person game. We have extended this to cover the multiple providers found in our target environment. Second, we do not just reason about decommitment, we also deliberate about when and how to make a commitment. Third, LVC require the agents to have information about the actual and alternative options of their opponents in order to be able to calculate the Nash equilib- rium decommitment threshold. This assumption is unreal- istic in practical scenarios and is not required in our model. Finally, unlike LVC (which typically assumes a fixed penalty for decommitting, regardless of the stage of the process at which the commitment is broken), our model takes the cost of ongoing commitment into account by introducing variable penalty contracts. Again, this is more realistic for most real- world settings.
Abstract—The domain of high assurance distributed systems has focused greatly on the areas of fault tolerance and dependability. As a result the paradigm of service orientated architectures (SOA) has been commonly applied to realize the significant benefits of loose coupling and dynamic binding. However, there has been limited research addressing the issues of managing real-time constraints in SOAs that are by their very nature dynamic. Although the paradigm itself is derived from fundamental principles of dependability, these same principles appear to not be applied when considering the timed dimension of quality of service. As a result the current state-of- the-art in SOA research only addresses soft real-time and does not seek to provide concrete guarantees about a systems performance. When a distributed system is deployed we do not understand enough the emerging behavior that will occur. This paper therefore proposes an approach that probabilistically monitors system state within a given workflow’s execution window. Utilizing a real distributed system we experiment with services from the computer vision domain, with clear real-time constraints, evaluating the performance of each system component. Our approach successfully models the likelihood of the service meeting providing various levels of QoS, providing the basis for a more dynamic and intelligent approach to real- time service orientation.
The kidney showed multiple moderate interstitial infiltrations of lymphoid cells intermingled with heterophils, macrophages and giant cells and mul- tiple instances of moderate linear tubular degen- eration and necroses. Moreover, congestion and haemorrhages were frequently found in the gas- trointestinal tract, liver and kidney. Rod-shaped bacilli, either discretely or as colonies, were found in or around the above lesions. No significant le- sions except congestion were seen in other organs. Most notably, a varying number of PAS-positive, round-to-ovoid amoebic trophozoites were iden- tified in the above lesions of the gastrointestinal track, liver and kidney (Figures 1C and 2D). The trophozoites exhibited vacuolated cytoplasm sur- rounded by a thick eosinophilic wall and a single centrally-to-eccentrically located, round-to-ovoid nucleus (Figures 1C and 2D). Such morphological features were suggestive of the genus Entamoeba.
Condor takes advantage of idle and lightly used workstations around a network. Like us, each machine in the network could run a task from another user. This task is run in the background with the desire not to disturb the workstation for the user to destroy any remotely executing job. At the beginning of execution the Condor system assigns one machine within Condor to be the master to all other workers. This master will farm out all jobs to other machines. Results are then returned to the master and resolved. The most recent version of Condor  has several ways of executing a parallel application, using MPI, PVM, or master-worker. The MPI environment as described before requires the understanding of how to write a MPI program, same applies to PVM programs.
involve both the phonological loop and at least some of the capacity of the visual cache. If memory for the letters is then combined with another task that involves visual mem- ory or visual processing, not all of the capacity of the visual component would be available. So, there would be a small overall reduction in dual task compared with single task performance. This effect could be mitigated by using audi- tory presentation of the verbal memoranda, coupled with a visually presented, nonverbal task. This is the approach used in the experiment that we report here. However, a task designed to test visual working memory might gain support from some verbal storage or processing (e.g., names of shapes or colors, or spatial orientations), even if the main load is on a specific visual component of working memory, so there may still be a small dual task cost even when using different input modalities. Previous studies providing evi- dence for domain-specific components of working memory almost invariably show such a cost, interpreted as above. However, most striking is that the dual task cost, even if statistically robust, tends to be small compared to the resid- ual levels of performance on each task when they are per- formed concurrently (e.g., Cocchini et al., 2002; Duff & Logie, 2001; Logie et al., 1990, 2004). Very much larger dual task costs are observed when both tasks are chosen to rely primarily on the same component of working mem- ory (e.g., Logie et al., 1990).
Several works have explored the performance degradation problem of co-running tasks. Survey  focuses on the approaches that address the shared resource contention problem of task scheduling on Chip Multi-core Processors (CMPs). Performance and energy model are built to analyze and predict the performance impact        and . Reference  studies the im- pact of L2 cache sharing on the concurrent threads; Reference  proposes an interference model, which considers the time-variant inter-dependency among different levels of resource interference to predict the application QoS. Refer- ence  decomposes parallel runtime into compute, synchronization and steal time, and uses the runtime breakdown to measure program progress and identify the execution inefficiency under interference (in virtual machine environment). Reference  reveals that the cross-application interference problem is related to the amount of simultaneous access to several shared resources. Based on this discovery, it proposes a multivariate and quantitative model, which has an ability of predicting cross-application interference level by considering a set of features, for example, the amount of concurrent accesses to SLLC, DRAM and virtual network, and the similarity between the amount of those accesses in virtual environments. Reference  predicts the execution time of an applica- tion workload for a hypothetical change of configuration on the number of CPU cores of the hosting VM. Reference  gained the insight into the principle of enriching the capability of the existing approaches to predicting the perfor- mance of multi-core systems. Reference  develops an efficient ELM based on the Spark framework (SELM), which includes three parallel subalgorithms, is proposed for big data classification.
In most HRC scenarios, it is insu ffi cient to plan a single sequence of actions and blindly execute it at run-time. Having a human in the loop results in dynamic and partially unknown environments where the robot needs to adapt to it. Therefore, the robot has to exhibit adap- tive task planning. The team members typically act in real-time; therefore, they must make decisions on the fly. Researcher in the field focused on improving di ff erent aspects of HRC. They have set di ff erent constrains when performing task planning and action selec- tion to improve a specific aspect of the collaboration. For example, favoring actions that avoid plans where the human spends a lot of his / her time being idle , plans focus on reducing the cognitive and the physical load on the human  and avoiding plans which can be misinterpreted by the human .
In Experiment 1, providing a concurrenttask for ex- ecution during a phase of S–R learning led to enhanced rather than impaired learning of the visuomotor associa- tion, as indexed by mean RT to targets on the succeeding discrimination task. This result was contrary to the predic- tion that dual-task completion would siphon off resources from learning the association and therefore lead to per- formance deficits on discrimination. Interpretation of this counterintuitive finding may be aided by considering that a simple task that was largely devoid of cognitive evaluation was used; it is therefore plausible that minimal attentional engagement may be all that is required to perform this task adequately. If so, the dual task may have raised the level of attention devoted to both tasks rather than redi- rect resources from one task to another. This suggestion is supported by the fact that in both the motor and verbal conditions, the concurrenttask that was intended to gen- erate a greater attentional load (fast tapping and random number generation, respectively) led to shorter RTs than did low-attention dual tasks (slow tapping and nonsense utterance). This pattern was observed in the discrimina- tion task but not in the training task, suggesting a more complex relationship between the variables. The addition of a dual task increased attentional allocation to both on- going tasks, and this increase was uniform for both tasks. It is possible that a function similar to the Yerkes–Dodson curve (Yerkes & Dodson, 1908) determines performance in this case: Tasks that have very low (S–R learning alone) or very high (not tested here) resource demands may result in poor learning, whereas intermediate levels of demand (dual-task conditions used in Experiment 1) may lead to improved performance. Further testing will be required to evaluate this tentative hypothesis.