The Kilo TM design introduces new hardware units and new hardware features to enable the operation of transac- tional memory. The execution stage of transactions takes place in the SIMT cores. The load/store units are responsible for maintaining the read- and write-logs for transactions that are being executed. The SIMT stack is extended to handle transactional code blocks. The commit stage takes place in the Commit Units, which implements the ring buffers, bloom filter-based signatures and valid-based validation. The Com- mit Units are located in the memory partitions. The entire address space is divided into disjoint sets, each of which is managed by one Commit Unit. As such, the implementation is a distributed algorithm which involves multiple Commit Units, and a protocol that aggregates and broadcasts vali- dation results between the Commit Units and SIMT cores. At the beginning of a commit, the transactionallog walk- ers transfer the logs to the Commit Units. When validation results are computed, they are sent back to the originating SIMT cores.
CC implementations using cloning are supposed to apply the following concept: A leader thread starts the transaction with an arbitrary OCC algorithm based on deferred updates. The checkpoint created by the leader thread in the _ITM_beginTransaction function is considered to be unmodified until the leader leaves the transaction. Because of possible modifications any helper thread first has to create a clean copy of the checkpoint before it can enter it. The helper thread has to use its own separate stack and log all writes into the stack of the leader (e.g. via pointers in previous stack frames) according to the deferred update concept. When the helper successfully commits a transaction it can return control to the leader through another checkpoint created during commit. This end checkpoint has to contain the contents of the stack used by the transaction context in the helper thread, too. When the leader enters the checkpoint using the cloning facility, it also receives the modified content from the helper-thread. Modifications to the stack of the leader outside of the stack frames of the transaction will be transparently written to the correct location during the commit by the helper thread, because the references received during the start of the transaction (via registers, local variables etc.) still point to the original location.
In this section, we present KILO TransactionalMemory, a TM system scalable to 1000s of concurrent transactions. KILO TM does not leverage a cache coherence protocol for conflict detec- tion among running transactions. Instead, each transaction per- forms word-level, value-based conflict detection against committed transactions by comparing the saved value of its read-set against the value in memory upon its completion [42, 19]. A changed value in- dicates a conflict. This mechanism offers weak isolation . Each transaction buffers its saved read-set values and memory writes in a read-log and a write-log (in address-value pair) in local memory (lazy version management). When a transaction finishes executing, it sends its read- and write-log to a set of commit units for con- flict detection (validation), each of which replies with the outcome (pass/fail) back to the transaction at the core. Each commit unit validates a subset of the transaction’s read-set. If all commit units report no conflict detected, the transaction permits the commit units to publish the write-log to memory. To improve commit parallelism for non-conflicting transactions, transactions speculatively validate against committed transactions in parallel, leveraging the deeply pipelined memory subsystem of the GPU. The commit units use an address-based conflict detection mechanism to detect conflicts among these transactions (we call these hazards to distinguish them from the conflicts detected via value comparison). A hazard is re-
The whole-program protocols necessary for TM are most easily implemented in some combination of the compiler (particularly for read and/or write barriers) and the runtime system (including hardware) because we can localize the implementation of the protocols. Put another way, the difficulty of implementation does not increase with the size of the source program. In theory, transactionalmemory can improve performance by increasing parallelism (due to optimistic concurrency), but in practice we may pay a moderate performance cost for software- engineering benefits. space for time is a bad performance decision or where heap-allocated data lifetime follows an idiom not closely approximated by reachability. Language features such as weak pointers allow reachable memory to be reclaimed, but using such features correctly is best left to experts or easily recognized situations such as a software cache. Recognizing that GC may not always be appropriate, languages can complement it with support for other idioms. In the extreme, programmers can code manual memory management on top of garbage collection, destroying the advantages of garbage collection. More efficient implementations (e.g.,using a free list) are straightforward extensions. A programmer
We use a conventional epoch-based system for memory management, based on that described by Fraser . This mechanism ensures that a location is not deallocated by one thread while it is being accessed transactionally by another thread. Epoch-based reclamation works well in our setting. However, it would be straightforward to use alternatives such as tracing garbage collection, or lock-free schemes [21, 25]. Our baseline STM implementation uses the TL2 algo- rithm , timebase extension , and hash-based write sets . The approach is typical of C/C++ STM sys- tems [4, 7, 10, 28, 30]. It performs well for our workloads, matching the performance reported by Dragojevi´c et al. . BaseTM provides opacity , guaranteeing that a run- ning transaction sees a consistent view of the heap. BaseTM provides weak isolation, meaning that it does not detect con- flicts between concurrent transactional and non-transactional accesses to the same location (if strong isolation is needed, then many implementation techniques exist ). BaseTM does not provide privatization safety (barriers such as those of Marathe et al.  could be added, if needed).
Modern STMs [5, 17, 23] use automatic instrumentation. Java annotations are used to mark methods as atomic. The instrumentation engine then handles all code inside atomic methods and modifies them to run as transactions. This conversion does not need the source code and can be done offline or online. Instrumentation allows using external libraries – i.e., code inside a transaction can call methods from an external library, which may modify program data . In ByteSTM, code that is reachable from within a transaction is compiled to native code with transactional support. Classes/packages that will be accessed transactionally are input to the VM by specifying them on the command line. Then, each memory operation in these classes is translated by first checking the thread’s mode. If the mode is transactional, the thread runs transactionally; otherwise, it runs regularly. Although doing such a check with every memory load/store operation increases overhead, our results show significant throughput improvement over competitor STMs (see Section 3).
We found that using a complex data structure to represent read-sets and write-sets affects performance. Given the sim- plified raw memory abstraction used in ByteSTM, we de- cided to use simple arrays of primitive data types. This de- cision is based on two reasons. First, array access is very fast and has access locality, resulting in better cache usage. Second, with primitive data types, there is no need to al- locate a new object for each element in the read/write set. (Recall that an array of objects is allocated as an array of references in Java, and each object needs to be allocated sep- arately. Hence, there is a large overhead for allocating mem- ory for each array element.) Even if object pooling is used, the memory will not be contiguous since each object is allo- cated independently in the heap.
We can extract several conclusions by looking at the statistics tables. If we take a look at the abort rate columns, we can see that the abort rates consider- ing both implicit and explicit transactions are almost equal compared to explicit transaction abort rates. This means that the abort rate regarding implicit (or forced) transactions is negligible, reaching its maximum in kmeans-high with 0.91% (32 processors). This is the first indicator that tells us that the always-in- transaction approach we are using does not introduce sensitive overheads. The second indicator proves this affirmation, the percentage of parallel section time dedicated to commit processes (including both implicit and explicit) is very small, less than 0.14%, for all the tests. We achieve this low overhead at commit time by not writing back data updates during the process, the data remains in private caches until its requested (i.e.,accessed by another processor), and we send mes- sages to the directory to mark and own the lines instead, which are much cheaper than memory updates since we avoid paying the main memory latency for each write done to the same node.
1) TinySTM: Our prompt transaction revalidation architec- ture has been integrated with the open source TinySTM  package, although its design principles are general. In fact, it embeds a skeleton that enables a thread currently running a transaction to transparently change its execution flow with fine-grain period, in order to launch an arbitrary user-space callback routine—a revalidation routine in our case. TinySTM manages transactions by relying on a global version clock (gvc). It is a global shared counter atomically incremented whenever a thread commits a transaction that updates shared- data. A data object is a memory word, and each word address is associated with its own meta-data consisting of (A) a lock- bit and (B) a timestamp, both kept in a single entry of a hash array that is manipulated atomically (also called lock array). When a transaction commits, the updated gvc value is reflected as the new timestamp of the written word. Upon (re)starting a transaction, a thread stores the current value of the gvc into a local variable called transaction start-timestamp (tst). Upon a write operation, the target address and the value to be stored are both added to the transaction write set. Read operations on shared objects previously updated by the same transaction are served by picking values from the transaction write set. Instead, read operations performed on shared objects outside the write set lead to sample the timestamp and the lock bit of the shared object in order to check if (A) the timestamp is less than or equal to the tst of the reading transaction, and (B) the object is not currently locked. If both checks succeed, it means that no concurrent transaction has modified the object in the
VTM. VTM  implements the eager conflict detection/lazy version management algorithm pre- sented in Figure 2.7 on page 23. VTM tracks overflowed transactional state using a shared data structure mapped into the virtual address space (called the XADT). Entries in the XADT are allo- cated when blocks overflow the cache. Much like UTM’s xstate, VTM’s XADT uses linked lists and supports accessing all entries for a specific virtual memory block or all entries for a specific transac- tion. XADT operations include concurrently adding an entry on overflow, looking up an entry for a block, committing a transaction, aborting a transaction, and saving state on context switches. Each transactional load or store miss checks for conflicting transactional accesses before it completes. As VTM uses lazy version management, it buffers speculative updates in the XADT itself, prop- agating these updates only when a transaction commits. Because VTM operates on virtual rather than physical addresses, it supports paging of transactional data with no extra effort. However, this choice also complicates the task of context-switching a transaction, as discussed below.
Abstract—Transactionalmemory (TM) is a promising lock- free technique that offers a high-level abstract parallel pro- gramming model for future chip multiprocessor (CMP) systems. Moreover it adapts the popular well established paradigm of transaction, thus providing a general and flexible way of allowing programs to atomically read and modify disparate memory locations as a single operation. In this paper, we propose a general and executable specification model for an abstract TM with validation for various correctness conditions of concurrent transactions. This model is constructed within a flexible transition framework that allows the testing of a TM model with animation. Interval Temporal Logic (ITL) and its programming language subset AnaTempura are used to build, execute, and validate this model. To demonstrate this work, we selected a queue example to be executed and illustrated with animation.
The practical solutions discussed in this chapter address the challenges of interactions between transactions and the OS code but have performance limitations. At the end of the day, the overall overhead of the OS infrastructure depends on the characteristics of the applications that run on it and the frequency at which they invoke OS services. In this experiment, we evaluate the ATLAS system with a set of benchmarks: 5 STAMP applications and 3 SPLASH and SPLASH-2 applications. STAMP  is a unique benchmark suite that is written from scratch in order to evaluate the effective- ness of coarse-grain transactions. V acation models a 3-tier server system powered by the in-memory database; kmeans is an algorithm that clusters objects into k parti- tions based on some attributes; genome performs gene sequencing; and yada produces guaranteed quality meshes for applications, such as graphics rendering; labyrinth im- plements Lee’s maze routing algorithm, which is commonly used in layout . On the other hand, SPLASH  and SPLASH-2  are parallel benchmark suites that are widely used in evaluating parallel computer systems. Radix is a parallel sort- ing algorithm; mp3d simulates rarefied hypersonic flow; and ocean simulates eddy currents in an ocean basin. The applications were originally coded with locks. We produced transactional version by replacing locked regions with transactions. Any code between two lock regions also executes an implicit transaction.
the so-called GPH estimator, originally proposed by Geweke and Porter-Hudak (1993). It essentially consists of least squares regression of the log-periodogram versus log λ, using small frequencies. It is well known that the GPH estimator does not have ideal finite sample properties due to the dependence of the periodogram ordinates at low frequencies (K¨ unsch, 1986, Hurvich and Beltrao 1993, Robinson, 1995) and a large bias caused by the relative curvature of f u at
Significant investment and advances have been made within the past decade on anti-spam research but even with all this work, the spam problem appears to be getting worse . One reason is that existing spam filtering techniques have been directed towards detection based filtering techniques such as content and network based filtering. Content filtering is based upon a set of static rules generated by analyzing the content of previously detected spam whereas network based filtering, such as DNS blacklists, list sources that have been previously detected sending spam. Spammers have proven their ability to overcome such techniques by modifying the content of their spams such that the spams can bypass existing content filtering techniques  or by creating a fresh supply of spam zombies to send spam for the purpose of bypassing network filtering techniques. With industry metrics indicating that approximately 80% of all spam comes from spam zombies , there is a need to improve existing spam filtering techniques to detect a wide range of spam from spam zombies.
data) into a unique transaction. However, according to Chung et al.’s recent work in , smaller the transaction size, more the bookkeeping overhead involved in the cre- ation and the committing of transactions. To reduce this overhead, one possible strategy would be to create a trans- action for every basic block of original program. To fur- ther increase the transaction size without introducing too many conflicts, Chung et al. proposed two techniques in , putting traces of frequently executed basic blocks into transactions and dynamically merging transactions. Having large transactions has its own share of problems, particularly if code for performing synchronization fully fits within an individual transaction. To see why, let us consider the counter based barrier shown in Fig. 2(a), which is fully enclosed within a transaction. Let us assume that processor 1 reaches the barrier first and thus wait for the counter to be 2 after incrementing it (transaction T1). When transac- tion T2 is subsequently started (when processor 2 reaches the barrier), it conflicts with T1 since it tries to acquire the same lock that was earlier acquired by T1. If EE policy is followed, T2 being the requester, is forced to abort and this situation keeps repeating since it is always the requester. Thus a livelock situation arises. If EL policy is instead used, T2, being the requester, aborts T1. However, T1 becomes the requester now, and it consequently aborts T2. Thus, in this situation, transactions keep aborting the other repeat- edly and cause a livelock. With lazy conflict detection (LL), neither of the processors can see the value of the updated value of the counter and hence land into a livelock. Thus, irrespective of the policy followed, putting barrier entirely into a transaction, causes a livelock. However, a livelock can still arise for some policies, if we put only parts of the barrier code into a transaction.
Data Mining can be defined as an activity that extracts some new nontrivial information contained in large databases. Traditional data mining techniques have focused largely on detecting the statistical correlations between the items that are more frequent in the transaction databases. Also termed as frequent itemset mining, these techniques were based on the rationale that itemsets which appear more frequently must be of more importance to the user from the business perspective .In this thesis we throw li ght upon an emerging area called Utility Mining which not only considers the frequency of the itemsets but also considers the utility associated with the itemsets. The term utility refers to the importance or the usefulness of the appearance of the itemset in transactions quantified in terms like profit, sales or any other user preferences. In High Utility Itemset Mining the objective is to identify itemsets that have utility values above a given utility threshold. In existing system some high utility itemset mining algorithms such as Two-Phase, UP-Growth have been proposed. But there is problem like it requires more execution time and it uses more memory. The new method is memory efficient technique for mining high utility itemsets from transactional databases. This technique requires less memory space and execution time than existing algorithms.
Furthermore, recent task schedulers, e.g. Intel TBB, use a fixed number of worker threads, equal to the number of hard- ware contexts. This is a standard technique to avoid over- commitment of CPU resources. The fixed concurrency level, however, is only suited for CPU-intensive tasks that rarely block . We show that tasks in DBMS often block due to synchronization, especially in heavily contending OLTP workloads. Thus, the fixed concurrency level can result in under-utilization of CPU resources in DBMS workloads. In this paper, we show how the task scheduler can detect the inactivity periods of tasks and dynamically adapt its con- currency level. Our scheduler gives control to the OS of additional workers when needed to saturate CPU resources. Contributions. We apply task scheduling to a commer- cial main-memory DBMS. Our experiments show the ben- efits of using task scheduling for scaling up main-memory DBMS over modern multi-socket multi-core processors, to efficiently evaluate highly concurrent analytical and trans- actional workloads. Our main contributions are:
very poisonous, and may be even fatal if ingested. It has a brown spore print, unlike Shiitake with a white spore print. It is highly unlikely Galerina or other look-a-like mushrooms will grow on logs managed for shiitake. Galerina is only found on well-decayed wood, which would exclude at least one and two year post-inoculation shiitake logs, and probably older. Whereas shiitake is a primary decomposer that colonizes “clean” substrate (not yet colonized by other fungi), Galerina is a seconday decomposer which requires the substrate be already partially broken down by other fungi. If Galerina does appear after a log has been fully utilized, decomposed or discarded, the grower will be able to distinguish it from shiitake which s/he has been observing for the past several years. Being familiar with the appearance of shiitake by this time, the grower will easily note the differences between shiitake and Galerina, in size (shiitakes are larger), cap ornamentation (absent in Galerina), annulus or ring under the cap (usually present in Galerina), and of course spore print if any doubt remains.
The traces were collected running three benchmark ap- plications, RB-Tree, SkipList and List, that were originally used for evaluating DSTM2  and, later on, adopted in a number of performance evaluation studies of (S)TM sys- tems [1, 2]. These applications perform repeated insertion, removal and search operations of a randomly chosen inte- ger in a set of integers. The set of integers is implemented either as a sorted single-linked list, a skip list, or a red- black tree. We configured the benchmark to initialize the set of integers with 128 values, and allowed it to store up to a maximum of 256 values. Finally, we configured the benchmark not to generate any read-only transaction (i.e. searches). This has been done in compliance with the oper- ating mode of the protocol in , where read-only trans- actions can be executed locally at each single replicated process, without the need for propagation via the atomic broadcast (read-only transactions do not alter the state of the replicated transactionalmemory system). By only con- sidering update transactions in our study, we can therefore precisely assess the impact of the atomic broadcast latency on the performance of a replicated STM, as well as the per- formance gains achievable by means of the optimistic ap- proach proposed in . The below table reports the aver- age transaction execution time observed for the three bench- marks via the aforementioned tracing scheme: