Top PDF Query execution in column-oriented database systems

Query execution in column-oriented database systems

Query execution in column-oriented database systems

On the data warehousing benchmark, we found that the complete C-Store system performed an order of mag- nitude faster than an alternative approach to building column-store[r]

148 Read more

Query Execution in Column-Oriented Database Systems. Daniel J. Abadi

Query Execution in Column-Oriented Database Systems. Daniel J. Abadi

Starting in around the 1990s, however, businesses started to use their databases to ask more detailed analytical queries. For example, the bank might want to analyze all of the data to find associations between customer attributes and heightened loan risks. Or they might want to search through the data to find customers who should receive VIP treatment. Thus, on top of using databases to automate their business processes, businesses started to want to use databases to help with some of the decision making and planning. However, these new uses for databases posed two problems. First, these analytical queries tended to be longer running queries, and the shorter transactional write queries would have to block until the analytical queries finished (to avoid di fferent queries reading an inconsistent database state). Second, these analytical queries did not generally process the same data as the transactional queries, since both operational and historical data (from perhaps multiple applications within the enterprise) are relevant for decision making. Thus, businesses tended to create two databases (rather than a single one); the transactional queries would go to the transactional database and the analytical queries would go to what are now called data warehouses. This business practice of creating a separate data warehouse for analytical queries is becoming increasingly common; in fact today data warehouses comprise $3.98 billion [65] of the $14.6 billion database market [53] (27%) and is growing at a rate of 10.3% annually [65].
Show more

148 Read more

The Design and Implementation of Modern Column-Oriented Database Systems

The Design and Implementation of Modern Column-Oriented Database Systems

Traditional query execution uses a tuple-at-a-time, pull-based, it- erator approach in which each operator gets the next input tuple by calling the next() method of the operators of its children in the opera- tor tree. In contrast, MonetDB works by performing simple operations column-at-a-time. In this way, MonetDB aimed at mimicking the suc- cess of scientific computation programs in extracting efficiency from modern CPUs, by expressing its calculations typically in tight loops over fixed-width and dense arrays, i.e., columns. Such code is well- supported by compiler technology to extract maximum performance from CPUs through techniques such as strength reduction (replacing an operation with an equivalent less costly operation), array block- ing (grouping subsets of an array to increase cache locality), and loop pipelining (mapping loops into optimized pipeline executions). The MonetDB column-at-a-time primitives not only get much more work done in fewer instructions - primarily thanks to eliminating tuple-at- a-time iterator function calls - but its instructions also run more effi- ciently in modern CPUs. That is, MonetDB query plans provide the CPU more in-flight instructions, keep the pipelines full and the branch misprediction and CPU cache miss rates low, and also automatically (through the compiler) allow the database system to profit from SIMD instructions.
Show more

44 Read more

The Design and Implementation of Modern Column-Oriented Database Systems

The Design and Implementation of Modern Column-Oriented Database Systems

Traditional query execution uses a tuple-at-a-time, pull-based, it- erator approach in which each operator gets the next input tuple by calling the next() method of the operators of its children in the opera- tor tree. In contrast, MonetDB works by performing simple operations column-at-a-time. In this way, MonetDB aimed at mimicking the suc- cess of scientific computation programs in extracting efficiency from modern CPUs, by expressing its calculations typically in tight loops over fixed-width and dense arrays, i.e., columns. Such code is well- supported by compiler technology to extract maximum performance from CPUs through techniques such as strength reduction (replacing an operation with an equivalent less costly operation), array block- ing (grouping subsets of an array to increase cache locality), and loop pipelining (mapping loops into optimized pipeline executions). The MonetDB column-at-a-time primitives not only get much more work done in fewer instructions - primarily thanks to eliminating tuple-at- a-time iterator function calls - but its instructions also run more effi- ciently in modern CPUs. That is, MonetDB query plans provide the CPU more in-flight instructions, keep the pipelines full and the branch misprediction and CPU cache miss rates low, and also automatically (through the compiler) allow the database system to profit from SIMD instructions.
Show more

87 Read more

Gaining the Performance Edge Using a Column-Oriented Database Management System

Gaining the Performance Edge Using a Column-Oriented Database Management System

• : One of the biggest pain points in very large data warehousing systems is the aspect of backup/recovery and high availability. By their sheer virtue of data compression, columnar databases require less time in backup and recovery. When adding the concept of partitioned database units into the picture, recognize that the tables can be backed up and subsequently recovered independently, thereby simplifying this relatively complex problem. There are other factors to consider as well. Column-oriented databases are relatively simple, yet as new features and capabilities are added, there is a greater need for managed administration tools. The simplicity of the design may also lead to limitations in terms of the types of data that can be incorporated into the database—seek those systems that do not restrict the use of unstructured data or XML. Structural constraints may force the user to employ the same keys across the entire system, while others provide more flexibility, both in key use and in tabular vs. the more traditional star schemas used for data warehouses.
Show more

10 Read more

Practical yet Provably Secure: Complex Database Query Execution over Encrypted Data

Practical yet Provably Secure: Complex Database Query Execution over Encrypted Data

output by the first encryption scheme with a stronger encryption scheme providing more security but less functionality and finally hiding all properties by the application of a semantically secure encryption scheme. This adjustable encryption is applied for each database column individually and the resulting “ciphertext onion” for each cell is then stored in the database. In case of a query that is not supported by the currently stored ciphertext, one “onion layer” is peeled off by the server, that is, the client sends the decryption key to the server enabling the server to decrypt the current ciphertext unveiling the underlying ciphertext with more properties preserved. The decryption process can be repeated until the query is supported or the minimal protection level is reached according to a given policy. Additional functionality for query processing on the server side and increased performance for the ciphertext decryption on the client side can be supported by storing multiple ciphertexts of the same plaintext supporting different functionalities in parallel in the database. Obviously, the security level for the plaintext is as low as the security level provided by the weakest encryption scheme.
Show more

151 Read more

Implementation of multidimensional databases in column-oriented NoSQL systems

Implementation of multidimensional databases in column-oriented NoSQL systems

Other studies investigate the process of transforming relational databases into a NoSQL logical model (see Fig. 1). In [14], the author proposed an approach for trans- forming a relational database into a column-oriented NoSQL database. In [18], the author studies “denormalizing” data into schema-free databases. However, these approaches never consider the conceptual model of data warehouses. They are limited to the logical level, i.e. transforming a relational model into a column-oriented model. More specifically, the duality fact/dimension requires guaranteeing a number of con- straints usually handled by the relational integrity constraints and these constraints cannot be considered when using the logical level as starting point.
Show more

15 Read more

Native Language OLAP Query Execution

Native Language OLAP Query Execution

Perhaps the most notable of the language-centric approaches is Microsoft’s LINQ extensions for its .NET family of languages (C# and VisualBasic) [BRK + 08]. Syn- tactically, LINQ resembles embedded SQL in that, for more complex queries at least, the standard SELECT-FROM-WHERE format is employed (for better or for worse). While LINQ has been quite popular with developers, it has been subsumed under the new ADO.NET model [AMM07]. The overarching theme of ADO.NET is the Entity Framework (EF), a comprehensive attempt to pull back the abstraction level of development projects from the object-oriented logical level to the entity-focused conceptual level. In other words, use of EF and its Entity Data Model makes it possible, in theory, to program directly against user level concepts. Source code, pos- sibly written with LINQ, is then parsed into an internal command tree, which can subsequently be used to generate optimized SQL. While the move towards greater abstraction is quite appealing, initial reaction has been mixed, with many develop- ers concerned about the design and development complexity associated with the EF. Db4o (Database for objects) is another database language that allows to use the na- tive program language to query the database [NGD + 08]. It is an embeddable open source object database for Java and .NET developers. In .NET, LINQ support is fully integrated in db4o. Although db4o offers nice language integrated queries, it suffers from some drawbacks of which a notable one is the difficulty to overcome its slow performance when retrieving a lot of objects.
Show more

274 Read more

Density-Aware Linear Algebra in a Column-Oriented In-Memory Database System

Density-Aware Linear Algebra in a Column-Oriented In-Memory Database System

These requirements are addressed by our linear algebra engine, which has a multi-layered archi- tecture. In particular, the core contributions of this thesis are: firstly, we show that the columnar storage layer of an in-memory DBMS yields an easy adoption of efficient sparse matrix data types. Furthermore, we present how linear algebra operations, in particular multiplications, are physically implemented based on the presented matrix formats. In the logical layer of the engine, we show that the execution of linear algebra expressions significantly benefits from different techniques that are inspired from database technology. In a novel way, we implemented several of these optimiza- tion strategies in Lapegâ˘A´Zs optimizer (SpMachO), which generates an optimal execution plan for sparse matrix chain multiplications, and outperforms linear algebra runtimes like R or Matlab by orders of magnitude. An important component of the optimizer is the advanced density estima- tion method (SpProdest) for an efficient prediction of the matrix density of intermediate results. Moreover, we present the adaptive matrix data type AT Matrix that obviates the requirement of sci- entists to select appropriate matrix representations, and provides transparent matrix operations. AT Matrix is a topology-aware data structure internally consisting of sparse and dense matrix tiles of variable size. We show that the performance of matrix multiplications benefits from cache locality and runtime optimization by dynamic tile conversions (ATmult). The tiled substructure improves the saturation of the different sockets of a multi-core main-memory platform, reaching up to a speed-up of 10x compared to alternative approaches. Finally, a major part of this thesis is devoted to the topic of data manipulation: we propose a matrix manipulation API that includes insertions and
Show more

193 Read more

A Storage and Access Architecture for Efficient Query Processing in Spatial Database Systems

A Storage and Access Architecture for Efficient Query Processing in Spatial Database Systems

Abstract: Due to the high complexity of objects and queries and also due to extremely large data volumes, geographic database systems impose stringent requirements on their storage and access architecture with respect to efficient query processing. Performance improving concepts such as spatial storage and access structures, approximations, object decompositions and multi-phase query processing have been suggested and analyzed as single building blocks. In this paper, we describe a storage and access architecture which is composed from the above building blocks in a modular fashion. Additionally, we in- corporate into our architecture a new ingredient, the scene organization, for efficiently supporting set-oriented access of large-area region queries. An experimental performance comparison demonstrates that the concept of scene organization leads to considerable performance improvements for large-area region queries by a factor of up to 150.
Show more

20 Read more

EXPLORING QUERY PROCESSING USING SBA IN OBJECT ORIENTED DATABASE

EXPLORING QUERY PROCESSING USING SBA IN OBJECT ORIENTED DATABASE

Query processing is the sequence of actions that takes as input a query formulated in the user language and delivers as result the data asked for. Query processing involves query transformation and query execution. Query transformation is the mapping of queries and query results back and forth through the different levels of the DBMS. Query execution is the actual data retrieval according to some access plan. An important task in query processing is query optimization. Query optimization techniques are dependent upon the query model and language. For example, a functional query language lends itself to functional optimization which is quite different from the algebraic, cost-based optimization techniques employed in relational as well as a number of object-
Show more

13 Read more

Data Processing on Database Management Systems with Fuzzy Query

Data Processing on Database Management Systems with Fuzzy Query

The fuzzy set theory, proposed by L.A. Zadeh, aims at processing the indefinite and vague information. In other words, the concept of fuzziness refers to the state of ambiguity which stems from the lack of certainty. The fuzzy logic and the fuzzy set theory play an important role for vague knowledge display and almost all of our ex- pressions in the daily language contain fuzziness. (cold-hot, rich-poor, short-long etc.) [10-13]. Ambiguity plays an important role in human thinking style, especially in communication, inference, and in identifying and abstracting figures; and the impor- tance of the fuzzy theory appears at this point. When we wish to transform the user interfaces which enable us to communicate with machines into a human-oriented style, the fuzzy theory becomes an effective tool at our hands [14].
Show more

8 Read more

Statistics on Query Expressions in Relational Database Management Systems

Statistics on Query Expressions in Relational Database Management Systems

The idea of building statistics over non-base tables first appears in [AGPR99]. This reference introduces join synopses, which are pre-computed samples of a small set of dis- tinguished joins. Joins must be defined between foreign and primary keys, and therefore a single sample for each table is enough to provide approximate answers for a large num- ber of queries. The idea is to conceptually materialize the extended table obtained by applying all foreign-key joins, and then take a uniform sample over this result. Refer- ence [GLR00] extends this approach by introducing the concept of icicles, which are a new class of samples that tune themselves to a dynamic workload. Intuitively, the probability of a tuple being present in an icicle is proportional to its importance for answering queries in the workload. In general, once samples are obtained, input queries can be rewritten to use the samples instead of the corresponding base tables. Therefore, by operating on the sample domain, we obtain (approximate) answers using just a fraction of the orig- inal execution time. In contrast, SITs as introduced in this thesis can be defined over arbitrary query expressions and require significantly fewer resources than samples. The reason is that we are interested in cardinality estimation during optimization rather than providing approximate answers of user queries.
Show more

217 Read more

Graph database management systems: storage, management and query processing

Graph database management systems: storage, management and query processing

(b) collecting the intermediate results and checking them against the results at depth 2, and (c) expanding the follows relationship to depth 2 and removing the friends at depth 1. Method (b) performed the best. Methods (a) and (b) resulted in different execution plans, although with a similar number of total database accesses. It was not clear why Method (c) failed to return a result in a reasonable time. As such, some queries had to be rephrased in order to achieve gains in performance. Ideally a query optimizer in Cypher should be converting a query plan to a consistent set of primitives at the back end. With every new release, Cypher is being improved with a lot of emphasis on cost-based optimizers to cater for this. While the expressiveness is a great advantage in Cypher, an optimizer must take care in converting it to an efficient plan based on the cost of alternate traversal plans. A good speedup can be achieved by specifying parameters, because it allows Cypher to cache the execution plans.
Show more

206 Read more

Vectorization vs. compilation in query execution

Vectorization vs. compilation in query execution

For database architects seeking a way to increase the com- putational performance of a database engine, there might seem to be a choice between vectorizing the expression en- gine versus introducing expression compilation. Vectoriza- tion is a form of block-oriented processing, and if a system already has an operator API that is tuple-at-a-time, there will be many changes needed beyond expression calculation, notably in all query operators as well as in the storage layer. If high computational performance is the goal, such deep changes cannot be avoided, as we have shown that if one would keep adhering to a tuple-a-time operator API, expres- sion compilation alone only provides marginal improvement. Our main message is that one does not need to choose be- tween compilation and vectorization, as we show that best results are obtained if the two are combined. As to what this combining entails, we have shown that ”loop-compilation” techniques as have been proposed recently can be inferior to plain vectorization, due to better (i) SIMD alignment, (ii) ability to avoid branch mispredictions and (iii) parallel memory accesses. Thus, in such cases, compilation should better be split in multiple loops, materializing intermediate vectorized results. Also, we have signaled cases where an in- terpreted (but vectorized) evaluation strategy provides op- timization opportunities which are very hard with compila- tion, like dynamic selection of a predicate evaluation method or predicate evaluation order.
Show more

8 Read more

An integrated concurrency control in object-oriented database systems.

An integrated concurrency control in object-oriented database systems.

lock, m-name is a method invoked, Bi, Bz,...B. are break points encountered during the method execution. In [Malt, 1993], the lock format has the following form : [trans-name, m-ncane] where trans-name and m-name have same meaning as the lock table in proposed work.

227 Read more

WIQ: Work-Intensive Query Scheduling for In-Memory Database Systems

WIQ: Work-Intensive Query Scheduling for In-Memory Database Systems

Cloud computing offers the ability to access large pools of hardware resources at low economic costs. In com- bination with the parallelization capabilities of multicore servers, this has promoted in recent years the migration of computationally-intensive applications from internal data centers to the cloud. For example, enterprises are increas- ingly using cloud infrastructures to access or host data an- alytics and business intelligence services [1]. The workload of these services often consists of long batch jobs processing very large data sets recording sales and customer informa- tion. Since longer execution times correspond to increased costs for the service provider, hosted services are requested to deliver maximum performance from their software stack. For this reason, in-memory databases have recently emerged as a mainstream technology to maximize responsiveness of data-intensive applications [2]. By manipulating large data sets directly in memory, and therefore omitting time- consuming disk operations, in-memory databases execute data-intensive operations in a fraction of the time of tra- ditional databases.
Show more

8 Read more

Reducing Execution Time of Distributed SELECT Query in Heterogeneous Distributed Database using Genetic Algorithm

Reducing Execution Time of Distributed SELECT Query in Heterogeneous Distributed Database using Genetic Algorithm

Many algorithms have been implemented for reducing response time of the running query like dynamic programming algorithms, Genetic Algorithms. Reza Ghaemi, Amin Milani Fard, Hamid Tabatabaee, and Mahdi Sadeghizadeh[13] have presented paper on Evolutionary Query Optimization for Heterogeneous Distributed Database Systems. They have proposed an evolutionary query optimization technique in distributed heterogeneous systems using multi-agent architecture and genetic algorithm approach.They concentrated on join order for optimizing the query, but in our work we concentrated on finding optimized execution plan instead of query optimization in combination with data site.
Show more

5 Read more

A Novel Approach of Query Optimization for Distributed Database Systems

A Novel Approach of Query Optimization for Distributed Database Systems

we concentrate on t wo of these factors, response time and total execution cost , though it is fairly easy to extend these to include other factors, assuming they can be easily estimated. Since we assume that the only information we have about the costs of operations is through the interface to the bidders, the optimization problem has to be restated as optimizing over the cost information exported by the bidders. Before describing the adaptations of the known query optimization algorithms to take into account the high cost of optimization, we will discuss two important issues that affect the optimization cost in this framework significantly.
Show more

6 Read more

The Implementation of Column-Oriented Database in Postgresql for Improving Performance of Queries

The Implementation of Column-Oriented Database in Postgresql for Improving Performance of Queries

PostgreSql is world's most advanced object- relational database management system [7]. It is free and open-source software. It is developed by PostgreSql Global Development Group consisting of handful of volunteers employed and supervised by companies such as Red Hat and Enterprise DB. PostgreSql is available for almost all operating systems like: Linux (all recent distributions), Windows, UNIX, Mac OS X, FreeBSD, OpenBSD, Solaris, and all other Unix-like systems. It works on all majority of architectures like: x86, x86-64, IA64, PowerPC, Sparc, Alpha, ARM, MIPS, PA-RISC, VAX, M32R [7]. MySQL and PostgreSql both compete strongly in field of relational databases since they both have advanced functionalities and also comparable performance and speed and most importantly they are open-source. PostgreSql which uses a client/server model can be broken-up into three large subsystems [7]:
Show more

20 Read more

Show all 10000 documents...