• No results found

Instruction Retentiveness to increase Query Performance through Cache Management with Column Store

N/A
N/A
Protected

Academic year: 2021

Share "Instruction Retentiveness to increase Query Performance through Cache Management with Column Store"

Copied!
5
0
0

Loading.... (view fulltext now)

Full text

(1)

Instruction Retentiveness to increase Query

Performance through Cache Management – with

Column Store

Tejaswini Apte

Devi Ahilya VishwaVidyalay, Indore,India Email: [email protected]

Maya Ingle and A. K. Goyal Devi Ahilya VishwaVidyalay, Indore, India

Email: {[email protected], [email protected]}

Abstract-Cache management has strong impact on database query performance. Literature survey has shown, tuning the database cache may improve performance significantly. Materialization of intermediates in database cache plays vital role in improving the response time. We propose an enhancement to a column store database structure by building additional intelligence in the form of an adaptive and efficient query cache, called KPCS (Keep Pool Column Store). The proposed algorithm is evaluated against the TPC-H schema to demonstrate effectiveness in terms of significant performance gains.

Index Terms-TPC-H, column store

I. INTRODUCTION

The need of good response time of DSS queries has given birth to materialization and reuse of intermediates before eviction from cache. The materialization of intermediates can be avoided in tuple-at-a-time pipelined execution model thereby in minimizing the opportunities for shared and reused computation. To overcome with this limitation, materialized view or query-result-set-caches may be used by commercial database systems. Operator-at-a-time execution paradigm promotes the exploitation of intermediates to speed up queries [1], [2], [3]. Materialized views represent common sub-queries, in which materialization improves subsequent time of processing. Recycle of intermediates is useful in applications for parameterized queries, with minimum eviction rate.

As, it has been observed that frequent eviction from cache affects the hit ratio badly [4], [5]. However, it is possible to improve cache hit rate by introducing Keep-Pool in column store database architecture. Keeping the intermediate in the cache improves response time and throughput problems and as a whole it refines caching methodology used in column store database system. Experiments have been conducted against the

Manuscript received January 07, 2013; revised April 20, 2013.

a-time database systems [MonetDB] suitable for read only workload [6], [7].

To improve the response time of DSS queries, keeping the intermediate in cache plays vital role. Section 2 reveals the related techniques for maintaining the intermediates. Description of our hypothesis with its architecture, policies and description of KPCS is presented in Section 3. Maintenance of architecture has great impact on performance. Section 4 presents the factors affecting the maintenance of Keep-Pool. Section 5, presents the experiment analysis on MonetDB against TPC-H schema, before result analysis and conclusion in Section 6.

II. RELATED WORK

Keeping the intermediates enhances the generic idea of storing and reusing expensive computations. Keeping the results is also the fundamental idea of materialized views and query caching [8], [9]. To improve the system performance materialized views have defined by the DBA [10]. The dynamic materialized views refreshes as soon as data changes. Traditionally, view or intermediate matching is integrated with the query optimization. By the eviction strategy the system adapts the pool content to maximal benefit of the current workload [11].

III. KEEP-POOL ARCHITECTURE (KPA)

The architecture of the Keep-Pool is targeted to SQL queries over read-only database. Each query plan is produced in isolation, i.e. without knowledge of workload itself using query optimizer techniques. Thresults of operation from previous queries are not taken into account at optimization time. Instead, matching of instruction, followed by decisions to reuse it, is performed at run time.

A. Design Instructions for Keep-Pool

The first design step is to identify instructions to the Keep-Pool, performed during query optimization by the

(2)

Keep-Pool optimizer. It inspects the plan and marks instructions to be controlled by Keep-Pool. The cheap instructions are not kept in the Keep-Pool. The overhead of administration outweighs the expected gain. Keep-Pool threads access the columns of SQL query and propagates the property through query plan. Typically, the threads involve selections, joins and other primary relational operations. The Keep-Pool adapts continuously to the workload without DBA intervention and incurs minimum start-up and maintenance cost. The design contains three dimensions as Query matching, execution cost and pool administration.

B. Keep-Pool Policies

To improve the cache performance, Keep-Pool acts as a buffer for storing instruction, parameters and results. The instructions are stored with three parameters namely: Reuse statistics, CPU time to compute, and Result size. The admission policy determines the candidate instruction for Keep-Pool, thus preventing the eviction rate. The instructions are evicted from Keep-Pool to manage the memory resource, to prevent overflow, resulted in the cache miss for costlier instruction. Entries are evicted from the pool when the base relation for that instruction changes.

C. Proposed Algorithm-KPCS

In this section, we present the flow of KPCS with informal and formal description.

D. Informal Description

KPCS is broadly designed with three functions: Query (TPC-H-Q) is a built in function that accepts TPC-H query and returns a valid instruction as output, as a valid instruction is required for query execution. Match (I) is also a built in function, designed to check the presence of an instruction in Keep-Pool. The function keepEntry(I) is developed to check the instruction availability in Keep-Pool. keepPool(I) The candidate instruction is admitted to the Keep-Pool by said procedure. The built-in function freeSpace(KP) maintains the heap of available space. The focus of keepExit(I) is to track the execution frequency and evict instructions from Keep-Pool.

E. Formal Description – KPCS

/* It will keep the frequently used instruction in the memory in order to reuse it for improving the

performance */

/* Main program begins */ void main() { Instruction I; I=Query (TPC-H); Call keepPool(I); } /* End of main */ Procedure keepPool(Instruction I)

/* This procedure keeps candidate instruction to Keep-Pool */ { /* Variable Declaration */ Heap kp; int frequency; /* Allocating 10% of 1GB RAM */ kp=malloc((1024)3*10/100); if marked(I) then

if not keepEntry(I) then execute I; kp=kp-sizeof(I); call keepExit(I); else execute I; }

/* End of Procedure keepPool */

Boolean function keepEntry(Instruction I)

/* This function checks for instruction availability in the keep pool, if available execute the same from Keep-Pool */

{

/* Variable Declaration */ Boolean M;

/* Function call to check instruction in Keep-Pool*/ M=Match(I) ; if M then Execute I; return true; else return False; end function; }

/* End of function keepEntry */ Procedure keepExit(Instruction I)

/* This procedure checks the execution frequency then evicts instruction from Keep-Pool */

{

/*Variable Declaration */ Heap kp;

int threshold=40;

/*Checking the available space*/ if admission(I) then

if sizeof(I)<freeSpace(kp) then if marked(I) then

if execute(marked(I))>=threshold then

/* Clean cache by flushing least recently used instruction*/ cleanInstructionCache (sizeof(I)); add I to kp; end if; end if; end if; end procedure; }

(3)

IV. MAINTENANCE OF KEEP-POOL

Normally, it has been found out that the database server is overloaded while checking the instructions and for testing the candidature for eviction. The process is supported by the frequency of execution of an instruction. The factors affecting the capture rate of an instruction for eviction are: Instruction execution frequency and Percentage of performance gain.

The performance gain is measured by:

 Resource cost the system has spent to compute results.

 The reuse weight (number of reuses).

When resource limit is reached, candidates are to be evicted from Keep-Pool. Resource limit can be measured in terms of memory or number of entries in the Keep-Pool.

V. TPC-HEVALUATION

To understand the Keep-Pool mechanism better and its effect on performance, we have conducted the experiments with TPC-H decision support benchmark [12]. KPCS has been executed on 10GB database and the measurements have been taken against the hot cache. For this benchmark we executed 10 instances of each query generated with the TPC-H query generator. To illustrate the Keep-Pool mechanics better, the admission policy is keep-all and there are no resource limitations, thus no cache policy interferes with the results. Queries have been analyzed with respect to commonalities, benefited from reusing intermediates. As the number of commonality increases, the hit ratio improves, with

degradation in resource utilization. Result may vary with respect to commonality in queries. We have observed the time improvements and memory consumption after each query as shown in Table I, which is driven by commonality in queries. The targeted queries for these experiments are simple, join and aggregated queries. The Fig. 1 shows the execution time of individual queries, which is directly proportional to hit ratio. To estimate the query commonalities we assume that the same query is executed with different parameters, where the parameter generation follows the TPC-H specification. Query Q 9 contains substantial intra-query commonality where a large part of the sub-query is shared with the main query. The common part includes a selection, a 2-way join, and an arithmetic computation over projected attributes. Since the query overlap is negligible, the time improvement and the hit ratio are stable for all instances. The cache policy has a different influence over different types of queries. The small candidate instructions are always put in the pool and reused for current query execution approximately always, hence prevent eviction. The eviction policy, evicts instructions based on frequency of usage in Keep-Pool. Obstacle for performance improvements are huge complex queries. The huge complex queries which cannot fit in the Keep-Pool result in eviction of small instructions, which directly lead to higher number of Keep-Pool misses and performance degradation. A manual inspection of the execution traces of queries demonstrating high time savings, queries Q4, Q7,and Q21, confirmed that the reused instructions have lowered the execution cost drastically. For other queries, such as Q17, time savings are minimal despite the high percentage of instructions reused.

TABLE I. TIME AND MEMORY UTILIZATION WITH KEEP-POOL TPC-H

Queries Initial execution time (in ms) KPCS execution time (in ms) Initial memory consumption (in kb) KPCS memory consumption (in kb) 2 306.015 79.59 59,176 57,278 3 900 459.335 61,172 60,164 4 3100 566.817 61,428 60,252 5 1700 889.573 65,964 63,460 6 614.468 332.213 66,072 64,692 7 11800 975.637 65,912 64,784 8 2400 233.075 70,428 69,040 9 3500 782.187 80,608 78,160 10 4000 2000 1,09,736 1,06,980 12 1100 760.684 1,12,928 1,08,632 16 857.605 589.215 1,12,288 1,09,840 17 404.548 320.771 1,12,784 1,11,120 19 3500 2000 1,18,064 1,13,836 21 6700 900 1,21,761 1,19,200

VI. RESULT ANALYSIS AND CONCLUSION

In this paper we have described a database architecture augmented with Keep-Pool instructions. The architecture is implemented as an extension to the MonetDB system.

The Keep-Pool policies respect the inter-operator dependencies, which leads to effective reuse of large threads in template based query sessions. The policies cover an LRU scheme and resource cost. The challenges are, to decide the factors for the retention period and

(4)

eviction strategy, for optimal utilization of resources. To evaluate the cache policies, we have used 14 TPC-H queries as shown in Table I. The unlimited space strategy has been used to measure the total resources needed. We have considered two major areas for resource consumption namely; memory consumption and instruction matching time. The experiments have been performed on machine running RHEL 5 on a 2.4 GHz Intel processor and 1GB of RAM. As observed, the average improved time with Keep-Pool becomes approximately four times better than the previous architecture. The architecture is affected by the amount of I/O and processing bandwidth available in the system, with selectivity and number of columns accessed as parameters as shown in Fig. 1. The extensive experimentation based on a full-edged implementation shows that the MonetDB software architecture is well suited to be extended with a targeted optimization goal. The results obtained indicate several areas for further exploration. Within the context of MonetDB, it seems worth exploring sub-summation relationships through join paths and the opportunities offered by recognition of query classes. Another direction of work is refining and developing admission and cache policies that respect the semantic dependencies of instructions, as well as automatic accommodation of policies most appropriate for the current workload.

Figure 1. KPCS versus Initial time consumption

REFERENCES

[1] Y. Kotidis and N. Roussopoulos, “A case for dynamic view management,” ACM Trans. Database Syst, vol. 26, no. 4, pp. 388-423, 2001.

[2] J. Zhou, P. A. Larson, J. Goldstein, and L. Ding, “Dynamic materialized views,” in Proc. IEEE 23rd International Conference

on Data Engineering, 2007, pp. 526-535.

[3] MONETDB: The Column-store Pioneer. (2008). [Online]. Available: http://monetdb.cwi.nl/

[4] R. Cornacchia, S. Heman, M. Zukowski, A. P. de Vries, and P. A. Boncz, “Flexible and efficient IR using array databases,” VLDB J., vol. 17, no. 1, pp. 151-168, 2008.

[5] T. Phan and W. S. Li, “Dynamic materialization of query views for data warehouse workloads,” in Proc. IEEE 24th International

Conference on Data Engineering, 2008, pp. 436-445.

[6] J. Zhou, P.A. Larson, J. C. Freytag, and W. Lehner, “Efficient exploitation of similar subexpressions for query processing,” in

Proc. SIGMOD Conference, 2007, pp. 533-544.

[7] G. Luo, “Partial materialized views,” in Proc. IEEE 23rd

International Conference on Data Engineering, 2007, pp. 756-765.

[8] N. Bruno and S. Chaudhuri, “Physical design refinement: The 'Merge-Reduce' approach,” ACM Trans. Database Syst., vol. 32, no. 4, 2007.

[9] S. Agrawal, S. Chaudhuri, and V. R. Narasayya, “Automated selection of materialized views and indexes in SQL databases,” in

Proc. 26th International Conference on Very Large Databases,

2000, pp. 496-505.

[10] C.-H. Choi, J. X. Yu, and H. Lu, “Dynamic materialized view management based on predicates,” in Proc.5th Asia-Pacific Web

Conference, 2003, pp. 583-594.

[11] M. Ivanova, M. L. Kersten, and N. Nes, “Self-organizing strategies for a column-store database,” in Proc. 11th International Conference on Extending Database Technology:

Advances in Database Technology, 2008, pp. 157-168.

[12] Transaction Processing Performance Council. TPC Benchmark H,

Revision 2.6.2, 2008.

[13] P. A. Boncz, M. L. Kersten, and S. Manegold, “Breaking the memory wall in Monet DB,” Commun.ACM, vol. 51, no. 12, 2008. [14] C. Bornhovd, M. Altinel, C. Mohan, H. Pirahesh, and B. Reinwald,

“Adaptive database caching with DBCACHE,” IEEE Data Eng. Bull., vol. 27, no. 2, pp. 11-18, 2004.

[15] C.-M. Chen and N. Roussopoulos. “The implementation and performance evaluation of the ADMS query optimizer: Integrating query result caching and matching,” in Proc. 4th International

Conference on Extending Database Technology Cambridge, 1994,

pp. 323-336.

[16] J. Goldstein and P. A. Larson, “Optimizing queries using materialized views: A practical, scalable solution,” in Proc.ACM

SIGMOD International Conference on Management of Data, 2001,

pp. 331-342.

[17] G. Graefe, “Volcano-an extensible and parallel query evaluation system,” IEEE Trans. Knowl. Data Eng., vol. 6, no. 1, pp. 120-135, 1994.

[18] M. Ivanova, N. Nes, R. Gon_calves, and M. L. Kersten. “MonetDB/SQL meets skyserver: The challenges of a scientific database,” in Proc. 19th International Conference on Scientific

and Statistical Database Management, Canada, July 2007.

[19] P. A. Larson, J. Goldstein, and J. Zhou. “MTCache: Transparent Mid-Tier database caching in SQL server,” in Proc. 20th

International Conference on Data Engineering, 2004, pp. 177-188.

[20] P. Roy, S. Seshadri, S. Sudarshan, and S. Bhobe. “Efficient and extensible algorithms for multi query optimization,” in Proc.ACM

SIGMOD International Conference on Management of Data, 2000,

pp. 249-260.

[21] H. Mistry, P. Roy, S. Sudarshan, and K. Ramamritham. “Materialized view selection and maintenance using multi-query optimization,” in Proc. SIGMOD Conference, 2001, pp. 307-318. [22] S. M. Kent, “Sloan digital sky survey,” in Science with

Astronomical Near-Infrared Sky Surveys, 1994, pp. 27-30.

[23] K. L. Tan, S. T. Goh, and B. C. Ooi. “Cache-on-Demand: Recycling with certainty,” in Proc. 17th International Conference

on Data Engineering, 2001, pp. 633-640.

Maya Ingle did her Ph.D in Computer Science from DAU, Indore (M.P.) , M.Tech (CS) from IIT, Kharagpur, INDIA, Post Graduate Diploma in Automatic Computation, University of Roorkee, INDIA, M.Sc. (Statistics) from DAU, Indore (M.P.) INDIA.

She is presently working as PROFESSOR, School of Computer Science and Information Technology, DAU, Indore (M.P.) INDIA. She has over 100 research papers published in various International / National Journals and Conferences. Her areas of interest include Software Engineering, Statistical, Natural Language Processing, Usability Engineering, Agile computing, Natural Language Processing, Object Oriented Software Engineering. interest include Software Engineering, Statistical Natural Language Processing, Usability Engineering, Agile computing, Natural Language Processing, Object Oriented Software Engineering.

Dr. Maya Ingle has a lifetime membership of ISTE, CSI, IETE. She has received best teaching Award by All India Association of Information Technology in Nov 2007.

(5)

Tejaswini Apte received her M.C.A (CSE) from Banasthali Vidyapith,Rajasthan. She is pursuing research in the area of Machine Learning from DAU, Indore,(M.P.),INDIA.

Presently she is working as an ASSISTANT PROFESSOR in Symbiosis Institute of Computer Studies and Research at Pune. She has 4 papers published in I nternational/National Journals and Conferences. Her areas of interest include Databases, Data Warehouse and Query Optimization.

Mrs. Tejaswini Apte has a professional membership of ACM

Arvind Goyal did his Ph.D in Physics from Bhopal University, Bhopal (M.P.), M.C.M from DAU, Indore, M.Sc.(Maths) from U.P.

He is presently working as SENIOR PROGRAMMER, School of Computer Science and Information Technology, DAU, Indore (M.P.). He has over 10 research papers published in various International/National Journals and Conferences. His areas of interest include databases and data structure.

Dr. Arvind Goyal has lifetime membership of All India Association of Education Technology.

References

Related documents

To test whether immediate changes in CO2 concentration affect the food uptake, 30 copepods from pre-incubation at control seawater pH were placed in three 1 l bottles containing

画像所見 MRI 検査 : 膀胱後壁左側に約 1. 3 cm 大の腫瘤性病 変を認めた.明らかな筋層浸潤を疑う所見は認められ なかった( Fig. 2a , b ). IVP :

In light of the above discussion, the overarching goal of this research was to conduct a comprehensive systematic literature review of the following aspects in the context of SIDS:

Increased productivity, the mitigation and potential elimination of environmental and human related hazards, and the lower cost of inputs are valid objectives

The Company believes that the approximately $2.5 billion of proceeds of the common stock and senior secured notes offering — in addition to the covenant relief provided under

Specifically, this must match the data interface on the Email Security appliance that you specified for the Primary server in the Deliver Messages Via section of the Spam

Monks in the inner courtyard play long copper trumpets called 'zang dung' and clash cymbals called rolmo during Mani Rimdu festival in Chiwang Monastery , a gift of Trulshig