• No results found

Bitmap Index an Efficient Approach to Improve Performance of Data Warehouse Queries

N/A
N/A
Protected

Academic year: 2020

Share "Bitmap Index an Efficient Approach to Improve Performance of Data Warehouse Queries"

Copied!
5
0
0

Loading.... (view fulltext now)

Full text

(1)

International Journal of Emerging Technology and Advanced Engineering

Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 6, Issue 1, January 2016)

164

Bitmap Index an Efficient Approach to Improve Performance

of Data Warehouse Queries

Kale Sarika Prakash

1

, P. M. Joe Prathap

2

1Research Scholar, Department of Computer Science and Engineering, St. Peters Institute of Higher Education and Research,

St. Peters University, Chennai, India

2Associate Professor, Department of Information Technology, RMD College of Engineering, Chennai, India

Abstract— In today’s era, due to involvement of advanced

and scientific computing in all field of engineering and sciences the processing of application generates huge number of parameters related to that application. Data warehouse collects all such huge information related to same application from various external data sources. Decision support system work on this platform created by Data warehouse and help decision making process of organization. As the Data warehouse is huge and complex , effectively retrieving information from such a huge database is the challenge in front of the researchers. The nature of the queries to be fired on Data warehouse are Iceberg query, aggregation query, equality query and membership query. As data warehouse is huge ,directly executing query on such a huge dataset is not efficient way .In this case first up all we have to create index on Data warehouse and then we have to execute query on it .Four indexing types are mainly used in database system such as B Tree, R Tree ,Hash Index and Bitmap Index. Out of this first three are traditional techniques which are suitable for Online Transaction Processing system (OLTP) whereas Bitmap indexing is suitable for Online Analytical processing (OLAP) system. The performance of Data warehouse is depends upon the time required to execute query on it. Improving the performance of query in Data warehouse and decision support system is challenge in front of the researchers. In this paper we studied different indexing strategies available and through our experimental result we come to the conclusion that indexing technique base on Bitmap representation is the appropriate choice for Data warehouse query processing, which help to improve the performance of Data warehouse query.

Keywords— Data warehouse(DW) ,Iceberg query (IBQ),

Online transaction processing (OLTP), Online Analytical processing(OLAP),Bitmap Index(BI)

I. INTRODUCTION

Due to involvement of modern and scientific techniques in engineering and science application ,data generated by it is huge and complex. Data warehouse(DW) size and complexity is increased day by day. Data warehouse (DW) is a huge collection of dataset which is integrated, subject oriented ,time-variant and Non volatile [ 1].

Data warehouse(DW) collects the data from various external sources that data may be in different format or structure. Data warehouse brings it in common platform and provide it to Decision support System(DSS) ,which helps in decision making process of organization. Generally the query fired on warehouse is Iceberg query, aggregation query, equality query and membership query. Iceberg query contain aggregation function followed by having clause with some threshold condition. These queries work on huge database but actually answer to the query is from some selected record only. It completely depends upon where clause and threshold condition present in query. To extract such a small information efficiently from very huge dataset is the challenge in front of researchers. Various researchers has worked on to improve efficiency of Iceberg query using Bitmap indexing Technique [2,3,4,5,6].

Generally all the queries to be execute on DW are complicated and interactive in nature . Such a complicated queries require several minutes or hours or days to execute because huge amount of data to be scanned from DW. After scanning complete huge dataset as per the requirement of query the answer is generated .In this case scanning whole database and along with it processing the same is time consuming process. To improve efficiency of Iceberg query or any DW query is the challenging task.

Various researchers has proposed different strategies to improve the performance of DW query. These strategies are summary table generation and indexing[1,8,9].

(2)

International Journal of Emerging Technology and Advanced Engineering

Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 6, Issue 1, January 2016)

165 Accessing complete database, retrieving and aggregating actual data as per requirement of query reduces the DW query performance. For every such a new query follow all these steps is a time consuming approach[7].

Indexing is generally used in database system to increase the performance of database query. Index is created on particular columns of database, it helps in searching the tuples quickly. There are different types of indexing like B tree, Bitmap indexing etc. The B Tree data structure is widely used in relational database system. But it represents sorted data and perform different operations like insertion,deletion ,modification and retrieval of tuples. As this data structure works on sorted data it is not suitable for handling huge or large database like data warehouse .It is suitable for OLTP System. However Data warehouse works on read only mode, bitmap indexing takes advantage of this property of Data warehouse [1,8,9].

Modern data warehouses stores high dimensional data generated from different scientific, engineering applications. Due to this high dimensionality and huge data set time required for query processing is increases drastically. As DW plays vital role in decision making process of organization ,it generally required very small data from huge data set. Bitmap indexing is suitable for DW due to this two reasons: First due to the properties of DW like huge data base, read only and updates are not frequent. Second the nature of query to be executed on DW i.e. Extraction of small information from huge set.

Bitmap Indexing is common technique for processing complex ,multidimensional and adhoc queries, different commercial DBMS products introduces bitmap indexing in it like Sybase, IBM, Red Bricks and Oracle[10] .

In this paper we are focusing on the suitability of Bitmap indexing for DW query. Our experimental results shows that BI is suitable for Data warehouse (DW),DSS and OLAP System.Remaining paper is organized in following fashion. Section II focus on Bitmap Indexing, section III Comparison between B Tree and Bitmap indexing and in Section IV we have present the experimental results to measure the performance of Bitmap Indexing ,Finally section V is conclusion of our study is presented.

II. BITMAP INDEXING TECHNIQUE

Bitmap index was implemented first time in Model 204 DBMS[10]. It consists of b number of bitmap vector which is the number of unique value present in respective column. The b is the cardinality value of attribute column. Bit map vector is the combination of binary number zero’s and one’s. In bit map vector v bit value = 1 if it associate with respective bitmap else it is zero.

Table I shows simple relation and table II shows the bitmap vector created on Position column . The cardinality of bitmap vector on position attribute is ={ First,second,Third}.

TABLE I BASIC RELATION

Id Name Position Age Location

1 Sudha First 10 Pune

2 Subodh First 20 Mumbai

3 Radha Third 15 Nasik

4 Megha Second 20 Aurangabad

5 Priya Third 12 Pune

6 Anu Second 23 Pune

7 Ram First 22 Aurangabad

[image:2.612.338.546.183.659.2]

8 Raj Third 33 Mumbai

TABLE II

BITMAP INDEX ON POSITION COLUMN

We can execute queries like Iceberg Query (IBQ), equality queries, membership queries and aggregation queries on bit map vector. To generate result of IBQ we have to make use of OR and XOR operation.

Id First Second Third

1 1 0 0

2 1 0 0

3 0 0 1

4 0 1 0

5 0 0 1

6 0 1 0

7 1 0 0

(3)

International Journal of Emerging Technology and Advanced Engineering

Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 6, Issue 1, January 2016)

166 To generate result of an equality queries the value mentioned in predicate condition is read into memory. Whereas in case of membership queries if more than one bitmap vector is present then Bitwise OR operation is performed n-1 time on complete bitmap vector, where n is the size of bitmap vector[11].

Bit map Index uses all distinct value of the attribute to be index as key value to each index. It then generates the bitmap for each Key value. Further for query processing this bitmap index values are used and results are generated by using different Boolean operation like AND,OR and XOR [10].The size of bitmap index is relatively small in case of low cardinality attributes like Gender ={ male,female }.In case of values generated after every transaction is more the cardinality will be very high. In such a type of situations the bitmap index is designed with Bins [12].In this case attribute values are partition into number of bins and generates bitmap vectors to represent bins rather distinct keys. Due to this approach binning reduces storage space but increases access time in case of boundary value accessing. If simple bitmap is build on high cardinality attribute then it requires more memory space and query processing time will be increased[11].

Most of the researchers have worked to overcome this

problem.They developed this two strategies:1)

Implementation of different Bitmap Index compression

algorithm 2)Algorithms for Bitmap Index

extension[13].Under first method two types are developed World Aligned Hybrid(WAH)[13,14] and Run length Huffman(RLH) .Both this techniques significantly reduces the storage space for bitmap index. But direct query processing is not possible on such a compressed bitmap. So before executing it must decompressed first and then query processing will perform on the same, which is again a time consuming process. Whereas in second approach each bitmap index is encoded with number which extends the representation format but space required to store is small and data access time is also get reduced [7].As storage space and data access problem is overcome by above research, bitmap indexing(BI) approach is well suitable for high cardinality attributes .

Generally scientific queries involved with

multidimensional query processing. As it is huge data and is used for analysis purpose it is not frequently updated almost till the end of analysis work it remains same. The nature of such a data is read only, so once bitmap index is created it remain fixed for all type OLAP query on respective attribute. Initial cost of bitmap creation is involved but after first iteration time required to execute query will get reduced drastically.

In our results we observe that if we execute the query repeatedly on same BI then time required to execute get reduced. Through our experimental result we noticed that Bitmap Indexing is suitable choice for DW query processing.

III. COMPARISON BETWEEN BTREE AND BITMAP

INDEXING

Different Database Management System like

[image:3.612.307.581.339.713.2]

Oracle,IBM Sybase etc. uses B Tree as a normal indexing for relational databases. They have included Bitmap Indexing strategies also. But selection of indexing techniques as per the need of application is important factor. This helps to get query result fast. Table IV shows the difference between B Tree and BI.

TABLE III

DIFFERENCE BETWEEN B TREE AND BI

Sr.No. B Tree Index Bitmap Index(BI)

1. Balanced tree structure is used in B-Tree for efficient record retrieval.

Bitmap uses matrix structure with zero and one bit value.

2. B Tree Indexing does not

index NULL Values NULL values are indexed in Bitmap Indexing 3. Update,delete and insert

operation does not affect the structure of B Tree.Based on value balanced tree get modified.

Update,delete and insert operation completely affect the structure of Bitmap index.After any operation new bitmap index has to prepare.

4. B-Tree index updates on Key values has relateviely inexpensive.

Bitmap Index update, delete has more expensive.

5. In this case optimizer are not directly work on boolean operations.

Optimizers answers queries containg AND,OR or XOR operation.

6. B tree is suitable for Online Transaction Processing Sytem where databases are updated frequently.

It is suitable for Data warehouse and decision support system where database is not updated frequently.

7. Performance of B Tree Degrades for nested queries

Bitmap Index performance is better than B tree when queries are nested and complicated.

8. It is suitable relational databases where frequent database updates are necessary.

It is suitable for Static databases where updates are not so frequent..

9. Compare to Bitmap indexing it is slow as it is not directly work on boolean operations.

Bitmap indexing is fast as it work on boolean operations which are directly

(4)

International Journal of Emerging Technology and Advanced Engineering

Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 6, Issue 1, January 2016)

167 IV. EXPERIMENT TO MEASURE PERFORMANCE OF

INDEXING

To measure the performance of Indexing strategy we perform the experiment on databases with 5 lack ,10 lack and 15 lack tuples. On this variety of databases we execute the queries of different complexity. Initially we consider simple query with only one condition. We execute it on all three databases.In all these query performance in terms of time required to execute is recorded. Same query is executed seven to eight times and results are recorded for analysis.Same procedure is repeated for all the queries. We perform the experimentation on Oracle 11g database . Our experimentation is based on following three cases.

Case1: Simple Query with one condition only.

Input Query: Select products,states,count(*) from sales group by products,states HAVING count(*) >= 100

This query is executed on three different databases like sales1 = 5 lack tuples ,sales =10 lack tuples and sales2 = 15 lack tuples. The time required to execute this query on all databases is summarized in Table V and graphical representation of the same is shown in Figure 1.

In all these cases performance of Bitmap indexing is good compare to B tree and without using indexing strategy. The performance of Bitmap indexing is excellent in case of large database. The time is directly reduces from 1000 m sec to 184 msec,145 msec,121 msec after each iteration time in bitmap indexing get reduced repeatedly. This is the important feature which proves applicability of Bitmap indexing in Data warehouse query processing.

Case2: Query with two condition.

Input Query: select products, states, count(*) from sales2 where states='Bihar' group by products, states Having count(*) >= 100

This query is executed on three different databases like sales1 = 5 lack tuples ,sales =10 lack tuples and sales2 = 15 lack tuples. The time required to execute this query on all databases is summarized in Table VI and graphical representation of the same is shown in Figure 2. The performance of Bitmap indexing is excellent in case of large database. The time is directly reduces from 1000 m sec to 105 msec,22 msec. Compare to case 1 even though the complexity of query increases then also time required to execute query get reduced.From this observation we can say that database size and query complexity does not affect the performance of Bitmap Indexing rather the performance is improved.

Case 3: Query with three condition,

Input Query: select products, states, count(*) from sales2 where states='Bihar' AND products='garlic' group by products, states Having count(*) >= 100

This query is executed on three different databases like sales1 = 5 lack tuples ,sales =10 lack tuples and sales2 = 15 lack tuples. The time required to execute this query on all databases is summarized in Table VII and graphical representation of the same is shown in Figure 3.

The performance of Bitmap indexing is excellent in case of large database. The time is directly reduces from 1000 msec to 24 msec. Compare to case 1 and case 2 even though the complexity of query increases then also time required to execute query get reduced. This final observation is very useful to prove suitability of Bitmap indexing for data warehouse query processing.

[image:4.612.321.585.332.639.2]

FIG I CASE1: QUERY WITH ONE CONDITION

(5)

International Journal of Emerging Technology and Advanced Engineering

Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 6, Issue 1, January 2016)

[image:5.612.47.306.118.279.2]

168 FIG III CASE1: QUERY WITH TWO CONDITION

In all above case1,case2,case3 the performance of Bitmap indexing is better in terms of time required to execute the query. After analyzing this result we have following observations:

1.The performance of BI is better in all

case1,case2,case3 on different size of databases. 2.On large size database( i.e. sales2) with all cases i.e

case1,case2,case3 the performance of BI is good and comparatively better in case of large size database. 3.We noticed that after every iteration time required to

execute the query get reduced.

4.Even though the complexity of query increases then also in case of BI time required to execute query is decreases.

V. CONCLUSION

In this paper suitability of Bitmap Indexing for increasing speed of DW query is proved. A critical issue for DW system is ability to answer queries quickly. Bitmap Indexing strategy is proved to be time efficient for answering Data warehouse queries by performing fast Boolean operation on index level instead of executing queries on main database. Bitmap indexing is suitable for data warehouse because of static and read only nature of Data warehouse. Also the data warehouse is not frequently updated. Because of all this properties of Data Warehouse BI is suitable for it. In this paper we perform analytical and experimental study of Bitmap Indexing and B Tree Indexing strategies. We compare the results of BI with B Tree index with and without using indexing strategy. From our experimental result we come to the conclusion that Bitmap indexing performance is good compare to B Tree and without using indexing strategy.

We noticed that Bitmap indexing performance is better even though the database size and query complexity increase. Thus we can say that Bitmap indexing is suitable approach for data warehouse queries like Iceberg query, membership query and equality query.

REFERENCES

[1] S.Chaudhari and U.Dayal, “An Overview of Data Warehousing and OLAP Technology”,ACM SIGMOD RECORD ,Vol.26,pp.65-74,1997

[2] Bin He,Hui-l Hsiao,Ziyang Liu,Yu Huang and Yi Chen,” Efficient Iceberg Query Evaluation Using Compressed Bitmap Index”,IEEE Transaction on knowledge and data engineering,Vol.24,No.9, September 2012 .

[3] C.V.Guru Rao, V. Shankar,”Efficient Iceberg Query Evaluation Using Compressed Bitmap Index by Deferring Bitwise- XOR Operations “ ,978-1-4673-4529-3/12/$31.00c 2012 IEEE.

[4] C.V.Guru Rao, V. Shankar, “Computing Iceberg Queries Efficiently Using Bitmap Index Positions” ,DOI: 10.1190/ICHCI- IEEE.2013.6887811 Publication Year: 2013 ,Page(s): 1 – 6. [5] V.Chandra Shekhar Rao,P. Sammulal,” Efficient Iceberg Query

Evaluation Using Set Representation”, IEEE India Conference(INDICON) 2014.

[6] Vuppu.Shankar, C.V.Guru Rao,”Cache based evaluation of iceberg

queries”,IEEE ICCCT 2014,

2014,DOI:10.1109/ICCCT.2014.7066694

[7] Amorntep Keawpibal,Niwan Wattanakitrungroj and Sirirut Vanichayobon,”Enhanced Encoded Bitmap Index for Equality Query”,IEEE ,Publication Year 2012, Pg.no.-293-298.

[8] P.O’Neil and D.Quass,”Improved Query Performance with Variant Indexes”,Proceeding 1997 ACM SIGMOD International conference on Management of Data,pp.38-49,1997.

[9] C.Y.Chan and Y.E.Ioannidis,”Bitmap Index Design and Evaluation”,ACM SIGMOD International conference on Management of Data,pp.355-366,1988

[10] P.O’Neil.Model 204 Architecture and Performance,2nd International

Workshop in High Performance Transaction System,USA,1987,Springer-Verlag.

[11] Weahason Weahama,Sirirut Vanichayobon and Jarin Manfuekphan,”Using Data Clustering to optimize Scatter Bitmap Index for Membership Queries”,IEEE International Conference on Computer and Automation Engineering 2009.

[12] K.Stockinger,K.Wu, and A.Shoshani,” Evaluation Strategies for bitmap indices with Binning”,International Conference on Database and Expert System Applications(DEXA),September 2004,Springer-Verlag

[13] M.Stabno and R.Wrembel,”RLH:Bitmap Compression Technique Based on Run-length and Huffman Encoding”,Information Systems,Vol 34,400-414,2009.

Figure

TABLE II BITMAP INDEX ON POSITION COLUMN
TABLE  III DIFFERENCE BETWEEN B TREE AND BI
FIG II CASE1:  QUERY WITH TWO CONDITION
FIG III CASE1:  QUERY WITH TWO CONDITION

References

Related documents

equipment storage systems, third party logistics, sorting systems, engineering consulting, picking systems, delivery systems, information equipment and software, industrial

Separate C-curves were indicated for upper and lower bainite*, and this is * Throughout this study upper bainite is defined as an aggregate of bainitic ferrite and high-carbon

Type of childcare Age covered Ofsted registered Offer free early education Opening hours Costs: average from.. Childminders Birth to 16 years Yes Some Network Childminders

In this paper, closed-form formulae for the Kirchhoff index and resistance distances of the Cayley graphs over finite abelian groups are derived in terms of Laplacian eigenvalues

Next, we detail the seven different types of data provided by this dataset: (1) word similarity datasets (benchmarks) in text-based CSV fi le format; (2) pre-trained word embedding fi

This function enables reading out parameter adjustment data as well as writing parameter adjustment data into the sensor via the display and adjustment module. A description of

“Cuando hablamos de una economía del conocimiento, estamos hablando de “Cuando hablamos de una economía del conocimiento, estamos hablando de como mejorar procesos o sistemas para