Conclusion - Heuristics for Selecting Robust Database Structures with Dynamic Query Patterns AB

We began this paper by emphasizing that the success of a company increasingly depends on timely information (internal or external) being available to the right person at the right time. Achieving such a “right time/right place” duet depends directly on database performance. A database system has been a core component that supports modern business system such as enterprise resource planning (ERP) system that integrates and supports all enterprise processes including product designing and engineering, manufacturing, and other business functions to achieve highest efficiency and effectiveness of operations. The task is complicated by these dynamic activities that have become an integral part of a firm’s operations. Our purpose here is to set forth database design procedures that identify robust database structures that perform well across diverse and uncertain queries, i.e., situations such as that just described.

Delivering “right time/right place” accurate data is a challenge. It is that challenge that our work begins to address by developing and implementing a new “query-driven” heuristics for database design that seeks to identify database structures that perform robustly in dynamic settings. Our heuristics begins with detailed analysis of relationships between diverse query sets and the performance of different database structures. These relationships are then used in a series of steps that identify robust database structures that maintain high performance levels for a wide range of query sets.

Adapting Wiederhold’s six-step process model [44], we can summarize our procedures as follows:

1) Specify the desired behavior of an object.

Design a database system that matches query-processing need and provides timely query responses to assist crucial decision-making by explicitly considering queries and query properties.

2) Select reasonable building blocks and tools.

Develop different query classification schemes and database structure grouping methodologies.

3) Use the model to evaluate which combinations produce the desired result.

Set guidelines to select the best query classification approach (Section IV) and apply correlation analysis to group database structures (Section V).

4) Select the most effective combination of building blocks and tools.

Identify "selectivity factor" approach as the most reasonable approach to classify queries into different complexity levels and identify "0.95 correlation level among database structure using the selectivity factor classification" method as the most reasonable way to group database structures.

5) Build it as well as possible.

Identify and build robust efficient database structure(s) for a database system that can timely process queries with a wide range of complexity.

6) Observe and monitor its behavior.

Continuously observe and monitor query changes and database performance to seek improvement and modifications.

We are performing measurement experiments that serve as inputs into our analysis and identification of robust database structures. It is important to point out that we can also use this experimental, measure-based approach to identify which views to materialize. As described in [46], the VM problem is:

given a set of known queries;

given a set of possible views to materialize;

given well defined processing costs and maintenance costs; select the views to materialize to minimize total costs.

Instead, because of our focus on highly dynamic environment, we don’t start with a given set of queries and well-defined costs. We look at a much wider universe of queries and directly conduct cost measurements through extensive ex ante experiments. Similarly to [46] we look at materializing views that include JOINS. In the approach we have begun here we (1) attempt to look at query complexity classes that span the possible set of queries that pertain to an

information context; (2) consider all feasible materializations of table JOINS in that information context; and, (3) instead of assuming cost structures for the processing of queries, we begin by directly measuring them experimentally.

While we have emphasized the detailed measuring of alternative processing costs, we also realize the importance of delving further to incorporate costs of maintenance and restructuring. In fact, our ongoing work includes the formal modeling and analysis of such critical cost relationships beyond assumption of [22] and [45] that the cost of answering a query can be represented by the number of rows present in the table that used to construct the query.

The results in this paper provide the initial steps necessary to develop a dynamic database system capable of effectively accommodating significant query set shifts. This suggests the next research question on which we are currently working, that is, can we develop intelligent learning tools that identify imp ortant shifts in query sets and identify opportunities where a firm can gain significantly by altering the maintained data structure (i.e., changing the maintained database design)? A dynamic database system should be able to detect changes in complexity of query sets and adopt a new database structure that is most efficient for the new query-processing needs.

Alternatively, would maintaining a set of parallel database structures be preferable? Would there be net gains from assigning arriving queries to the best of the maintained structures? In either case, analysis of options would include query-driven analysis as presented here. Choice of implementation mechanism would depend on several factors, including implementation context (web, intranet, etc.) and implementation/operating costs. Our current research involves the development of a dynamic database restructuring approach that includes and relies upon formal modeling of the critical cost relationships.

References:

[1] S. Agrawal, S. Chaudhuri, and V. Narasayya. "Automated Selection of Materialized Views and Indexes for SQL Databases." Proceedings of the 26th International Conference on Very Large Databases, Cairo, Egypt, 2000.

[2] R. Avnur and J. M. Hellerstein. "Eddies: Continuously Adaptive Query Processing." Working paper, University of California, Berkeley, 2000.

[3] D. Bai, T. Carpenter, and J. Mulvey, "Making a Case for Robust Optimization Models,"

Management Science, Vol. 43, No. 7, 1997.

[4] R. Barquin and H. Edelstein, Planning and Designing The Data Warehouse, edited, Prentice Hall, 1997.

[5] M. Blasgen and K. Eswaran, "On the Evaluation of Queries in a Relational Database System," IBM Systems Journal, Vol. 16, No. 1, 1976.

[6] D. Botzer and O. Etzion. "Optimization of Materialization Strategies for Derived Data Elements." IEEE Transactions on Knowledge and Data Engineering. 8, 260–272, 1996. [7] R. Chaturvedi and A. K. Choubey, "Scheduling the Allocation of Data Fragments in a

Distributed Database Environment: A Machine Learning Approach," IEEE Transactions on Engineering Management, Vol. 41, No. 2, pp. 194-207, 1994.

[8] S. Chaudhuri, R. Krishnamurthy, S. Potamianos, and K. Shim. "Optimizing Queries with Materialized Views." Proceedings of the 11th International Conference on Data

Engineering, 1995.

[9] S. Chaudhuri, U. Dayal, and W. Ganti. "Database Technology for Decision Support Systems." IEEE Computer, 34, 48–55, 2001.

[10] A. Chen. Improving Database Performance in a Changing Environment With Uncertain and Dynamic Information Demand: An Intelligent Database System Approach. Doctoral Dissertation, University of Connecticut, 1999.

[11] A. Chen, P. Goes, and J. Marsden. "A Query-Driven Approach to the Design and

Management of Flexible Database Systems." Journal of Management Information Systems,

19:3, 121–154, 2002.

[12] P. Chen, "The Entity Relationship Model - Toward a Unified View of Data," ACM Transactions on Database Systems, Vol. 1, No. 1, 1976.

[13] E. Codd, "A Relational Model for Large Shared Data Banks," Communications of the ACM, Vol. 13, No. 6, 1970.

[14] E. Codd, Further Normalization of the Data Base Relational Model, in Rustin, 1972. [15] G. Dodd, "Elements of Data Management Systems", ACM Computing Surveys, Vol. 1, No.

2, 1969.

[16] R. A. Feinauer, A Mechanism for Natural Language Database, Doctoral Dissertation, University of Cincinnati, 1985.

[17] P. B. Goes, R. D. Gopal, and A. N. K. Chen, “Query Evaluation Management Design and Prototype Implementation,” Decision Support Systems, Vol. 19, pp. 23-42, 1997.

[18] G. Graefe. "Query Evaluation Techniques for Large Databases." ACM Computing Surveys,

25, 73–170, 1993.

[19] J. Grant, J. Gryz, J. Minker, and L. Raschid. "Logic-Based Query Optimization for Object Databases." IEEE Transactions on Knowledge and Data Engineering. 12, 529–547, 2000. [20] S. Greene, S. Devlin, P. Cannata, and L. Gomez. "No IFs, ANDs, or ORs: A Study of

Database Querying." International Journal of Man-Machine Studies,32, 303-326, 1990. [21] E. Hanson. "A Performance Analysis of View Materialization Strategy." Proceedings of

ACM SIGMOD International Conference of Management of Data, San Francisco, 1987. [22] V. Harinarayan, A. Rajaraman, and J. Ullman, "Implementing Data Cubes Efficiently,"

Proceedings of the ACM SIGMOD International Conference of Management of Data, Canada, 1996.

[23] J. M. Hellerstein, M. J. Franklin, S. Chandrasekaran, A. Deshpande, K. Hildrum, S. Madden, V. Raman, and M. A. Shah. "Adaptive Query Processing: Technology in Evolution." Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, 23, 7–18, 2000.

[24] W. H. Inmon, Building the Data Warehouse, John Wiley, New York, 1996.

[25] Z. G. Ives, A. Y. Levy, D. S. Weld, D. Florescu, and M. Friedman. "Adaptive Query Processing for Internet Applications." Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, 23, 19–26, 2000.

[26] M. Jarke and J. Koch. "Query Optimization in Database Systems." ACM Computing Surveys, 16, 111–152, 1984.

[27] K. Jarvelin, “A Methodology for User Charge Estimation in Numeric Online Databanks, Part I,” Journal of Information Science, Vol. 14, pp. 3-16, 1988.

[28] K. Jarvelin, “A Methodology for User Charge Estimation in Numeric Online Databanks, Part II,” Journal of Information Science, Vol. 14, pp. 77-92, 1988.

[29] W. Kent, "Limitations of Record-Based Information Model," ACM Transactions on Database Systems, Vol. 4, No. 1, 1979.

[30] J. King, Query Optimization by Semantic Reasoning, Doctoral Dissertation, Stanford University, 1981.

[31] M. Laguna, "Applying Robust Optimization to Capacity Expansion of One Location in Telecommunications with Demand Uncertainty," Management Science, Vol. 44, No. 11, 1998.

[32] Microsoft Corporation. "Query Optimization Techniques: Contrasting Various Optimizer Implementations with Microsoft SQL Server."

Microsoft SQL Server (General) Technical Articles, 1992, http://msdn.microsoft.com/archive/default.asp?url=/archive/en- us/dnarsqlsg/html/msdn_qryoptim.asp

[33] P. Mishra and M. H. Eich. "Join processing in relational databases." ACM Computing Surveys, 24 (1), 63–113, 1992.

[34] J. Mulvey, R. Vanderbei, and S. Zenios, "Robust Optimization of Large-Scale Systems,"

Operations Research, Vol. 43, No. 2, 1995.

[35] P. Ng, "Further Analysis of the Entity-Relationship Approach to Database Design," IEEE Transactions on Software Engineering, Vol. 7, No. 1, 1981.

[36] D. A. Schneider and D. J. DeWitt. "A Performance Evaluation of Four Parallel Join

Algorithms in a Shared-nothing Multiporcessor Environment." SIGMOD Record, 110–121, 1989.

[37] D. E. Shasha. Database Tuning: A Principled Approach, Prentice Hall, 1992.

[38] D. E. Shasha. "Tuning Databases For High Performance." ACM Computing Surveys, 28, 113–115, 1996.

[39] M. Stillger, G. Lohman, V. Markl, and M. Kandil. LEO – DB2's LEarning Optimizer. IBM Corporation, 2001.

[40] V. C. Storey and R. Goldstein. "A Methodology for Creating User Views in Database Design." ACM Transactions on Database Systems, 13, 305–338, 1988.

[41] T. J. Teorey and J. P. Fry, Design of Database Structures, Prentice-Hall Inc., Englewood Cliffs, NJ, 1982.

[42] J. C. Thomas and J. D. Gould, “A Psychological Study of Query by Example,”

[43] S. Tillett and J. Schwartz. Delta syncs data, ops. InternetWeek, June 25, 2001. [44] G. Wiederhold and R. Elmasri, The Structural Model for Database Design, in ER

Conference, 1979.

[45] G. Wiederhold, File Organization for Database Design, McGraw-Hill, New York, 1987. [46] J. Yang, K. Karlapalem, and Q. Li, "Algorithms for Materialized View Design in Data

Warehousing Environment," Proceedings of the 23rd VLDB Conference, Athens, Greece, pp. 136-145, 1997.

In document Heuristics for Selecting Robust Database Structures with Dynamic Query Patterns ABSTRACT (Page 35-41)