A FEEDBACK CONTROL SOLUTION IN
IMPROVING DATABASE DRIVER
CACHING
RAVI KUMAR GULLAPALLI
Technical Expert Hewlett-Packard, Bangalore, India
Email: [email protected]
DR.CHELLIAH MUTHUSAMY
Academic Relations Head Yahoo India, Bangalore, India
Email:[email protected]
DR.A.VINAYA BABU
Director, Admissions JNTUH, Hyderabad, India
Email:[email protected]
RAJ NARAYAN MARNDI
Technical Specialist Hewlett-Packard, Bangalore, India
Email: [email protected]
Abstract:
The conventional cache replacement algorithms like LFU and LRU are explored for improvement in Web Caching and Operating Systems page caching. Although there are some researches in improving caching algorithms of database drivers, these techniques are limited to improve using Time-Series based feedback control. Although LFU, LRU cache replacement algorithms and some improved algorithms provide better utilization of database driver’s cache but they cannot provide the caching effectively in applications where statements accumulate huge frequency and are never used later. In which case, they remain in the cache without being accessed. We propose a modified LFU open loop control and Time-Series feedback control that address this problem. The cache-hit ratio is evaluated with these techniques and also compared with conventional LFU. We observed that a Time-series feedback control has better cache hit ratio compared to LFU.
Keywords: feedback-control, JDBC Driver, Statement caching, Pattern matching, LFU
1. Introduction
objective is to cache the statements that are repeatedly accessed. The LRU, LFU algorithms and improvements over these replacement techniques have been explored but the study of improving them in the case of Database drivers caching is limited. In this paper, we investigated on improving the database drivers caching mechanisms using control systems based techniques. We propose both open loop and feedback loop controllers and evaluate their performance against LFU.
The statement caching enables in avoiding the cursor creation, parsing and creation of statements repeatedly. These operations typically take more time and there by leading to low performance of the database applications when they are not cached. Typically the algorithms like LRU and LFU are in use for caching statements. In LRU, the frequency of a statement is not considered which is addressed by LFU. A trivial solution of an approach using access patterns is to retain statements that accumulate high frequency.
There are some improved techniques available which are implemented for caching in web proxy based on either pattern matching or control systems theory. The access patterns of the objects in the cache can be derived by determining the popularity ratings of the documents by using LFU-k [2] or computing weights of the cache objects based on their cost retrieval as discussed in [3]. Also, the access patterns are determined by evaluating on how pairs of objects in the cache are accessed [4]. A similar improvement is being discussed in [5] where the access patterns are determined by considering the request sequence of the objects. A different approach is proposed in [6] where an adaptive replacement technique proposed that maintains two different lists. The first list containing the objects that are at least accessed once recently and the second list maintains the lists of objects that are accessed at least twice recently. The dynamic access patterns of the objects in the cache are determined using a sliding window based approach in [7] but the document size variability is an important factor as it focuses on the web caching. In our approach the object size in the cache is not a factor to be considered. Though our solution has the concept of weights associated with the statements in the cache based on their access patterns, it is heuristic discussed to compute the weights is much simpler considering the database driver cache dynamics compared to the web cache systems. Additionally our approach is implemented in a phased manner that generates the history of the access that is used by Triple Exponential model [8] that has ability to read the trends in the history data in combination with feedback control mechanism.
In [9], the proxy cache contents are classified and a heuristic algorithm is used to allocate cache space. A similar and improved version is proposed in [10] where a dual control framework and online parameter identification is discussed to improve the caching web systems. In [11], an adaptive control is implemented for web cache systems in presence of system dynamic uncertainties and environment noises. Though these feedback control solutions are effective for improving caching mechanisms, they are more appropriate for web based environments that deal with more uncertainty and non-predictable. Hence we propose a more simplistic feedback control solution that is suitable for database driver cache. We propose a simple and less resource intensive solution to addresses the bottlenecks of LFU discussed above and improve the caching.
2. Modified LFU
We have proposed a new approach named “Open loop control” modifying, the LFU replacement strategy to consider the dynamic access patterns of the statements present in the cache. The Modified LFU controller implements a Rank Update Algorithm. Every statement has a rank associated with it which is a function of its weight. We have described below the procedures of calculating the weight of a statement.
The entire time duration (T) of the controller is divided into multiple fixed sized windows (Wn). Each window is divided into ‘n’ equal intervals. Weights are assigned to a statement based on its occurrence in each interval. During such an interval, if a statement is accessed then its weight is increased by a pre-fixed weight factor and if not accessed, it is reduced by another pre-fixed weight factor. As an experiment, we have chosen to divide each window into 4 intervals and the weight factor of 0.25 is added if the statement is accessed. If a statement is not accessed, then the weight factor is reduced by 0.05. The Rank Update algorithm is executed at the end of each window.
Count (C): This signifies the number of times a statement is accessed over a given period of time.
caching is decided using its weight unlike the LFU where only the frequency of a statement occurrence is considered. Each interval is used to increase or decrease the weight of the statements in the cache. For example if a statement that is present in the cache and is accessed in only 2 intervals of the 4 intervals within a window, then its weight is 0.4 (2 x 0.25 – 2 x 0.05) after that window. Whenever a statement is accessed, it will be verified if the statement is already present in the cache. If it is present in the cache, then its weight is incremented by 0.25 or that interval in the window, if it is not present in the cache, then the statement with lowest rank will be replaced with the new statement. Likewise the weights are calculated for the statements. The weights are computed after every interval and if it is found that any statement that is not accessed in that window then its weight is reduced by 0.05. The weight of a statement is always 0≤W≤ 1.
Rank (R): The rank will be computed based on the weight (W) of each statement.
R = cW
It represents the rank of a given statement in the cache
c = Proportionality constant
Figure: 1 Modified LFU Open Loop Controller.
2.1. Rank Update Algorithm
i) Define the time for which the Rank update algorithm to be executed as ‘T’.
ii) The time period T is divided into ‘m’ windows. Each window is divided into ‘n’ intervals.
iii) When a new statement comes, it is considered for further weight computation in subsequent intervals of the current window. At the end of the window period only, it is determined whether it needs to be in the cache.
iv) At the end of the window period, following steps are taken:
• For every interval in the window, for each statement in the cache as well as for all the new statements accessed, if the statement is accessed, its weight is increased by 0.25, otherwise it is decreased by a factor of 0.05
• The statements in the cache having less weight than that of new statements are replaced in cache with these new statements.
Figure: 2 Modified LFU Rank Update Windows
3. JDBC Driver Cache Feedback control
A Triple Exponential Smoothing feedback controller is implemented to tune controlling inputs to the cache. The controller will use the rank history supplied by the Modified LFU after time duration of ‘T’ such that enough rank history is obtained. This helps in determining the behavioral pattern of statements. The controller predicts the ranks of all the statements in the cache and computes the average rank prediction accuracy. The definition of the ranks of the statements is explained in the previous section
Figure: 3 JDBC Driver Cache Feedback Control.
For each window the average rank accuracy (Ravgp) is computed, which is a controlled input to the cache. The cache hit ratio is directly proportional to the no of successful hits which in turn depends up on the accuracy of the rank prediction or inversely depends upon the rank prediction error. Based on this relation, the Triple Exponential Controller predicts the rank of a statement considering the rank prediction error rather than considering the cache hit ratio error. The Triple Exponential smoothing uses its model to improve the prediction accuracy based on the rank prediction error. Considering these dependencies, the cache is modeled as shown below.
h(t + 1) = aht + bRavgp(t)
(1)
m
Ravgp=∑ Rip/m (2)
i=1
Rip = It represents the Rank prediction accuracy of statement
Ravgp = It represents the average of the Rank prediction accuracy
m = Maximum number of statements in the cache
h(t) = It represents the cache hit ratio of the current window
h(t + 1) = It represents the cache hit ratio of the next window
a and b represent the coefficients which will be tuned to improve the cache hit ratio.
The replacement strategy is decided based on the rank predictions. The trends and seasonality that lies with the statements access is handled by the Triple Exponential Smoothing controller. The Rank Prediction algorithm is executed for each loop using the rank history and the cache hit ratio is computed. Whenever a new statement occurs, the most recently predicted ranks list is used to identify the statement with least rank which will be replaced with the new statement. It is important to determine the most optimal values of a and b so as to achieve high cache hit ratio. Based on a training data obtained, these values are determined. Initially the value of b is kept constant and a is being varied for efficient cache hit ratio. Similarly keeping a fixed, the value of b is being modified.
3.1. Rank Prediction Algorithm
Below steps are followed for Rank Prediction algorithm that is based on Triple Exponential Smoothing predictor i) Get the rank history of all statements from Modified LFU.
ii) In each window, find out number of times the statement is accessed. iii) For each statement
• Compute the actual rank for the current window
• Predict the rank using the Triple Exponential Smoothing
• Compute rank prediction accuracy and error
• Compute average rank prediction accuracy
v) Compute the actual cache hit ratio for the next window vi) Go to the next window and repeat from step ii) to step v) 4. Implementation and Performance Analysis
The Modified LFU open loop control and Triple Exponential feedback controller are implemented using Java and Open Forecast API [12]. We have simulated results for about 100 SQL statements accessed randomly. The simulation has a set of statements that are accessed heavily initially such that they gain a high count and remain the cache for more time. Our implementation addressed this problem in both Modified LFU and Triple Exponential feedback control techniques. The cache hit ratio of these techniques is studied between them and also with the traditional LFU. We have not attempted to compare our solution with various other advanced caching techniques as they are less used in JDBC drivers but more in web caching solutions.
We fixed the JDBC driver cache size as 32 statements at any point of time. The results show the performance of the Modified LFU controller and the feedback controller. A simulated benchmark java application was used to perform these experiments. The application has about 100 SQL statements and it keeps generating up to ‘m’ different statements in each window. In our experiments the value of m is 10. But in the initial few windows there are a set of statements that are generated continuously in order to simulate the behavior to gain high count. The experiments are conducted with the Modified LFU control and with feedback control. The graphs in Figure 4 and Figure 5 show the percentage accuracy of cache hit ratio (Y-axis) against the number of windows (X-axis)
4.1. Cache Hit Ratio comparison of Modified LFU and LFU
The graph in Fig 4 illustrates cache hit ratio between LFU and Modified LFU that we implemented for about 500 windows. We are showing the results for about 30 windows only. The average cache hit ratio of Modified LFU is about 55% and LFU is about 45% when run against the benchmarking application generating the same statements.
Figure: 4 Cache Hit Ratio Modified LFU vs. LFU
4.2. Cache Hit Ratio comparison of Feedback Controller with Modified LFU and LFU
Before running the experiments with feedback control, the values of the constants, a and b from equation (1) have been tuned for the most optimal performance as a=0.5, b=0.74. The following illustrates that the cache hit ratio using the feedback control being significantly better compared to the other two techniques. The trend factors of Triple Exponential Model have been set as α=0.25, β=0.50, γ =0.75. The cache hit ratio is show against the number of iterations
The Figure 5 illustrates the cache hit ratio of the feedback control in comparison with the Modified LFU and LFU. The cache hit ratio initially has a settling time before it stabilizes. We observe within less than 10 windows our solution starts improving the cache ratio. This proves that our solution requires less time for stabilizing in providing a higher cache hit ratio. The average cache hit ratio after the settling time is around 80%. The implementation of the Triple Exponential algorithm as a feedback controller provides the ability to consider the usage pattern of the statements, their trends and seasonality factors.
5. Conclusion and Future Work
In the Rank update algorithm, we have considered that there will be up to 10 occurrences of different statements whose weight is being modified within an interval. Our experiments currently are limited to using the Triple Exponential Smoothing Controller. We want to determine the most appropriate controller for the JDBC driver for the benchmark applications that we used. Additionally we want to perform the steady state and stability analysis of our proposed solution. We also want to consider the miss ratio and analyze the pattern of such statements and improve feedback controller for reducing cache miss ratio. We also want to investigate the possibility of second level cache to store statements with relatively lesser frequency but occasionally used to improve the cache ratio. As part of our future work, we want to refine the constants chosen to compute the cache hit ratio for any kind of data set dynamically. Our results currently based on simulation environment and will be executing on popular open source databases like PostGres [13] in 3-tier environment running with any Application Server.
Acknowledgments
We thank Dr.AVN Krishna, Principal PJMSCET Hyderabad for his valuable review comments References
[1] JDBC Specification : http://www.oracle.com/technetwork/java/overview-141217.html
[2] Vladimir V. Prischepa : An Efficient Web Caching Algorithm based on LFU-K replacement policy, Spring Young Researcher’s Colloqium on Database and Information Systems, Russia, 2004
[3] A. Radhika Sarma and R. Govindarajan : An EfficientWeb Cache Replacement Policy, , In the Proc. of the 9th Intl. Symp. on High Performance Computing (HiPC-03), Hyderabad, India, 2003
[4] Patterns Ronny Lempel, Shomo Moran : A Simple Yet Robust Caching Algorithm Based on Dynamic Access, “Competitive Caching of Query results in Search Engines”, ACM Proceedings, Volume 324 , Issue 2-3 (September 2004), Pages: 253 – 271, 2004 [5] Gopal Pandurangan, Wojciech, Szpankowski:A Universal Online Caching Algorithm Based on Pattern Matching , IEEE Information
Theory, ISIT 2005
[6] Nimrod Megiddo, Dharmendra S. Modha : Outperforming LRU with an Adaptive Replacement Cache Algorithm, IEEE Computer, Volume: 37, pp.58-65. Issue: 4, 2004
[7] Wen-Chi Hou, Suli Wang : Size-Adjusted Sliding Window LFU - A New Web Caching scheme, SpringerLink, Volume 2113/2001,
2001
[8] Triple Exponential Smoothing : http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc435.htm
[9] Ying Lu, Avneesh Saxena, Tarek E Abdelzaher : Differentiated Caching Services; A Control-Theoretical Approach, Distributed Computing Systems, 2001. 21st International Conference, pp.615-622, 2001
[10] Keqiang Wu, David J. Lilja, Haowei Bai2: The Applicability of Adaptive Control Theory to QoS Design: Limitations and Solutions, Parallel and Distributed Processing Symposium, 2005. Proceedings. 19th IEEE International, pp. 8, 4-8, 2005
[11] Ying Lu, Tarek Abdelzaher, Gang Tao : Direct Adaptive Control of A Web Cache System, American Control Conference, 2003. Proceedings of the 2003, pp.1625-1630, 2003
[12] Open Forecast API : http://www.stevengould.org/software/openforecast/index.shtml