6.3 Experiments
6.3.6 Experimental results
For each of the three scenarios, simulations were run 10 times. Each simulation was run for 9000 requests of 3000 possible data entities. All data displayed below are average data on 10 simulations.
Table 6.3 shows the results of the execution of scenario 1 (no shared storage). Data are provided for each cache instance.
Rows in table are:
Replacements : percentage of data evicted by the replacement method
Requests : ratio (percentage) between the number of requests (get + retrieve) received by the cache and the total number of requests processed by the system Individual HR : percentage of the requests that produced a cache hit (individual
hit rate)
Participation : participation of the cache to the total hit rate, i.e. ratio between the number of local hits and the total number of hits (percentage)
Several points should be noticed about these results :
• the activity of the GCS instances is very variable. This is an expected conse- quence of the Poisson distributions that are used (locality patterns)
• some GCS instances (e.g. 1, 2) show a very low activity both in terms of number of processed requests and nb of replacements
• 4 GCSs (4, 5, 6, 7) show a high replacement rate, i.e. they spend much time for managing their storage space
Cache 1 2 3 4 5 6 7 8 9 10 Replacements(%) 0.1 3.7 10.1 14.7 17.6 16.6 15.1 11.6 7.0 3.2 Requests(%) 0.7 3.4 8.8 14.4 18.1 18.2 15.0 10.6 6.8 3.6 Individual HR (%) 12.3 34.7 58.2 67.9 70.3 70.6 68.5 62.7 52.5 37.1 Participation(%)1 0.1 1.8 8.0 15.3 20.0 20.2 16.2 10.4 5.5 2.1
Table 6.3: Experimental results by cache in base scenario Average data for 9000 requests executed 10 times
A clear consequence of these remarks is that some load balancing procedure should be implemented in order to optimize the global performance of the system. A system in which 40% of the participating entities assume 70% of the work is definitely not well balanced.
This argues in favour of the implementation of a global coordination of the cache instances. This coordination suppose that :
• the system is able to monitor all the participating cache instances: cf. table 6.3
• the system is able to change the behaviour of the participating cache instances • the system implements some load balancing and performance improvement
heuristics.
This third issue is definitely out of the scope of this thesis. It has been addressed in our team from two different points of view :
• from a semantic and collaborative point of view (see David Coquil’s PhD [35]): data usually show semantic correlations that replacement heuristics and cache collaboration protocols can use to optimize the hit rate;
• from the grid infrastructure point of view (see Julien Gossa’s PhD [66]): GCS instances should be placed and cached data should be placed/duplicated/migrated according to the actual operational conditions (e.g., network bandwidth and latency, CPU charge. . . ).
We refer interested readers to these theses and to the extensive bibliography they propose. In our simulations, as explained before, we implemented two basic heuris- tics, the so-called scenarios 2 and 3.
Implementing cooperation and optimization heuristics supposes, as noted above, that the system is able to monitor and modify the behaviour of the participating
cache instances. For instance, scenarios 2 and 3 require that the coordinator can demand that GCSs reserve storage space for remote data.
We therefore run two additional simulations to illustrate these two scenarios. In these simulations, the coordinator invokes the SetStorage() configuration operation (see appendix D.3.4) to change the storage capacity according to scenario 2 (reduc- tion of the capacity of all the instances by 10%) and scenario 3 (reduction of the capacity of the five less active instances). As noted before, both scenarios reduce the global storage capacity by 10%. This 10% reserved space is “frozen” and not used for storing data. We then measured the cost in terms of hit rate and number of replacements of this storage preemption.
Table 6.4 shows the performance results (average data + standard deviation) mea- sured by these simulations.
Several points can be noticed :
• as expected, reducing the storage capacity of some cache instances reduce the global hit rate and increases the number of replacements
• this overcost is much higher in scenario 2 than in scenario 3 : this is also an expected result : adapting the reduction wrt the load of the GCS is more effective than implementing a uniform reduction
• in scenario 3, the overcost is entirely undertaken by the five less active caches • in terms of hit rate, this overcost is 3,9% for scenario 2 and 0,4% for scenario
3
In other words, these results show that one can preempt 10% for remote data storage for a cost of 0,4%. For a grid site and GCS instance administrator, such a cost is definitely affordable.
These preempted 10% of the total storage space are available for the system coor- dinator to optimise the performances of the system.
As noted above, defining optimization heuristics is not the subject of this thesis. We have mentioned research works (within or out of our team) that specifically address this issue. From these works, some basic recommendations can be made :
• optimization strategies should be based on the monitoring and identification of access patterns : when a data is requested from very distant site, a duplication can be very efficient [67]; when data are correlated, prefetching techniques can highly improve the hit rate[35]
Scenario No shared Uniform storage Adapted storage storage preemption preemption Value average std dev average std dev average std dev Replacements 612 .8 29.6 771.5 33.45 682 23.05 Successful hits 5746.9 48.13 5523.3 92.84 5725 89.79
Table 6.4: Preemption of storage space Performance for 9000 requests executed 10 times
• the focus must be put on overloaded cache instances ; the preempted storage space must be used to reduce the load of the most active caches by redis- tributing some cached data (using data remote storage, data migration or data duplication)9
• grids are very dynamic and heterogeneous platforms : data redistributions must be done and GCS reconfigurations must be decided with respect to the ac- tual operational conditions (network bandwidth and latency, CPU charge. . . ) [67]10