• No results found

Retrieving Local Popular Items to Cache in the Infrastructure

4.9 Implementation Details

4.9.3 Retrieving Local Popular Items to Cache in the Infrastructure

Initially described in Section4.5, each T h y m e - I n f r a instance periodically checks its localPopularityInformation table, which contains references to each object downloaded by its local clients as well as the number of retrievals, decides the most popular items for each world, sends download requests to those items and then periodically batch-inserts the received items in theLocal Popularity Cache. This process of Retrieving Local Popular Items to Cache (RLPIC) in the infrastructure is composed of two different timers to coordinate the periodic mechanisms: popular decision and batch-insert timers (Fig.

4.17).

Popular Decision Timer (PDT). After the arrival of the firstcell data message, the sys-tem will schedule this timer for repeated fixed-rated executions, with regular intervals between them, after a specified delay (this value is configurable by the administrator, see Section4.9.9). When this timer fires, the process will always increment the global RLPIC counter but will purposely skip the first execution in order to give the system and its clients enough time to create meaningful and considerable data before checking the pop-ularity of the published items to offload to the infrastructure. On further executions, the T h y m e - I n f r anode will go through itslocalPopularityInformation table, determine the top most popular data among its local clients (Section4.9.3.1) and broadcast a download message to each individual items’ replicas. This RLPIC counter represents the current popular decision cycle and is mainly used to guarantee that the items requested within one popular decision cycle arrive before the next one. To check for this, when the infras-tructure node sends a local download request as part of this process, it also associates to that request the current counter. For instance, if the current RFLPIC counter is 10 and one of the requested items arrive at the infrastructure with the counter also at 10, that item will be accepted and saved to be batch-inserted in the near future. On the other hand, if the current RLPIC value is 12 and an incoming item has 10 in its associated

cycle counter, meaning that the replica took too long to process the infrastructure request due to being busy or resourcethrottled (for instance), it will be ignored by the T h y m e -I n f r ainstance since its popularity index could have changed between cycles 10-12 and no longer considered popular and worthy of cache space.

Batch-Insert Timer. After the execution of each Popular Decision Timer’s cycle, another timer, the Batch-Insert timer, will be scheduled to fire in a predetermined amount of regular-intervaled executions (values also configured by the administrator - Section4.9.9) until the next PDT activation. This timer has the goal of batch-inserting already received items that were requested by the previous process into the Local Popularity Cache and trigger the execution of the cache’sAdaptive Multipart Caching algorithm, as explained in Section4.7.2. We believe that this batch-insert approach offers the best benefits both in terms of user experience and performance. The other two considered alternatives were:

1. Inserting individual items into the cache as they arrive at the infrastructure, which would make entries more readily available for theAP’s clients but would have a considerable impact on performance since theAMC would have to be executed for every single insertion and, depending on its properties, individual puts could have minimal to no changes on the outcome of Equation4.1;

2. Inserting all received items right before the next execution of the Popular Decision cycle which would have a lower impact on performance since theAMC would be executed only once per cycle but the content would take longer to be available to the users, degrading the user experience linearly with respect to the PDT’s interval delay.

Finally, the scheduling of all the Batch-Insert Timer’s intervals, between popularity decisions, takes into account what we call a “Processing Slack”, γ, at the very end with the goal of giving enough time for the T h y m e - I n f r a instance to finalize the insertion process to theLocal Popularity Cache before starting the next cycle. Thus, avoiding sce-narios where a specific item is about to be inserted into the cache and an overlapping Popularity Decision process determines that the exact same item is not yet cached and still considered popular and proceeds to ask the replica again for the data, resulting in an unnecessary duplicate request.

4.9.3.1 Determining the Top Most Popular Data to Offload to the Infrastructure In order to choose the most relevant items to offload to the infrastructure during every popularity decision cycle to be cached in the Local Popularity Cache for easier access, each T h y m e - I n f r a instance executes a custom filtering algorithm to establish a ranking order of popularity. Among all of the state-of-the-art projects developed in this area that were studied during the context of this dissertation (Chapter2), we decided to focus on approaches that were computationally simple and fast while still offering relevant

Batch-insert received items

(B-I RI)

B-I RI B-I RI

increment execution counter choose most popular items send download message to chosen items’ replicas

B-I RI B-I RI

B-I RI

γ γ

Popular Decision Timer Batch-Insert Timer γ = Processing Slack

Figure 4.17: Timer definition for the process of retrieving local popular items to cache in the infrastructure.

results. The reasoning behind this is due to the fact that this process will need to be executed periodically and we do not wish to overwhelm the infrastructure node with complex computations. Furthermore, due to the periodicity of this decision process, we are able to use simpler algorithms, which would tend to have lower accuracy as a tradeoff, since we are expecting to eventually converge to the best items. Thus, considering its characteristics and positive results, we implemented a simplified approach of [31] by eradicating the authors’ use of computationally intensive file clustering algorithms and incorporating T h y m e - I n f r a-specific features and steps. Our vision for the popularity decision algorithm is composed of the following stages:

1. We start by splitting the items from thelocalPopularityInformation table into two distinct groups: active files (files that have been accessed/downloaded recently) and stale files (files not accessed in the most recent time period). In [31], the authors use the exact time, right down to the milliseconds, of when an item was downloaded in order to check whether it is within the active files’ time window. In T h y m e - I n f r a, we use a coarser granularity approach to time, in order to achieve a better (lower) space complexity during the execution of this process: whenever a T h y m e - I n f r a node receives a cell data message, it will set the current RLPIC counter value to the items that were marked as downloaded in the message as their latest download timestamp. Therefore, the infrastructure node will consider active files as the ones that have been accessed in the past activewindow(value configured by the developed - see Section4.9.9) popular decision cycles, thus passing the following condition:

0 ≤ RLP IC − itemdownloadTsactivewindow

where itemdownloadTsis the last RLPIC counter associated with the item;

2. Secondly, we take the set of all active files, divide them by world and within each world’s container we will sort the items further by popularity;

Figure 4.18: Caffeine’s Window TinyLFU scheme. Adapted from [52].

3. Then, for each world container we calculate the sum of the popularity of all the inner items while also calculating the total amount of downloads throughout every world. With these values obtained, we will then compute the weight of each world based on its contribution to the total amount of downloads executed within theBSS;

4. Finally, we will use the upper bound of the number of items that should be consid-ered popular during each cycle (Section 4.9.9), and then choose the top N items from each world, with regards to their calculated weights, until that limit is reached.

By the end of the execution of this algorithm, T h y m e - I n f r a will possess the identi-fiers of the most popular items from this cycle and is now ready to communicate with its local replicas to download and offload them to the infrastructure. Further iterations of this algorithm will skip items that are considered popular but are already cached within theLocal Popularity Cache.