• No results found

Cache-Oblivious Model

7. Bender, M., Cole, R., Demaine, E., Farach-Colton, M.: Scan-

ning and traversing: Maintaining data for traversals in a mem- ory hierarchy. In: Proc. 10th Annual European Symposium on Algorithms. LNCS, vol. 2461, pp. 139–151. Springer, Berlin (2002)

8. Bender, M., Cole, R., Raman, R.: Exponential structures for cache-oblivious algorithms. In: Proc. 29th International Col- loquium on Automata, Languages, and Programming. LNCS, vol. 2380, pp. 195–207. Springer, Berlin (2002)

9. Bender, M.A., Brodal, G.S., Fagerberg, R., Ge, D., He, S., Hu, H., Iacono, J., Lopez-Ortiz, A.: The cost of cache-oblivious search- ing. In: Proc. 44th Annual IEEE Symposium on Foundations of Computer Science, pp. 271–282. IEEE Computer Society Press, Los Alamitos (2003)

10. Bender, M.A., Demaine, E.D., Farach-Colton, M.: Cache-oblivi- ous B-trees. SIAM J. Comput.35(2), 341–358 (2005). Confer- ence version appeared at FOCS (2000)

11. Bender, M.A., Duan, Z., Iacono, J., Wu, J.: A locality-preserving cache-oblivious dynamic dictionary. J. Algorithms53(2), 115– 136 (2004). Conference version appeared at SODA (2002) 12. Bender, M.A., Farach-Colton, M., Fineman, J.T., Fogel, Y.R., Kusz-

maul, B.C., Nelson, J.: Cache-oblivious streaming B-trees. In: Proc. 19th Annual ACM Symposium on Parallel Algorithms and Architectures, pp. 81–92. ACM, New York (2007)

13. Bender, M.A., Farach-Colton, M., Kuszmaul, B.C.: Cache-oblivi- ous string B-trees. In: Proc. 25th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 233–242. ACM, New York (2006)

14. Bender, M.A., Fineman, J.T., Gilbert, S., Kuszmaul, B.C.: Concur- rent cache-oblivious B-trees. In: Proc. 17th Annual ACM Sym- posium on Parallel Algorithms, pp. 228–237. ACM, New York (2005)

15. Brodal, G.S., Fagerberg, R.: Cache-oblivious string dictionar- ies. In: SODA: ACM-SIAM Symposium on Discrete Algorithms, pp. 581–590. ACM Press, New York (2006)

16. Brodal, G.S., Fagerberg, R., Jacob, R.: Cache-oblivious search trees via binary trees of small height. In: Proc. 13th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 39–48 ACM, New York (2002)

17. Franceschini, G., Grossi, R.: Optimal worst-case operations for implicit cache-oblivious search trees. In: Proc. Algorithms and Data Structures, 8th International Workshop, WADS. LNCS, vol. 2748, pp. 114–126. Springer, Berlin (2003)

18. Frigo, M., Leiserson, C.E., Prokop, H., Ramachandran, S.: Cache- oblivious algorithms. In: 40th Annual IEEE Symposium on Foundations of Computer Science, pp. 285–298. IEEE Com- puter Society Press, Los Alamitos (1999)

19. Itai, A., Konheim, A.G., Rodeh, M.: A sparse table implementa- tion of priority queues. In: Automata, Languages and Program- ming, 8th Colloquium. LNCS, vol. 115, pp. 417–431. Springer, Berlin (1981)

20. Ladner, R.E., Fortna, R., B.-Nguyen, H.: A comparison of cache aware and cache oblivious static search trees using pro- gram instrumentation. In: Experimental Algorithmics. LNCS, vol. 2547, pp. 78–92. Springer, Berlin (2000)

21. Prokop, H.: Cache-oblivious algorithms. Master’s thesis, Mas- sachusetts Institute of Technology (1999)

22. Rahman, N., Cole, R., Raman, R.: Optimised predecessor data structures for internal memory. In: Proc. Algorithm Engineer- ing, 5th International Workshop, WAE. LNCS, vol. 2141, pp. 67– 78. Springer, Berlin (2001)

Cache-Oblivious Model

1999; Frigo, Leiserson, Prokop, Ramachandran ROLFFAGERBERG

Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark

Model Definition

The memory system of contemporary computers consists of a hierarchy of memory levels, with each level acting as a cache for the next; a typical hierarchy may consist of registers, level 1 cache, level 2 cache, level 3 cache, main memory, and disk (Fig.1). One characteristic of the hier- archy is that the memory levels get larger and slower the further they get from the processor, with the access time increasing most dramatically between RAM memory and disk. Another characteristic is that data is moved between levels in blocks.

As a consequence of the differences in access time be- tween the levels, the cost of a memory access depends highly on what is the current lowest memory level hold- ing the element accessed. Hence, the memory access pat- tern of an algorithm has a major influence on its practi- cal running time. Unfortunately, the RAM model (Fig.2) traditionally used to design and analyze algorithms is not

Cache-Oblivious Model, Figure 1 The memory hierarchy

Cache-Oblivious Model, Figure 2 The RAM model

124

C

Cache-Oblivious Model

Cache-Oblivious Model, Figure 3 The I/O-model

capable of capturing this, as it assumes that all memory ac- cesses take equal time.

To better account for the effects of the memory hier- archy, a number of computational models have been pro- posed. The simplest and most successful is the two-level I/O-model introduced by Aggarwal and Vitter [2] (Fig.3). In this model a two-level memory hierarchy is assumed, consisting of a fast memory of sizeMand a slower mem- ory of infinite size, with data transferred between the lev- els in blocks ofBconsecutive elements. Computation can only be performed on data in the fast memory, and algo- rithms are assumed to have complete control over trans- fers of blocks between the two levels. Such a block trans- fer is denoted a memory transfer. The complexity mea- sure is the number of memory transfers performed. The strength of the I/O-model is that it captures part of the memory hierarchy, while being sufficiently simple to make design and analysis of algorithms feasible. Over the last two decades, a large body of results for the I/O-model has been produced, covering most areas of algorithmics. For an overview, see the surveys [3,24,26,27].

More elaborate models of multi-level memory have been proposed (see e. g. [26] for an overview) but these models have been less successful than the I/O-model, mainly because of their complexity which makes analy- sis of algorithms harder. All these models, including the I/O-model, assume that the characteristics of the memory hierarchy (the level and block sizes) are known.

In 1999 thecache-oblivious model(Fig.4) was intro- duced by Frigo et al. [22]. A cache-oblivious algorithm is an algorithm formulated in the RAM model but analyzed in the I/O-model, with the analysis required to hold for

anyblock sizeBand memory sizeM. Memory transfers are assumed to take place automatically by an optimal off- line cache replacement strategy.

The crux of the cache-oblivious model is that because the I/O-model analysis holds for any block and memory

Cache-Oblivious Model, Figure 4 The cache-oblivious model

size, it holds for all levels of a multi-level memory hier- archy (see [22,25] for detailed versions of this statement). Put differently, by optimizing an algorithm to one un- known level of the memory hierarchy, it is optimized to all levels simultaneously. Thus, the cache-oblivious model el- egantly generalizes the I/O-model to a multi-level memory model by one simple measure: the algorithm is not allowed to know the value ofBandM. The challenge, of course, is to develop algorithms having good memory transfer ana- lyzes under these conditions.

Besides capturing the entire memory hierarchy in a conceptually simple way, the cache-oblivious model has other benefits: Algorithms developed in the model do not rely on knowing the parameters of the memory hierarchy, which is an asset when developing programs to be run on diverse or unknown architectures (e. g. software libraries or programs for heterogeneous distributed computing such as grid computing and projects like SETI@home). Even on a single, known architecture, the memory param- eters available to a computational process may be non- constant if several processes compete for the same mem- ory resources. Since cache-oblivious algorithms are opti- mized for all parameter values, they have the potential to adapt more gracefully to these changes. Also, the same code will adapt to varying input sizes forcing different memory levels to be in use. Finally, cache-oblivious algo- rithms automatically are optimizing the use of translation lookaside buffers (a cache holding recently accessed parts of the page table used for virtual memory) of the CPU, which may be seen as a second memory hierarchy paral- lel to the one mentioned in the introduction.

Possible weak points of the cache-oblivious model are the assumption of optimal off-line cache replacement, and the lack of modeling of the limited associativity of many of the levels of the hierarchy. The first point is mitigated by

Cache-Oblivious Model

C

125 the fact that normally, the provided analysis of a proposed

cache-oblivious algorithm will work just as well assuming a Least-Recently-Used cache replacement policy, which is closer to actual replacement strategies of computers. The second point is also a weak point of most other memory models.

Key Results

This section surveys a number of the known results in the cache-oblivious model. Other surveys available in- clude [5,14,20,24].

First of all, note that scanning an array ofNelements takesO(N/B) memory transfers for any values ofBandM, and hence is an optimal cache-oblivious algorithm. Thus, standard RAM algorithms based on scanning may already possess good analysis in the cache-oblivious model – for instance, the classic deterministic selection algorithm has complexityO(N/B) [20].

For sorting, a fundamental fact in the I/O-model is that comparison-based sorting ofNelements takes (Sort(N)) memory transfers [2], where Sort(N) = NB logM/B NM. Also

in the cache-oblivious model, sorting can be carried out in (Sort(N)) memory transfer, if one makes the so-called

tall cacheassumptionMB1+"[15,22]. Such an assump- tion has been shown to be necessary [16], which proves a separation in power between cache-oblivious algorithms and algorithms in the I/O-model (where this assumption is not needed for the sorting bound).

For searching,B-trees have costO(logBN), which is optimal in the I/O-model for comparison-based searching. This cost is also attainable in the cache-oblivious model, as shown for the static case in [25] and for the dynamic case in [13]. A number of later variants of cache-oblivious search trees have appeared. Also for searching, a separa- tion between cache-oblivious algorithms and algorithms in the I/O-model has been shown [12] in the sense that the constants attainable in theO(logBN) bound are provably different.

Permuting in the I/O-model has complexity (minf

Sort(N);Ng), assuming that elements are indivisible [2]. It has been proven [16] that this asymptotic complexity can- not be attained in the cache-oblivious model, hence also for this problem, a separation exists.

Cache-oblivious priority queues supporting opera- tions inO(1/BlogM/BN/M) memory transfers amortized have been given.

Currently known cache-oblivious algorithms also in- clude algorithms for problems in computational geom- etry [1,6,7,8,10,15], for graph problems [4,17,18,23], for scanning dynamic sets [9], for layout of static trees [11],

for search problems on multi-sets [21], for dynamic pro- gramming [19], for partial persistence [10], for matrix op- erations [22], and for the Fast Fourier Transform [22]. Applications

The cache-oblivious model is a means for design and anal- ysis of algorithms that use the memory hierarchy of com- puters efficiently.

Experimental Results

Cache-oblivious algorithms have been evaluated empiri- cally in a number of areas, including sorting, searching, matrix algorithms [22], and dynamic programming [19]. The overall conclusion of these investigations is that cache-oblivious methods often outperform RAM algo- rithms, but not always exactly as much as do algorithms tuned to the specific memory hierarchy and problem size. On the other hand, cache-oblivious algorithms seem to perform well on all levels of the memory hierarchy, and to be more robust to changing problem sizes.

Cross References Cache-Oblivious B-Tree

Cache-Oblivious Sorting

I/O-model

Recommended Reading

1. Agarwal, P.K., Arge, L., Danner, A., Holland-Minkley, B.: Cache- oblivious data structures for orthogonal range searching. In: Proc. 19th ACM Symposium on Computational Geometry, pp. 237–245. ACM, New York (2003)

2. Aggarwal, A., Vitter, J.S.: The Input/Output complexity of sort- ing and related problems. Commun. ACM31(9), 1116–1127 (1988)

3. Arge, L.: External memory data structures. In: Abello, J., Parda- los, P.M., Resende, M.G.C. (eds.) Handbook of Massive Data Sets, pp. 313–358. Kluwer Academic Publishers, Boston (2002) 4. Arge, L., Bender, M.A., Demaine, E.D., Holland-Minkley, B., Munro, J.I.: Cache-oblivious priority queue and graph algo- rithm applications. In: Proc. 34th Annual ACM Symposium on Theory of Computing, pp. 268–276. ACM, New York (2002) 5. Arge, L., Brodal, G.S., Fagerberg, R.: Cache-oblivious data struc-

tures. In: Mehta, D., Sahni, S. (eds.) Handbook on Data Struc- tures and Applications. CRC Press, Boca Raton (2005) 6. Arge, L., Brodal, G.S., Fagerberg, R., Laustsen, M.: Cache-oblivi-

ous planar orthogonal range searching and counting. In: Proc. 21st Annual ACM Symposium on Computational Geometry, pp. 160–169. ACM, New York (2005)

7. Arge, L., de Berg, M., Haverkort, H.J.: Cache-oblivious R-trees. In: Symposium on Computational Geometry, pp. 170–179. ACM, New York (2005)

126

C

Cache-Oblivious Sorting

8. Arge, L., Zeh, N.: Simple and semi-dynamic structures for cache-oblivious planar orthogonal range searching. In: Sym- posium on Computational Geometry, pp. 158–166. ACM, New York (2006)

9. Bender, M., Cole, R., Demaine, E., Farach-Colton, M.: Scan- ning and traversing: Maintaining data for traversals in a mem- ory hierarchy. In: Proc. 10th Annual European Symposium on Algorithms. LNCS, vol. 2461, pp. 139–151. Springer, Berlin (2002)

10. Bender, M., Cole, R., Raman, R.: Exponential structures for cache-oblivious algorithms. In: Proc. 29th International Col- loquium on Automata, Languages, and Programming. LNCS, vol. 2380, pp. 195–207. Springer, Berlin (2002)

11. Bender, M., Demaine, E., Farach-Colton, M.: Efficient tree layout in a multilevel memory hierarchy. In: Proc. 10th Annual Euro- pean Symposium on Algorithms. LNCS, vol. 2461, pp. 165–173. Springer, Berlin (2002). Full version athttp://arxiv.org/abs/cs/ 0211010

12. Bender, M.A., Brodal, G.S., Fagerberg, R., Ge, D., He, S., Hu, H., Iacono, J., López-Ortiz, A.: The cost of cache-oblivious search- ing. In: Proc. 44th Annual IEEE Symposium on Foundations of Computer Science, pp. 271–282. IEEE Computer Society Press, Los Alamitos (2003)

13. Bender, M.A., Demaine, E.D., Farach-Colton, M.: Cache-obliv- iousB-trees. In: 41st Annual Symposium on Foundations of Computer Science, pp. 399–409. IEEE Computer Society Press, Los Alamitos (2000)

14. Brodal, G.S.: Cache-oblivious algorithms and data structures. In: Proc. 9th Scandinavian Workshop on Algorithm Theory. LNCS, vol. 3111, pp. 3–13. Springer, Berlin (2004)

15. Brodal, G.S., Fagerberg, R.: Cache oblivious distribution sweep- ing. In: Proc. 29th International Colloquium on Automata, Languages, and Programming. LNCS, vol. 2380, pp. 426–438. Springer, Berlin (2002)

16. Brodal, G.S., Fagerberg, R.: On the limits of cache-oblivious- ness. In: Proc. 35th Annual ACM Symposium on Theory of Com- puting, pp. 307–315. ACM, New York (2003)

17. Brodal, G.S., Fagerberg, R., Meyer, U., Zeh, N.: Cache-oblivi- ous data structures and algorithms for undirected breadth- first search and shortest paths. In: Proc. 9th Scandinavian Workshop on Algorithm Theory. LNCS, vol. 3111, pp. 480–492. Springer, Berlin (2004)

18. Chowdhury, R.A., Ramachandran, V.: Cache-oblivious shortest paths in graphs using buffer heap. In: Proc. 16th Annual ACM Symposium on Parallelism in Algorithms and Architectures. ACM, New York (2004)

19. Chowdhury, R.A., Ramachandran, V.: Cache-oblivious dynamic programming. In: Proc. 17th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 591–600. ACM-SIAM, New York (2006) 20. Demaine, E.D.: Cache-oblivious algorithms and data struc- tures. In: Proc. EFF summer school on massive data sets, LNCS. Springer, Berlin. To appear. Online version athttp://theory. csail.mit.edu/edemaine/papers/BRICS2002/

21. Farzan, A., Ferragina, P., Franceschini, G., Munro, J.I.: Cache- oblivious comparison-based algorithms on multisets. In: Proc. 13th Annual European Symposium on Algorithms. LNCS, vol. 3669, pp. 305–316. Springer, Berlin (2005)

22. Frigo, M., Leiserson, C.E., Prokop, H., Ramachandran, S.: Cache oblivious algorithms. In: 40th Annual IEEE Symposium on Foundations of Computer Science, pp. 285–298. IEEE Com- puter Society Press, Los Alamitos (1999)

23. Jampala, H., Zeh, N.: Cache-oblivious planar shortest paths. In: Proc. 32nd International Colloquium on Automata, Languages, and Programming. LNCS, vol. 3580, pp. 563–575. Springer, Berlin (2005)

24. Meyer, U., Sanders, P., Sibeyn, J.F. (eds.): Algorithms for Mem- ory Hierarchies. LNCS, vol. 2625. Springer, Berlin (2003) 25. Prokop, H.: Cache-oblivious algorithms. Master’s thesis, Mas-

sachusetts Institute of Technology, Dept. of Electrical Engi- neering and Computer Science (1999)

26. Vitter, J.S.: External memory algorithms and data structures: Dealing with MASSIVE data. ACM Comput. Surv.33(2), 209– 271 (2001)

27. Vitter, J.S.: Geometric and spatial data structures in external memory. In: Mehta, D., Sahni, S. (eds.) Handbook on Data Struc- tures and Applications. CRC Press, Boca Raton (2005)

Outline

Related documents