12.6. Cache Memory

(1)

12.6. Cache Memory

(also refer PPT)

Analysis of a large number of typical programs has shown that the references to memory at any given interval of time tend to be confined with a few localized areas in memory. This phenomenon is known as the property of locality of reference.

If the active portions of the program and data are placed in a fast memory, the average memory access time can be reduced, thus reducing the total execution time of the program. Such a fast memory is referred to as a cache memory.

It is placed between CPU and main memory. The basic operation of the cache is as follows. When the CPU needs to access memory, the cache is examined. If the word is found in the cache, it is read from the memory. If the word is not found, it will be fetched from the main memory.

The performance of cache memory is frequently measured in terms of a quantity called hit ratio. When the CPU refers to memory and finds the word in cache, it is said to produce a hit. If the word is not found in cache, it is in main memory and it counts as a miss.

The transformation of data from main memory to cache memory is referred to as a mapping process. Three types of mapping procedures are in organization of cache memory (as follows).

12.6.1. Associative Mapping

Cache memory is one of the primary uses for associative memory. When performing memory access, we must be able to instantly determine whether a copy of a given main memory addresses is in the cache, or the cache will be useless.

An associative cache stores both the main memory address and a copy of its content in each word. Hence, if main memory addresses are 32 bits, and each address contains one byte, then each word of the associative cache must be 40 bits wide.

The key register in an associative cache is permanently fixed to mask off the data, so that only the address part of each word is compared to the argument. This allows the cache to immediately determine whether a given main memory address is currently cached.

(2)

12.6.2. Direct Mapping

Since associative memory is expensive, there is motivation to find ways of implementing a cache with normal RAM. One method is using direct mapping.

In direct mapping, the main memory address is divided into a tag (high bits) and index (low bits). The index is used as the address for the cache RAM.

Figure 12-12 depicts a direct mapping for the 32k x 12 memory from the previous example.

Note that a given main memory address is always cached at the same address in the cache RAM.

Also, we cannot cache two addresses from main memory that have the same index, since they would collide at the index address in the cache. This tends not to be a problem, though, since main memory addresses with the same index are far from each other, and variables in a subprogram that are likely to be cached together are usually grouped together in memory. For example, with a 9-bit index, addresses with the same index would be 29_{= 512 addresses away}

(3)

12.6.3. Writing to Cache

An important aspect of cache organization is concerned with memory write requests.

12.6.3.1. Write-through

In write-through, when data is written to memory, a copy is stored in both the cache and the main memory. This method ensures that the cache contents are always consistent with main memory. The drawback is that it may cause delays in program execution. The CPU need not always wait for main memory to complete a write operation before executing the next instruction, but it will have to wait before it can access memory again.

12.6.3.2. Write-back

In write-back, data are initially only written to the cache. This is also known as a lazy write. Each location written is then tagged as dirty. When the data at this location are evicted from the cache, they are then written back to main memory. This system improves write performance significantly, but also adds complexity to the cache hardware, which must now keep track of which locations are dirty or clean.

12.6.4 Cache Initialization

Cache initializing steps:

If the cache is empty, it can have a state and invalid data. 1) When power is applied to the computer

2) When main memory is loaded with a complete set of programs from auxiliary memory

valid bit

(4)

12.7. Virtual Memory

Virtual memory is a concept used in some large computer system that permits the user to construct programs as though a large memory space were available. Each address that is referenced by the CPU goes through an address mapping from the so-called virtual address to a physical address in main memory.

A virtual memory system provides a mechanism for translating program-generated addresses into correct main memory locations. An address used by a programmer will be called a virtual address, and the set of such addresses the address space. An address in memory is called a location or physical address. The set of such locations is called the memory space.

12.7.1. Address Translation

The instruction codes of memory-reference instructions contain virtual addresses, not physical addresses. Virtual addresses must be translated to physical addresses by hardware in order to read or write data in physical memory.

In order to translate virtual addresses to physical addresses, we must associate each page with a frame, and have an efficient way to convert a page number to a frame number. One way is to use an array where the page number is used as an index to the array, and each element of the array contains a frame number and an indicator to tell whether the page is in RAM or not. Such an array is called a page table.

The page table can be stored in main memory (in an area not used for virtual memory!) or in a separate memory unit. In either case, translating the virtual address to a physical address requires an added memory access, so virtual memory systems have a performance cost compared with real memory access.

One advantage of using a separate memory unit is that it would be much smaller, and therefore faster than main memory. If the page size is 4k, then the page table will be 1/4096 the size of virtual memory, and probably about 1/1000 the size of main memory. By using a smaller, higher-quality (possibly static) RAM for the page table, the cost of address translation can be greatly reduced.

Suppose a virtual memory system uses 32-bit virtual addresses and 31-bit physical addresses, and a page size of 4k. We then have a 4 gigabyte virtual address space, and a 2 gigabyte RAM maximum (memory space). There are up to 220_{(1 binary million) pages and half a}

binary million frames in this system.

The page table would consist of 220_{32-bit words, each containing a 19-bit frame number and}

a presence bit, which indicates whether or not the page is in RAM.

Page Frame P 00000 7f0fa 1 00001 78787 0 00002 59300 1 ... fffff dfd99 0

(5)

Each virtual address requested by the CPU is divided into a 20-bit page number and a 12-bit line number (byte within the 4k page).

Virtual address +---+ | page number | line | +---+ 31 12 11 0

The page number is used as an index to the page table. If the presence bit is one for this entry, the page number is replaced by the 19-bit frame number to determine the physical memory location corresponding to the virtual address.

Physical address +---+ | frame number | line | +---+ 30 12 11 0

If the presence bit is 0, a page fault has occurred, and the operating system performs a swap. This loads the page into RAM, and updates the page table.

In theory, the page table could indicate the location of a page in swap when the presence bit is zero. In practice, the operating system can use a more sophisticated approach for allocating and managing swap space.

Note that when a page is not present in RAM, the corresponding entry in the page table is unused. On systems where virtual memory is much larger than physical memory, this can mean that most of the page table is unused at any given moment, which is not an efficient use of memory.

(6)

An associative page table can alleviate this problem, since it only requires an entry for each frame, not each page.

+---+

| page | line| Argument +---+ +---+ | 1..1 | 0..0 | Key +---+ +---+ | | | | | | | | | ... | | | | | | | | | | | | +---+ page frame

Since associative memory is much more expensive than ordinary RAM, this solution may or may not be more cost effective. It depends on the ratio of virtual address space to physical memory space.

12.7.2. Page Replacement

If a page fault occurs and RAM is already full, the operating system must decide which page to swap out in order to make room for the page to be swapped in.

This decision can have a critical impact on performance. If the page swapped out is addressed shortly afterward, this will generate another costly page fault.

The ideal algorithm for page replacement is to swap out the page that will not be needed for the longest time. Unfortunately, this algorithm requires knowledge of the future, which is not currently available.

A very simple algorithm is FIFO (first-in, first-out). This algorithm simply swaps out the page that has been in memory the longest. It has the advantage of being easy to implement. Unfortunately, it lacks intelligence, since there is no relation between when a page is swapped in and when it will be accessed next.

Another fairly simple policy is LRU (least recently used). This algorithm keeps track of the last access time to a page in RAM. The one with the oldest time stamp is then swapped out, on the assumption that the program has stopped using it. There is no guarantee that it won't be the next page addressed, but statistically, it works much better than FIFO.

(7)

12.7. Memory Management Hardware

A memory management system is a collection of hardware and software procedures for managing the various programs residing in memory.

The memory management software is part of an overall operating system available in many computers.

The basic components of a memory management unit are:

1. A facility for dynamic storage location that maps logical memory references into physical memory addresses.

2. A provision for sharing common programs stored in memory by different users.

3. Protection of information against unauthorized access between users and preventing users from changing operating system functions.

A segment is a set of logically related instructions or data elements associated with a given name. (Dividing programs and data into logical parts called segments)

The address generated by a segmented program is called a logical address. This is similar to a virtual address except that logical address space is associated with variable-length segments rather than fixed-length pages.

The logical address is partitioned into three fields. The segment field specifies a segment number. The page field specifies the page within the segment and the word field gives the specific word within the page.