Bruce Jacob
1.6.2 Caching the Process Address Space
A process operates in its own little world; this is thevirtual machineparadigm, illustrated in Fig. 1.19. Each running process generates addresses for loads and stores as if it has the entire machine to itself—as if the computer offers an extremely large amount of memory and no other processes are executing or consuming resources. This makes the job of the programmer and compiler much easier, because no details of the hardware or memory organization are necessary to build a program.
The operating system divides the process address space into equal-sized portions for ease of manage- ment; these divisions are calledvirtual pages. A page is usually a multiple of the unit of transfer that hard disks use, and in most operating systems ranges from several kilobytes to several dozen kilobytes. A page is never fragmented; if any data in a virtual page are in physical memory then all the data in that page are, and if any of the data in a virtual page are nonexistent or being held on disk then all the data are. When the wordpageis used in a verb form, it means to allow a section of memory to be virtual—to
Reality:
Process A’s view of the world:
Instructions Data Stack
A’s address space
Physical memory:
B’s address space C’s address space D’s address space
FIGURE 1.19 The virtual machine paradigm. A process operates in its own virtual environment, unaware that other processes are executing and contending for the same limited resources. The operating system views each process address space as a collection of pages that can be cached in physical memory, or left in backing store.
allow it to move freely between physical memory and disk. This allows the physical memory to be used more efficiently: When a region of memory has not been used recently, the space can be freed up for more active pages, and pages that have been migrated to disk are brought back in as soon as they are needed again.
How is this done? The ultimate home for the process’s address space isbacking store, usually a disk drive; this is where the process’s instructions and data come from and where all of its permanent changes go to. Every hardware memory structure between the CPU and the backing store is a cache for the instructions and data in the process’s address space. This includes main memory—main memory is really nothing more than a cache for a process’s virtual address space. A cache operates on the principle that a small, fast storage device can hold the most important data found on a larger, slower storage device, effectively making the slower device look fast. The large storage area in this case is the process address space, which can be many gigabytes in size. Everything in the address space initially comes from the program file stored on disk or is created on demand and defined to be zero. Figure 1.20 illustrates:
CPU:
Address space Ideal physical model
L1/L2 /etc. Cache hierarchy
CPU: Backing store Reality Main memory Dynamically allocated data space Virtual memory system L1/L2 /etc. Cache hierarchy
FIGURE 1.20 Caching the process address space. In the first view, a process is shown referencing locations in its address space. Note that all loads, stores, and fetches use virtual names for objects, and many of the requests can be satisfied by a cache hierarchy. The second view shows that the address space is not a linear object stored on some device, but is instead scattered across hard drives and dynamically allocated when necessary.
There really is no linear array of data that houses the process address space. Its illusion is actually manufactured by the operating system through the virtual memory mechanism.
When a program first begins executing, the operating system copies a small portion of the process address space from the program file stored on disk into main memory. This typically includes the first page of instructions in the program and possibly a small amount of data that the program needs at startup. Then, as more instructions or data are needed, the operating system brings in pages from the process’s address on demand. This process, calleddemand paging, is depicted in Fig. 1.21.
In step 1 of the figure, the operating system initializes a process address space and loads the first page of instructions into physical memory. The operating system then sets the hardware program counter to the first instruction in the program, which sets the process running. Assuming that one of the first few instructions references the initialized data area, the uninitialized data area, or the (so far nonexistent) stack, the operating system will have to bring in a page of data from the program file or create an uninitialized-data page or stack page and link it into the process address space. This is shown in steps 2 and 3 of the figure. When a process references an item in its address space that is not currently in physical memory, the reference causes apage fault, and the operating system loads the necessary pages from backing store into main memory. Clearly, the termdemand pagingrefers to the fact that pages are allocated or brought into physical memory on demand. StepNof the figure shows a process that has been executing for some time, as it has several pages of data in its stack area and several pages in its data area that were not there when the process began executing. All of these pages were dynamically allocated by the operating system as the process needed or asked for them.
As has been pointed out before, the process is unaware of the operating system activity that moves pages in and out of main memory on its behalf. It typically does not know whether or not any given page is memory-resident or where it is located if it is memory-resident. Figure 1.19 at the beginning of the section illustrates this by showing a process address space from two points of view. The first point of
Text: Data: Stack: Virtual space Physical space Virtual space Physical space Virtual space Physical space Virtual space Physical space Step 1 Step 2 Step 3 Step N
Text: Data: Stack: Text: Data: Stack: Text: Data: Stack:
FIGURE 1.21 Demand paging at process start-up. In step 1, the operating system loads the first page of the process’s instructions into physical memory, and sets the program counter to the first instruction in the program. This first instruction references a location in the process’s data area, so in step 2 the operating system brings the corresponding data page into physical memory. The next instruction references a location on the process’s stack, so in step 3 the operating system has allocated a stack page for the process and placed it into the process address space and main memory. Succeeding instructions reference more locations in the stack area, jump to instructions that lie outside of the initial page of instructions, and allocate extra data storage area on the heap. In stepN(many steps later), these pages have been brought into main memory.
view is from the process itself; in most operating systems a process sees its address space as a contiguous span of memory locations from minimum to maximum. Somewhere in the address space is the program’s instructions, or text; somewhere else is the program’s data. Most operating systems also create a stack area, a heap area, and possibly one or more dynamically loaded libraries containing system-supplied utilities such as input=output routines or networking functions. The advantage of the virtual machine paradigm is that these can be arranged in physical memory, which is most convenient, rather than having to fit things together like the pieces of a puzzle, as would be the case without address translation.
The second point of view in the figure is from the operating system. In reality, the process address space is not a large contiguous segment in physical memory but is partially cached by physical memory. Portions of the process address space are scattered about physical memory and are likely to be not contiguous at all. The process is unaware of where in the system any particular portion of its address space is being held; some portions can be on disk (for example, the portions of the program that have not been used yet), some can be in main memory, and some can be in hardware caches. The operating system maintains a map for each address space so that, for every virtual page in the address space, it can tell where in memory or on disk the page can be found. As the figure suggests, the virtual machine paradigm allows each process to behave as if it owns the entire machine; each process is protected from all others and does not even know that other processes exist—for example, a process cannot spoof the identity of another process, and the resource-management mechanisms implemented by the operating system to support the illusion that each process own all physical resources means that no process may dominate system resources. One of the many benefits of this organization is that it makes facilities such as multitasking very easy to implement, because process protection, resource sharing, and a clean division of process identity are provided as side effects of the virtual machine paradigm by definition.
The mapping information that tells the location of pages in memory or on disk is organized intopage tables, which are collections of page table entries(PTEs). Virtual addresses (shown in Fig. 1.22) are mapped at the granularity ofpages; at its simplest, virtual memory is then a mapping of virtual page numbers(VPNs) topage frame numbers (PFNs), shown in Fig. 1.23. ‘‘Frame’’ in this context means ‘‘slot’’—physical memory is divided into frames that hold pages. The page table holds one PTE for every mapped virtual page; an individual PTE indicates whether its virtual page is in memory, on disk, or not allocated yet. The logical PTE therefore contains the VPN and either the page’s location in memory (a PFN), or its location on disk (a disk block number). Depending on the organization, some of this information is redundant; actual implementations do not necessarily require both the VPN and the PFN. Later developments in virtual memory added such things as page-level protections; a modern PTE usually contains protection information as well, such as whether the page contains executable code, whether it can be modified, and if so by whom.
The mapping is a function; any virtual page can have only one location. However, the inverse map is not necessarily a function; it is possible and sometimes advantageous to have several virtual pages mapped to the same page frame (to share memory between processes or threads, or to allow different views of data with different protections, for example). Shared memory is one of the more commonly used features of page tables. It is a mechanism whereby two address spaces that are protected from each other are allowed to intersect at points, still retaining protection over the nonintersecting regions. Several processes sharing portions of their address spaces are pictured in Fig. 1.24. The shared memory mechanism only opens up a pre-defined portion of a process’s address space; the rest
Page offset Virtual page number (VPN)
12 bits 20 bits
FIGURE 1.22 Virtual addresses. A virtual address is divided into two components: the virtual page number and the page offset. The virtual page number identifies the page’s location within the address space. The page offset identifies a byte’s location within the page. Bit widths are shown for a 32-bit address and a 4 kbyte page size.
of the address space is still protected, and even the shared portion is only unprotected for those processes sharing the memory. For instance, in the figure, the region of A’s address space that is shared with process B is unprotected from whatever actions B might want to take, but it is safe from the actions of any other processes. Shared memory is therefore useful as a simple, secure means for
Virtual space Physical space 00007 00005 00003 00001 00006 00004 00002 00000 FFFFF FFFFD FFFFE FFFFC 00007 00005 00003 00001 00006 00004 00002 00000 0000F 0000D 0000B 00009 0000E 0000C 0000A 00008 Virtual page numbers Page frame numbers 4A007 4A005 4A003 4A001 4A006 4A004 4A002 4A000 4A00D 4A00B 4A009 4A00E 4A00C 4A00A 4A008
FIGURE 1.23 Page numbers (for 32-bit virtual addresses). Every page in an address space is given a virtual page number (VPN). Every page in physical memory is given a physical page number, called a page frame number (PFN).
Process D
Process A Process C
Process B
Shared by A & B Shared by
B & C Shared by C & D Shared by B & D Shared by B & C & D
FIGURE 1.24 Shared memory. Shared memory allows processes to overlap portions of their address space while retaining protection for the nonintersecting regions; this is a simple and effective method for inter-process communication. Pictured are four process address spaces that have overlapped. The darker regions are shared by more than one process, while the lightest regions are still protected from other processes.
inter-process communication. Shared memory also reduces requirements for physical memory; for example, in most operating systems, the text regions of processes are shared whenever multiple instances of a single program are run, or when multiple instances of a common library are used in different programs.
The mechanism works by ensuring that shared pages map to the same physical page; this is done by simply placing the same page frame number in the page tables of two processes sharing a page. A simple example is shown in Fig. 1.25. Here, two very small address spaces are shown overlapping at several places, and one address space overlaps with itself; two of its virtual pages map to the same physical page. This is not just a contrived example; many operating systems allow this, and it is useful, for example, in the implementation of user-level threads.