Programmation Systèmes Cours 6 Memory Mapping

(1)

Programmation Systèmes

Cours 6 — Memory Mapping

Stefano Zacchiroli

[email protected]

Laboratoire PPS, Université Paris Diderot 2013–2014

URL http://upsilon.cc/zack/teaching/1314/progsyst/

License Creative Commons Attribution-ShareAlike 3.0 Unported License

http://creativecommons.org/licenses/by-sa/3.0/

(2)

Outline

1 _{Memory mapping}

2 _{Anonymous mapping}

3 _{File mapping}

4 _mmapmemory management

(3)

Outline

1 _{Memory mapping}

3 _{File mapping}

(4)

Virtual memory — redux

Withvirtual memory management(VMM) the OS adds an indirection layer betweenvirtual memory pages andphysical memory frames. Theaddress spaceof a process is made of virtual memory pages, decoupling it from direct access to physical memory.

We have seen the main ingredients of VMM:

1 _{virtual pages}_{, that form processes’ virtual address space} 2 physical frames

3 _the_{page table}_{maps pages of the}_{resident set}_{to frames}

When a page p6∈resident set is accessed, VMM swap it in from disk, swapping out a (likely) unneeded page.

(5)

Virtual memory — redux (cont.)

TLPI, Figure 6-2

(6)

Backing store

Definition (datum, backing store)

In memory cache arrangements:

adatumis an entry of the memory we want to access, passing through the cache

thebacking storeis the (slow) memory where a datum can be retrieved from, in case it cannot be found in the (fast) cache, i.e. when acache misshappens

On UNIX VMM, the ultimate backing storeof virtual memory pages is usually the set of non-resident pages available ondisk(that have been swapped out in the past, or never swapped in).

(7)

Memory mapping

A UNIXmemory mappingis a virtual memory area that has anextra backing store layer, which points to anexternal page store.

Processes can manipulate their memory mappings: request new mappings, resize or delete existing ones, flush them to their backing store, etc.

Alternative intuition

memory mappings are dynamically allocated memory with peculiar read/write rules

(8)

Memory mapping types

Memory mappings can be of two different types, depending on the type of backing storethey rely upon.

1 _an_{anonymous mappings}_{maps a memory region to a fresh} “fake” memory area filled with 0

ñ _{backing store = zero-ed memory area}

2 afile mappingmaps a memory region to a file region ñ _{backing store = file}

ñ _{as long as the mapping is established, the content of the file can}

be read from or written to using direct memory access (“as if they were variables”)

(9)

Page sharing

Thanks to VMM, different processes canshare mapped pages. More precisely: mapped pages in different processes can refer to physical memory pages that have thesame backing store.

That can happen for 2 reasons:

1 _through_{fork, as}_{memory mappings are inherited}_{by children} ñ _{only between related processes}

2 when multiple processes map thesame region of a file ñ _{between any processes}

(10)

Shared vs private mappings

With mapped pages in common, the involved processes mightsee changes performed by others to common pages, depending on whether the mapping is:

private modifications are not visible to other processes. pages are initiallythe same, but modification are not shared, as it happens withcopy-on-write memory afterfork

private mappings are also known as copy-on-write mappings

shared modifications to common pages are visible to all involved processes

i.e. pages arenotcopied-on-write

(11)

Memory mapping zoo

Summarizing:

1 _{memory mappings can have “zero” blocks or files as backing} store

2 _{memory mappings can be private or shared}

4 different memory mapping flavors ensue:

visibility / store file mapping anon. mapping

private private file mapping private anon. mapping

shared shared file mapping shared anon. mapping

Each of them is useful for a range of differentuse cases.

(12)

mmap

Themmapsyscall is used toestablish memory mappingsin the address space of the calling process:

#include<sys/mman.h>

void_*mmap(void_{*addr, size_t length,}intprot,intflags,

int fd, off_t offset );

Returns:starting address of the mapping if OK,MAP_FAILEDon error Themapping specificationis given by:

length, that specifies the length of the desired mapping flags, that is a bit mask of flags that include

MAP_PRIVATE request aprivate mapping

MAP_SHARED request ashared mapping

MAP_ANONYMOUS request ananonymous mapping

ñ _{MAP_ANONYMOUS}_a_{anonymous mapping}

« fdmust be -1 for anonymous mappings

ñ _{exactly one of}_{MAP_PRIVATE}_,_{MAP_SHARED}_{must be specified} Stefano Zacchiroli (Paris Diderot) Memory mapping 2013–2014 12 / 56

(13)

mmap

(cont.)

addrgives anaddress hintabout where, in the process address space, the new mapping should be placed. It is just a hint and it is very seldomly used. To not provide one, passNULL.

For file mappings, the mappedfile regionis given by: fd: file descriptor pointing to the desiredbacking file offset: absolute offset pointing to thebeginningof the file regionthat should be mapped

ñ _{the end is given implicitly by length}

ñ _{to map the entire file, use}_{offset == 0}_and length == stat().st_size

(14)

mmap

(cont.)

The desiredmemory protectionfor the requested mapping must be given viaprot, which is a bitwise OR of:

PROT_READ pages may be read PROT_WRITE pages may be write PROT_EXEC pages may be executed

PROT_NONE pages may not be accessed at all eitherPROT_NONE or a combination of the others must be given.

(15)

mmap

(cont.)

PROT_NONE pages may not be accessed at all eitherPROT_NONE or a combination of the others must be given.

What canPROT_NONEbe used for?

(16)

mmap

(cont.)

PROT_NONE pages may not be accessed at all eitherPROT_NONE or a combination of the others must be given. PROT_NONE use case: putmemory fencesaround memory areas that we do not want to be trespassed inadvertently

(17)

Outline

1 _{Memory mapping}

3 _{File mapping}

(18)

Private anonymous mapping (#1)

Effects:

each request of a new private anonymous mapping gives afresh memory areathat shares no pages with other mappings

obtained memory isinitialized to zero

child processes will inherit private anonymous mappings, but copy-on-writewill ensure that changes remain process-local

ñ _{and that are minimized}

Use case

allocation of initialized memory, similar tocalloc

ñ _{in fact,}_malloc_{implementations often use}_mmap_{to allocate large} memory chunks

Note: on several UNIX-es, anonymous mappings (both private and shared) can alternatively be obtained byfile mapping/dev/zero.

(19)

mmap

allocation — example (library)

#include <errno . h>

#include <stdi o . h>

#include <sys /mman. h>

#include <unistd . h> #include " helpers . h " struct l i s t { i n t v a l ; struct _{l i s t *next ;} } ; s t a t i c struct _{l i s t * l i s t _ b o t ;} s t a t i c struct _{l i s t * l i s t _ t o p ;} s t a t i c long l i s t _ s i z ;

(20)

mmap

allocation — example (library) (cont.)

i n t l i s t _ i n i t (long len ) {

l i s t _ t o p = (struct _{l i s t * )}

mmap( NULL , len * sizeof(struct l i s t ) , PROT_READ | PROT_WRITE , MAP_PRIVATE | MAP_ANONYMOUS, −1, 0 ) ; i f ( l i s t _ t o p == MAP_FAILED ) return −1; l i s t _ b o t = l i s t _ t o p ; l i s t _ s i z = len ;

p r i n t f ( " l i s t _ i n i t : top=%p , len=%ld \n " , l i s t _ t o p , len ) ;

return 0;

}

(21)

mmap

struct _{l i s t * l i s t _ a l l o c 1 ( ) {} long s i z = l i s t _ t o p − l i s t _ b o t ; i f ( s i z >= l i s t _ s i z ) { errno = ENOMEM; return NULL ; } l i s t _ t o p−>next = NULL ; p r i n t f ( " a l l o c a t e d %p ( length : %ld ) \ n " , l i s t _ t o p , s i z + 1 ) ; return l i s t _ t o p ++; } struct _{l i s t * l i s t _ f r e e ( ) {} /* l e f t as an exercise */ return NULL ; }

(22)

mmap

struct _{l i s t * list_add (}struct _{l i s t * l ,} i n t v a l ) {

struct _{l i s t * e l t ;} i f ( ( e l t = l i s t _ a l l o c 1 ( ) ) == NULL ) return NULL ; e l t−>v a l = v a l ; e l t−>next = l ; return e l t ; }

void l i s t _ v i s i t (const char _{* label ,} struct _{l i s t * l ) {}

p r i n t f ( " [% s ] v i s i t l i s t : " , l a b e l ) ; while ( l != NULL ) { p r i n t f ( "%d " , l−>v a l ) ; l = l−>next ; } p r i n t f ( " \n " ) ; } /* mmap−_{l i s t . h */}

(23)

mmap

allocation — example

#include "mmap−l i s t . h " i n t main (void) { struct _{l i s t * l = NULL ;} pid_t pid ; i f ( l i s t _ i n i t (1000) < 0) er r_ sy s ( " l i s t _ i n i t e r r o r " ) ; i f ( ( l = l i s t _ a d d ( l , 42)) == NULL | | ( l = l i s t _ a d d ( l , 17)) == NULL | | ( l = l i s t _ a d d ( l , 13)) == NULL ) er r _ sy s ( " l i s t _ a d d " ) ; l i s t _ v i s i t ( "common" , l ) ;

i f ( ( pid = fork ( ) ) < 0) e rr_ s ys ( " fork e r r o r " ) ;

i f ( pid > 0) { _{/* parent */} l = l i s t _ a d d ( l , 7 ) ; l i s t _ v i s i t ( " parent " , l ) ; } else { _{/* child */} l = l i s t _ a d d ( l , 5 ) ; l i s t _ v i s i t ( " c h i l d " , l ) ; } e x i t ( EXIT_SUCCESS ) ; } _{/* mmap}−_{l i s t . c */}

(24)

mmap

allocation — example (cont.)

Demo

Notes:

proof of concept example ofad hoc memory management

ñ _{. . . the importance of being libc!}

copy-on-writeensures that processes do not see each other changes

virtual memoryaddresses are preservedthroughfork

(25)

Shared anonymous mapping (#2)

Effects:

as with private anonymous mappings: each request gives a fresh area of initialized memory

the key difference is that now pages arenot copied-on-write

ñ _{if the virtual pages become shared among multiple processes,}

theunderlying physical memory frames become sharedas well

ñ _{note: in this case, the only way for this to happen is via}_fork

inheritance

Use case

Interprocess communication ñ _{data-transfer}_{(not byte stream)} ñ _among_{related processes} ñ _with_{process persistence}

(26)

mmap-based IPC — example

#include <sys / wait . h>

#include <unistd . h>

#include " helpers . h "

i n t main (void) {

i n t _{*addr ;}

addr = mmap( NULL , sizeof(i n t) , PROT_READ | PROT_WRITE , MAP_SHARED | MAP_ANONYMOUS, −1, 0 ) ; i f ( addr == MAP_FAILED ) e rr _ s y s ( "mmap e r r o r " ) ; *addr = 42; switch ( fork ( ) ) { case −1: e rr _ s y s ( " fork e r r o r " ) ; break;

(27)

mmap-based IPC — example (cont.)

case 0: _{/* child */} p r i n t f ( " c h i l d : %d\n " , *addr ) ; ( * addr )++; break; default: _{/* parent */} i f ( wait ( NULL ) == −1) e r r _ sy s ( " wait e r r o r " ) ; p r i n t f ( " parent : %d\n " , *addr ) ; } e x i t ( EXIT_SUCCESS ) ; } /* End of mmap−_{i p c . c */}

(28)

mmap-based IPC — example (cont.)

Demo

Notes:

the mapping isinherited through fork

it is used as a data transfer facility from child to parent as with all shared memory solutions,synchronizationis mandatory

ñ _{in this case it is achieved via}_wait

(29)

Outline

1 _{Memory mapping}

3 _{File mapping}

(30)

mmap-ing files — example

#include < f c n t l . h>

#include <sys / s t a t . h>

i n t main (i n t argc , char _{** argv ) {} i n t fd ;

struct s t a t f i n f o ;

void _{* buf ;} i f ( argc != 2)

e r r _ q u i t ( " Usage : mmap−cat F I L E " ) ;

(31)

mmap-ing files — example (cont.)

i f ( ( fd = open ( argv [ 1 ] , O_RDONLY ) ) < 0) err _ sys ( " open e r r o r " ) ;

i f ( f s t a t ( fd , &f i n f o ) < 0) err _ sys ( " f s t a t e r r o r " ) ; buf = mmap( NULL , f i n f o . s t_ s iz e ,

PROT_READ, MAP_PRIVATE , fd , 0 ) ;

i f ( buf == MAP_FAILED )

err _ sys ( "mmap e r r o r " ) ;

i f ( w r i t e ( STDOUT_FILENO , buf , f i n f o . s t _ s i z e ) != f i n f o . s t _ s i z e )

err _ sys ( " w r i t e e r r o r " ) ; e x i t ( EXIT_SUCCESS ) ;

} _{/* mmap}−_{cat . c */}

(32)

mmap-ing files — example (cont.)

Demo

Notes:

bufis an ordinarymemory buffer

which, thanks tommap, simply end up having a file region as its backing store

(33)

File mappings

File mapping recipe

1 _{fd = open(/* .. */);}

2 _{addr = mmap(/* .. */, fd, /* .. */);}

Once the file mapping is established, access to the (mapped region of the) underlying file does not need to happen viafdanymore. However, the FD might still be useful for other actions that can still be performedonly via FDs:

changing file size file locking

fsync/fdatasync . . .

Thanks to file pervasiveness, we can use file mapping ondevice files e.g.: disk device files,/dev/mem, . . . (not all devices support it)

(34)

File mapping and memory layout

All kinds of memory mappings are placed in the large memory area inbetween the heap and stack segments.

The return value of mmappoints to the start of the created memory mapping (i.e. its lowest address) and the mapping grows above it (i.e. towards higher addresses).

For file mappings, the mappedfile regionstarts atoffsetand is lengthbytes long. The mapped region in memory will have the same size(modulo alignment issues. . . ).

(35)

File mapping and memory layout (cont.)

APUE, Figure 14.31

(36)

File mapping and memory protection

The requestedmemory protection(prot,flags) must be

compatiblewith thefile descriptor permissions(O_RDONLY, etc.).

FD mustalways be open for reading

ifPROT_WRITEandMAP_SHAREDare given, the file must beopen for writing

(37)

Private file mapping (#3)

Effects:

the content of the mapping isinitialized from fileby reading the corresponding file region

subsequent modifications are handled viacopy-on-write

ñ _{they are invisible to other processes} ñ _{they are not saved to the backing file}

Use cases

1 _{Initializing process segments}_{from the corresponding sections} of a binary executable

ñ _{initialization of the text segment from program instructions} ñ _{initialization of the data segment from binary data}

either way, we don’t want runtime changes to be saved to disk This use case is implemented in thedynamic loader/linkerand rarely needed in other programs.

(38)

Private file mapping (#3) (cont.)

Effects:

the content of the mapping isinitialized from fileby reading the corresponding file region

subsequent modifications are handled viacopy-on-write

ñ _{they are invisible to other processes} ñ _{they are not saved to the backing file}

Use cases

2 _{Simplifying program}_input_logic

ñ _{instead of a big loop at startup time to read input, one}_mmap_call and you’re done

ñ _{runtime changes won’t be saved back, though}

(39)

Private file mapping — example (segment init.)

A real life example can be found in: glibc’s ELF dynamic loading code.

Demo

(40)

Private file mapping — example (input logic)

#include <stdi o . h>

#include < s t d l i b . h>

#include < s t r i n g . h>

i n t main (i n t argc , char _{** argv ) {} i n t fd ;

struct s t a t f i n f o ;

void _{*fmap ;} char _{*match ;}

i f ( argc != 3)

e r r _ q u i t ( " Usage : grep−substring STRING F I L E " ) ;

i f ( ( fd = open ( argv [ 2 ] , O_RDONLY ) ) < 0) e rr _ s y s ( " open e r r o r " ) ;

i f ( f s t a t ( fd , &f i n f o ) < 0) e rr _ s y s ( " f s t a t e r r o r " ) ;

(41)

Private file mapping — example (input logic) (cont.)

/* " input " a l l f i l e at once */ fmap = mmap( NULL , f i n f o . s t _ s i z e ,

PROT_READ, MAP_PRIVATE , fd , 0 ) ; match = s t r s t r ( (char _{* ) fmap , argv [ 1 ] ) ;}

p r i n t f ( " s t r i n g%s found\n " ,

match == NULL ? " not " : " " ) ;

e x i t ( match == NULL ? EXIT_FAILURE : EXIT_SUCCESS ) ; } _{/* grep}−_{substring . c */}

(42)

Private file mapping — example (input logic) (cont.)

Demo

Notes:

example similar tommap-cat.c, but here we actually profit from the memory interface

thanks to thebyte array abstractionwe can easily look for a substring withstrstrwithout risking that our target gets split across multipleBUFSIZchunks

(43)

Shared file mapping (#4)

Effects:

1 processes mapping the same region of a fileshare memory

frames

ñ _{more precisely: they have virtual memory pages that map to the}

same physical memory frames

2 _{additionally, the involved frames have the same mapped file as} ultimate backing store

ñ _i.e._{modifications}_{to the (shared) frames are}_{saved to the} mapped fileon disk

Note: with shared file mappings you can’t have one effect (e.g. sharing memory) without the other (e.g. file system persistence).

(44)

Shared file mapping (#4) (cont.)

Process A page table PT entries for mapped region Process B page table PT entries for mapped region Mapped pages Physical memory Mapped region of file Open file I/ O managed by kernel TLPI, Figure 49-2

(45)

Shared file mapping (#4) (cont.)

Use cases

1 _{memory-mapped I/O}_{, as an alternative to}_read/write

ñ _{as in the case of private file mapping}

ñ _{here it works for}_{both reading and writing}_data

2 interprocess communication, with the following characteristics:

ñ _{data-transfer}

ñ _among_{unrelated processes} _{(no longer related)} ñ _with_{filesystem persistence} _{(no longer process)}

(46)

Memory-mapped I/O

Given that:

1 _{memory content is initialized from file} 2 _{changes to memory are reflected to file}

we can perform I/O by simply changing memory bytes.

Accessto file mappings is less intuitive than sequentialread/write operations

the mental model is that of working on your data as a huge byte array (which is what memory is, after all)

abest practiceto follow is that of definingstruct-s that correspond to elements stored in the mapping, and copy them around withmemcpy& co

(47)

Memory-mapped I/O — example

Consider again the need of distributingunique sequential

identifiers. We would like to implement it without a central server.

To that end, we will use memory-mapped I/O, and store the current “server” state to a file that ismmap-ed by “clients”.

(48)

Memory-mapped I/O — example (protocol)

#include < s t r i n g . h>

#include <time . h>

#define DB_FILE " counter . data "

#define MAGIC "42"

#define MAGIC_SIZ sizeof(MAGIC)

struct glob_id {

char magic [ 3 ] ; _{/* magic string "42\0" */}

time_t t s ; _{/* l a s t modification timestamp */}

long v a l ; _{/* global counter value */} } ;

(49)

Memory-mapped I/O — example (library)

i n t glob_id_verify_magic (i n t fd , struct glob_id * id ) {

i n t rc ;

rc = strncmp ( id−>magic , MAGIC, 3 ) ;

return rc ;

}

void glob_id_write (struct _{glob_id * id ,} long v a l ) { memcpy( id−>magic , MAGIC, MAGIC_SIZ ) ;

id−>t s = time ( NULL ) ; id−>v a l = v a l ;

}

/* mmap−uid−_{common. h */}

(50)

Memory-mapped I/O — example (DB init/reset)

#include "mmap−uid−common. h " i n t main (void) {

i n t fd ;

struct s t a t f i n f o ; struct _{glob_id * id ;}

i f ( ( fd = open ( DB_FILE , O_RDWR | O_CREAT | O_TRUNC, S_IRUSR | S_IWUSR ) ) < 0)

e r r _ s y s ( " open e r r o r " ) ;

i f ( f t r u n c a t e ( fd , sizeof (struct glob_id ) ) < 0) e r r _ s y s ( " f t r u n c a t e e r r o r " ) ;

i f ( f s t a t ( fd , &f i n f o ) < 0) err_s y s ( " f s t a t e r r o r " ) ; id = (struct _{glob_id * ) mmap( NULL, f i n f o . st_size ,}

PROT_READ | PROT_WRITE , MAP_SHARED, fd , 0 ) ; glob_id_write ( id , (long) 0 ) ;

e x i t ( EXIT_SUCCESS ) ; }

/* mmap−uid−_{i n i t . c */}

(51)

Memory-mapped I/O — example (client)

#include "mmap−uid−common. h " i n t main (void) {

i n t fd ;

struct s t a t f i n f o ; struct _{glob_id * id ;}

i f ( ( fd = open ( DB_FILE , O_RDWR ) ) < 0) e r r _s y s ( " open e r r o r " ) ;

i f ( f s t a t ( fd , &f i n f o ) < 0) err_sy s ( " f s t a t e r r o r " ) ; id = (struct _{glob_id * ) mmap( NULL, f i n f o . st_size ,}

PROT_READ | PROT_WRITE , MAP_SHARED, fd , 0 ) ; p r i n t f ( " checking magic number . . . \ n " ) ;

i f ( glob_id_verify_magic ( fd , id ) < 0) {

p r i n t f ( " i n v a l i d magic number : abort . \ n " ) ; e x i t ( EXIT_FAILURE ) ;

}

p r i n t f ( " got id : %ld \n " , id−>v a l ) ; glob_id_write ( id , id−>v a l + 1 ) ; e x i t ( EXIT_SUCCESS ) ;

} _{/* mmap}−uid−_{get . c */}

(52)

Memory-mapped I/O — example

Demo

Notes:

glob_idincludes a magic string for verification purposes file might be larger than you expected, due topaddingfor memory alignment reasons

wekeep the FDaround

ñ _{resize the file upon opening}

(no,lseekis not enough to change filesize) I/O logic is trivial(in fact: non-existent)

(53)

Memory-mapped I/O — example

Demo

Notes:

wekeep the FDaround

any bug?

(54)

Memory-mapped I/O — example

Demo

Notes:

wekeep the FDaround

serioussynchronization problemwhen concurrent clients try to get their identifiers

ñ _{come back next week. . .}

(55)

Memory-mapped I/O — advantages

performance gain: 1 memory copy

ñ _with_read/write_{I/O each action involves 2 memory copies:}

1 between user-space and kernel buffers + 1 between kernel buffers and the I/O device

ñ _with_{memory-mapped I/O}_{only the 2nd copy remains} ñ _{flash exercise: how many copies for standard I/O?}

performance gain: no context switch

ñ _{no syscall and no context switch is involved in accessing}

mapped memory

ñ _{page faults are possible, though}

reduced memory usage

ñ _{we avoid user-space buffers, using kernel-space buffers directly} →less memory needed

ñ _{if memory mapped region is shared, we use only one set of}

buffers for all processes seekingis simplified

ñ _{no need of explicit}_lseek_{, just pointer manipulation}

(56)

Memory-mapped I/O — advantages

mapped memory

(57)

Memory-mapped I/O — advantages

mapped memory

buffers for all processes

seekingis simplified

(58)

Memory-mapped I/O — advantages

mapped memory

(59)

Memory-mapped I/O — disadvantages

memory garbage

ñ _{the size of mapped regions is a multiple of system page size} ñ _{mapping regions which are way smaller than that can result in a}

significant waste of memory

memory mapping must fit in the process address space

ñ _{on 32 bits systems, a large number of mappings of various sizes}

might result inmemory fragmentation

ñ _{it then becomes harder to find continuous (virtual) space to}

grant large memory mappings

ñ _{the problem is substantially diminished on 64 bits systems}

there iskernel overheadin maintaining mappings

ñ _{for small mappings, the overhead can dominate the advantages} ñ _{memory mapped I/O is best used with}_{large files}_and_random

access

(60)

Memory-mapped I/O — disadvantages

memory garbage

access

(61)

Memory-mapped I/O — disadvantages

memory garbage

access

(62)

Outline

1 _{Memory mapping}

3 _{File mapping}

(63)

munmap

The converse action ofmmap—unmapping—is performed bymunmap:

int munmap(void_{*addr, size_t length);}

Returns: 0 if OK, -1 otherwise The memory area between the addressesaddrandaddr+length will be unmapped as a result of munmap. Accessing it after a successfulmunmapwill (very likely) result in asegmentation fault. Usually, an entire mapping is unmapped, e.g.:

i f ( ( addr = mmap( NULL , length , /* . . . */) ) < 0) e r r _ s y s ( "mmap e r r o r " ) ;

/* access memory mapped region via addr */

i f (munmap( addr , length ) < 0) e r r _ s y s ( "munmap e r r o r " ) ;

(64)

munmap

on region (sub-)multiples

. . . but munmapdoes not force to unmapentireregions:

1 _{unmapping a region that contains}_{no mapped page}_{will have}_no effectand return success

2 unmapping a region that spansseveral mappingswillunmap all contained mappings

ñ _{and ignore non mapped areas, as per previous point}

3 _{unmapping only}_{part of an existing mapping}_will

ñ _either_{reduce mapping size}_{, if the unmapped part is close to one}

edge of the existing mapping;

ñ _or_{split it in two}_{, if the unmapped part is in the middle of the}

existing mapping

« _{i.e. you can use}munmapto “punch holes” into existing mapping

(65)

Memory alignment

The notion ofmemory alignment, refers to the relation between memory addresses as used by programs and memory addresses as used by the hardware.

A variable that is located at a memory address which is amultiple of the variable sizeis said to benaturally aligned, e.g.:

a 32 bit (4 bytes) variable is naturally aligned if its located at an address which is a multiple of 4

a 64 bit (8 bytes) variable is naturally aligned if its located at an address which is a multiple of 8

All memory allocated “properly” (i.e. via the POSIX APIs through functions likemalloc,calloc,mmap, . . . ) is naturally aligned. Therisksof unaligned memory access depend on the hardware:

it might result in traps, and hence in signals that kill the process it might work fine, but incur in performance penalties

Portable applications should avoid unaligned memory access.

(66)

posix_memalign

The notion of alignment is more general than natural alignment. We might want to allocate memory aligned tochunks larger than the variable size.

POSIX.1d standardized theposix_memalignfunction to allocate heap memoryaligned to arbitrary boundaries:

#include<stdlib.h>

int posix_memalign(void_{**memptr, size_t alignment, size_t size);}

Returns: 0 if OK;EINVALorENOMEMon error

we request an allocation ofsizebytes. . .

. . . aligned to a memory address that is amultiple ofalignment

ñ _alignment_{must be a power of 2 and a multiple of}

sizeof(void *)

(67)

posix_memalign

(cont.)

The notion of alignment is more general than natural alignment. We might want to allocate memory aligned tochunks larger than the variable size.

POSIX.1d standardized theposix_memalignfunction to allocate heap memoryaligned to arbitrary boundaries:

#include<stdlib.h>

int posix_memalign(void_{**memptr, size_t alignment, size_t size);}

Returns: 0 if OK;EINVALorENOMEMon error

on success,memptrwill be filledwith a pointerto freshly allocated memory; otherwise the return value is eitherEINVAL (conditions onalignmentnot respected) orENOMEM(not enough memory to satisfy request)

ñ _{note: the above are}_{return values}_,_errno_is_not_set

allocated memory should be freed withfree

(68)

posix_memalign

— example

#include <s tdi o . h> #include < s t d l i b . h> #include < s t r i n g . h> #include <unistd . h> #include " helpers . h "

i n t main (i n t argc , char _{** argv ) {} i n t rc ;

void _*mem;

i f ( argc < 3) e r r _ q u i t ( " Usage : memalign SIZE ALIGNMENT" ) ; i f ( ( rc = posix_memalign(&mem,

a t o i ( argv [ 2 ] ) , a t o i ( argv [ 1 ] ) ) ) != 0) { p r i n t f ( " posix_memalign e r r o r : %s \n " , s t r e r r o r ( rc ) ) ; e x i t ( EXIT_FAILURE ) ;

}

p r i n t f ( " address : %ld (%p ) \ n " , (long) mem, mem) ; e x i t ( EXIT_SUCCESS ) ;

}

/* memalign . c */

(69)

posix_memalign

— example (cont.)

Demo

(70)

Page alignment

As we have seen, thesize of memory pagesdefines the granularity at which virtual memory operates.

A memory page is the smallest chunk of memory that can have distinct behaviorfrom other chunks.

swap-in / swap-out is defined at page granularity ditto for memory permission, backing store, etc.

mmapoperates at page granularity:

each mapping is composed by adiscrete number of pages new mappings are returnedpage aligned

(71)

Determining page size

There are various way to determine the page size.

Anon-portableway of determining page size atcompile-timeis to rely on implementation-specific constants, such as Linux’s

PAGE_SIZE that is defined in<asm/page.h>

As it might change between compilation and execution, it is better to determine the page size at runtime. To determine that and many other limits at runtime, POSIX offerssysconf:

#include<unistd.h>

longsysconf(intname);

Returns: the requested limit ifnameis valid; -1 otherwise

Thesysconfname to determine page size is_SC_PAGESIZE, which is measured in bytes.

(72)

Determining page size — example

#include <s tdi o . h> #include < s t d l i b . h> #include <unistd . h>

i n t main (void) {

p r i n t f ( " page size : %ld btyes \n " , sysconf ( _SC_PAGESIZE ) ) ; e x i t ( EXIT_SUCCESS ) ;

} _{/* pagesize . c */}

On a Linux x86, 64 bits system: $ . / pagesize

page size : 4096 btyes $

(73)

mmap

and page alignment

For maximum portability,mmap’s argumentsaddrandoffsetmust be page aligned.1

Exercise

Note thatoffsetis the file offset. Contrary to addrit does not refer directly to memory. Why should it be page aligned then? Why can’t wemap unaligned file regions to aligned memory regions?

In the most common case, the requirement is trivial to satisfy: addr == NULL no address hint, do as you please offset == 0 map since the beginning of the file

1_SUSv3_mandate_{page-alignment. SUSv4 states that implementations can}

decide whether it’s mandatoryor not. The bottom line is the same: for maximum portability, be page-aligned.

(74)

mmap

and page alignment

For maximum portability,mmap’s argumentsaddrandoffsetmust be page aligned.1

Exercise

Note thatoffsetis the file offset. Contrary to addrit does not refer directly to memory. Why should it be page aligned then? Why can’t wemap unaligned file regions to aligned memory regions?

In the most common case, the requirement is trivial to satisfy: addr == NULL no address hint, do as you please offset == 0 map since the beginning of the file

1_SUSv3_mandate_{page-alignment. SUSv4 states that implementations can}

decide whether it’s mandatoryor not. The bottom line is the same: for maximum portability, be page-aligned.

(75)

File mapping and page alignment

Interactions among page alignment and mapping or file sizes might be tricky. Two cases deserve special attention.

1 _{requested mapping size}_<_{page size}

ñ _{as mappings are made of entire pages, the}_{size of the mapping} is rounded upto the next multiple of page size

ñ _{access beyond the}_actual_{mapping boundary will result in} SIGSEGV, killing the process by default

(76)

File mapping and page alignment (cont.)

2 _{mapping extends beyond EOF}

ñ _{due to explicit request of mapping size rounding, a mapping}

might extend past end of file

ñ _{the reminder of the page is accessible, initialized to 0, and}

shared with other processes (forMAP_SHARED)

ñ _{changes past EOF will not be written back to file}

« _{. . . until the file size changes by other means}

ñ _if_{more entire pages are included in the mapping past EOF,}

accessing them will result inSIGBUS(as a warning)

(77)

Memory synchronization

As an IPC facility, we use file mapping for filesystem-persistent data transfer. To that end, we need to be concerned about:

interprocess synchronization

ñ _{when are page changes made}_{visible to other processes}_?

easy: given that processes have virtual pages that point to the same frames, changes areimmediately visibleto all involved processes

memory-file synchronization

ñ _{when are modified pages}_{written to file}_? ñ _{when are pages}_{read from file}_?

these questions are particularly relevant for applications that mix memory-mapped withread/writeI/O

(78)

Memory synchronization (cont.)

As an IPC facility, we use file mapping for filesystem-persistent data transfer. To that end, we need to be concerned about:

interprocess synchronization

ñ _{when are page changes made}_{visible to other processes}_?

easy: given that processes have virtual pages that point to the same frames, changes areimmediately visibleto all involved processes

memory-file synchronization

ñ _{when are modified pages}_{written to file}_? ñ _{when are pages}_{read from file}_?

these questions are particularly relevant for applications that mix memory-mapped withread/writeI/O

(79)

msync

Themsyncsyscall can be used to control synchronization between memory pages and the underlying mapped file:

int msync(void_{*addr, size_t length,}intflags);

Returns: 0 if OK, -1 otherwise

addrandlengthidentify (part of) the mapping we want to sync flagsis a bitwise OR of:

MS_SYNC request synchronous file write MS_ASYNC request asynchronous file write

MS_INVALIDATE invalidate cached copies of mapped data addrmust be page aligned;lengthwill be rounded up

(80)

Memory

→

file synchronization

We can use msyncto perform memory→file synchronization, i.e. to flush changesfrom the mapped memory to the underlying file.

doing so ensures that applicationsread-ing the file will see changes performed on the memory mapped region

The degree ofpersistence guaranteesoffered bymsyncvaries: withsynchronouswrites (MS_SYNC),msync will return only after the involved pages have been written to disk

ñ _{i.e. the memory region is synced}_{with disk}

withasynchronouswrites (MS_ASYNC)msyncensures that subsequentread-s on the file will return fresh data, but only schedulesdisk writes without waiting for them to happen

ñ _{i.e. the memory region is synced}_{with kernel buffer cache}

Intuition

msync(..., MS_SYNC)≈msync(..., MS_ASYNC)+fdatasync(fd)

(81)

File

→

memory synchronization

Regarding theinitial loading, the behavior isimplementation dependent.

The only (obvious) guarantee is that loading from file to memory will happenin between themmapcall and 1st memory access.

portable applications should not rely on any specific load timing on Linux, page loading is lazy and will happen at first page access, with page-by-page granularity

Subsequent loading — e.g. a processwrite to a mapped file, when will the change be visible to processes mapping it? — can be

controlled using msync’sMS_INVALIDATE flag.

access to pages invalidated withMS_INVALIDATE will trigger page reloadfrom the mapped file

(82)

Unified virtual memory system

Several UNIX implementations provideUnified Virtual Memory(UVM) (sub)systems. With UVMmemory mappings and the kernel buffer cache share physical memory pages.

Therefore the implementation guarantees that the views of a file 1 _{as a memory-mapped region}

2 _{and as a file accessed via I/O syscalls} are always coherent.

With UVM,MS_INVALIDATEis useless and the only useful use of msync is to flush data to disk, to ensure filesystem persistence. Whether a system offers UVM or not is an implementation-specific “detail”.

Linux implements UVM

http://upsilon.cc/zack/teaching/1314/progsyst/

http://creativecommons.org/licenses/by-sa/3.0/