Linux Lab 2008 - Workshop 4
Richard A. Hayden
Introduction
Part 4
: Using the ramdisk as a swap device . . .
Quite a technically-motivated part
Aim is to make some changes to allow your
driver to be used more effectively as a swap
device
Each 4096 byte chunk on your device will hold
Swap device
Nothing to stop use of device now for swapping
mkswap srd
— gets the size of your device and writes
some accounting info to the first block
swapon srd
— makes the device available for
swapping
Check everything is as expected (
free
command):
total used free ...
Mem: 524288 245760 278528 ...
-/+ buffers/cache: 210062 314226 ...
Assumptions
From now on, device will be used only for swap. Introduces some simplifying assumptions:
All requests will be for complete 4096 (PAGE_SIZE)-byte
chunks, i.e. req->current_nr_sectors is always 8, and req->sector will always be a multiple of 8
Due to the above, ramdiskcheck and using the device with
a filesystem will no longer work
Thus you no longer appear to need the buffer page, but keep
it around — required for the memory allocator
Compressed ramdisk will be quite slow — can’t co-exist with
the OOM-killer, comment out the code in mm/vmscan.c
Kernel configuration tool
Currently, your ramdisk compiles into the kernel
unconditionally
Best practise is to make it an optional component, as
with other drivers
Configuration stored in
/usr/src/linux/.config
. Format of
each line is
CONFIG_SOMEOPTION=X
, where
X
is
normally
y
,
n
or
m
, but can take integer values too
make menuconfig
from kernel root directory presents
a graphical tool for selecting options — try it but
don’t
save any changes
(yet)
Adding configuration options
You need to edit
drivers/block/Config.in
to
define new configuration options for your driver
Two options:
Boolean option (
y
or
n
) to say if included
Integer to specify amount of memory it uses
Running
make menuconfig
should now show your
options — modify them (
and only them
), check
.config
is
updated appropriately
May be told you need to run
make dep
again before
Using them
Configuration options available in both Makefiles and C files Modify Makefile appropriately to use the first option for
conditional compilation
Modify srd.c appropriately so that your driver takes its size
from the second option. Values are available as constants with the same name
Soon you will modify external kernel files to support your
driver — always wrap in the following style conditional compilation block:
#ifdef CONFIG_SOMEOPTION
// Code to compile when CONFIG_SOMEOPTION enabled #else
// Code to compile otherwise #endif
Overstating your size
Linux VM sub-system demands that swap devices declare a
fixed size (a multiple of 4096-bytes)
Problematic for our device as ‘capacity’ changes dynamically
as a result of how successful compression is
Cannot easily modify VM sub-system to cope with such
dynamic devices, so we hack
We overstate our device’s size — make sure ioctl does too The average compression ratio would probably be the
best value for this — might vary depending on data in memory
What if we run out?
Currently, driver will fail — you will get I/O
requests you can’t honour
In the future, when we add memory allocation,
perhaps we’ll gain some space, but what if not
enough to honour exaggeration?
Need a way to tell kernel to try a different page
if device runs out of actual memory
Need to understand what happens in low
Low memory situations
shrink_cache() is invoked
shrink_cache() loops through all pages in inactive list
If the page is not mapped into any process’ VM, checks if it is
dirty
If it is dirty, clears the dirty flag and submits page to backing
device (could become dirty again during I/O if another process writes to it)
If page is not dirty, can reclaim the memory If a threshold of mapped pages is reached during
shrink_cache(), invokes swap_out()
Unmaps pages from processes’ VMs, replacing physical page
Simple solution
So a simple solution, if we can’t honour a write request,
is not to do any I/O, but pretend we did and just set the
the dirty flag again
This is not an error, so we still set
uptodate
parameter
to 1 when we call
end_that_request_first()
This way, kernel thinks device is behaving correctly, but
that some other process dirtied the page — gives us the
semantics we want
Of course, there are better solutions if VM knew about
How to do it
The
struct page *
for the page
corresponding to a request can be found at
req->bh->b_page
Use the
SetPageDirty()
macro
For now, until you have your memory
allocator:
I/O within actual size do as normal
I/O within exaggerated size, but not actual
size, use the page dirty hack
Freeing swap slots
Normal swap devices do not know whether a
slot on the swap device is occupied or free,
they simply respond to I/O and let kernel
manage it
Doesn’t work for us. Due to our dynamic
nature, we will need to know when a slot is
freed so we can free the corresponding memory
(when we have the memory allocator at least)
Again, this is because VM doesn’t know about
our internal structure and again, we have to
make a minor hack
Making the VM tell us
Modify swap_entry_free(), called every time a swap slot is
freed by some referent
Decrements p->swap_map[offset] each time (p is a pointer to a
struct swap_info_struct)
When it reaches 0, no one is referring to that swap slot anymore, so
kernel updates its table to mark it as free
This is where you need to insert a call to a function within your
driver so it is also informed about this event
You won’t fill in this callback function till the next exercise, but test
with a printk()
Obviously, your callback needs external linkage and you need a
header file srd.h declaring it, which should be included into the appropriate kernel source file
Assessment/testing
Test as a swap device, but until you do the
memory allocator, do not exaggerate size (set
factor to 1)
Test utility is
swapcheck
, execute
swapoff; mkswap srd; swapon srd
beforehand
Swapcheck modes
swapcheck has two modes:
Assessment mode: two parameters; device node and ‘y’ or ‘n’,
‘n’ for milestone 3. Runs for four minutes, processing large amounts of data, fails if detects any corruption or crashes in any way, otherwise, passes
Manual mode: two parameters; amount of memory to use and
‘y’ or ‘n’, ‘n’ for milestone 3. Runs indefinitely, processing specified amount of data, fails if detects any corruption or crashes in anyway, otherwise, passes
The ‘y’ and ‘n’ simply let swapcheck know if you have a memory