Abstract
The /proc (the process file system), also known as a pseudo-filesystem, is used as an interface to kernel data structures. It doesn’t exist, neither the /proc directory nor its subdirectories or its files actually exist. Most of the files in this special directory are read-only and cannot be changed, but some kernel variables can be changed. It is these files that we will talk about in this chapter of the book.
It is important to note that the /proc filesystem is structured in a hierarchy. Most of the entries in the /proc directory are a decimal number, corresponding to a process-ID running on the system.
These entries are themselves subdirectories and access to process state that is provided by additional files contained within each subdirectory. Have you ever thought about where all the processes running in the background of your system are handled and managed by the kernel?
The answer is the /proc filesystem directory of Linux.
But the /proc filesystem doesn’t handle only process ID of the system; it is also responsible for providing and managing all access to the state of each information on the system. This
information is comprised of CPU, devices, IDE, SCSI, interrupts, io-ports, memories, modules, partitions, PCI information and much more. Just take a quick look inside your /proc filesystem directory to get an idea of the available features controlled by the kernel through the /proc filesystem. We can read the contents of this information to get an idea of what processor, PCI, network cards, kernel version, partitions, etc that we have on our system.
As we said before, not all features available in the /proc filesystem are customizable, most are managed by the kernel and cannot be changed. Most are well controlled by the kernel and should not require any modifications since the kernel does a good job with them. Some can, and need to be, changed and customized to better fit your system resources, and increase security. It is those customizable features related to performance and security of the Linux system under the /proc filesystem that we will explain and customize in this chapter.
This is possible with the /etc/sysctl.conf file which contains values that change the default parameters of customizable features in the /proc filesystem. To recap, systcl.conf is the configuration file that talks to sysctl(8) which is an interface that allows you to make changes to a running Linux system. We use systcl.conf to talk to the kernel and say for example: hey, I need more power on the virtual memory, please change your value to this value.
Throughout this chapter, we’ll often use it to customize our /proc filesystem on Linux to better utilize resources, power and security of our particular machine. Remember that everyone have a different computer with different hardware, setting and this is why changing some default
customizable values in the /proc directory could make the difference on security and speed.
In this chapter, we will talk about customized parameters available under the /proc/sys directory since most of all changeable parameters are located under this directory. We will talk about virtual memory, file system, TCP/IP stack security and performance.
What is sysctl?
sysctl is an interface that allows you to make changes to a running Linux system. It serves two functions: to read and to modify system settings.
• To view all readable variables, use the following command:
[root@deep /]# sysctl -a
• To read a particular variable, for example, fs.file-max, use the following command:
[root@deep /]# sysctl fs.file-max fs.file-max = 8192
• To set a particular variable for fs.file-max, use the following command:
[root@deep /]# sysctl -w fs.file-max=5536 fs.file-max = 16384
Settings of sysctl variables are usually strings, numbers, or booleans (a boolean being 1 for yes or a 0 for no). If you set and change variable manually with the sysctl command as show above, your changes will not resists on the next reboot of the system. For this reason, we will use and show you further down in this chapter how to make your changes permanent even on
possible reboot of the server by using the /etc/sysctl.conf file.
/proc/sys/vm: The virtual memory subsystem of Linux
All parameters described in this chapter reside under the /proc/sys/vm directory of the server and can be used to tune the operation of the virtual memory (VM) subsystem of the Linux kernel.
Be very careful when attempting this. You can optimize your system, but you can also cause it to crash. Since every system is different, you'll probably want some control over this piece of the system.
Finally, these are advanced setting and if you don’t understand them, then don’t try to play in this area or try to use all the examples below in your system. Remember that all systems are different and require different settings and customizations. The majority of the following hacks will work fine on a server with >= at 512MB of RAM or a minimum of 256MB of RAM. Below this amount of memory, nothing is guaranteed and the default setting will just be fine for you.
Next I’ll show you parameters that can be optimized. All suggestions I make in this section are valid for all kinds of servers. The only difference depends on the amount of RAM your machine has and this is where the settings will change.
The above figure shows a snapshot of /proc/sys/vm directory on an OpenNA Linux & Red Hat Linux system running kernel version 2.4. Please note that this picture may look different on your system.
Process file system management 0 CHAPTER 7
The bdflush parameters:
The bdflush file is closely related to the operation of the virtual memory (VM) subsystem of the Linux kernel and has a little influence on disk usage. This file /proc/sys/vm/bdflush controls the operation of the bdflush kernel daemon. We generally tune this file to improve file system performance.
By changing some values from the defaults shown below, the system seems more responsive;
e.g. it waits a little more to write to disk and thus avoids some disk access contention. The bdflush parameters currently contain 9 integer values, of which 4 are actually used by the kernel. Only first, fifth, sixth and the seventh parameters are used by the kernel for bdflush setup and all the other parameters are not used and their values are set to ‘0’.
Parameter 1 (nfract):
The bdflush parameter 1 governs the maximum number of dirty buffers in the buffer cache.
Dirty means that the contents of the buffer still have to be written to disk (as opposed to a clean buffer, which can just be forgotten about). Setting this to a high value means that Linux can delay disk writes for a long time, but it also means that it will have to do a lot of I/O (Input/Output) at once when memory becomes short. A low value will spread out disk I/O more evenly at the cost of more frequent I/O operations. The default value is 40%, the minimum is 0%, and the
maximum is 100%. We improve the default value here.
Parameter 2 (dummy1):
This parameter is unused by the system so we don’t need to change the default ones.
Parameter 3 (dummy2):
This parameter is unused by the system so we don’t need to change the default ones.
Parameter 4 (dummy3):
This parameter is unused by the system so we don’t need to change the default ones.
Parameter 5 (interval):
The bdflush parameter 5 specifies the minimum rate at which kupdate will wake and flush.
The value is expressed in jiffies (clockticks), the number of jiffies per second is normally 100.
Thus, x*HZ is x seconds. The default value is 5 seconds, the minimum is 0 seconds, and the maximum is 600 seconds. We keep the default value here.
Parameter 6 (age_buffer):
The bdflush parameter 6 governs the maximum time Linux waits before writing out a dirty buffer to disk. The value is in jiffies. The default value is 30 seconds, the minimum is 1 second, and the maximum 6,000 seconds. We keep the default value here.
Parameter 7 (nfract_sync):
The bdflush parameter 7 governs the percentage of buffer cache that is dirty before bdflush activates synchronously. This can be viewed as the hard limit before bdflush forces buffers to disk. The default is 60%, the minimum is 0%, and the maximum is 100%. We improve the default value here.
Parameter 8 (dummy4):
This parameter is unused by the system so we don’t need to change the default ones.
Parameter 9 (dummy5):
This parameter is unused by the system so we don’t need to change the default ones.
The default kernel setup for the bdflush parameters is:
"40 64 64 256 500 3000 60 0 0"
The default setup for the bdflush parameters under OpenNA Linux is:
"60 64 64 256 500 3000 80 0 0"
The default setup for the bdflush parameters under Red Hat Linux is:
"30 64 64 256 500 3000 60 0 0"
Step 1
To change the values of bdflush, type the following command on your terminal:
• Edit the sysctl.conf file (vi /etc/sysctl.conf) and add the following line:
# Improve file system performance
vm.bdflush = 60 64 64 256 500 3000 80 0 0
Step 2
You must restart your network for the change to take effect. The command to restart the network is the following:
• To restart all network devices manually on your system, use the following command:
[root@deep /]# /etc/init.d/network restart Setting network parameters [OK]
Bringing up interface lo [OK]
Bringing up interface eth0 [OK]
Bringing up interface eth1 [OK]
NOTE: There is another way to update the entry without restarting the network by using the following command in your terminal screen:
[root@deep /]# sysctl -w vm.bdflush="60 64 64 256 500 3000 80 0 0"
The kswapd parameter:
The kswapd file is related to the kernel swapout daemon. This file /proc/sys/vm/kswapd frees memory on the system when it gets fragmented or full. Its task is to keep the memory management system operating efficiently. Since every system is different, you'll probably want some control over this piece of the system.
There are three parameters to tune in this file and two of them (tries_base and
swap_cluster) have the largest influence on system performance. The kswapd file can be used to tune the operation of the virtual memory (VM) subsystem of the Linux kernel.
Parameter 1 (tries_base):
The kswapd parameter 1 specifies the maximum number of pages kswapd tries to free in one round. Usually this number will be divided by 4 or 8, so it isn't as big as it looks. Increase this number to cause swap to be released faster, and increase overall swap throughput. The default value is 512 pages. We keep the default value here.
Process file system management 0 CHAPTER 7
Parameter 2 (tries_min):
The kswapd parameter 2 specifies the minimum number of pages kswapd tries to free a least each time it is called. Basically it's just there to make sure that kswapd frees some pages even when it's being called with minimum priority. The default value is 32 pages. We keep the default value here.
Parameter 3 (swap_cluster):
The kswapd parameter 3 specifies the number of pages kswapd writes in one iteration. You want this large to increase performance so that kswapd does its I/O in large chunks and the disk doesn't have to seek often, but you don't want it to be too large since that would flood the request queue. The default value is 8 pages. We improve the default value here.
The default kernel setup for the kswapd parameters is:
"512 32 8"
The default setup for the kswapd parameters under OpenNA Linux is:
"512 32 32"
The default setup for the kswapd parameters under Red Hat Linux is:
"512 32 8"
Step 1
To change the values of kswapd, type the following command on your terminal:
• Edit the sysctl.conf file (vi /etc/sysctl.conf) and add the following lines:
# Increase swap bandwidth system performance vm.kswapd = 512 32 32
Step 2
You must restart your network for the change to take effect. The command to restart the network is the following:
• To restart all network devices manually on your system, use the following command:
[root@deep /]# /etc/init.d/network restart Setting network parameters [OK]
Bringing up interface lo [OK]
Bringing up interface eth0 [OK]
Bringing up interface eth1 [OK]
NOTE: There is another way to update the entry without restarting the network by using the following command into your terminal screen:
[root@deep /]# sysctl -w vm.kswapd=”512 32 32”
The overcommit_memory parameter:
The overcommit_memory parameter is simply a flag that enables memory overcommitment.
Memory overcommitment is a procedure to check that a process has enough memory to allocate a new virtual mapping. When this flag is 0, the kernel checks before each malloc() to see if there's enough memory left. If the flag is 1, the system pretends there's always enough memory and don't make the check on the system. This feature can be very useful ONLY on big servers with a lot of pysical memories available (>= 2GB) because there are a lot of programs that malloc() huge amounts of memory "just-in-case" and don't use much of it.
The default kernel setup for the overcommit_memory parameter is:
"0"
The default setup for the overcommit_memory parameter under OpenNA Linux is:
"0"
The default setup for the overcommit_memory parameter under Red Hat Linux is:
"0"
Step 1
To change the value of overcommit_memory, type the following command on your terminal:
• Edit the sysctl.conf file (vi /etc/sysctl.conf) and add the following lines:
# Enables/Disables memory overcommitment vm.overcommit_memory = 0
Step 2
You must restart your network for the change to take effect. The command to restart the network is the following:
• To restart all network devices manually on your system, use the following command:
[root@deep /]# /etc/init.d/network restart Setting network parameters [OK]
Bringing up interface lo [OK]
Bringing up interface eth0 [OK]
Bringing up interface eth1 [OK]
WARNING: Only change the default value of 0 to become 1 on systems with more than 2GB of RAM. Recall that on small systems the value must be set to 0 (overcommit_memory=0).
There is another way to update the entry without restarting the network by using the following command into your terminal screen:
[root@deep /]# sysctl -w overcommit_memory=0
Process file system management 0 CHAPTER 7
The page-cluster parameter:
The Linux virtual memory subsystem avoids excessive disk seeks by reading multiple pages on a page fault. The number of pages it reads is highly dependent on the amount of memory in your machine. The number of pages the kernel reads in at once is equal to 2 ^ page-cluster. Values above 2 ^ 5 don't make much sense for swap because we only cluster swap data in 32-page groups. The page-cluster parameter is used to tune the operation of the virtual memory (VM) subsystem of the Linux kernel.
The default kernel setup for the kswapd parameter is:
"3"
The default setup for the kswapd parameter under OpenNA Linux is:
"5"
The default setup for the kswapd parameter under Red Hat Linux is:
"4"
Step 1
To change the value of page-cluster, type the following command on your terminal:
• Edit the sysctl.conf file (vi /etc/sysctl.conf) and add the following lines:
# Increase number of pages kernel reads in at once vm.page-cluster = 5
Step 2
You must restart your network for the change to take effect. The command to restart the network is the following:
• To restart all network devices manually on your system, use the following command:
[root@deep /]# /etc/init.d/network restart Setting network parameters [OK]
Bringing up interface lo [OK]
Bringing up interface eth0 [OK]
Bringing up interface eth1 [OK]
NOTE: There is another way to update the entry without restarting the network by using the following command into your terminal screen:
[root@deep /]# sysctl -w vm.page-cluster=5
The pagetable_cache parameter:
The kernel keeps a number of page tables in a per-processor cache (this helps a lot on SMP systems). The cache size for each processor will be between the low and the high value. On SMP systems it is used so that the system can do fast pagetable allocations without having to acquire the kernel memory lock.
For large systems, the settings are probably OK. For normal systems they won't hurt a bit. For small systems (<16MB RAM) and on a low-memory, single CPU system it might be
advantageous to set both values to 0 so you don't waste the memory.
The default kernel setup for the kswapd parameters is:
"25 50"
The default setup for the kswapd parameters under OpenNA Linux is:
"25 50"
The default setup for the kswapd parameters under Red Hat Linux is:
"25 50"
Step 1
To change the values of pagetable_cache, type the following command on your terminal:
• Edit the sysctl.conf file (vi /etc/sysctl.conf) and add the following lines:
# Improve number of page tables keeps in a per-processor cache vm.pagetable_cache = 25 50
Step 2
You must restart your network for the change to take effect. The command to restart the network is the following:
• To restart all network devices manually on your system, use the following command:
[root@deep /]# /etc/init.d/network restart Setting network parameters [OK]
Bringing up interface lo [OK]
Bringing up interface eth0 [OK]
Bringing up interface eth1 [OK]
WARNING: Only change these values on systems with multiple processors (SMP) or on small systems (single processor) with less than 16MB of RAM. Recall that on small systems the both values must be set to 0 (vm.pagetable_cache = 0 0).
There is another way to update the entry without restarting the network by using the following command into your terminal screen:
[root@deep /]# sysctl -w vm.pagetable_cache=”25 50”
Process file system management 0 CHAPTER 7
/proc/sys/fs: The file system data of Linux
All parameters described later in this chapter reside under the /proc/sys/fs directory of the server and can be used to tune and monitor miscellaneous things in the operation of the Linux kernel. Be very careful when attempting this. You can optimize your system, but you can also cause it to crash. Since every system is different, you'll probably want some control over these pieces of the system.
Finally, these are advanced settings and if you don’t understand them, then don’t play in this area or try to use all the examples below in your system. Remember that all systems are different and required different setting and customization.
Below I show you only parameters that can be optimized for the system. All suggestions I enumerate in this section are valid for every kind of servers. The only difference depends on the amount of MB of RAM your machines have and this is where settings will change.
The above figure shows a snapshot of /proc/sys/fs directory on a OpenNA Linux & Red Hat Linux system running kernel version 2.4. Please note that this picture may look different on your system.
The file-max & file-nr parameters:
The file-max and file-nr files work together on Linux, we use the file-max parameter to sets the maximum number of file-handles that the Linux kernel will allocate and the file-nr file to get information about the number of allocated file handles, the number of used file handles and the maximum number of file handles presently on the system. A large-scale production server
The file-max and file-nr files work together on Linux, we use the file-max parameter to sets the maximum number of file-handles that the Linux kernel will allocate and the file-nr file to get information about the number of allocated file handles, the number of used file handles and the maximum number of file handles presently on the system. A large-scale production server