A.5.1
Hugepage Configuration
Multiple 1 GB and 2 MB hugepage sizes are supported, but the 1 GB hugepage support can only be enabled in the Linux boot line. The following kernel boot line enables and allocates both 1 GB and 2 MB hugepage memory:
default_hugepagesz=1GB hugepagesz=1GB hugepages=16 hugepagesz=2M hugepages=2048 In this case, the default hugepage size is 1 GB; 16 GB of 1 GB hugepage memory are allocated and 4 GB of 2 MB hugepages are allocated. The boot-line hugepage memory allocation is divided across Non-Uniform Memory Access (NUMA) nodes. For a two NUMA node system, half the memory is allocated from each NUMA node. For this example, on two sockets, eight 1 GB hugepages are allocated on NUMA node 0 and another 8 GB on NUMA node 1; 1024 2 MB pages are allocated on NUMA node 0 and another 1024 2 MB pages on NUMA node 1.
When multiple physical processors are inserted into the system, the memory attached to each processor forms a memory NUMA node. When using the general hugepage controls, the kernel attempts to distribute the hugepage pool over all the allowed NUMA nodes. If insufficient memory is on a NUMA node, it is skipped silently and the skipped memory page count is added to counts to be allocated on other NUMA nodes.
44
A.5.1.1 Dynamically Allocating 2 MB Hugepage Memory for
NUMA Nodes
The 2 MB hugepages are set up on a per memory NUMA node with a page count set for each NUMA node. The following displays the number of memory pages for a system with two NUMA nodes:
# cat /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages 1024
# cat /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages 1024
There are currently 2048 2 MB pages in the memory pool with half for each NUMA node. For example, changing to 2048 2 MB pages for NUMA node 0 can be done by writing the count to the
nr_hugepages variable for NUMA node 0:
# echo 2048 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages Reviewing the NUMA node information:
# cat /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages 2048
# cat /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages 1024
Notes: 1. Hugepage memory may not be added, if insufficient memory is available. Adding huge
memory should be done before the system becomes heavily loaded with processes and/or VMs.
2. Unless insufficient NUMA memory is available and memory of another NUMA node is available, the Linux system allocates memory from the NUMA node that the task is running on. Controlling the NUMA node the task is running on and having sufficient NUMA node memory allows control of which NUMA node hugepage memory is allocated on.
A.5.2
Isolating the CPU Core
The CPU virtual processor cores can be removed from the Linux task scheduler so that some of these virtual processor cores can be reserved for real-time processing tasks. Real-time processing does not work as expected, if all CPU cores are assigned core-level, real-time tasks. Because the default Linux interrupt core is 0, it is recommended to always leave CPU 0 for general Linux system processing. If this is not done, all the interrupt processing needs to be moved to the non-real-time core before starting the real-time task.
If hyperthreading is enabled, both hyperthreaded cores pairs should be in the isolated core list. Full non-hyperthreaded core performance can be achieved by leaving the associated hyperthreaded core idle.
The following boot command-line argument isolates CPU physical cores 1 to 12 with associated hyperthreaded cores for the E5-2697 (14 core CPU) installed. Cores 0 and 13 with associated hyperthreaded cores are available for the Linux host system. The full second physical CPU socket cores are left for Linux system use in this case.
isolcpus=1,2,3,4,5,6,7,8,9,10,11,12,28,29,30,31,32,33,34,35,36,37,38,39
The Linux command taskset can be used during program startup or for running programs to move a task or thread to the target CPU core.
45 this:
isolcpus=1,2,3,4,5,6,7,8,9,10,11,12,28,29,37,38,39,40,41,42,43,44,45,46
A.5.3
Configuring the IOMMU
The IOMMU is needed for PCI passthrough and SR-IOV to pass a hardware PCI device to a VM. Enabling the IOMMU adds extra system overhead and security for all physical I/O devices in the system. When the IOMMU is not needed, it should be disabled by removing IOMMU arguments from the Linux boot command line and then rebooted. To enable IOMMU, add the following arguments to the Linux boot command line:
iommu=pt intel_iommu=on
Without iommu=pt, the host I/O operates much slower than when it is present.
A.5.4
Editing the Default Grub Configuration
The hugepage configuration can be added to the default configuration file /etc/default/grub by adding to the GRUB_CMDLINE_LINUX and the grub configuration file regenerated to get an updated
configuration file for Linux boot:
# vim /etc/default/grub // edit file . . .
GRUB_CMDLINE_LINUX="default_hugepagesz=1GB hugepagesz=1GB hugepages=16 \
hugepagesize=2m hugepages=2048 isolcpus=1,2,3,4,5,6,7,8,9,10,11,12,28,29,30,31,\ 32,33,34,35,36,37,38,39 iommu=pt intel_iommu=on ..."
. . .
After editing, the new grub.cfg boot file needs to be regenerated: # grub2-mkconfig -o /boot/grub2/grub.cfg
A.5.5
Verifying Kernel Boot Configuration
The system needs to be rebooted for the kernel changes to take effect. After reboot, the kernel command line can be checked looking at kernel messages with dmesg:
# dmesg |grep command
[ 0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-3.18.5-201.fc21.x86_64 root=UUID=cc17be46-707c-414a-8c16-11ecda5905be ro default_hugepagesz=1GB
hugepagesz=1GB hugepages=16 hugepagesz=2M hugepages=2048
isolcpus=1,2,3,4,5,6,7,8,9,10,11,12,28,29,30,31,32,33,34,35,36,37,38,39 rhgb quiet The dmesg only outputs the last system messages that are present in a limited reserved memory for system messages. If the kernel boot command line is not present, then search the full log file /var/log/messages.
46
The memory hugepage information also can be checked: # cat /proc/meminfo . . . HugePages_Total: 16 HugePages_Free: 16 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 1048576 kB . . .
Only the hugepage count for the default memory size is shown. For the case of having both 1 GB and 2 MB hugepages, only the 1 GB hugepage count is shown with /proc/meminfo.