CPU Related Root Causes and Solutions - Root Causes And Solutions

3 Root Causes And Solutions

3.2 CPU Related Root Causes and Solutions

This section covers the troubleshooting of high CPU consumption on the system.

A constantly high CPU consumption will lead to a considerably slower system as no more requests can be processed. From an end user perspective, the application behaves slowly, is unresponsive or can even seem to hang.

Note that a proper CPU utilization is actually desired behavior for SAP HANA, so this should be nothing to worry about unless the CPU becomes the bottleneck. SAP HANA is optimized to consume all memory and CPU available. More concretely, the software will parallelize queries as much as possible in order to provide optimal performance. So if the CPU usage is near 100% for a query execution it does not always mean there is an issue. It also does not automatically indicate a performance issue.

3.2.1 Indicators of CPU Related Issues

CPU related issues are indicated by alerts issued or in views in the SAP HANA Studio.

The following alerts may indicate CPU resource problems:

● Host CPU Usage (Alert 5)

● Most recent savepoint operation (Alert 28)

● Savepoint duration (Alert 54)

You notice very high CPU consumption on your SAP HANA database from one of the following:

● Alert 5 (Host CPU usage) is raised for current or past CPU usage

● The displayed CPU usage on the overview screen

● The Load graph is currently showing high CPU consumption or shows high consumption in the past

3.2.2 Analysis of CPU Related Issues

The following section describes how to analyze high CPU consumption using tools in the SAP HANA studio tools.

When analyzing high CPU consumption, you need to distinguish between the CPU resources consumed by HANA itself and by other, non-SAP HANA processes on the host. While the CPU consumption of SAP HANA will be addressed here in detail, the CPU consumption of other processes running on the same host is not

covered. Such situations are often caused by additional programs running concurrently on the SAP HANA appliance such as anti-virus and backup software. For more information see SAP Note 1730928.

A good starting point for the analysis is the Overview tab in the SAP HANA studio. It contains a section that displays SAP HANA CPU usage versus total CPU usage, which includes all processes on the host, and keeps track of the maximum CPU usage that occurred since the last restart of SAP HANA. If SAP HANA CPU usage is low while total CPU usage is high, the issue is most likely related to a non-SAP HANA process.

To find out what is happening in more detail, open Performance Threads tab (see Thread Monitoring). In order to prepare it for CPU time analysis, perform the following steps:

● To switch on resource tracking open the Configuration tab and in the resource_tracking section of the global.ini file set the following parameters to on.

○ cpu_time_measurement_mode

○ enable_tracking

● Display the CPU Time column by using the Configure Viewer button on the outer right side of the Threads tab.

The Thread Monitor shows the CPU time of each thread running in SAP HANA in microseconds.. A high CPU time of related threads is an indicator that an operation is causing the increased CPU consumption.

Figure 4: Thread Monitor Showing CPU Time

In order to identify expensive statements causing high resource consumption, turn on the Expensive

Statement trace and specify a reasonable runtime (see Expensive Statements Trace). If possible, add further restrictive criteria such as database user or application user to narrow down the amount of information traced.

Note that the CPU time for each statement is shown in the column CPU_TIME if resource_tracking is activated.

Another tool to analyze high CPU consumption is the Kernel Profiler. More information about this tool can be found in Kernel Profiler. Note that setting a maximum duration or memory limit for profiling is good practice and should be used if appropriate values can be estimated.

To capture the current state of the system for later analysis you can use Full System Info Dump. However, taking a Full System Info Dump requires resources itself and may therefore worsen the situation. To get a Full System Info Dump, open Diagnosis Files Diagnosis Information and choose Collect (SQL Procedure) if the system is up and accepting SQL commands or Collect (Python Script) if it is not.

Related Information

SAP Note 1730928

Thread Monitoring [page 134]

Expensive Statements Trace [page 144]

Kernel Profiler [page 167]

3.2.3 Resolving CPU Related Issues

The first priority in resolving CPU related issues is to return the system to a normal operating state, which may complicate identifying the root cause

Issue resolution should aim to bring the system back to a sane state by stopping the operation that causes the high CPU consumption. However, after resolving the situation it might not be possible to find out the actual root cause. Therefore please consider recording the state of the system under high load for later analysis by collecting a Full System Info Dump (see Analysis of CPU Related Issues).

Actually stopping the operation causing the high CPU consumption can be done via the Thread Monitor (see Thread Monitoring). With the columns Client Host, Client IP, Client PID and Application User it is possible to identify the user that triggered the operation. In order to resolve the situation contact him and clarify the actions he is currently performing:

Figure 5: Identify Application User

As soon as this is clarified and you agree on resolving the situation, two options are available:

● On the client side, end the process calling the affected threads

● Cancel the operation that is related to the affected threads. To do so, right-click on the thread in the Threads tab and choose Cancel Operations.

For further analysis on the root cause, please open a ticket to SAP HANA Development Support and attach the Full System Info Dump, if available.

Related Information

Analysis of CPU Related Issues [page 36]

Thread Monitoring [page 134]

3.2.4 Retrospective Analysis of CPU Related Issues

There are a number of options available to analyze what the root cause of an issue was after it has been resolved.

A retrospective analysis of high CPU consumption should start by checking the Load graph and the Alerts tab.

Using the alert time or the Load graph, determine the time frame of the high CPU consumption. If you are not able to determine the time frame because the issue happened too long ago, check the following statistics server table which includes historical host resource information up to 30 days:

HOST_RESOURCE_UTILIZATION_STATISTICS (_SYS_STATISTICS schema)

With this information, search through the trace files of the responsible process. Be careful to choose the correct host when SAP HANA runs on a scale-out landscape. The information contained in the trace files will give indications on the threads or queries that were running during the affected time frame.

If the phenomenon is recurrent due to a scheduled batch jobs or data loading processes, turn on the Expensive Statements trace during that time to record all involved statements (see Expensive Statements Trace ).

Furthermore, check for concurrently running background jobs like backups and Delta Merge that may cause a resource shortage when run in parallel. Historical information about such background jobs can be obtained from the system views:

● M_BACKUP_CATALOG

● M_DELTA_MERGE_STATISTICS

● A longer history can be found in the statistics server table HOST_DELTA_MERGE_STATISTICS (_SYS_STATISTICS schema).

Related Information

Expensive Statements Trace [page 144]

M_BACKUP_CATALOG

M_DELTA_MERGE_STATISTICS HOST_DELTA_MERGE_STATISTICS

3.2.5 Controlling Parallelism of SQL Statement Execution

There are two subsystems, SQLExecutors and JobExecutors, that control the parallelism of statement execution.

Caution

Altering the settings described here can only be seen as a last resort when traditional tuning techniques like remodeling and repartitioning as well as query tuning are already fully exploited. Playing with the parallelism settings requires a deep understanding of the actual workload and has severe impacts on the overall system behavior, so be sure you know what you are doing.

On systems with highly concurrent workload, too much parallelism of single statements may lead to sub-optimal performance. The parameters below allow you to adjust the CPU contention in the system.

Two subsystems control the parallelism of the statement execution.

● SqlExecutor

These thread types handle incoming client requests and execute simple statements. For each statement execution, an SqlExecutor thread from a thread pool processes the statement. For simple (OLTP-like)

statements against column store as well as for most statements against row store, this will be the only type of thread involved.

● JobExecutor

The JobExecutor is a job dispatching subsystem. Almost all remaining parallel tasks are dispatched to the JobExecutor and its associated JobWorker threads.

For both SqlExecutor and JobExecutor, a separate limit can be set for the maximum amount of threads. This can be used to keep some room for OLTP workload on a system where OLAP workload would otherwise consume all the CPU resources.

Caution

Lowering the value of this parameter can have a drastic effect on the parallel processing of the servers and reduce the performance of the overall system.

Table 2: SqlExecutor Parameters

INI file Section Parameter Default Description

indexserver.ini sql sql_executors 0 (number of available threads)

fines the number of parallel (SQL) queries that can be processed by the system. The de

fault value is the num

ber of hyperthreads in a system (0). As each thread allocates a par

ticular amount of main memory for the stack, reducing the value of this parameter can help to avoid memory footprint.

indexserver.ini sql max_sql_executo

0 (disabled) Sets the maximum number of threads that can be used

Table 3: JobExecutor Parameters

Ini file Section Parameter Default Description

global.ini/ indexserver.ini

execution max_concurrency Number of available threads

Sets the maximum number of threads that can be used.

JobExecutor settings do not solely affect OLAP workload, but also other SAP HANA subsystems (for example, memory garbage collection, savepoint writes). The JobExecutor executes also operations like table updates and backups, which were delegated by the SqlExecutor. JobExecutor settings are soft limits, meaning the JobExecutor can “loan” threads, if available, and then fall back to the maximum number of threads when done.

Tip

In a system that supports multitenant database containers, a reasonable default value for the

max_concurrency parameter is the number of cores divided by the number of tenant databases. Do not specify a value of 0.

3.2.6 Bind Processes to CPUs

For better workload management in a system that supports multitenant database containers, you can bind SAP HANA processes to logical cores of the hardware, for example to partition the CPU resources of the system by tenant database.

Prerequisites

● You have access to the operating system of the SAP HANA instance and are able to read the directories and files mentioned below.

● You have the privilege INIFILE ADMIN.

Context

Tip

Instead of binding processes to CPUs as described here, you can achieve similar performance gains by changing the parameter [execution] max_concurrency in the indexserver.ini configuration file.

This is more convenient to do and does not require the system to be offline. For more information, see Managing Resources in Multiple-Container Systems.

If the physical hardware on a host needs to be shared with other processes, it may be useful to assign a set of cores to an SAP HANA process. This is achieved by assigning an affinity to logical cores. Note that the logical cores used in the configuration do not have a fixed association to the physical cores on the hardware. The commands listed below can be used to discover this setting. The information on the cores provided by sysfs is used to restrict the cores available to SAP HANA in the daemon.ini file. Sysfs is a virtual file system and is used to access information about hardware. General information on sysfs is available in the Linux kernel documentation.

For Xen and VmWare, the users in the VM guest system see what is configured in the VM host. So the quality of the reported information depends on the configuration of the VM guest. Therefore SAP cannot give any performance guarantees in this case.

For Linux containers (like Docker) /proc/cpuinfo reports more cores than are actually available. As Linux containers use the same mechanism to limit CPU resources per container as SAP does for the affinity, the affinity for the container also applies to the complete SAP HANA system. This means that if the Linux container is configured to use logical cores 0-4, then only those will be available to SAP HANA processes.

There are a number of steps to analyze the topology of sockets and cores. You can use this information to define the desired setting for the affinity of SAP HANA processes.

Procedure

1. Get cores available for scheduling:

cat /sys/devices/system/cpu/present Typical output: 0-31

The system exposes 32 logical cores (or cpu). For each of them there is a /sys/devices/system/cpu/

cpu# with # denoting the core-id in the range 0-31.

2. Get sibling cores, that is the cores on the same socket:

cat /sys/devices/system/cpu/cpu0/topology/core_siblings_list Typical output: 0-7, 16-23

3. Get all logical cores assigned to the same physical core (hyperthreading):

cat /sys/devices/system/cpu/cpu0/topology/thread_siblings_list Typical output: 0, 16

4. Get the socket of a specific core:

cat /sys/devices/system/cpu/cpu0/topology/physical_package_id Typical output: 0

5. Restrict CPU usage of SAP HANA processes to certain CPUs.

In the daemon.ini file the relevant sections are: nameserver, indexserver, compileserver, preprocessor and xsengine. Here the parameter affinity can be set to c1, c2, c3, c4-c5, where c1 and so on are logical cores.

With sched_setaffinity (similar to numactl) the Linux OS will make sure that whatever number of threads are executed by a process, they will be assigned to distinct logical cores.

Results

After following these steps you have the information required to assign an affinity to the logical cores. The affinity setting only takes effect after a restart of the affected SAP HANA processes.

Example affinity settings:

● To restrict the nameserver to two logical cores of the first CPU of socket 0 (derived in step 3), use the following affinity setting:

Example

ALTER SYSTEM ALTER CONFIGURATION ('daemon.ini', 'SYSTEM') SET ('nameserver', 'affinity') = '0,16'

● To restrict the preprocessor and the compileserver to all remaining cores (that is, all except 0 and 16) on socket 0 use the following affinity setting (derived in steps 2 and 3):

Example

ALTER SYSTEM ALTER CONFIGURATION ('daemon.ini', 'SYSTEM') SET ('preprocessor', 'affinity') = '1-7,17-23'

ALTER SYSTEM ALTER CONFIGURATION ('daemon.ini', 'SYSTEM') SET ('compileserver', 'affinity') = '1-7,17-23'

● To restrict the indexserver to all cores on socket 1 use the following affinity setting (derived in steps 1 and 2):

Example

ALTER SYSTEM ALTER CONFIGURATION ('daemon.ini', 'SYSTEM') SET ('indexserver', 'affinity') = '8-15,24-31'

● To set the affinity for two tenant databases called DB1 and DB2 respectively in a multitenant database container setup:

Example

ALTER SYSTEM ALTER CONFIGURATION ('daemon.ini', 'SYSTEM') SET ('indexserver.DB1', 'affinity') = '1-7,17-23';

ALTER SYSTEM ALTER CONFIGURATION ('daemon.ini', 'SYSTEM') SET ('indexserver.DB2', 'affinity') = '9-15,25-31';

Note

Even on a system with 32 logical cores and two sockets the assignment of logical cores to physical CPUs and sockets can be different. It is important to collect the assignment in advance.

In document SAP HANA Troubleshooting and Performance Analysis Guide En (Page 35-43)