Resource name
2.4 Finding a core dump
Lots of old core dumps over time can create a problem by filling up file systems.
Sometimes you will want to find the core dumps, examine them, and remove them to save space. This is particularly useful if you are administering a multiuser system; you will not know about all the core dumps created because they were most probably created by other users' applications.
To find the core file corresponding to an error log entry, use the corepath command located in the /usr/samples/findcore directory. This facility is part of the bos.sysmgt.serv_aid fileset.
When an application core dumps, a core file is placed in the current directory where the program runs. Unfortunately, we are not able to determine the path name of the dump at dump time. The corepath package uses error notification to detect that a core dump has happened. It then finds the core file, if possible, and then logs an operator message with the dump file's path name. This is
particularly useful for system administrators who want to look at the dumps, but were not the ones operating the system when the dump occurred.
The following is a list of commands used to find core dump files:
/usr/samples/findcore/corepath The shell script that finds the dump.
/usr/samples/findcore/getvfsname A binary program used by corepath to get the file system containing the dump.
To run corepath automatically, follow these steps:
1. You must be root to start it.
2. Put the error notification object into ODM. Put the error notification object at the end of this file into a separate file, then run odmadd <file> to add the object to the errnotify object class. Note that the file should begin with the line errnotify:.
To stop error notification from searching for core dumps, issue the following command:
# odmdelete -o errnotify -q 'en_name = corepath Error notification ODM Object
errnotify:
en_name = corepath en_persistenceflg = 1 en_label = CORE_DUMP
en_method = "/usr/samples/findcore/corepath $1 root"
Note: Currently we only search journaled file systems (JFS). There are a couple of lines to comment out in corepath if you want to search any type of file system. Be aware that this may cause unwanted network traffic and may take hours to find the dump.
AIX 5L Version 5.1 has changed the way it names the core file used for a core dump. In earlier AIX releases, a core file was always named core. If more than one application dumped or the same application dumped more than once, you always lost the earlier core file. Beginning with AIX 5L Version 5.1, each core file can be uniquely named, so no core file will be overwritten with a new one. This feature helps debugging and tracing application failures.
By default, a new core file is named core. To enable the new enhancement, set the CORE_NAMING environment variable to yes. After setting the
CORE_NAMING variable, the new core file names are of the format
core.pid.ddhhmmss, where pid is Process ID, dd is day of the month, hh is hours, mm is minutes, and ss is seconds.
In AIX 5L Version 5.1, a new feature of gathering core files is announced. This enhancement automates core collection processes and packages them into a single archive. This archive will have all the necessary information to
successfully analyze the core on any machine.
The snapcore command gathers a core file, program, and libraries used by the program and compresses the information into a pax file. The file can then be downloaded to disk or tape, or transmitted to a remote system. The information gathered with the snapcore command allows you to identify and resolve problems with an application.
To collect all the information you might need to debug and analyze the problem, you can use the snapcore command, as shown in the following steps:
1. Change to the directory where the core dump file is located:
# ls -l total 43
-rw-r--r-- 1 root staff 72 Aug 22 11:01 .profile -rw--- 1 root staff 446 Aug 22 18:01 .sh_history -rw-r--r-- 1 root system 20987 Aug 24 15:04
core.16928.24200405
2. Run the snapcore command to collect all needed files:
# snapcore -d /tmp/coredir core.16928.24200405 Core file "core.16928.24200405" created by "telnet"
pass1() in progress ....
Calculating space required .
Total space required is 8258 kbytes ..
Note: The expected value of the CORE_NAMING variable is yes. But any value except null has the same meaning as yes. In other words, any value will also work. In case the CORE_NAMING value is unset, the core file name will not be changed.
Checking for available space ...
Available space is 10292 kbytes pass1 complete.
pass2() in progress ....
Collecting fileset information .
Collecting error report of CORE_DUMP errors ..
Creating readme file ..
Creating archive file ...
Compressing archive file ....
pass2 completed.
Snapcore completed successfully. Archive created in /tmp/coredir.
The snapcore command will gather all information and create a new
compressed pax archive in the /tmp/coredir directory. If you do not specify a special directory using the -d flag, the archive will be stored in /tmp/snapcore directory. The new archive file will be named as snapcore_<$pid>.pax.Z.
# ls -l /tmp/coredir total 5720
-rw-r--r-- 1 root system 2925093 Aug 24 15:07 snapcore_3580.pax.Z
3. To check the content of the pax archive, use the following command:
# uncompress -c snapcore_3580.pax.Z | pax core.16928.24200405
README lslpp.out
If you want to extract the files from the archive, type the following commands:
# uncompress -c snapcore_3580.pax.Z | pax -r
# ls -l
The check_core utility is used by the snapcore command to gather all information about the core dump. This is a small C program and is located in the /usr/lib/ras directory.
Change to the directory where the core dump file is located and run the
check_core utility against the core dump file. You will receive a list containing the program that caused the core dump and the libraries used by it:
# /usr/lib/ras/check_core core.16928.24200405 /usr/lib/libc.a
/usr/lib/libcrypt.a
/usr/lib/libbsd.a /usr/lib/libbind.a /usr/lib/libi18n.a /usr/lib/libiconv.a /usr/lib/libcur.a /usr/lib/libauthm.a /usr/lib/libodm.a /usr/lib/libcfg.a /usr/lib/nls/loc/en_US telnet
From the information above, we know the name of the program that caused the dump. Also you will know it from the core dump error log. Actually, we run the telnet to the other system and run kill -11 pid_telnet (where 11 signal means SEGV: Segmentation Violation) command.