Shows the system status and messages.
p (alias: proc) [*/slot/symb/eaddr]
Displays the process table.
u (alias: user) [-?][slot/symb/eaddr]
Displays u_area.
th (alias: thread) [*/slot/symb/eaddr/-w ?]
Displays the thread table.
mst [slot] [[-a] symb/eaddr]
Displays the mstsave area for the specified thread.
f (alias: stack) [+x/-x][th {slot/eaddr}]
Displays all stack frames for specified thread.
h (alias: ?) [topic]
Lists all subcommands.
Provides information about subcommands of kdb.
errpt
Displays error log messages.
stat subcommand
The stat subcommand gives plenty of useful information about a dump, such as the dump code, the panic string, time of the crash, version and release of the operating system, name of the machine that crashed, and how long the machine had been running since the last crash or power off of the system. Example 4-15 shows an example of this subcommand.
Example 4-15 The stat subcommand (0)> stat
SYSTEM_CONFIGURATION:
CHRP_SMP_PCI POWER_PC POWER_604 machine with 1 cpu(s) (32-bit registers) SYSTEM STATUS:
sysname... AIX nodename.. aix5l release... 1 version... 5
machine... 004232044C00
nid... 4232044C
time of crash: Wed Jul 11 07:39:56 2001 age of system: 27 min., 36 sec.
xmalloc debug: disabled CRASH INFORMATION:
CPU 0 CSA 00891EB0 at time of crash, error code for LEDs: 30000000 pvthread+000100 STACK:
[01AC0644]ethchandd:ech_to+000188 (2FAE0000 [??]) [01AC0550]ethchandd:ech_to+000094 (??)
[000D7180]watchdog+0000C4 () [0001B880]sys_timer+0004E8 (??) [0001BD00]clock+000134 (??) [0001CE04]i_softmod+000264 ()
[00057A5C]flih_603_patch+0000CC (??, ??) [000579A0]flih_603_patch+000010 (??, ??) ____ Exception (2FF3B400) ____
iar : 00026BCC msr : 00009032 cr : 22002080 lr : 00026BF0 ctr : 00026B54 xer : 00000000 mq : 00000000
r0 : 00000000 r1 : 2FF3B328 r2 : 0052B810 r3 : 00000000 r4 : 01945000 r5 : 00000004 r6 : 01945800 r7 : 00000001 r8 : 00000000 r9 : 00000000 r10 : 00000000 r11 : 00000000 r12 : 00000000 r13 : DEADBEEF r14 : DEADBEEF r15 : DEADBEEF r16 : DEADBEEF r17 : DEADBEEF r18 : DEADBEEF r19 : DEADBEEF r20 : DEADBEEF r21 : DEADBEEF r22 : DEADBEEF r23 : DEADBEEF r24 : DEADBEEF r25 : DEADBEEF r26 : DEADBEEF r27 : 01945984 r28 : 00180170 r29 : 000034E0 r30 : 0015C48C r31 : 01945000
s0 : 00000000 s1 : 00004010 s2 : 00004C13 s3 : 00004411 s4 : 007FFFFF s5 : 007FFFFF s6 : 007FFFFF s7 : 007FFFFF s8 : 007FFFFF s9 : 007FFFFF s10 : 007FFFFF s11 : 007FFFFF s12 : 007FFFFF s13 : 007FFFFF s14 : 00001004 s15 : 007FFFFF
prev 00000000 kjmpbuf 00000000 stackfix 00000000 intpri 0B curid 00000204 sralloc F1EF0000 ioalloc 00000000 backt 00 flags 00 tid 00000000 excp_type 00000000
fpscr 00000000 fpeu 00 fpinfo 00 fpscrx 00000000 o_iar 00000000 o_toc 00000000 o_arg1 00000000
The stat subcommand should always be the first command run when examining a system crash.
f subcommand
The f subcommand gives you a kernel stack traceback.
This subcommand gives you information on what was happening in the kernel when the crash occurred. The f subcommand gives you a history of function calls and what interrupt processing was going on in the system. If the crash occurred while interrupt processing was going on, this is the command to use. This command traces the linked list of mstsave areas (Figure 4-7). The mstsave areas basically contain a history of what interrupt processing was going on in the system.
Figure 4-7 Machine state save area
The machine state save area, or MST, contains a saved image of the machine’s process context. The process context includes the general purpose and floating point registers, the special purpose registers, and other information necessary to restart a thread when it is dispatched. Example 4-16 on page 98 has an example of a stack trace back.
MST MST
MST MST
uthread previous
previous previous
System Dump MST
High Priority Interrupt
Low Priority Interrupt
Base Interrupt Level
Example 4-16 stack trace back (0)> f
pvthread+000100 STACK:
[01AC0644]ethchandd:ech_to+000188 (2FAE0000 [??]) [01AC0550]ethchandd:ech_to+000094 (??)
[000D7180]watchdog+0000C4 () [0001B880]sys_timer+0004E8 (??) [0001BD00]clock+000134 (??) [0001CE04]i_softmod+000264 ()
[00057A5C]flih_603_patch+0000CC (??, ??) [000579A0]flih_603_patch+000010 (??, ??) ____ Exception (2FF3B400) ____
[00026BCC]waitproc+000078 ()
In this example, there are two levels of stack traceback.
When looking at a stack traceback, realize that the first thing on the stack was the most recently running function, which was called by the function below it, which was called by the function below it, and so on. So, in the case of the middle stack traceback in our example, we see that sys_timer called watchdog, which called ech_to in ethchandd module and an exception occurred in ech_to. You would have to look at the code for this to try to find out the cause of this exception.
Anyway, you can be sure that the ethchandd did something wrong.
Make sure the failing module is at the latest version. Problems are frequently resolved in later versions of software. You can use the lke subcommand in kdb and the lslpp -w command to find the fileset that contains the specific module.
Refer to “Finding addresses in kernel extensions” on page 101 for more information. You can get the latest fileset information from the Internet at:
http://techsupport.service.ibm.com/server/support
Use the lke subcommand with an argument of the address listed in the stack trace back. The address is displayed in brackets after the name of the module.
Example 4-17 has an example of list loaded extensions.
Example 4-17 List loaded extensions (0)> lke 01AC0644
ADDRESS FILE FILESIZE FLAGS MODULE NAME
1 0567DC00 01ABFB00 00008A7C 00000262 ethchandd32/usr/lib/drivers/ethchandd le_flags... TEXT DATAINTEXT DATA DATAEXISTS
le_next... 058FE580 le_fp... 00000000 le_filename.... 0567DC58 le_file... 01ABFB00 le_filesize.... 00008A7C le_data... 01AC3D20 le_tid... 00000000 le_datasize.... 0000485C le_usecount.... 00000001 le_loadcount... 00000001
le_ndepend... 00000001 le_maxdepend... 00000001
One of the fields listed by the lke subcommand is the Name of the module. You can then use the lslpp -w command to determine the fileset that contains the module. For example:
The proc subcommand displays entries in the process table. The process table is made up of entries of type struct proc, one per active process. Entries in the process table are pinned so that they are always resident in physical memory.
The process table contains information needed when the process has been swapped out in order to get it running again at some point in the future.
Example 4-18 shows the displayed process table.
Example 4-18 Display process table (0)> p
SLOT NAME STATE PID PPID PGRP UID ADSPACE CL #THS pvproc+000400 2*wait ACTIVE 00204 00000 00000 00000 00004C13 0 0001 NAME... wait
STATE... stat :07 .... xstat :0000
FLAGS... flag :00000303 LOAD NOSWAP FIXPRI KPROC ... flag2 :00000002 WAITPROC
... atomic :00000000 LINKS... child :00000000
... siblings :E2000200 <pvproc+000200>
... uidinfo :0019A588