/****************************************************************************/ /* Document : UNIX command examples, mainly based on Solaris, AIX, HP */ /* and ofcourse, also Linux. */ /* Doc. Version : 102 */ /* File : unix.txt */ /* Purpose : some usefull examples for the Oracle, DB2, SQLServer DBA */ /* Date : 10-03-2008 */ /* Compiled by : Albert van der Sel */ /* Best use : Use find/search in your editor to find a string, command, */ /* or any identifier */ /****************************************************************************/
##################################### SECTION 1. COMMANDS AND ARCHITECTURE: #####################################
========================== 1. HOW TO GET SYSTEM INFO: ==========================
1.1 Short version: ==================
See section 1.2 for more detailed commands and options. Memory:
---AIX: bootinfo -r lsattr -E -lmem0
/usr/sbin/lsattr -E -l sys0 -a realmem
or use a tool as "topas" or "nmon" (these are utilities) Linux: cat /proc/meminfo
/usr/sbin/dmesg | grep "Physical" free (the free command)
HP: /usr/sam/lbin/getmem
grep MemTotal /proc/meminfo /etc/dmesg | grep -i phys wc -c /dev/mem
or us a tool as "glance", like entering "glance -m" from prompt (is a utility)
Solaris: /usr/sbin/prtconf | grep "Memory size"
Tru64: /bin/vmstat -P | grep "Total Physical Memory" Swap: ---AIX: /usr/sbin/lsps -a HP: /usr/sbin/swapinfo -a Solaris: /usr/sbin/swap -l Linux: /sbin/swapon -s
cat /proc/swaps cat /proc/meminfo OS version:
---HP: uname -a
Linux: cat /proc/version Solaris: uname -a
Tru64: /usr/sbin/sizer -v AIX: oslevel -r
lslpp -h bos.rte AIX firmware:
lsmcode -c display the system firmware level and service processor lsmcode -r -d scraid0 display the adapter microcode levels for a RAID adapter scraid0
lsmcode -A display the microcode level for all supported devices prtconf shows many setting including memory, firmware, serial# etc.. cpu: ----HP: ioscan -kfnC processor getconf CPU_VERSION getconf CPU_CHIP_TYPE model
AIX: prtconf | grep proc pmcycles -m
lsattr -El procx (x is 0,2, etc..) lscfg | grep proc
Linux: cat /proc/cpuinfo Solaris: psrinfo -v
prtconf
Notes about lpars:
For AIX: The uname -L command identifies a partition on a system with multiple LPARS. The LPAR id
can be useful for writing shell scripts that customize system settings such as IP address or hostname.
The output of the command looks like: # uname -L
1 lpar01
The output of uname -L varies by maintenance level. For consistent output across maintenance levels,
add a -s flag. For illustrate, the following command assigns the partition number to the variable
"lpar_number" and partiton name to "lpar_name". For HP-UX:
Use commands like "parstatus" or "getconf PARTITION_IDENT" to get npar information.
patches:
---AIX: Is a certain fix (APAR) installed? instfix -ik APAR_number
instfix -a -ivk APAR_number
To determine your platform firmware level, at the command prompt, type: lscfg -vp | grep -p Platform
The last six digits of the ROM level represent the platform firmware date in the format, YYMMDD.
HP: /usr/sbin/swlist -l patch swlist | grep patch
Linux: rpm -qa Solaris: showrev -p
pkginfo -i package_name
Tru64: /usr/sbin/dupatch -track -type kit
Netcards:
---AIX: lsdev -Cc adapter
lsdev -Cc adapter | grep ent lsdev -Cc if
lsattr -E -l ent1 ifconfig -a
Solaris: prtconf -D / prtconf -pv / prtconf | grep "card" prtdiag | grep "card"
svcs -x
ifconfig -a (up plumb)
1.2 More Detail: ================
1.2.1 Show memory in Solaris: ============================= prtconf:
---Use this command to obtain detailed system information about your Sun Solaris installation
# prtconf -v
Displays the size of the system memory and reports information about peripheral devices
Use this command to see the amount of memory: # /usr/sbin/prtconf | grep "Mem"
sysdef -i reports on several system resource limits. Other parameters can be checked on a running system
using adb -k :
# adb -k /dev/ksyms /dev/mem parameter-name/D
^D (to exit)
1.2.2 Show memory in AIX: ========================= >> Show Total memory: ---=====---# bootinfo -r
# lsattr -El sys0 -a realmem
# prtconf (you can grep it on memory)
>> Show Details of memory:
---You can have a more detailed and comprehensive look at AIX memory by using "vmstat -v" and "vmo -L" or "vmo -a":
For example: # vmstat -v 524288 memory pages 493252 lruable pages 67384 free pages 7 memory pools 131820 pinned pages 80.0 maxpin percentage 20.0 minperm percentage 80.0 maxperm percentage 25.4 numperm percentage 125727 file pages 0.0 compressed percentage 0 compressed pages 25.4 numclient percentage 80.0 maxclient percentage 125575 client pages
0 remote pageouts scheduled
14557 pending disk I/Os blocked with no pbuf 6526890 paging space I/Os blocked with no psbuf 18631 filesystem I/Os blocked with no fsbuf
0 client filesystem I/Os blocked with no fsbuf
0 Virtualized Partition Memory Page Faults
0.00 Time resolving virtualized partition memory page faults
The vmo command really gives lots of output. In the following example only a small fraction of the output is shown:
# vmo -L .. lrubucket 128K 128K 128K 64K 4KB pages D ---maxclient% 80 80 80 1 100 % memory D maxperm% minperm% ---maxfree 1088 1088 1088 8 200K 4KB pages D minfree memory_frames ---maxperm 394596 394596 S ---maxperm% 80 80 80 1 100 % memory D minperm% maxclient% ---maxpin 424179 424179 S .. ..
>> To further look at your virtual memory and its causes, you can use a combination of:
--
# ipcs -bm (shared memory) # lsps -a (paging)
# vmo -a or vmo -L (virtual memory options) # svmon -G (basic memory allocations) # svmon -U (virtual memory usage by user)
To print out the memory usage statistics for the users root and steve taking into account only working segments, type:
svmon -U root steve -w
To print out the top 10 users of the paging space, type: svmon -U -g -t 10
To print out the memory usage statistics for the user steve, including the list of the process identifiers, type:
svmon -U steve -l svmon -U emcdm -l
Note: sysdumpdev -e
Although the sysdumpdev command is used to show or alter the dumpdevice for a system dump,
you can also use it to show how much real memory is used. The command
# sysdumpdev -e
provides an estimated dump size taking into account the current memory (not pagingspace) currently
in use by the system. Note: the rmss command:
The rmss (Reduced-Memory System Simulator) command is used to ascertain the effects of reducing the amount
of available memory on a system without the need to physically remove memory from the system. It is useful
for system sizing, as you can install more memory than is required and then use rmss to reduce it.
Using other performance tools, the effects of the reduced memory can be monitored. The rmss command has
the ability to run a command multiple times using different simulated memory sizes and produce statistics
for all of those memory sizes.
The rmss command resides in /usr/bin and is part of the bos.perf.tools fileset, which is installable
from the AIX base installation media. Syntax rmss -p -c <MB> -r
Options
-p Print the current value
-c MB Change to M size (in Mbytes) -r Restore all memory to use -p Print the current value
Example: find out how much memory you have online rmss -p
Example: Change available memory to 256 Mbytes rmss -c 256
Example: Undo the above rmss -r
Warning:
rmss can damage performance very seriously Don't go below 25% of the machines memory Never forget to finish with rmss -r
1.2.3 Show memory in Linux: ===========================
# /usr/sbin/dmesg | grep "Physical:" # cat /proc/meminfo
The ipcs, vmstat, iostat and that type of commands, are ofcourse more or less the same
in Linux as they are in Solaris or AIX.
1.2.4 Show aioservers in AIX: ============================= # lsattr -El aio0
autoconfig available STATE to be configured at system restart True fastpath enable State of fast path True kprocprio 39 Server PRIORITY True maxreqs 4096 Maximum number of REQUESTS True maxservers 10 MAXIMUM number of servers per cpu True minservers 1 MINIMUM number of servers True # pstat -a | grep -c aios
20 # ps -k | grep aioserver 331962 - 0:15 aioserver 352478 - 0:14 aioserver 450644 - 0:12 aioserver 454908 - 0:10 aioserver 565292 - 0:11 aioserver 569378 - 0:10 aioserver 581660 - 0:11 aioserver 585758 - 0:17 aioserver 589856 - 0:12 aioserver 593954 - 0:15 aioserver 598052 - 0:17 aioserver 602150 - 0:12 aioserver 606248 - 0:13 aioserver 827642 - 0:14 aioserver 991288 - 0:14 aioserver 995388 - 0:11 aioserver 1007616 - 0:12 aioserver 1011766 - 0:13 aioserver 1028096 - 0:13 aioserver 1032212 - 0:13 aioserver What are aioservers in AIX5?:
With IO on filesystems, for example if a database is involved, you may try to tune the number
of aioservers (asynchronous IO)
AIX 5L supports asynchronous I/O (AIO) for database files created both on file system partitions and on raw devices.
AIO on raw devices is implemented fully into the AIX kernel, and does not require database processes
to service the AIO requests. When using AIO on file systems, the kernel database processes (aioserver)
control each request from the time a request is taken off the queue until it completes. The kernel database
FastPath disabled. By default,
FastPath is enabled. The number of aioserver servers determines the number of AIO requests that can be executed
in the system concurrently, so it is important to tune the number of aioserver processes when using file systems
to store Oracle Database data files.
- Use one of the following commands to set the number of servers. This applies only when using asynchronous I/O
on file systems rather than raw devices: # smit aio
# chdev -P -l aio0 -a maxservers='128' -a minservers='20' - To set asynchronous IO to Available :� �
# chdev -l aio0 -P -a autoconfig=available You need to restart the Server:
# shutdown -Fr
1.2.5 aio on Linux distro's: ============================
On some Linux distro's, Oracle 9i/10g supports asynchronous I/O but it is disabled by default because
some Linux distributions do not have libaio by default. For Solaris, the following configuration is not required
- skip down to the section on enabling asynchronous I/O.
On Linux, the Oracle binary needs to be relinked to enable asynchronous I/O. The first thing to do is shutdown
the Oracle server. After Oracle has shutdown, do the following steps to relink the binary:
su - oracle
cd $ORACLE_HOME/rdbms/lib make -f ins_rdbms.mk async_on make -f ins_rdbms.mk ioracle
1.2.6 The ipcs and ipcrm commands: ==================================
The "ipcs" command is really a "listing" command. But if you need to intervene in memory structures, like for example if you need to "clear" or remove a shared memory segment,
because a faulty or crashed
application left semaphores, memory identifiers, or queues in place, you can use to "ipcrm" command to remove those structures.
Example ipcrm command usage:
---Suppose an application crashed, but it cannot be started again. The following might help,
if you happened to know which IPC identifier it used.
Suppose the app used 47500 as the IPC key. Calcultate this decimal number to hex which is, in this example, B98C.
No do the following: # ipcs -bm | grep B89C
This might give you, for example, the shared memory identifier "50855977". Now clear the segment:
# ipcrm -m 50855977
It might also be, that still a semaphore and/or queue is still "left over". In that case you might also try commands like the following example:
ipcs -q ipcs -s
# ipcrm -s 2228248 (remove semaphore) # ipcrm -q 5111883 (remove queue)
Note: in some cases the "slibclean" command can be used to clear unused modules in kernel and library memory.
Just give as root the command: # slibclean
Other Example:
---If you run the following command to remove a shared memory segment and you get this error:
# ipcrm -m 65537
ipcrm: 0515-020 shmid(65537) was not found.
However, if you run the ipcs command, you still see the segment there: # ipcs | grep 65537
m 65537 0x00000000 DCrw--- root system
If you look carefully, you will notice the "D" in the forth column. The "D" means: D If the associated shared memory segment has been removed. It disappears when the last process attached
to the segment detaches it.
So, to clear the shared memory segment, find the process which is still associated with the segment:
# ps -ef | grep process_owner
where process_owner is the name of the owner using the shared segment Now kill the process found from the ps command above
# kill -9 pid
Running another ipcs command will show the shared memory segment no longer exists: # ipcs | grep 65537
Example
ipcrm -m 65537
1.2.7 Show patches, version, systeminfo: ======================================== Solaris: ======== showrev: ---#showrev
Displays system summary information. #showrev -p
Reports which patches are installed sysdef and dmesg:
---The follwing commands also displays configuration information # sysdef
# dmesg
versions:
---==> To check your Solaris version: # uname -a or uname -m
# cat /etc/release # isainfo -v
==> To check your AIX version: # oslevel
# oslevel -r tells you which maintenance level you have. >> To find the known recommended maintenance levels:
# oslevel -rq
>> To find all filesets lower than a certain maintenance level: # oslevel -rl 5200-06
>> To find all filesets higher than a certain maintenance level: # oslevel -rg 5200-05
type:
# oslevel -q -s Known Service Packs ---5300-05-04 5300-05-03 5300-05-02 5300-05-01 5300-05-00 5300-04-CSP 5300-04-03 5300-04-02 5300-04-01 5300-03-CSP
>> How can I determine which fileset updates are missing from a particular AIX level?
To determine which fileset updates are missing from 5300-04, for example, run the following command:
# oslevel -rl 5300-04
>> What SP (Service Pack) is installed on my system?
To see which SP is currently installed on the system, run the oslevel -s command. Sample output for an
AIX 5L Version 5.3 system, with TL4, and SP2 installed would be: # oslevel s�
5300-04-02
>> Is a CSP (Concluding Service Pack) installed on my system?
To see if a CSP is currently installed on the system, run the oslevel -s command. Sample output for an AIX 5L Version 5.3 system, with TL3, and CSP installed would be:
# oslevel s� 5300-03-CSP
==> To check your HP machine: # model
9000/800/rp7410
: machine info on AIX
How do I find out the Chip type, System name, Node name, Model Number etc.?
The uname command provides details about your system. uname -p Displays the chip type of the system.
For example, powerpc.
uname -r Displays the release number of the operating system. uname -s Displays the system name. For example, AIX.
uname -n Displays the name of the node.
uname -a Displays the system name, nodename,Version, Machine id. uname -M Displays the system model name. For example, IBM, 7046-B50. uname -v Displays the operating system version
uname -m Displays the machine ID number of the hardware running the system. uname -u Displays the system ID number.
Architecture:
---To see if you have a CHRP machine, log into the machine as the root user, and run the following command:
# lscfg | grep Architecture or use: # lscfg -pl sysplanar0 | more
The bootinfo -p command also shows the architecture of the pSeries, RS/6000 # bootinfo -p
chrp
1.2.8 Check whether you have a 32 bit or 64 bit version: ======================================================== - Solaris:
# iasinfo -vk
If /usr/bin/isainfo cannot be found, then the OS only supports 32-bit process address spaces. (Solaris 7 was the first version that could run 64-bit binaries on certain SPARC-based systems.)
So a ksh-based test might look something like if [ -x /usr/bin/isainfo ]; then bits=`/usr/bin/isainfo -b` else bits=32 fi - AIX:
Command: /bin/lslpp -l bos.64bit ...to see if bos.64bit is installed & committed.
-or- /bin/locale64 ...error message if on 32bit machine such as:
Could not load program /bin/locale64:
Cannot run a 64-bit program on a 32-bit machine.
Or use:
# bootinfo -K displays the current kernel wordsize of "32" or "64" # bootinfo -y tells if hardware is 64-bit capable
# bootinfo -p If it returns the string 32 it is only capable of running the
32-bit kernel. If it returns the string chrp the machine is capable of running the 64-bit kernel or the 32-bit kernel. Or use:
# /usr/bin/getconf HARDWARE_BITMODE
This command should return the following output: 64
Note:
HOW TO CHANGE KERNEL MODE OF IBM AIX 5L (5.1)
The AIX 5L has pre-configured kernels. These are listed below for Power processors:
/usr/lib/boot/unix_up 32 bit uni-processor
/usr/lib/boot/unix_mp 32 bit multi-processor kernel /usr/lib/boot/unix_64 64 bit multi-processor kernel
Switching between kernel modes means using different kernels. This is simply done by pointing the location that is referenced by the system to these kernels. Use symbolic links for this purpose. During boot AIX system runs the kernel in the following locations:
/unix
/usr/lib/boot/unix
The base operating system 64-bit runtime fileset is bos.64bit. Installing bos.64bit also installs
the /etc/methods/cfg64 file. The /etc/methods/cfg64 file provides the option of enabling or disabling
the 64-bit environment via SMIT, which updates the /etc/inittab file with the load64bit line.
(Simply adding the load64bit line does not enable the 64-bit environment). The command lslpp -l bos.64bit reveals if this fileset is installed. The bos.64bit fileset
is on the AIX media; however, installing the bos.64bit fileset does not ensure that you will be able
to run 64-bit software. If the bos.64bit fileset is installed on 32-bit hardware, you should be able
to compile 64-bit software, but you cannot run 64-bit programs on 32-bit hardware.
The syscalls64 extension must be loaded in order to run a 64-bit executable. This is done from
the load64bit entry in the inittab file. You must load the syscalls64 extension even when running
a 64-bit kernel on 64-bit hardware.
To determine if the 64-bit kernel extension is loaded, at the command line, enter genkex |grep 64.
Information similar to the following displays: 149bf58 a3ec /usr/lib/drivers/syscalls64.ext
To change the kernel mode follow steps below:
1. Create symbolic link from /unix and /usr/lib/boot/unix to the location of the desired kernel.
2. Create boot image. 3. Reboot AIX.
Below lists the detailed actions to change kernel mode: To change to 32 bit uni-processor mode:
# ln -sf /usr/lib/boot/unix_up /unix
# ln -sf /usr/lib/boot/unix_up /usr/lib/boot/unix # bosboot -ad /dev/ipldevice
# shutdown -r
To change to 32 bit multi-processor mode:
# ln -sf /usr/lib/boot/unix_mp /unix
# ln -sf /usr/lib/boot/unix_mp /usr/lib/boot/unix # bosboot -ad /dev/ipldevice
# shutdown -r
To change to 64 bit multi-processor mode: # ln -sf /usr/lib/boot/unix_64 /unix
# ln -sf /usr/lib/boot/unix_64 /usr/lib/boot/unix # bosboot -ad /dev/ipldevice
# shutdown -r
IMPORTANT NOTE: If you are changing the kernel mode to 32-bit and you will run 9.2 on this server, the following line should be included in /etc/inittab: load64bit:2:wait:/etc/methods/cfg64 >/dev/console 2>&1 # Enable 64-bit execs This allows 64-bit applications to run on the 32-bit kernel. Note that this line is also mandatory if you are using the 64-bit kernel.
In AIX 5.2, the 32-bit kernel is installed by default. The 64-bit kernel, along with JFS2
(enhanced journaled file system), can be enabled at installation time.
Checking if other unixes are in 32 or 64 mode:
--- Digital UNIX/Tru64: This OS is only available in 64bit form. - HP-UX(Available in 64bit starting with HP-UX 11.0):
Command: /bin/getconf KERNEL_BITS ...returns either 32 or 64 - SGI: This OS is only available in 64bit form.
- The remaining supported UNIX platforms are only available in 32bit form.
scinstall:
---# scinstall -pv
Displays Sun Cluster software release and package version information
1.2.9 Info about CPUs: ====================== Solaris:
---# psrinfo -v
Shows the number of processors and their status. # psrinfo -v|grep "Status of processor"|wc -l Shows number of cpu's
Linux:
---# cat /proc/cpuinfo
# cat /proc/cpuinfo | grep processor|wc l�
Especially with Linux, the /proc directory contains special "files" that either extract information from
or send information to the kernel HP-UX:
---# ioscan -kfnC processor
# /usr/sbin/ioscan -kf | grep processor # grep processor /var/adm/syslog/syslog.log # /usr/contrib/bin/machinfo (Itanium) Several ways as,
1. sam -> performance monitor -> processor 2. print_manifest (if ignite-ux installed) 3. machinfo (11.23 HP versions)
4. ioscan -fnC processor
5. echo "processor_count/D" | adb /stand/vmunix /dev/kmem 6. top command to get cpu count
The "getconf" command can give you a lot of interesting info. The parameters are: ARG_MAX _BC_BASE_MAX BC_DIM_MAX
BS_SCALE_MAX BC_STRING_MAX CHARCLASS_NAME_MAX CHAR_BIT CHAR_MAX CHAR_MIN
CHILD_MAX CLK_TCK COLL_WEIGHTS_MAX CPU_CHIP_TYPE CS_MACHINE_IDENT CS_PARTITION_IDENT CS_PATH CS_MACHINE_SERIAL EXPR_NEST_MAX HW_CPU_SUPP_BITS HW_32_64_CAPABLE INT_MAX
INT_MIN KERNEL_BITS LINE_MAX LONG_BIT LONG_MAX LONG_MIN
MACHINE_IDENT MACHINE_MODEL MACHINE_SERIAL MB_LEN_MAX NGROUPS_MAX NL_ARGMAX NL_LANGMAX NL_MSGMAX NL_NMAX NL_SETMAX NL_TEXTMAX NZERO OPEN_MAX PARTITION_IDENT PATH
_POSIX_ARG_MAX _POSIX_JOB_CONTROL _POSIX_NGROUPS_MAX _POSIX_OPEN_MAX _POSIX_SAVED_IDS _POSIX_SSIZE_MAX _POSIX_STREAM_MAX _POSIX_TZNAME_MAX _POSIX_VERSION POSIX_ARG_MAX POSIX_CHILD_MAX POSIX_JOB_CONTROL POSIX_LINK_MAX POSIX_MAX_CANON POSIX_MAX_INPUT POSIX_NAME_MAX POSIX_NGROUPS_MAX POSIX_OPEN_MAX POSIX_PATH_MAX POSIX_PIPE_BUF POSIX_SAVED_IDS POSIX_SSIZE_MAX POSIX_STREAM_MAX POSIX_TZNAME_MAX POSIX_VERSION POSIX2_BC_BASE_MAX POSIX2_BC_DIM_MAX POSIX2_BC_SCALE_MAX POSIX2_BC_STRING_MAX POSIX2_C_BIND POSIX2_C_DEV POSIX2_C_VERSION POSIX2_CHAR_TERM POSIX_CHILD_MAX POSIX2_COLL_WEIGHTS_MAX POSIX2_EXPR_NEST_MAX POSIX2_FORT_DEV POSIX2_FORT_RUN POSIX2_LINE_MAX POSIX2_LOCALEDEF POSIX2_RE_DUP_MAX POSIX2_SW_DEV POSIX2_UPE POSIX2_VERSION SC_PASS_MAX SC_XOPEN_VERSION SCHAR_MAX SCHAR_MIN SHRT_MAX SHRT_MIN SSIZE_MAX Example:
# getconf CPU_VERSION
sample function in shell script: get_cpu_version()
{
case `getconf CPU_VERSION` in # ???) echo "Itanium[TM] 2" ;; 768) echo "Itanium[TM] 1" ;; 532) echo "PA-RISC 2.0" ;; 529) echo "PA-RISC 1.2" ;; 528) echo "PA-RISC 1.1" ;; 523) echo "PA-RISC 1.0" ;; *) return 1 ;; esac return 0 AIX: ----# pmcycles -m Cpu 0 runs at 1656 MHz Cpu 1 runs at 1656 MHz Cpu 2 runs at 1656 MHz Cpu 3 runs at 1656 MHz
# lscfg | grep proc
More cpu information on AIX:
# lsattr -El procx (where x is the number of the cpu) type powerPC_POWER5 Processor type False
frequency 165600000 Processor speed False ..
..
where False means that the value cannot be changed through an AIX command. To view CPU scheduler tunable parameters, use the schedo command:
# schedo -a
In AIX 5L on Power5, you can switch from Simultaneous Multithreading SMT, or Single Threading ST, as follows
(smtcl)
# smtctl -m off will set SMT mode to disabled # smtctl -m on will set SMT mode to enabled # smtctl -W boot makes SMT effective on next boot
# smtctl -W now effects SMT now, but will not persist across reboots
When you want to keep the setting across reboots, you must use the bosboot command in order to create a new boot image.
1.2.10 Other stuff: =================== runlevel:
---To show the init runlevel: # who -r
Top users:
---To get a quick impression about the top 10 users in the system at this time: ps auxw | sort r +3 |head 10 -Shows top 10 memory usage by process� � ps auxw | sort r +2 |head 10 -Shows top 10 CPU usage by process� �
shared memory:
---To check shared memory segment, semaphore array, and message queue limits, issue the ipcs -l command.
# ipcs
The following tools are available for monitoring the performance of your UNIX-based system.
pfiles:
---/usr/proc/bin/pfiles
are having problems
caused by files not getting closed. lsof:
---This utility lists open files for running UNIX processes, like pfiles. However, lsof gives more
useful information than pfiles. You can find lsof at ftp://vic.cc.purdue.edu/pub/tools/unix/lsof/.
Example of lsof usage:
You can see CIO (concurrent IO) in the FILE-FLAG column if you run lsof +fg, e.g.: tarunx01:/home/abielewi:# /p570build/LSOF/lsof-4.76/usr/local/bin/lsof +fg
/baanprd/oradat
COMMAND PID USER FD TYPE FILE-FLAG DEVICE SIZE/OFF NODE NAME
oracle 434222 oracle 16u VREG R,W,CIO,DSYN,LG;CX 39,1 6701056 866 /baanprd/oradat (/dev/bprdoradat)
oracle 434222 oracle 17u VREG R,W,CIO,DSYN,LG;CX 39,1 6701056 867 /baanprd/oradat (/dev/bprdoradat)
oracle 442384 oracle 15u VREG R,W,CIO,DSYN,LG;CX 39,1 1174413312 875 /baanprd/oradat (/dev/bprdoradat)
oracle 442384 oracle 16u VREG R,W,CIO,DSYN,LG;CX 39,1 734011392 877 /baanprd/oradat (/dev/bprdoradat)
oracle 450814 oracle 15u VREG R,W,CIO,DSYN,LG;CX 39,1 1174413312 875 /baanprd/oradat (/dev/bprdoradat)
oracle 450814 oracle 16u VREG R,W,CIO,DSYN,LG;CX 39,1 1814044672 876 /baanprd/oradat (/dev/bprdoradat)
oracle 487666 oracle 15u VREG R,W,CIO,DSYN,LG;CX 39,1 1174413312 875 /baanprd/oradat (/dev/bprdoradat
You should also see O_CIO in your file open calls if you run truss, e.g.: open("/opt/oracle/rcat/oradat/redo01.log", O_RDWR|O_CIO|O_DSYNC|O_LARGEFILE) = 18 VMSTAT SOLARIS: ---# vmstat
This command is ideal for monitoring paging rate, which can be found under the page in (pi) and page out (po) columns.
Other important columns are the amount of allocated virtual storage (avm) and free virtual storage (fre).
This command is useful for determining if something is suspended or just taking a long time.
Example:
kthr memory page disk faults cpu r b w swap free re mf pi po fr de sr m0 m1 m3 m4 in sy cs us sy id
0 0 0 2163152 1716720 157 141 1179 1 1 0 0 0 0 0 0 680 1737 855 10 3 87 0 0 0 2119080 1729352 0 1 0 0 0 0 0 0 0 1 0 345 658 346 1 1 98 0 0 0 2118960 1729232 0 167 0 0 0 0 0 0 0 0 0 402 1710 812 4 2 94 0 0 0 2112992 1723264 0 1261 0 0 0 0 0 0 0 0 0 1026 5253 1848 10 5 85 0 0 0 2112088 1722352 0 248 0 0 0 0 0 0 0 0 0 505 2822 1177 5 2 92 0 0 0 2116288 1726544 4 80 0 0 0 0 0 0 0 0 0 817 4015 1530 6 4 90 0 0 0 2117744 1727960 4 2 30 0 0 0 0 0 0 0 0 473 1421 640 2 2 97
procs/r: Run queue length.
procs/b: Processes blocked while waiting for I/O. procs/w: Idle processes which have been swapped. memory/swap: Free, unreserved swap space (Kb).
memory/free: Free memory (Kb). (Note that this will grow until it reaches lotsfree, at which point
the page scanner is started. See "Paging" for more details.)
page/re: Pages reclaimed from the free list. (If a page on the free list still contains data needed
for a new request, it can be remapped.)
page/mf: Minor faults (page in memory, but not mapped). (If the page is still in memory, a minor fault
remaps the page. It is comparable to the vflts value reported by sar -p.) page/pi: Paged in from swap (Kb/s). (When a page is brought back from the swap device, the process
will stop execution and wait. This may affect performance.)
page/po: Paged out to swap (Kb/s). (The page has been written and freed. This can be the result of
activity by the pageout scanner, a file close, or fsflush.)
page/fr: Freed or destroyed (Kb/s). (This column reports the activity of the page scanner.)
page/de: Freed after writes (Kb/s). (These pages have been freed due to a pageout.)
page/sr: Scan rate (pages). Note that this number is not reported as a "rate," but as a total number of pages scanned.
disk/s#: Disk activity for disk # (I/O's per second). faults/in: Interrupts (per second).
faults/sy: System calls (per second). faults/cs: Context switches (per second). cpu/us: User CPU time (%).
cpu/sy: Kernel CPU time (%).
cpu/id: Idle + I/O wait CPU time (%).
When analyzing vmstat output, there are several metrics to which you should pay attention. For example,
keep an eye on the CPU run queue column. The run queue should never exceed the number of CPUs on the server.
If you do notice the run queue exceeding the amount of CPUs, it s a good� indication that your server
has a CPU bottleneck.
To get an idea of the RAM usage on your server, watch the page in (pi) and page out (po) columns
of vmstat s output. By tracking common virtual memory operations such as page� outs, you can infer
the times that the Oracle database is performing a lot of work. Even though UNIX page ins must correlate
with the vmstat s refresh rate to accurately predict RAM swapping, plotting page� ins can tell you
when the server is having spikes of RAM usage.
Once captured, it's very easy to take the information about server performance directly from the
Oracle tables and plot them in a trend graph. Rather than using an expensive statistical package
such as SAS, you can use Microsoft Excel. Copy and paste the data from the tables into Excel.
After that, you can use the Chart Wizard to create a line chart that will help you view server
usage information and discover trends.
# VMSTAT AIX:
---This is virtually equal to the usage of vmstat under solaris.
vmstat can be used to give multiple statistics on the system. For CPU-specific work, try the following command:
# vmstat -t 1 3
This will take 3 samples, 1 second apart, with timestamps (-t). You can, of course, change the parameters
as you like. The output is shown below.
kthr memory page faults cpu time --- --- ---- --- r b avm fre re pi po fr sr cy in sy cs us sy id wa hr mi se 0 0 45483 221 0 0 0 0 1 0 224 326 362 24 7 69 0 15:10:22 0 0 45483 220 0 0 0 0 0 0 159 83 53 1 1 98 0 15:10:23 2 0 45483 220 0 0 0 0 0 0 145 115 46 0 9 90 1 15:10:24
In this output some of the things to watch for are: "avm", which is Active Virtual Memory.
Ideally, under normal conditions, the largest avm value should in general be smaller than the amount of RAM.
If avm is smaller than RAM, and still exessive paging occurs, that could be due to RAM being filled
with file pages.
avm x 4K = number of bytes
Columns r (run queue) and b (blocked) start going up, especially above 10. This usually is an indication
that you have too many processes competing for CPU.
If cs (contact switches) go very high compared to the number of processes, then you may need to tune
the system with vmtune.
In the cpu section, us (user time) indicates the time is being spent in programs. Assuming Java is
In the cpu section, if sys (system time) is higher than expected, and you still have id (idle) time left,
this may indicate lock contention. Check the tprof for lock related calls in the kernel time. You may want
to try multiple instances of the JVM. It may also be possible to find deadlocks in a javacore file.
In the cpu section, if wa (I/O wait) is high, this may indicate a disk bottleneck, and you should use
iostat and other tools to look at the disk usage.
Values in the pi, po (page in/out) columns are non-zero may indicate that you are paging and need more memory.
It may be possible that you have the stack size set too high for some of your JVM instances.
It could also mean that you have allocated a heap larger than the amount of memory on the system. Of course,
you may also have other applications using memory, or that file pages may be taking up too much of the memory
Other example: ---# vmstat 1
System configuration: lcpu=2 mem=3920MB
kthr memory page faults cpu --- -- - ---r b avm f---re ---re pi po f---r s---r cy in sy cs us sy id wa 0 0 229367 332745 0 0 0 0 0 0 3 198 69 0 0 99 0 0 0 229367 332745 0 0 0 0 0 0 3 33 66 0 0 99 0 0 0 229367 332745 0 0 0 0 0 0 2 33 68 0 0 99 0 0 0 229367 332745 0 0 0 0 0 0 80 306 100 0 1 97 1 0 0 229367 332745 0 0 0 0 0 0 1 20 68 0 0 99 0 0 0 229367 332745 0 0 0 0 0 0 2 36 64 0 0 99 0 0 0 229367 332745 0 0 0 0 0 0 2 33 66 0 0 99 0 0 0 229367 332745 0 0 0 0 0 0 2 21 66 0 0 99 0 0 0 229367 332745 0 0 0 0 0 0 1 237 64 0 0 99 0 0 0 229367 332745 0 0 0 0 0 0 2 19 66 0 0 99 0 0 0 229367 332745 0 0 0 0 0 0 6 37 76 0 0 99 0
The most important fields to look at here are:
r -- The average number of runnable kernel threads over whatever sampling interval you have chosen.
b -- The average number of kernel threads that are in the virtual memory waiting queue over your sampling interval. r should always be higher than b; if it is not, it usually means you have a CPU bottleneck.
fre -- The size of your memory free list. Do not worry so much if the amount is really small. More importantly, determine if there is any paging going on if this amount is small.
pi -- Pages paged in from paging space. po -- Pages paged out to paging space.
CPU section: us
sy id wa
Let's look at the last section, which also comes up in most other CPU monitoring tools, albeit with different headings:
us -- user time sy -- system time id -- idle time wa -- waiting on I/O # IOSTAT:
---This command is useful for monitoring I/O activities. You can use the read and write rate to estimate the
amount of time required for certain SQL operations (if they are the only activity on the system).
This command is also useful for determining if something is suspended or just taking a long time.
Basic synctax is iostat <options> interval count
option - let you specify the device for which information is needed like disk , cpu or terminal. (-d , -c , -t or -tdc ) . x options gives the extended statistics .
interval - is time period in seconds between two samples . iostat 4 will give data at each 4 seconds interval.
count - is the number of times the data is needed . iostat 4 5 will give data at 4 seconds interval 5 times.
Example:
$ iostat -xtc 5 2
extended disk statistics tty cpu disk r/s w/s Kr/s Kw/s wait actv svc_t %w %b tin tout us sy wt id sd0 2.6 3.0 20.7 22.7 0.1 0.2 59.2 6 19 0 84 3 85 11 0 sd1 4.2 1.0 33.5 8.0 0.0 0.2 47.2 2 23
sd2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 sd3 10.2 1.6 51.4 12.8 0.1 0.3 31.2 3 31 disk name of the disk
r/s reads per second w/s writes per second
Kr/s kilobytes read per second Kw/s kilobytes written per second
wait average number of transactions waiting for service (Q length) actv average number of transactions actively
being serviced (removed from the queue but not yet completed)
%w percent of time there are transactions waiting for service (queue non-empty)
The values to look from the iostat output are: Reads/writes per second (r/s , w/s)
Percentage busy (%b) Service time (svc_t)
If a disk shows consistently high reads/writes along with , the percentage busy (%b) of the disks
is greater than 5 percent, and the average service time (svc_t) is greater than 30 milliseconds,
then action needs to be taken.
# netstat
This command lets you know the network traffic on each node, and the number of error packets encountered.
It is useful for isolating network problems. Example:
To find out all listening services, you can use the command # netstat -a -f inet
1.2.11 Some other utilities for Solaris: ======================================== # top
For example:
load averages: 0.66, 0.54, 0.56 11:14:48 187 processes: 185 sleeping, 2 on cpu
CPU states: % idle, % user, % kernel, % iowait, % swap Memory: 4096M real, 1984M free, 1902M swap in use, 2038M swap free
PID USERNAME THR PRI NICE SIZE RES STATE TIME CPU COMMAND 2795 oraclown 1 59 0 265M 226M sleep 0:13 4.38% oracle 2294 root 11 59 0 8616K 7672K sleep 10:54 3.94% bpbkar 13907 oraclown 11 59 0 271M 218M cpu2 4:02 2.23% oracle 14138 oraclown 12 59 0 270M 230M sleep 9:03 1.76% oracle 2797 oraclown 1 59 0 189M 151M sleep 0:01 0.96% oracle 2787 oraclown 11 59 0 191M 153M sleep 0:06 0.69% oracle 2799 oraclown 1 59 0 190M 151M sleep 0:02 0.45% oracle 2743 oraclown 11 59 0 191M 155M sleep 0:25 0.35% oracle 2011 oraclown 11 59 0 191M 149M sleep 2:50 0.27% oracle 2007 oraclown 11 59 0 191M 149M sleep 2:22 0.26% oracle 2009 oraclown 11 59 0 191M 149M sleep 1:54 0.20% oracle 2804 oraclown 1 51 0 1760K 1296K cpu2 0:00 0.19% top 2013 oraclown 11 59 0 191M 148M sleep 0:36 0.14% oracle 2035 oraclown 11 59 0 191M 149M sleep 2:44 0.13% oracle 114 root 10 59 0 5016K 4176K sleep 23:34 0.05% picld Process ID
This column shows the process ID (pid) of each process. The process ID is a positive number,
usually less than 65536. It is used for identification during the life of the process.
Once a process has exited or been killed, the process ID can be reused. Username
This column shows the name of the user who owns the process. The kernel stores this information
as a uid, and top uses an appropriate table (/etc/passwd, NIS, or NIS+) to translate this uid in to a name.
Threads
This column displays the number of threads for the current process. This column is present only
in the Solaris 2 port of top.
For Solaris, this number is actually the number of lightweight processes (lwps) created by the
threads package to handle the threads. Depending on current resource utilization, there may not
be one lwp for every thread. Thus this number is actually less than or equal to the total number
of threads created by the process. Nice
This column reflects the "nice" setting of each process. A process's nice is inhereted from its parent.
Most user processes run at a nice of 0, indicating normal priority. Users have the option of starting
a process with a positive nice value to allow the system to reduce the priority given to that process.
This is normally done for long-running cpu-bound jobs to keep them from interfering with
interactive processes. The Unix command "nice" controls setting this value. Only root can set
a nice value lower than the current value. Nice values can be negative. On most systems they range from -20 to 20.
The nice value influences the priority value calculated by the Unix scheduler. Size
This column shows the total amount of memory allocated by each process. This is virtual memory
and is the sum total of the process's text area (program space), data area, and dynamically
allocated area (or "break"). When a process allocates additional memory with the system call "brk",
this value will increase. This is done indirectly by the C library function "malloc".
The number in this column does not reflect the amount of physical memory currently in use by the process.
Resident Memory
This column reflects the amount of physical memory currently allocated to each process.
This is also known as the "resident set size" or RSS. A process can have a large amount
of virtual memory allocated (as indicated by the SIZE column) but still be using very little physical memory.
This column reflects the last observed state of each process. State names vary from system to system.
These states are analagous to those that appear in the process states line: the second line of the display.
The more common state names are listed below. cpu - Assigned to a CPU and currently running run - Currently able to run
sleep - Awaiting an external event, such as input from a device stop - Stopped by a signal, as with control Z
swap - Virtual address space swapped out to disk
zomb - Exited, but parent has not called "wait" to receive the exit status CPU Time
This column displayes the accumulated CPU time for each process. This is the amount of time
that any cpu in the system has spent actually running this process. The standard format shows
two digits indicating minutes, a colon, then two digits indicating seconds.
For example, the display "15:32" indicates fifteen minutes and thirty-two seconds. When a time value is greater than or equal to 1000 minutes, it is displayed as hours with the suffix H.
For example, the display "127.4H" indicates 127 hours plus four tenths of an hour (24 minutes).
When the number of hours exceeds 999.9, the "H" suffix is dropped so that the display
continues to fit in the column. CPU Percentage
This column shows the percentage of the cpu that each process is currently consuming.
By default, top will sort this column of the output.
Some versions of Unix will track cpu percentages in the kernel, as the figure is used in the calculation
of a process's priority. On those versions, top will use the figure as calculated by the kernel.
Other versions of Unix do not perform this calculation, and top must determine the percentage explicity
by monitoring the changes in cpu time.
On most multiprocessor machines, the number displayed in this column is a percentage of the total
available cpu capacity. Therefore, a single threaded process running on a four processor system will never
use more than 25% of the available cpu cycles. Command
This column displays the name of the executable image that each process is running.
In most cases this is the base name of the file that was invoked with the most recent kernel "exec" call.
On most systems, this name is maintained separately from the zeroth argument. A program that changes
its zeroth argument will not affect the output of this column.
# modinfo
kernel.
The /etc/system file:
Available for Solaris Operating Environment, the /etc/system file contains definitions for kernel configuration limits
such as the maximum number of users allowed on the system at a time, the maximum number of processes per user,
and the inter-process communication (IPC) limits on size and number of resources. These limits are important because
they affect DB2 performance on a Solaris Operating Environment machine. See the Quick Beginnings information
for further details. # more /etc/path_to_inst
To see the mapping between the kernel abbreviated instance name for physical device names,
view the /etc/path_to_inst file. # uptime
uptime - show how long the system has been up /export/home/oraclown>uptime
11:32am up 4:19, 1 user, load average: 0.40, 1.17, 0.90
1.2.12 Wellknown tools for AIX: =============================== 1. commands:
---CPU Memory Subsystem I/O Subsystem Network Subsystem
---vmstat vmstat iostat netstat
iostat lsps vmstat ifconfig
ps svmon lsps tcpdump
sar filemon filemon
tprof ipcs lvmstat
nmon and topas can be used to monitor those subsystems in general. 2. topas:
---topas is a useful graphical interface that will give you immediate results of what is going on in the system.
When you run it without any command-line arguments, the screen looks like this:
Topas Monitor for host: aix4prt EVENTS/QUEUES FILE/TTY
Mon Apr 16 16:16:50 2001 Interval: 2 Cswitch 5984 Readch 4864 Syscall 15776 Writech 34280 Kernel 63.1 |################## | Reads 8 Rawin 0 User 36.8 |########## | Writes 2469 Ttyout 0 Wait 0.0 | | Forks 0 Igets 0 Idle 0.0 | | Execs 0 Namei 4 Runqueue 11.5 Dirblk 0
Network KBPS I-Pack O-Pack KB-In KB-Out Waitqueue 0.0 lo0 213.9 2154.2 2153.7 107.0 106.9
tr0 34.7 16.9 34.4 0.9 33.8 PAGING MEMORY
Faults 3862 Real,MB 1023 Disk Busy% KBPS TPS KB-Read KB-Writ Steals 1580 % Comp 27.0 hdisk0 0.0 0.0 0.0 0.0 0.0 PgspIn 0 % Noncomp 73.9 PgspOut 0 % Client 0.5 Name PID CPU% PgSp Owner PageIn 0
java 16684 83.6 35.1 root PageOut 0 PAGING SPACE java 12192 12.7 86.2 root Sios 0 Size,MB 512 lrud 1032 2.7 0.0 root % Used 1.2 aixterm 19502 0.5 0.7 root NFS (calls/sec) % Free 98.7 topas 6908 0.5 0.8 root ServerV2 0
ksh 18148 0.0 0.7 root ClientV2 0 Press:
gil 1806 0.0 0.0 root ServerV3 0 "h" for help
The information on the bottom left side shows the most active processes; here, java is consuming 83.6% of CPU.
The middle right area shows the total physical memory (1 GB in this case) and Paging space (512 MB),
as well as the amount being used. So you get an excellent overview of what the system is doing
in a single screen, and then you can select the areas to concentrate based on the information being shown here.
Note: about waits:
---Don't get caught up in this whole wait i/o thing. a single cpu system with 1 i/o outstanding and no other runable threads (i.e. idle) will have 100% wait i/o. There was a big discussion a couple of years ago on removing the kernel tick as it has confused many many many techs.
So, if you have only 1 or few cpu, then you are going to have high wait i.o figures, it does not neccessarily mean your disk subsystem is slow.
3. trace:
---trace captures a sequential flow of time-stamped system events. The ---trace is a valuable tool for observing
system and application execution. While many of the other tools provide high level statistics such as
CPU and I/O utilization, the trace facility helps expand the information as to where the events happened,
which process is responsible, when the events took place, and how they are affecting the system.
Two post processing tools that can extract information from the trace are utld (in AIX 4) and curt
(in AIX 5). These provide statistics on CPU utilization and process/thread activity. The third post
processing tool is splat which stands for Simple Performance Lock Analysis Tool. This tool is used to analyze
4. nmon:
---nmon is a free software tool that gives much of the same information as topas, but saves the information
to a file in Lotus 123 and Excel format. The download site is http://www.ibm.com/developerworks/eserver/articles/analyze_aix/.
The information that is collected included CPU, disk, network, adapter statistics, kernel counters,
memory and the "top" process information. 5. tprof:
---tprof is one of the AIX legacy tools that provides a detailed profile of CPU usage for every
AIX process ID and name. It has been completely rewritten for AIX 5.2, and the example below uses
the AIX 5.1 syntax. You should refer to AIX 5.2 Performance Tools update: Part 3 for the new syntax.
The simplest way to invoke this command is to use: # tprof -kse -x "sleep 10"
# tprof -ske -x "sleep 30"
At the end of ten seconds, or 30 seconds, a new file __prof.all, or sleep.prof, is generated that contains
information about what commands are using CPU on the system. Searching for FREQ, the information looks something
like the example below:
Process FREQ Total Kernel User Shared Other ======= === ===== ====== ==== ====== ===== oracle 244 10635 3515 6897 223 0 java 247 3970 617 0 2062 1291 wait 16 1515 1515 0 0 0 ... ======= === ===== ====== ==== ====== ===== Total 1060 19577 7947 7252 3087 1291
This example shows that over half the CPU time is associated with the oracle application and that Java
is using about 3970/19577 or 1/5 of the CPU. The wait usually means idle time, but can also include
the I/O wait portion of the CPU usage.
svmon:
---The svmon command captures a snapshot of the current state om memory. use it with the -G switch to get global statistics for the whole system.
svmon is the most useful tool at your disposal when monitoring a Java process, especially native heap.
The article "When segments collide" gives examples of how to use svmon -P <pid> -m to monitor the
native heap of a Java process on AIX. But there is another variation, svmon -P <pid> -m -r, that is very
effective in identifying native heap fragmentation. The -r switch prints the address range in use, so it gives
a more accurate view of how much of each segment is in use. As an example, look at the partially edited output below:
Pid Command Inuse Pin Pgsp Virtual 64-bit Mthrd LPage 10556 java 681613 2316 2461 501080 N Y N Vsid Esid Type Description LPage Inuse Pin Pgsp Virtual 22ac4 9 mmap mapped to sid b1475 - 0 0 - - 21047 8 mmap mapped to sid 30fe5 - 0 0 - - 126a2 a mmap mapped to sid 91072 - 0 0 - - 7908c 7 mmap mapped to sid 6bced - 0 0 - - b2ad6 b mmap mapped to sid b1035 - 0 0 - - b1475 - work - 65536 0 282 65536 30fe5 - work - 65536 0 285 65536 91072 - work - 65536 0 54 65536 6bced - work - 65536 0 261 65536 b1035 - work - 45054 0 0 45054 Addr Range: 0..45055
e0f9f 5 work shmat/mmap - 48284 0 3 48284 19100 3 work shmat/mmap - 46997 0 463 47210 c965a 4 work shmat/mmap - 46835 0 281 46953 7910c 6 work shmat/mmap - 37070 0 0 37070 Addr Range: 0..50453
e801d d work shared library text - 9172 0 0 9220 Addr Range: 0..30861
a0fb7 f work shared library data - 105 0 1 106 Addr Range: 0..2521
21127 2 work process private - 50 2 1 51 Addr Range: 65300..65535
a8535 1 pers code,/dev/q109waslv:81938 - 11 0 - - Addr Range: 0..11
Other example:
# svmon -G -i 2 5 # sample five times at two second intervals
memory in use pin pg space size inuse free pin work pers clnt work pers clnt size inuse 16384 16250 134 2006 10675 2939 2636 2006 0 0 40960 12674 16384 16250 134 2006 10675 2939 2636 2006 0 0 40960 12674 16384 16250 134 2006 10675 2939 2636 2006 0 0 40960 12674 16384 16250 134 2006 10675 2939 2636 2006 0 0 40960 12674 16384 16250 134 2006 10675 2939 2636 2006 0 0 40960 12674 In this example, there are 16384 pages of total size of memory. Multuply this number by 4096
filemon:
---filemon can be used to identify the files that are being used most actively. This tool gives a very
comprehensive view of file access, and can be useful for drilling down once vmstat/iostat confirm disk
to be a bottleneck. Example:
# filemon -o /tmp/filemon.log; sleep 60; trcstop
The generated log file is quite large. Some sections that may be useful are: Most Active Files
#MBs #opns #rds #wrs file volume:inode
25.7 83 6589 0 unix /dev/hd2:147514 16.3 1 4175 0 vxe102 /dev/mailv1:581 16.3 1 0 4173 .vxe102.pop /dev/poboxv:62 15.8 1 1 4044 tst1 /dev/mailt1:904 8.3 2117 2327 0 passwd /dev/hd4:8205 3.2 182 810 1 services /dev/hd4:8652 ... Detailed File Stats
FILE: /var/spool/mail/v/vxe102 volume: /dev/mailv1 (/var/spool2/mail/v) inode: 581
opens: 1
total bytes xfrd: 17100800
reads: 4175 (0 errs)
read sizes (bytes): avg 4096.0 min 4096 max 4096 sdev 0.0 read times (msec): avg 0.543 min 0.011 max 78.060 sdev 2.753 ...
curt:
---curt Command Purpose
The CPU Utilization Reporting Tool (curt) command converts an AIX trace file into a number of statistics related
to CPU utilization and either process, thread or pthread activity. These statistics ease the tracking of
specific application activity. curt works with both uniprocessor and multiprocessor AIX Version 4 and AIX Version 5
traces. Syntax
curt -i inputfile [-o outputfile] [-n gennamesfile] [-m trcnmfile] [-a pidnamefile] [-f timestamp]
[-l timestamp] [-ehpstP] Description
The curt command takes an AIX trace file as input and produces a number of statistics related to
processor (CPU) utilization and process/thread/pthread activity. It will work with both uniprocessor and
multiprocessor AIX traces if the processor clocks are properly synchronized.
1.2.13 Not so well known tools for AIX: the proc tools: =======================================================
--proctree
Displays the process tree containing the specified process IDs or users. To display the ancestors
and all the children of process 12312, enter: # proctree 21166
11238 /usr/sbin/srcmstr
21166 /usr/sbin/rsct/bin/IBM.AuditRMd
To display the ancestors and children of process 21166, including children of process 0, enter: #proctree a 21166 � 1 /etc/init 11238 /usr/sbin/srcmstr 21166 /usr/sbin/rsct/bin/IBM.AuditRMd -- procstack
Displays the hexadecimal addresses and symbolic names for each of the stack frames of the current thread
in processes. To display the current stack of process 15052, enter: # procstack 15052
15052 : /usr/sbin/snmpd
d025ab80 select (?, ?, ?, ?, ?) + 90 100015f4 main (?, ?, ?) + 1814
10000128 __start () + 8c
Currently, procstack displays garbage or wrong information for the top stack frame, and possibly for the
second top stack frame. Sometimes it will erroneously display "No frames found on the stack," and sometimes
it will display: deadbeef ???????? (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ...) The fix for this problem had not
been released at the writing of this article. When the fix becomes available, you need to download the
APAR IY48543 for 5.2. For AIX 5.3 it all should work OK. -- procmap
Displays a process address map. To display the address space of process 13204, enter: # procmap 13204 13204 : /usr/sbin/biod 6 10000000 3K read/exec biod 20000910 0K read/write biod d0083100 79K read/exec /usr/lib/libiconv.a 20013bf0 41K read/write /usr/lib/libiconv.a d007a100 34K read/exec /usr/lib/libi18n.a 20011378 4K read/write /usr/lib/libi18n.a d0074000 11K read/exec /usr/lib/nls/loc/en_US d0077130 8K read/write /usr/lib/nls/loc/en_US d00730f8 2K read/exec /usr/lib/libcrypt.a f03c7508 0K read/write /usr/lib/libcrypt.a d01d4e20 1997K read/exec /usr/lib/libc.a f0337e90 570K read/write /usr/lib/libc.a
-- procldd
Displays a list of libraries loaded by a process. To display the list of dynamic libraries loaded by process 11928, enter # procldd 11928. T 11928 : -sh /usr/lib/nls/loc/en_US /usr/lib/libcrypt.a /usr/lib/libc.a -- procflags
Displays a process tracing flags, and the pending and holding signals. To display the tracing flags of
process 28138, enter: # procflags 28138
28138 : /usr/sbin/rsct/bin/IBM.HostRMd data model = _ILP32 flags = PR_FORK /64763: flags = PR_ASLEEP | PR_NOREGS /66315: flags = PR_ASLEEP | PR_NOREGS /60641: flags = PR_ASLEEP | PR_NOREGS /66827: flags = PR_ASLEEP | PR_NOREGS /7515: flags = PR_ASLEEP | PR_NOREGS /70439: flags = PR_ASLEEP | PR_NOREGS /66061: flags = PR_ASLEEP | PR_NOREGS /69149: flags = PR_ASLEEP | PR_NOREGS
-- procsig
Lists the signal actions for a process. To list all the signal actions defined for process 30552, enter: # procsig 30552 30552 : -ksh HUP caught INT caught QUIT caught
ILL caught TRAP caught ABRT caught EMT caught FPE caught
KILL default RESTART BUS caught
-- proccred
Prints a process' credentials. To display the credentials of process 25632, enter:
# proccred 25632
25632: e/r/suid=0 e/r/sgid=0
-- procfiles
Prints a list of open file descriptors. To display status and control information on the file descriptors
opened by process 20138, enter: # procfiles n 20138�
20138 : /usr/sbin/rsct/bin/IBM.CSMAgentRMd Current rlimit: 2147483647 file descriptors
0: S_IFCHR mode:00 dev:10,4 ino:4178 uid:0 gid:0 rdev:2,2 O_RDWR name:/dev/null
2: S_IFREG mode:0311 dev:10,6 ino:250 uid:0 gid:0 rdev:0,0 O_RDWR size:0 name:/var/ct/IBM.CSMAgentRM.stderr
4: S_IFREG mode:0200 dev:10,6 ino:255 uid:0 gid:0 rdev:0,0
-- procwdx
Prints the current working directory for a process. To display the current working directory
of process 11928, enter: # procwdx 11928
11928 : /home/guest
-- procstop
Stops a process. To stop process 7500 on the PR_REQUESTED event, enter: # procstop 7500 .
-- procrun
Restart a process. To restart process 30192 that was stopped on the PR_REQUESTED event, enter:
# procrun 30192 . -- procwait
Waits for all of the specified processes to terminate. To wait for process 12942 to exit and display
the status, enter # procwait -v 12942 .
1.2.14 Other monitoring: ========================
Nagios: open source Monitoring for most unix systems:
---Nagios is an open source host, service and network monitoring program. Latest versions: 2.5 (stable)
Overview
Nagios is a host and service monitor designed to inform you of network problems before your clients,
end-users or managers do. It has been designed to run under the Linux operating system, but works fine
under most *NIX variants as well. The monitoring daemon runs intermittent checks on hosts and services you specify
using external "plugins" which return status information to Nagios. When problems are encountered,
the daemon can send notifications out to administrative contacts in a variety of different ways
(email, instant message, SMS, etc.). Current status information, historical logs, and reports can all
be accessed via a web browser. System Requirements
The only requirement of running Nagios is a machine running Linux (or UNIX variant) and a C compiler.
You will probably also want to have TCP/IP configured, as most service checks will be performed over the network.
You are not required to use the CGIs included with Nagios. However, if you do decide to use them,
you will need to have the following software installed...
- A web server (preferrably Apache)
- Thomas Boutell's gd library version 1.6.3 or higher (required by the statusmap and trends CGIs)
rstat: Monitoring Machine Utilization with rstat: ---rstat stands for Remote System Statistics service
Ports exist for most unixes, like Linux, Solaris, AIX etc.. -- rstat on Linux, Solaris:
running the rpc.rstatd daemon,
its server-side counterpart. The rpc.rstad daemon has been used for many years by tools such as Sun's perfmeter
and the rup command. The rstat program is simply a new client for an old daemon. The fact that the rpc.rstatd daemon
is already installed and running on most Solaris and Linux machines is a huge advantage over other tools
that require the installation of custom agents.
The rstat client compiles and runs on Solaris and Linux as well and can get statistics from any machine running
a current rpc.rstatd daemon, such as Solaris, Linux, AIX, and OpenBSD. The rpc.rstatd daemon is started
from /etc/inetd.conf on Solaris. It is similar to vmstat, but has some advantages over vmstat:
You can get statistics without logging in to the remote machine, including over the Internet.
It includes a timestamp.
The output can be plotted directly by gnuplot.
The fact that it runs remotely means that you can use a single central machine to monitor the performance
of many remote machines. It also has a disadvantage in that it does not give the useful scan rate measurement
of memory shortage, the sr column in vmstat. rstat will not work across most firewalls because it relies on
port 111, the RPC port, which is usually blocked by firewalls.
To use rstat, simply give it the name or IP address of the machine you wish to monitor. Remember that rpc.rstatd
must be running on that machine. The rup command is extremely useful here because with no arguments,
it simply prints out a list of all machines on the local network that are running the rstatd demon.
If a machine is not listed, you may have to start rstatd manually. To start rpc.rstatd under Red Hat Linux, run
# /etc/rc.d/init.d/rstatd start as root.
On Solaris, first try running the rstat client because inetd is often already configured to automatically
start rpc.rstatd on request. If it the client fails with the error "RPC: Program not registered,"
make sure you have this line in your /etc/inet/inetd.conf and kill -HUP your inetd process to get it to
re-read inetd.conf, as follows:
rstatd/2-4 tli rpc/datagram_v wait root /usr/lib/netsvc/rstat/rpc.rstatd rpc.rstatd
Then you can monitor that machine like this: % rstat enkidu
This command will give you a one-second average and then it will exit. If you want to continuously monitor,
give an interval in seconds on the command line. Here's an example of one line of output every two seconds:
% rstat enkidu 2 2001 07 10 10 36 28 0 0 1 98 0 0 7 2 0 0 61 0.0 2001 07 10 10 36 30 0 0 0 100 0 0 0 2 0 0 15 0.0 2001 07 10 10 36 32 0 0 0 100 0 0 0 2 0 0 15 0.0 2001 07 10 10 36 34 0 0 0 100 0 5 10 2 0 0 19 0.0 2001 07 10 10 36 36 0 0 0 100 0 0 46 2 0 0 108 0.0 ^C
To get a usage message, the output format, the version number, and where to go for updates, just type rstat
with no parameters: % rstat
usage: rstat machine [interval] output:
yyyy mm dd hh mm ss usr wio sys idl pgin pgout intr ipkts opkts coll cs load docs and src at http://patrick.net/software/rstat/rstat.html
Notice that the column headings line up with the output data.
-- AIX:
In order to get rstat working on AIX, you may need to configure rstatd. As root
1. Edit /etc/inetd.conf
Uncomment or add entry for rstatd Eg
rstatd sunrpc_udp udp wait root /usr/sbin/rpc.rstatd rstatd 100001 1-3 2. Edit /etc/services
Uncomment or add entry for rstatd Eg rstatd 100001/udp 3. Refresh services refresh -s inetd 4. Start rstatd /usr/sbin/rpc.rstatd ================================== 2. NFS and Mount command examples: ==================================
========
We will discuss the most important feaures of NFS, by showing how its implemented on
Solaris, Redhat and SuSE Linux. Most of this applies to HP-UX and AIX as well.
2.1.1 NFS and Redhat Linux:
---Linux uses a combination of kernel-level support and continuously running daemon processes to provide
NFS file sharing, however, NFS support must be enabled in the Linux kernel to function.
NFS uses Remote Procedure Calls (RPC) to route requests between clients and servers, meaning that the
portmap service must be enabled and active at the proper runlevels for NFS communication to occur.
Working with portmap, various other processes ensure that a particular NFS connection is allowed and may
proceed without error:
rpc.mountd The running process that receives the mount request from an NFS� client and checks to see
if it matches with a currently exported file system.
rpc.nfsd The process that implements the user-level part of the NFS service.� It works with the Linux kernel
to meet the dynamic demands of NFS clients, such as providing additional server threads for
NFS clients to uses.
rpc.lockd A daemon that is not necessary with modern kernels. NFS file locking� is now done by the kernel.
It is included with the nfs-utils package for users of older kernels that do not include this
functionality by default.
rpc.statd Implements the Network Status Monitor (NSM) RPC protocol. This� provides reboot notification
when an NFS server is restarted without being gracefully brought down.
rpc.rquotad An RPC server that provides user quota information for remote users.�
Not all of these programs are required for NFS service. The only services that must be enabled are rpc.mountd,
rpc.nfsd, and portmap. The other daemons provide additional functionality and should only be used if your server
environment requires them.
NFS version 2 uses the User Datagram Protocol (UDP) to provide a stateless network connection between
the client and server. NFS version 3 can use UDP or TCP running over an IP. The stateless UDP connection
minimizes network traffic, as the NFS server sends the client a cookie after the client is authorized
to access the shared volume. This cookie is a random value stored on the server's side and is passed