Monitoring tools
Below is my reference card notes on the most useful monitoring tools available on AIX and the most useful parameters.
If you see mistakes please correct them.
If you have other favourites please add them.
The Tips and techniques page is the place for discussions on tuning approach and ideas.
vmstat
Virtual Memory Management Stats but also includes CPU and other useful stuff
| Syntax |
vmstat <seconds> <count> |
| Options |
seconds |
Time between outputs |
| |
count |
number of outputs |
| Examples |
vmstat 10 20 |
20 lines output with 10 seconds between each |
| Output |
Warning: |
ignore the first line (average since reboot) |
| |
r |
number of processes on run queue |
| |
b |
number of processes on blocked queue = awaiting resources or I/O |
| |
avm |
active virtual memory pages in page space |
| |
fre |
real memory pages on the free list |
| |
re |
Page reclaims, free but claimed before reused |
| |
pi |
paged in (per second) |
| |
po |
paged out (per second) |
| |
fr |
pages freed (page replacement) (per second) |
| |
sr |
pages per second scanned for replacement |
| |
cy |
complete scans of page table |
| |
in |
device interrupts per second |
| |
sy |
system calls per second |
| |
cs |
CPU context switches per second |
| |
us |
User CPU time percentage |
| |
sys |
System CPU time percentage |
| |
id |
CPU idle percentage (nothing to do) |
| |
wa |
CPU waiting for pending local Disk i/o |
iostat
Disk I/O statistics
| Syntax |
iostat <seconds> <count> |
| Options |
seconds |
Time between outputs |
| |
count |
number of outputs |
| Examples |
iostat 10 20 20 |
lines output with 10 seconds between each |
| Output |
Warning: |
ignore the first line (average since reboot) |
| |
%tm_act |
Percentage of time active |
| |
Kbps |
K bytes per second transferred |
| |
tps |
Transfers per second |
| |
msps |
Millisecond per seek (if available) |
| |
Kb_read |
Total K bytes read ( likewise for write) |
ps
Process State
| Syntax |
ps -l -f -e -uuser -t ttyno -p pid -k -o xxx |
| |
ps aux |
| Options |
-l |
long listing |
| |
-f |
full listing |
| |
-u user |
list only user's processes (-u fred) |
| |
-e |
every user's processes |
| |
-t ttyno |
processes attached to tty (-t 03) |
| |
-p pid |
list the process number N |
| |
-k |
Include kernel processes (normally hiden) |
| |
-o xxx |
Lets you decide the column for example: -o tid,pid,user,class,pcou,pmem,args |
| |
aux |
BSD flavour (note no -) |
| Examples |
ps -f |
List your shells (sub) processes in detail |
| |
ps -f oracle |
List all processes for user oracle |
| |
ps -ef |
List all process |
| |
ps -el |
As above but other details |
| |
ps -fp 23456 |
Just list process 23456 |
| |
ps -o tid,pid,args |
List threadID, processID and arguments |
| Output |
PID/PPID |
Process IDentity&Parent Process IDentity |
| |
S |
State= Running Sleeping Waiting Zombie Terminating Kernel Intermediate X=growing |
| |
UID/USER |
User IDentity/User name |
| |
C |
CPU recent use value (part of priority) |
| |
STIME |
Start time of process |
| |
PRI |
Priority (higher means less priority) |
| |
NI |
NIce value (part of priority) default 20 |
| |
ADDR |
ADDRess, of stack ( segment no) |
| |
SZ |
SiZe of process in 1K pages |
| |
CMD |
COMmanD the user typed (-f for more) |
| |
WCHAN |
Event awaited for (kernel address) |
| |
TTY |
Terminal processes in connected to (- = none) |
| |
TIME |
Minutes and Seconds of CPU time |
| |
SSIZ |
Size of kernel stack |
| |
PGIN |
number of pages paged in |
| |
SIZE |
Virtual size of data section in 1K's |
| |
RSS |
Real memory (resident set) size of process 1K's |
| |
LIM |
Soft limit on memory (see setrlimit) xx=none |
| |
TSIZ |
Size of text (shared text program) image |
| |
TRS |
Size of resident set (real memory) of test |
| |
%CPU |
Percentage of CPU used since started |
| |
%MEM |
Percentage of real memory used |
nfsstat
Network File Systems Stats
| Syntax |
nfsstat -m -z |
| Options |
-m |
Display NFS mount point stats |
| |
-z |
Zeros NFS stats |
| Examples |
nfsstat |
Display all NFS stats |
| |
nfsstat -m |
Display stats about the mount points |
| Output |
|
Too many columns to cover here but labels are helpful if you know NFS |
netstat
Network statistics
| Syntax |
netstat -i -n -r -p -m |
| Examples |
netstat -in |
Interface stats |
| |
netstat -rn |
Routing stats |
| |
netstat -p tcp |
Protocol stats (also try ip, cmp, igmp, udp |
| |
netstat -m |
Memory buffer stats used for packets inside AIX |
| |
netstat -D |
Packets receiver, transmitted and dropped) stats |
wlmstat
Workload Manager Stats
| Syntax |
wlmstat -c -m -b -S -v [seconds [count]] |
| Options |
-b -c -m |
List only c=cpu m=memory -b=disks (yes b, not d) |
| |
-S List Super Class level only |
| |
-v |
Verbose outout (more detailed) |
| |
seconds |
Time bewteen output |
| |
count |
number of outputs |
| Examples |
wlmstat 3 100 |
Basic stats every 3 seconds for 100 times |
| |
wlmstat -v 60 |
Full details once a minute for ever |
| |
wlmstat -Sv 9 |
As above but Superclass only and every 9 seconds |
| Output |
Class |
Name of the Class |
| |
CPU,MEM,DKIO |
Percentages |
| |
tr |
Tier number of class |
| |
i |
Inheritance 0=no 1=yes |
| |
#pr |
number of processes in class |
| |
sha |
Shares (- = -1) |
| |
min |
Minimum Limit as a percentage |
| |
smx |
Soft maximum limit as a percentage |
| |
hmx |
Hard maximum limit as a percentage |
| |
des |
Desired percentage calculated by WLM |
| |
npg |
number of memory pages in class |
Hint Try to have nothing in the Default Class.
ncheck
Inode check
| Syntax |
ncheck [-a][-i inodenumber...] [-s] [filesystem] |
| Options |
-a |
all including . and .. |
| |
-i inode |
find the file(s) with these inode no. |
| |
-s |
list special and set UID files |
| Examples |
ncheck -a / |
List all files in / |
| |
ncheck -i 2194 /tmp f |
ind name for inode 2194 in /tmp |
netpmon
Network (and lots more) Monitor - uses trace so only the root user and this can hit performance.
| Syntax |
netpmon -o file -Tn -P -v -Oreport-type |
| Options |
-o outputfile |
put the output to file not stdout |
| |
-T n |
Set output buffer size (default 64000) |
| |
-P |
Force monitor process into pinned memory |
| |
-v |
Verbose (default only top 20 processes) |
| |
-O |
cpu, dd(device driver), so(socket), nfs, all |
| Examples |
netpmon -O all -o net.out |
| |
do network or general workload here ... |
| |
finish with: trcstop
There is lots of information gathered in one report. |
Output
filemon
File I/O monitor - uses trace so only the root user and this can hit performance.
| Syntax |
filemon -i file -o file -d -Tn -P -v -O levels |
| Examples |
filemon -O all -o file.out |
| |
do disk I/O work load here... |
| |
finish with: trcstop |
| Output |
#MBs |
total number of Mbytes transfer during run |
| |
#opns |
number of times the file was opened |
| |
#rpgs |
number of 4K page reads |
| |
#wpgs |
number of 4K page written |
| |
#wrs |
number of write calls |
| |
persistent |
paged from file system |
| |
working |
paged from paging space |
| |
util |
percentage busy |
| |
KB/s |
average data transfer rate |
svmon
System Virtual Memory Monitor - uses trace so only the root user and this can hit performance.
| Syntax |
svmon -G -Pnsa pid... -Pnsa[upg][count] -S sid... -i seconds count |
| Options |
-G |
Global report |
| |
-P[nsa] pid.. \Process report n=non-sys s-system a=both |
| |
-S[nsa][upg][x] |
Segment report as above + u==real-mem p=pinned g=paging x=top x items |
| |
-S sid... |
Segment report on particular segments |
| |
-i secs count |
Repeat report at interval second & count times |
| |
-D sid... |
Detailed report |
| Examples |
svmon -G |
Global / General stats |
| |
svmon -Pa 215 |
Process report for process 215 |
| |
svmon -Ssu 10 |
Top ten system segments in real memory order |
| |
svmon -D 340d |
Detailed report on a particular segment |
| Output |
size |
in pages (4096) |
| |
inuse |
in-use |
| |
free |
not in use included rmss pages |
| |
pin |
pinned (locked by app.) |
| |
work |
pages in working segments |
| |
pers |
pages in persistent segments |
| |
clnt |
pages in client segments |
| |
pg space |
paging space |
Note: pages can be in more than one process
ipcs
InterprocessComms(shared memory,queue&semaphore) stats
| Syntax |
ipcs -a |
| Examples |
ipcs |
Regular report |
| |
ipcs -a |
Full report = more columns |
| Output |
T |
Type m=memory, q=queue, s=semaphore |
| |
ID, KEY |
What the programmer user to access the ipc |
| |
CPID, LPID |
Process that created/last attached |
| |
CBYTES |
Bytes current in message queue |
| |
QBYTES |
Maximum number of bytes allowed in message queue |
| |
QNUM |
number of messages held |
| |
NATTCH |
Processes attached to this shared memory |
| |
SEGSZ |
Size of shared memory (segment) |
| |
NSEMS |
Number of Semaphores |
lvmstat
Logical Volume Stats
| Syntax |
lvmstat -v vgname -l lvname -e -d [seconds [count]] |
| Options |
|
|
| |
-v vgname |
Volume group to track |
| |
-l lvname |
Logical volume to track |
| |
-e |
Enable |
| |
-d |
Disable |
| |
seconds |
Between output |
| |
count |
Number of outputs |
| Examples |
|
|
| |
lvmstat -v rootvg -e |
Enable rootvg stats (use -d to disable later) |
| |
lvmstat -v rootvg |
Monitor all of volume group |
| |
lvmstat -l lv05 |
Monitor just one logical volume in more detail |
| Output |
iocnt |
number of io |
| |
Kb_read |
KBytes read (same for write) |
| |
Kbps |
Kbytes per second |
| |
mirror# |
Which copy of a mirror |
fileplace
Placement of a file in the filesystem
| Syntax |
fileplace -l -p -v filename |
| Options |
-l |
Logical layout in filesystem |
| |
-p |
Physical layout on disk(s) |
| |
-v |
Verbose (good) |
| Example |
fileplace -lv /tmp/xyz |
Logical layout |
| Example |
fileplace -pv /db/data.idx |
Disk layout |
rmss
Reduced Memory System Simulator
| Syntax |
rmss -p -c <MB> -r |
| Options |
| |
-p |
Print the current value |
| |
-c MB |
Change to M size (in Mbytes) |
| |
-r |
Restore all memory to use |
| |
-p |
Print the current value |
| Example |
rmss -p |
find out how much memory you have online |
| Example |
rmss -c 32 |
Change available memory to 32 Mbytes |
| Example |
rmss -r |
Undo the above |
Warning:
- rmss can damage performance very seriously
- Don't go below 25% od the machines memory
- Never forget to finish with rmss -r
rmms to determine the real memory use
To test the pressure on memory
- Reduce memory by 5% with rmss -c MB
- Immediately, rmss -r so release the rmss locked memory,
- This memory goes on the free list and will be the next memory allocated on demand
- Watch free memory being used with vmstat or nmon
If it reduces in
- seconds - the machine is probably short on memory
- minutes - memory is about right
- hours or days - there is spare memory, can you tune to use more memory, like increasing RDBMS disk caches or Webspace
truss
Tracks process system calls (AIX5+)
| Syntax: |
simple |
truss mycmd |
| Syntax: |
detailed |
truss -a -f -c -p pid -o file |
| Options |
-a |
Display parameters strings |
| |
-f |
Follow child processes |
| |
-c |
Counts system calls - displays when process stops |
| |
-p pid |
Track a running process with PID pid |
| |
-o file |
Output the results to a file (allows interaction cmd) |
| Examples |
truss -a -p 23456 |
Track process 23456 |
| Output |
lots |
Each system call name and parameters |
sar
System activity reporter
| Syntax |
Immediate: |
sar -A [-P ALL] interval number |
| |
Collect: |
sar -A -o savefile interval number >/dev/null |
| |
Report: |
sar -A -f savefile -i secs -s HH[:MM[:SS]] -e HH[:MM[:SS]] |
| Options |
-A |
All stats to be collected/reported |
| |
-o savefile |
Collect stats to binary file |
| |
-f savefile |
Report stats from binary file |
| |
-i secs |
Report at seconds interval from binary file |
| |
-s and -e |
Report stats only between these times |
| Examples |
sar 10 100 R |
eport now at 10 seconds intervals |
| |
sar -A -o fred 10 6 |
Collect data into fred |
| |
sar -P ALL 1 30 |
Show individual CPUs |
| |
sar -A -f fred |
Report on the data |
| |
sar -A -f x -s 10:30 -e 10:45 |
Report on 15 minutes from 10:30 a.m. |
| |
sar -A -f fred -i60 |
Report 1 min. interval -not 10 secs as collected |
| Column |
output |
comments |
| CPU |
%usr %sys |
Percent of time in user / kernel mode |
| |
%wio %idle |
Percent of time waiting for disk io/idle |
| Buffer Cache |
bread/s bwrit/s lread/s lwrit/s |
Block I/O per second Logical I/O per sec (hopefully cached |
| |
pread/s pwrit/s |
Raw disk I/O (not buffer cached) |
| |
%rcache %wcache |
Percentage hit on cache |
| Kernel |
exec/s fork/s sread/s swrite/s r/wchar/s scall/s |
Calls per second of these system calls sread/write system calls (cache, raw, tty or network). scall is the total system calls |
| |
msg/s sema/s |
IPC for messages and semaphores |
| |
kexit/s ksched/s kproc-ov/s |
Process exits, process switches and process-overload (hit proc thresholds) |
| |
runq-sz |
Avg. process on run queue |
| |
%runocc |
Percent. of time with process on queue |
| |
swap-sz |
Avg. process waiting for page in |
| |
%swap-occ |
Percent. of time with process on queue |
| |
cycles/s |
number of page replace search of all pages |
| |
faults/s |
number of page faults (might not need I/O) |
| |
slots |
number of free pages on paging spaces |
| |
odio/s |
number of non-paging disk I/O per second |
| |
file-ov, proc-ov |
number of times these table overflow per sec |
| |
file-sz inode-sz proc-sz |
Entries in the tables |
| |
pswch/s |
Process switches per second |
| |
canch/s outch/s rawch/s |
Characters per second on terminal lines |
| |
rcvin/s xmtin/s |
Receive and transmit interrupts per second |