Wednesday 13 March 2013

KDB Commands



KDB:

The KDB kernel debugger and the kdb command are useful for debugging device drivers, kernel extensions, and the kernel itself. Although they appear similar, the KDB kernel debugger and the kdb command are two separate tools:

KDB KERNEL DEBUGGER
It is integrated into the kernel and allows full control of the system while a debugging session is in progress. 


KDB COMMAND:
It is implemented as an ordinary user-space program and can be used for analyzing the following:

1. A running system: When used to analyze a running system, the kdb command opens the /dev/pmem special file, which allows direct access to the system's physical memory. The kdb command performs its own address translation internally using the same algorithms as the KDB kernel debugger.

2. A system dump file produced by a previously crashed-system: A system dump contains certain critical data structures. Only the memory belonging to the process that was running on the processor that created the dump image can be included in the dump file. When you work with a system dump, any subcommands that modify memory are not valid because the system dump is merely a snapshot of the real memory in a system. 

When you are analyzing a system dump file, the kdb command must be started with arguments that specify the location of the dump file and the kernel file:
# kdb /var/adm/ras/vmcore.0 /unix
(The kernel file is used by the kdb command to resolve symbol names from the dump file.)

------------------------------------

A very valuable benefit of kdb, that a device setting stored in ODM (lsattr..) can be compared with the realtime value used in running kernel with kdb!!

------------------------------------

KDB COMMAND:

help display context                             lists subcommands with the context "display"
p -?                                             list of parameters for the p subcommand and a brief description
! <command>                                      shell escape (provides a convenient way to run UNIX commands without leaving kdb)
hi                                               print history
lke                                              list loaded extensions
pvol -M <major> -m <minor>                       display physical volume info
stat                                             system status info
status                                           processor status
                                               exit from kdb

------------------------------------

echo vfcs fcs0 | kdb | grep num_cmd_elems        shows num_cmd_elems in hex on VIO client with NPIV (compare with odm: lsattr -El fcs0)
                                                 (if you change num_cmd_elems with chdev, you can check in kdb if it really has been changed)
echo scsidisk hdisk0 | kdb | grep queue_depth    shows real-time value in hex of queue_depth of given disk

------------------------------------

Check VSCSI adapter mapping:
(run this on vio client, not on vio server)

root@bb_lpar: / # echo "cvai" | kdb | grep vscsi                              <--cvai is a kdb subcommand
read vscsi_scsi_ptrs OK, ptr = 0xF1000000C01A83C0
vscsi0     0x000007 0x0000000000 0x0                aix-vios1->vhost2         <--shows which vhost is used on which vio server for this client
vscsi1     0x000007 0x0000000000 0x0                aix-vios1->vhost1
vscsi2     0x000007 0x0000000000 0x0                aix-vios2->vhost2


Check NPIV adapter mapping:
(run this on vio client, not on vio server)

root@bb_lpar: / # echo "vfcs" | kdb                                            <--vfcs is a kdb subcommand
...
NAME      ADDRESS             STATE   HOST      HOST_ADAP  OPENED NUM_ACTIVE
fcs0      0xF1000A000033A000  0x0008  aix-vios1 vfchost8  0x01    0x0000       <--shows which vfchost is used on vio server for this client
fcs1      0xF1000A0000338000  0x0008  aix-vios2 vfchost6  0x01    0x0000

------------------------------------

Check physical FC adapter setting (not in virtual environment):
(dyntrk, fc_err_recov, num_cmd_elems) 

These are the settings what we would like to verify:
----------
root@bb_lpar: / # lsattr -El fscsi0| egrep 'dyntrk|fc_err_recov'
dyntrk       yes       Dynamic Tracking of FC Devices        True
fc_err_recov fast_fail FC Fabric Event Error RECOVERY Policy True

root@bb_lpar: / # lsattr -El fcs0| grep num_cmd_elems
num_cmd_elems 200        Maximum number of COMMANDS to queue to the adapter True
----------

Verifying the settings from kernel:
1. root@bb_lpar: / # echo efscsi fscsi0 | kdb | grep efscsi_ddi
    struct efscsi_ddi ddi = 0xF1000A06007FA080                            <--this hexa value will be used


2. root@bb_lpar: / # echo dd 0xF1000A06007FA080+20 2 | kdb                <--"+20 2" should be added to the above hexa value
...                                                                       (20 is a reserved number)
F1000A06007FA0A0: 0101020202010200 000000B400000028  ...............(     <--on the specified locations you can decode the numbers there
                          FFDD     NNNNNNNN

FF = fc_error_recov:(we have "02" in this example here, which is fast_fail)
01 = delayed_fail 
02 = fast_fail

DD = dyntrk
: (we have "01" in this example here, which means "yes")
00 = disabled (no)
01 = enabled (yes)

NNNN = num_cmd_elems: (we have "B4" in this example here, but some calculation is still needed)
1. change to decimal value: 000000B4 --> 180
2. add 20 to the decimal number: 180 + 20 = 200
(you must always add "20" to the decimal value you get)

------------------------------------  

Volume group and lv info:


The volgrp subcommand displays information about vg and its lvs. 
The volgrp structure addresses are registered in the devsw table in the DSDPTR field.
(devsw: displays miscellaneous kernel data structures)

root@bb_lpar: /dev # echo devsw | kdb | grep dsdptr | grep -v 00000000
   dsdptr:    F1000A0600751800                                                 <--this will be used for "volgrp" command
   dsdptr:    05A50280
   dsdptr:    F1000A0600751400

root@bb_lpar: /dev # echo volgrp F1000A0600751800| kdb                         <--displays info about given volgrp
...
VOLGRP............. F1000A0600751800
vg_eyec............ 4C564D766F6C6772 (LVMvolgr)
vg_name............ rootvg
vg_ras_name........ rootvg
vg_id.............. 00080E820000D900000001335FBB8276
vg_lock.......... @ F1000A0600751868    vg_lock............ 0000000000000000
major_num.......... 0000000A            flags.............. 00040001
snapshot_copy...... 0000                partshift.......... 0012  (128M)
ltg_shift.......... 0001  (256K)        open_count......... 000A
max_lvs............ 0100                max_pvs............ 0020
....

------------------------------------

Check hcheck_interval value of a disk:

1. root@bb_lpar: / # echo lke | kdb | grep pcm
 59 F1000000A063D200 05A60000 00030000 02080242 /usr/lib/drivers/aixdiskpcmke      <--this shows slot number, what we can use (here 59)

2. root@bb_lpar: / # echo "lke -s 59" | kdb | grep le_data
  le_data........ 0000000005A80000   le_datasize.... 0000000000002828              <--this shows le_data value
                                                                                   (we will use this in adevq subbcommand)

3. root@bb_lpar: / # kdb
(0)> adevq
Unable to find <pcm_info>
Enter the pcm_info address (in hex): 0000000005A80000                              <--the above value is given here
NAME      ADDR               STATE MACHINE  ACTIVE_IO                              <--then we will see the list of hdisks
hdisk1    0xF1000A0600740400 0x0       0x       0                                  <--choose the address of a disk and run adevq against it
NAME      ADDR               STATE MACHINE  ACTIVE_IO
hdisk2    0xF1000A0600740E00 0x0       0x       0
NAME      ADDR               STATE MACHINE  ACTIVE_IO
hdisk3    0xF1000A0600741800 0x0       0x       0
NAME      ADDR               STATE MACHINE  ACTIVE_IO
hdisk0    0xF1000A0600742200 0x0       0x       0


4. (0)> adevq 0xF1000A0600740400 | grep hcheck                                     <--this shows the address of hcheck, what we will use
    hcheck_t &hcheck = 0xF1000A0600740470


5. (0)> ahcheck 0xF1000A0600740470 | grep interval
    uint interval = 0x0                                                            <--this shows hcheck_interval value in hex (we have 0)

------------------------------------

Check for a process which is using a specific network port:


1. root@bb_lpar: / # netstat -Aan | grep 22                                        <--check for address of the port
f1000e000330ebb8 tcp4       0      0  *.22   *.*    LISTEN


2. root@bb_lpar: / # kdb
(0)> sockinfo f1000e000330ebb8 tcpcb | grep pvproc                                 <--feed the addres in sockinfo subcommand (grep for pvproc)
pvproc+016000   88*sshd     ACTIVE 058000E 03A00A2 000000083846E480   0 0001

3. (0)> hcal 058000E                                                               <--calculate decimal value (this is the pid of the process)
Value hexa: 0058000E          Value decimal: 5767182

(0)> e                                                                             <--exit from kdb

4. root@bb_lpar: / # ps -fp 5767182                                                <--shows the process of a given pid
     UID      PID     PPID   C    STIME    TTY  TIME CMD
    root  5767182  3801250   0   May 09      -  0:00 /usr/sbin/sshd


No comments:

Post a Comment