Tuesday 19 March 2013

A nice clean dump


What? A dump?

AIX, like many other operating systems, provides the capability of making a system dump. This is nothing to do with the Windows Recycle bin or a Trash folder, a kind of holding place in case you want to undelete something. ("Undelete"? Who invented that word?). This IBM technote on Managing System Devices explains that you need a system dump which "automatically copies selected areas of kernel data to the primary dump device."
Care factor zero?
Now "automatically copying selected areas of kernel data" may not be right up there on the scale of life-changing news, so let's put it in layman's terms: you need to store a system dump in the event of a
catastrophic operating system failure
First the good news ...
You almost certainly already have a system dump in place. Use the command sysdumpdev -l to list the primary dump device. If you prefer to avoid the command line you can use smitty dump.
sysdumpdev -l
primary              /dev/lg_dumplv
secondary /dev/sysdumpnull
copy directory /var/adm/ras
forced copy flag FALSE
always allow dump FALSE
dump compression OFF
 In AIX 5.x the default dump device was /dev/hd6, except for servers or LPARs which at installation time had more than 4GB of memory. Then the default dump device was /dev/lg_dumplv. That's probably what you see as the primary dump device now.

... and now the bad news

bad news #1: Your dump device may be too small
 
Your dump device may not be big enough to capture a system dump. You can check the error report or run the dumpcheck command which should be scheduled in the cron to run once a day anyway. You'll probably need to specify the full path, which is /usr/lib/ras/dumpcheck. If nothing appears in the error report (errpt command), then at least one of your dump devices is large enough for that crash you're just waiting for. No news is good news. Aren't you glad you made enough space for rootvg?

bad news #2: dump won't prevent first crash
 
Having a system dump device is not going to prevent that catastrophic operating system failure. At least it won't for the first time you crash. What the dump does give you is something you can send off to your second level support to find out what caused that failure so that you can hopefully prevent it happening again.

bad news #3 (don't think I'm enjoying this): you may need a secondary dump device

If your primary dump device becomes unavailable, usually because the disk it sits on is lost somehow, you may need to have a secondary dump device. You probably don;t have one. If it's set to /dev/sysdumpnull then you're completely dependent on your primary dump device.

It's easy to set up a dump device: create a logical volume(smitty lv)of type sysdump and then use the command sysdumpdev -s to set it as your secondary device. The smitty dump menus give lots of helpful information.

Clean up this dump!

For a step-by-step explanation of how to manage system dumps, have a look at that IBM technote which was mentioned above.

If you need to configure or increase the size of a dump device, it's easy to do. Hopefully it will never be used, but if that system crash does happen, you'll be glad you have something which will help diagnose the root cause.

No comments:

Post a Comment