System Crashes in AIX 4.x


Contents

About this document
Collecting the system dump
Submitting the testcase
Forcing a system dump

About this document

This document describes the steps necessary for collecting and sending system dumps in AIX Versions 4.x. The section "Collecting the system dump" discusses dump collection, and "Submitting the testcase" describes the various methods of sending testcases. The section "Forcing a system dump" provides information if a valid dump does not result from the initial collection procedure.


Collecting the system dump

For best results, have a tape drive connected to your system before proceeding with the following steps.

  1. Determine if your system has an LED display and proceed accordingly.

    System with LED display

    1. Turn the key to Service mode.
    2. Wait for 888 to flash.
    3. Write down LED each time Reset is pressed until 888 displays again. These LED codes will be required during phone contact with AIX support personnel.
    4. Power off the system and move to step 2.

    System with LED or non-LED display without key switch or reset button, AIX 4.1.4 and later.

    1. If there is no disk activity, press Ctrl Alt 1 (on the number pad).
    2. Wait for disk activity to stop.
    3. Power off the system and move to step 2.

  2. If you have not already done so, connect a tape drive to the system. Power the system on.

    Unless the default dump configuration has been modified with the sysdumpdev command, the dump will be copied to /var/adm/ras/vmcore.x when the system is powered on.

    NOTE: If /var is too small to hold the dump, the system will prompt the user to copy the dump to external media such as tape or pre-formatted diskettes. If a tape drive is not connected, the system will prompt for diskettes. Using diskettes is NOT a recommended method of collecting a dump. If you are unable to save the dump, IBM will not be able to determine what caused your system to crash.

  3. If the dump to /var/adm/ras fails, answer Yes when prompted to collect the dump to external media. Log in as root and then issue the command:
        sysdumpdev -L 
    

    Check the sysdumpdev -L output for a valid time-stamp. If the time-stamp matches an approximate time of the system crash, then go to the next section. If the time-stamp does not match OR if the sysdumpdev command does not result in any output, a valid dump was not captured at this time.


Submitting the testcase

Decide which method to use to send the testcase.

Overnight/US Mail

To send a testcase via overnight mail, do the following:
  1. Place a blank tape in the tape drive. This step is applicable for users who copied the dump to tape or diskettes as well as users who copied the dump to a file in /var/adm/ras.

    If you are sending the dump on tape, acceptable media are 8mm, 4mm or 1/4" QIC tape.

  2. Issue the command:
        /usr/sbin/snap -gfkD -o /dev/rmt#. 
    

    This copies the unix file image and the error log needed for a complete dump analysis.

    NOTE: In the previous instructions, replace /dev/rmt# with the device name for your tape drive (usually /dev/rmt0).

  3. Verify that the medium contains at least the following files to analyze a system crash or hang:
        tar -tvf /dev/rmt# 
    
    or for 4.3.3.0-05 Maintenance level and later:
         pax -vf /dev/rmt#
    

    In the output, there should be three lines similar to the following:

    -rw-r--r--   1 user    group      36243456 Feb 21 10:05 ./dump/dump.Z 
    -rw-r--r--   1 user    group           176 Feb 20 15:13 ./dump/dump.snap 
    -rw-r--r--   1 user    group        933536 Feb 20 15:13 ./dump/unix.Z 
    

    NOTE: The file dump.Z may be just dump, and the file unix.Z may be just unix. These files should also be greater than 0 in size. In this case, dump.Z = 36243456 bytes and unix.Z = 933536 bytes.

  4. Ship the testcase per the following shipping instructions.
    1. Label the media.
              <customer name>
              <pmr #, branch #>
               tape block_size = <xxx> 
          <the command used to copy the information to tape> 
      

      NOTE: The tape block size can be obtained with the command:

          lsattr -El rmt# 
      

      The copy command is generally snap.

      IMPORTANT: If the person who is sending the testcase is NOT the person who reported the problem, be sure to include the name of the person who reported it. If the proper information is not on the package, then process delays will occur.

    2. Send the media to:

      IBM Corp.
      Attn: AIX Testcase Dept.
      0422A044 11400 Burnet Road
      Austin, TX 78758-3494
      Extension 3-4100

      NOTE: If you specify Saturday delivery, you must first make special arrangements with an AIX Support specialist. Otherwise, there could be a delay of several days.

Internet (ftp)

If you are sending the testcase by ftp, proceed according to how you collected the dump:
  1. Copy the dump information from the tape to the hard disk.
    1. Find the size of the dump with the command:
          sysdumpdev -L 
      
    2. To see the amount of free space available in /tmp, run the command:
          df /tmp 
      

      If space is not available to contain the dump, then increase the size of /tmp.

    3. Collect relevant system dump information, such as the unix file image and error logs, by running:
          snap -gfkD 
      
    4. Copy the dump from the tape to the hard disk:
          cd /tmp/ibmsupt/dump 
          tar -xf/dev/rmt# 
      
      or for 4.3.3.0-05 Maintenance level and later:
          pax -rvf
      
    5. Create a compressed tar or pax image of the dump and the snap output:
          snap -c 
      
    6. Verify that the tar or pax file contains at least the following files to analyze a system crash or hang:
          zcat /tmp/ibmsupt/snap.tar.Z | tar -tvf - 
      
      or
          zcat /tmp/ibmsupt/snap.tar.Z |pax -rvf -
      

      In the output, there should be three lines similar to the following:

      -rw-r--r--   1 user    group      36243456 Feb 21 10:05 ./dump/dump.Z 
      -rw-r--r--   1 user    group           176 Feb 20 15:13 ./dump/dump.snap
      -rw-r--r--   1 user    group        933536 Feb 20 15:13 ./dump/unix.Z 
      

      If so, go to step 3.

    NOTE: The file dump.Z may be just dump, and the file unix.Z may be just unix. These files should also be greater than 0 in size. In this case, dump.Z = 36243456 bytes and unix.Z = 933536 bytes.

  2. Verify that the compressed file contains at least the following files to analyze a system crash or hang:
        zcat /tmp/ibmsupt/snap.tar.Z | tar -tvf - 
    
    or
        zcat /tmp/ibmsupt/snap.tar.Z |pax -rvf -
    

    In the output, there should be three lines similar to the following:

    -rw-r--r--   1 user    group      36243456 Feb 20 10:05 ./dump/dump.Z 
    -rw-r--r--   1 user    group           176 Feb 20 15:13 ./dump/dump.snap 
    -rw-r--r--   1 user    group        933536 Feb 20 15:13 ./dump/unix.Z 
    

    Go to step 3 below.

    NOTE: The file dump.Z may be just dump, and the file unix.Z may be just unix. These files should also be greater than 0 in size. In this case, dump.Z = 36243456 bytes and unix.Z = 933536 bytes.

  3. Sample file names for the tarred and compressed file follow:

    1x234.001.tar.Z              (problem_report_#.branch_office_#.tar.Z)
    1x234.001.pax.Z             (problem_report_#.branch_office_#.pax.Z)

    If the ftp fails, the file will need to renamed for the next attempt. Use the format: 1x234.001.2nd.tar.Z or 1x234.001.2nd.pax.Z

    Follow these steps to ftp the file to the testcase repository:

    1. Enter ftp testcase.software.ibm.com.
    2. At the login prompt enter anonymous.
    3. At the password prompt enter your complete email address following the format customer@company.com.
    4. Change to binary transfer mode. Enter bin.
    5. Change into the /aix/toibm directory. Enter cd /aix/toibm.
    6. Use the put command to place the file. For example, put 1x234.001.tar.Z.
    7. Enter quit.


Forcing a system dump

If the system does not respond to mouse or keypad input, then it is in a hung state.

If the user cannot telnet, rlogin, or ping to the system, the system is hung. Another indication of this is if the user can ping the system but the rest of the system is not available.

It is likely that the system will hang again. If this event recurs, the following steps will prepare the system for a forced dump.

Preparing for a forced dump

NOTE: In AIX Version 4.1.4 or later, a system dump can be forced without a key switch. The system needs to be initially configured to use this method, which you can do through SMIT by following the fastpath.

Run the following command:

    smit dump 

Change the attribute Always Allow System Dump to TRUE.

When the system hangs again, proceed according to the type of system:

System with key and LED machine

  1. Turn the key to Service mode. Wait for a moment, then press Reset.
  2. The LED sequence will be 0c2-0c4 or 0c2-0c0.

System with LED or non-LED display without a key switch or reset button, AIX 4.1.4 and later

  1. If there is no disk activity, press Ctrl Alt 1 (on the number pad).
  2. Wait for disk activity to stop.
If a hang occurs after completing either of the previous sequences, power off the system and go to
step 2 at the beginning of this document.


[ Doc Ref: 90605202114646     Publish Date: Dec. 03, 2001]