About Filesystem Serviceability Enhancements in AIX Versions 4 and 5


Contents

About this document
    Related documentation
Serviceability enhancements
New filesystem error log entries

About this document

This document discusses serviceability enhancements made to the base filesystem services in AIX since its initial release. Information in this document is applicable to AIX Versions 4.x and 5.x.

Related documentation

For more in-depth coverage of this subject, the following IBM publications are recommended:

The product documentation library is also available at the following URL:
http://www.rs6000.ibm.com/resource/aix_resource/Pubs/index.html


Serviceability enhancements

In order to troubleshoot filesystem-related problems more effectively, several enhancements have been made to the base AIX operating system through the following APARs.
AIX RELEASE      APAR
------------------------------------
4.1.5            IX83878
4.2.1            IX82819 and IX84402
4.3.2            IX86362

The enhancements are evident in three areas.

First, there is a new error code, ECORRUPT (return code 89). This value is returned to an application from a filesystem system call when non-fatal (for example, directory) filesystem corruption is detected.

Second, new filesystem error log entries have been added. There are two classes of new entries.

  1. Informational. These do not of themselves constitute an error condition. They are meant to alert the administrator to possible problems; they may require attention to prevent a problem from occurring.
  2. Diagnostic. These indicate that a problem has occurred. They are intended to pinpoint the resource on which a problem has occurred and what kind of problem has occurred. A service representative may need to be contacted to pursue these when they occur.
In the case of the ECORRUPT return code, examine the system error log to gather more information about what may have caused the corruption--such as a hardware problem or, possibly, downlevel microcode or drivers for the underlying storage subsystem. Unmount the source filesystem involved with this return code, and run fsck against it to try to resolve corruption problems.

In AIX 4.1.5 and 4.3.2, the message returned by perror() for ECORRUPT is Reserved errno was encountered. In AIX 4.2.1 and all future releases beyond AIX 4.3.2, the message returned by perror() for ECORRUPT is Invalid filesystem control data detected.

Third, the system dump facility has been enhanced to include extensive filesystem dump information only when a filesystem-related crash has been detected. In the case of a filesystem-related dump, more information will be present in the dump to help diagnose the cause. If the crash is not filesystem related, the amount of filesystem-related information present in the dump will be reduced. This will generally result in smaller dumps.

When you receive one of the new error log entries in your error report, refer to the following list for an explanation of each new error type and possible actions to take.


New filesystem error log entries

Error Description: SPECFS_DDINTPRI                   Error ID: AE26DD07
This error is logged when a call to a device driver returns with system interrupts disabled. A system crash will accompany this. The owner of the device driver should investigate the cause.

Error Description: JFS_USER_HARDLINK            Error ID: 5ECE4A58
This error is logged when a hard link is created to or removed from a directory. This error is intended as a warning since such operations can result in filesystem corruption or strange filesystem behavior.

Only an entity with superuser privileges can perform these kinds of hard links.

Error Description: JFS_USER_WRITEMOUNT      Error ID: D73189F6
This error is logged when a process writes directly to a logical volume while a filesystem on that logical volume is mounted.

Unmount the filesystem and run fsck to ensure filesystem integrity. fsck can also be run with the -d option specifying a block (as indicated from the full error entry) to determine what file, if any, may have been affected.

Error Description: JFS_FS_FULL                               Error ID: 369D049B
This error is logged when no free space exists within a filesystem. This error is logged only once per filesystem mount or filesystem extension.

Error Description: JFS_FS_FRAGMENTED            Error ID: 5DFED6F1
This error is logged when insufficient contiguous free space exists to fulfill an allocation request within a filesystem. This error should only apply to filesystems created with a fragment size less than 4K or a large-file-enabled filesystem.

Run the defragfs command on the affected filesystem to defragment the filesystem free space. This error is logged once per filesystem mount or filesystem extension.

Error Description: JFS_FS_NOINODES                   Error ID: 8988389F
This error is logged when all of the inodes allocated to a filesystem are exhausted. To resolve the problem, remove or relocate inactive files within the filesystem to another filesystem. Alternatively, increase the filesystem size to supply more inodes. If more inodes are needed within the specified size of the filesystem, then the filesystem may need to be recreated with a smaller NBPI value to increase the number of inodes for this filesystem.

Error Description: JFS_KERNHEAP_LOW             Error ID: 83F4B3CB
This error is logged if a kernel memory request is unsuccessful. This error will be logged no more than once per 24-hour period of system uptime. Contact an appropriate service representative.

Error Description: JFS_KERNHEAP_DELAY        Error ID: 7975092C
This error is logged if a kernel memory request had a previous failure but succeeded on a retry. This error will be logged no more than once per 24-hour period of system uptime. This error is merely a warning, but may result in a JFS_KERNHEAP_LOW error if the condition continues. Contact an appropriate service representative.

Error Description: JFS_LOG_WRAP                        Error ID: 061675CF
This error is logged if transactions written to a JFS log device caused the beginning of the log to be overwritten. The system will have crashed as a result. The solution is either to move filesystems that use that log to other JFS log devices to attempt to distribute the amount of JFS log activity, or to increase the size of the affected JFS log. Use the major/minor numbers in the error report to identify the log device in the /dev directory.

NOTE: To increase the size of a JFS log device, all filesystems that use the log must be unmounted. The logical volume that the log device exists on can then be extended with extendlv to the appropriate size ( up to a maximum of 256meg). Afterwards, run logform on the log device to format it to use all of the space in the logical volume.

Error Description: JFS_LOG_WAIT                          Error ID: CF71B5B3
This error is logged if transactions written to a JFS log device are reaching a threshold that may result in a JFS_LOG_WRAP condition. This entry, which is intended as a warning, is logged if this condition occurs more than 10 times in a one-hour period. Use the solution given for JFS_LOG_WRAP to alleviate or eliminate the condition.

Error Description: JFS_LOG_WRITE_ERR             Error ID: 902CE5A8
This error is logged when an input/output error occurs on the underlying device on which the JFS log exists. The error could indicate a hardware problem, possibly a missing disk, or backlevel microcode or drivers. Use the major/minor number given in the error report entry to identify the log device as listed in the /dev directory. Also, examine the error report for any disk or LVM errors.

Error Description: JFS_LOG_EXCEPTION              Error ID: 00A1E866
Ultimately, this error is much the same as the JFS_LOG_WRITE_ERR entry. Follow the same solution steps.

Error Description: JFS_COMP_CORRUPTION      Error ID: 4B6DA1F5
This error is logged when the decompression code for a compressed filesystem fails due to corrupted file data. This error is logged once per filesystem mount, and is likely to be due to a hardware problem. Ensure that the latest drivers and microcode are installed for the underlying storage device, and check the error report for any errors related to the storage subsystem.

Error Description: JFS_META_CORRUPTION      Error ID: 684A365B
This error is logged when filesystem corruption is detected. Examine the system error report for any errors on the storage subsystem. Ensure that the latest drivers and microcode are installed for the storage subsystem components. Unmount the affected filesystem and run fsck against it. If a system crash occurred when this error was logged, the dump should be examined to help determine the cause. Contact your appropriate service representative if you need further assistance.

Error Description: JFS_META_EXCEPTION         Error ID: 0EC00096
This error is logged when filesystem corruption is detected. Examine the system error report for any errors on the storage subsystem. Ensure that the latest drivers and microcode are installed for the storage subsystem components. Unmount the affected filesystem and run fsck against it. If a system crash occurred when this error was logged, the dump should be examined to help determine the cause. Contact your appropriate service representative if you need further assistance.

Error Description: JFS_META_WRITE_ERR        Error ID: D2A1B43E
This error is logged when filesystem corruption is detected. Examine the system error report for any errors on the storage subsystem. Ensure that the latest drivers and microcode are installed for the storage subsystem components. Unmount the affected filesystem and run fsck against it. If a system crash occurred when this error was logged, the dump should be examined to help determine the cause. Contact your appropriate service representative if you need further assistance.

Error Description: JFS_FSCK_REQUIRED            Error ID: CD546B25
This error is logged when filesystem corruption is detected. This will normally accompany another error entry such as JFS_KERNHEAP_LOW, JFS_LOG_WRITE_ERR, JFS_META_CORRUPTION, JFS_META_EXCEPTION, or JFS_META_WRITE_ERR. Unmount the affected filesystem and run fsck against it.




[ Doc Ref: 91064783515772     Publish Date: Jun. 26, 2001]