Repairing Corrupt File Systems or File System Log Devices


Contents

About this document
About the fsck and logform commands
Recovering superblock errors
Correcting primary file systems and log
Fixes for the fsck and logform commands

About this document

This document covers the use of the fsck and logform commands to repair inconsistencies or corruption in file systems and associated log devices. This document applies to all versions of AIX.

About the fsck and logform commands

The fsck command is used to detect and repair inconsistencies in the journaled file system (JFS) and the enhanced journaled file system (JFS2) structure. It is not intended to correct problems with corrupt data, but only the file system structure itself. The basic syntax is:
	fsck [options] /dev/<LVname> 
OR
	fsck [options] /<fsname> 
The file system MUST be unmounted for any operation to occur. The fsck command requires the file system to be in a consistent state while performing its checks. Attempts by users to write to the file system might cause fsck to report corruption that does not exist. For this reason, any errors reported by fsck while the file system is mounted may not be relevant.

Various options for fsck exist:

-f    performs a fast check; files systems with check = true entries in /etc/filesystems are checked
-p    fixes minor problems without interaction from user
-y    gives permission to correct every problem found
-n    indicates not to correct any problems

See the man page for fsck for more information.

While the -y flag is certainly time-saving, be careful with this option. If fsck cannot read a block due to a missing disk, for example, it will ask to clear the block. If the disk is missing due to an adapter failure, for instance, you may be removing recoverable data by responding yes or by giving the fsck command explicit permission to fix everything it finds.

Examples of fsck use:

	fsck /dev/lv00 
	fsck -y /data 
	fsck -p /dev/lv00 
The logform command formats a logical volume for use as a log device, which stores transactional information about file system metadata changes and can be used to roll back incomplete operations if the machine crashes. The logform command is destructive; it wipes out all data in the logical volume. Accidentally running this on a file system completely destroys all file system data. The logform command should only be run on CLOSED logical volumes. If a log device is open due to its use by a mounted file system, the file system should be unmounted prior to running logform against the log device.

NOTE: There is no procedure for reformatting inline logs used in enhanced journaled filesystems (JFS2).

Run the following to ensure that the log device is closed:

	lsvg -l <VGname>
Here are some examples of messages you might receive that would indicate a corrupt log device:
failure replaying log 
media is not formatted or format is not correct 
Examples of logform use:
	logform /dev/loglv00 
	logform: destroy /dev/loglv00 (y)?y 

Recovering superblock errors

If you receive one of the following errors from the fsck or mount commands, the problem may be a corrupted superblock.
fsck: Not an AIX3 file system 
fsck: Not an AIXV3 file system 
fsck: Not an AIX4 file system 
fsck: Not an AIXV4 file system 
fsck: Not a recognized file system type 
0506-342 The superblock is dirty.  Run a full fsck to fix.
mount: invalid argument 
The backup superblock can be copied over the primary superblock via one of these commands:

All versions:

	dd count=1 bs=4k skip=31 seek=1 if=/dev/lv00 of=/dev/lv00 
For AIX 4.x and AIX 5.x only:
	fsck -p /dev/lv00 
Once the copying over is completed, check the integrity of the file system by issuing:
	fsck /dev/lv00 
In many cases, copying the backup superblock to the primary superblock will recover the file system. If this does not work, you will have to recreate the file system and restore the data from a backup.

Correcting primary file systems and log

It is not possible under normal circumstances to unmount /, /usr, /tmp, and /var, and thus close /dev/hd8 (the primary rootvg log device) so they can be checked or fixed. This can only be done in maintenance mode.

  1. Boot the machine into maintenance mode, access the rootvg volume group, and start a shell prior to mounting the file systems. If you need assistance with this, contact your AIX support center.

  2. If your system is at AIX 3.2.4 or 3.2.5, set the ODMDIR variable with:
    	ODMDIR=/etc/objrepos 
    
  3. Run the following commands to fsck the primary file systems:
    	fsck /dev/hd4 
    	fsck /dev/hd2 
    	fsck /dev/hd3 
    	fsck /dev/hd9var 
            fsck /dev/hd1
    
    Other fsck options as outlined previously can also be used, where appropriate.

  4. Format the default jfslog for the rootvg JFS file systems with:
    	logform /dev/hd8 
    
    Answer y when asked if you want to destroy the log.

  5. Type exit to exit from the shell. The primary file systems will automatically mount.

  6. Shutdown and reboot with the key in normal:
    	sync; sync; sync; shutdown -Fr 
    

You can also run fsck on any user-created file systems in rootvg, if needed. This can typically be done in normal mode.


Fixes for the fsck and logform commands

AIX
VERSION     APAR         DESCRIPTION
4.1.4               IX54927     logredo fails trying to open log
4.1.4               IX55250     fsck does not fix inodes with corrupted ACLs
4.1.4               IX56526     fsck may fail to replay log
4.1.4               IX60754     Do not allow logform to format past 256MB of loglv
4.2T                IX68182     fsck coredumps while correcting errors
4.2                  IX76061     defragfs and fsck incorrectly report bad bit map
4.2.1               IX77541     fsck should patch up allocations in wmap
4.3.1               IX78066     fsck should patch up allocations in wmap
4.3.x               IY09173     fsck does not correct file corruption which it finds
4.3.3               IY19799     freeiblk() backt fault on mount inode
4.3.3               IY13624     system crashes in unlockl
4.3.3               IY19778     jfserrlogging code should correctly handle multi-seg .indirect
4.3.3               IY15765     kernel & fsck report incor errors for almost full big file fs

You can check if a particular APAR is installed at version 4 with instfix -ik IX99999, for example.


[ Doc Ref: 90605204914686     Publish Date: Jan. 11, 2002]