Mathematics
FileCheck.gif (1195 bytes) Cyclic Redundancy Code FileCheck  Lab Report
Compute CRC-32s for Files, a Directory, or a Volume
(Also see CRC Calculator Lab Report for how to compute CRC-16/CRC-32 of a character string)
ScreenFileCheck.jpg (59764 bytes)
  
ScreenFileCheckVerify.jpg (16011 bytes)

Purpose
The purpose of this project is to show how to compute CRC-32s for one or more files and to form a "metaCRC" based on an ordered sequence of files for a directory or volume.  The integrity of files processed through a "scan" operation can be checked at any later time in a "verify" operation by saving a file containing the observed CRC values.

Background
For QA/QC purposes, knowing that a file, a directory or even a volume is exactly the same as another is very useful.  After "burning" several CD-Rs I discovered that some of the files were not being written reliably.  I wanted to find a way to verify whether the CD "burn" had exactly the same contents as the original disk copy. 

For example, I discovered a CD-R with 646 directories, 13,421 files and 509,314,783 bytes had 12 bad files!  As long as I could identify which files were "bad," and verify the "bad" files could be safely ignored,  I could then "accept" the CD as a valid backup copy even with bad files.   This Lab Report is about a FileCheck utility that can be used to automate this verification of CD-Rs or other media.

Materials and Equipment

Software Requirements
Windows 95/98
Delphi 3/4/5 (to recompile)

Hardware Requirements
VGA display

Procedure

  1. Click on the FileCheck.EXE icon and view the "Scan" TabSheet.
  2. If necessary, choose unique names for the  FileList and Verify output files.
  3. Select the Volume, Directory or File of interest using the DriveComboBox, DirectoryListBox, and/or FileListBox controls.
  4. Press the corresponding Scan button and watch any messages in the Log area.
  5. If desired, press the Print button in the Log area to document the results of the scan.
  6. Select the "Verify" TabSheet.
  7. Specify a CRC Verify file and press the Verify button.
  8. Any mismatches will be displayed in the message area
  9. If desired, the "verify" results can be printed by pressing on the Print button.

Discussion
"Scan" and "Verify" are the two main functions of this program.   A volume, directory or file can be scanned with resulting data written to a File List   (for viewing by a human much like a DOS DIR list) and a Verify File.

An example "Log" for  a successful volume scan appears as follows:

Volume e:\
Directories = 696, Files = 9,225, Bytes = 249,435,921, Meta CRC32 = C42592A7
Scan time = 375.8 sec (7 Sep 1999 21:51:21)

A File List disk file created during a scan is an expanded version of the information shown by the DOS "DIR" command and is intended to be in a  human readable.

Sample FileList disk file

FileCheck: e:\ 09/07/1999 21:45
Label = EFG-DELL-E VolSer = D7D4B00A

Date Time Attrib Bytes CRC-32 Filename
---------- -------- ------ ----------- -------- --------

e:
10/03/1997 15:44:30 -D----                   bc5
08/28/1999 13:34:12 -D----                   bde32sdk
10/03/1997 15:54:20 -D----                   bp
10/11/1997 10:14:06 -D----                   Comm
10/12/1997 22:56:08 -D----                   radiation
07/27/1997 16:35:42 -D-SH-                   RECYCLED
------------ --------
                                   00000000 0 files

e:\bc5
07/26/1997 19:13:46 A----- 122,337 F0B7FB58 BC5RMV.LOG
10/03/1997 15:44:32 -D----                   BGI
10/03/1997 15:44:32 -D----                   BIN
03/25/1997 05:02:00 A-----  32,768 A364DCED cgclean.exe
03/25/1997 05:02:00 A-----   9,057 C835AFF0 CGREADME.TXT
03/25/1997 05:02:00 A-----  49,152 E01A87CD cleanini.exe
10/03/1997 15:45:04 -D----                   DOC
10/03/1997 15:45:18 -D----                   EXAMPLES
10/03/1997 15:48:14 -D----                   EXPERT
10/03/1997 15:48:18 -D----                   HELP
10/03/1997 15:49:02 -D----                   INCLUDE

...

e:\radiation\WIN-BUG
01/11/1992 09:20:10 A-----   9,006 3A88EC5C PEEKPOKE.EXE
01/11/1992 09:20:10 A-----      91 A01D7133 READTHIS
------------ --------
                             9,097 454BDB4E 2 files

e:\RECYCLED
09/02/1999 17:55:32 A---H-      65 74221298 desktop.ini
------------ --------
                                65 4A7D57B9 1 files

Summary of e:\
Directories = 696
Files = 9,225
Bytes = 249,435,921
Meta CRC-32 = C42592A7

A Verify disk file created during a scan is an ASCII text file.   This file is intended for processing by the "Verify" operation, or other computer programs.  

Sample Verify disk file

V                    Label = EFG-DELL-E VolSer = D7D4B00A
P                    e:
D         0 00000000 e: 0
F    122337 F0B7FB58 e:\bc5\BC5RMV.LOG
F     32768 A364DCED e:\bc5\cgclean.exe
F      9057 C835AFF0 e:\bc5\CGREADME.TXT
F     49152 E01A87CD e:\bc5\cleanini.exe
F     21950 B7A958C4 e:\bc5\INSTALL.TXT
F      9982 E14F0501 e:\bc5\README.TXT
F     57101 F918A9C9 e:\bc5\regsrvr.exe
F     11014 9333C2EF e:\bc5\uninst.ini
F     49152 96AEC4BF e:\bc5\UNPAQ.EXE
F    114688 F6576E76 e:\bc5\unreg.exe
F     21792 0FD64C87 e:\bc5\unreg.ini
D    498993 12F585E9 e:\bc5 11
F      6332 4F4D7A95 e:\bc5\BGI\att.bgi
F     49630 F413139D e:\bc5\BGI\bgidemo.c
F     23016 62949AEE e:\bc5\BGI\bgidemo.ide
F     12208 6D728AAF e:\bc5\BGI\bgiobj.exe

...

F      9006 3A88EC5C e:\radiation\WIN-BUG\PEEKPOKE.EXE
F        91 A01D7133 e:\radiation\WIN-BUG\READTHIS
D      9097 454BDB4E e:\radiation\WIN-BUG 2
F        65 74221298 e:\RECYCLED\desktop.ini
D        65 4A7D57B9 e:\RECYCLED 1
S 249435921 C42592A7 9225 696

Editing this file may cause the Verify operation to report erroneous results, but sometimes editing this file is the quickest way to compare files one-by-one that are moved to a new location.  Consider a directory scan.  The first line of the Verify file, which is normally named Verify.CRC, is a Path:

P                    c:\data
F      2975 F0B7FB58 c:\data\set1.dat
...

If this directory was moved to e:\Monthly\Backup\data, simply edit Verify.CRC file to replace "c:\data" with the new location "e:\Monthly\Backup\data". 

P                    e:\Monthly\Backup\data
F      2975 F0B7FB58 e:\Monthly\Backup\data\set1.dat
...

[Future:  A possible future Verify option would be to specify a path that would be used instead of the one specified on the "P" line in the Verify.CRC file.]

Look at this page for various I/O errors that can occur while running FileCheck.

At a later time, the information stored in the Verify File can be verified to see that all CRCs match the original values.  A Print button allows printing the Scan or Verify operations for documentation purposes.

A volume "scan" is much like the scan of  the root directory of a volume, except that the volume label and volume serial number are stored as part of the information about a volume.  A volume "scan" always implies that all subdirectories should be scanned.  The Subdirs Checkbox allows one to specify whether subdirectories should be scanned in a Directory "scan."

If multiple instances of FileCheck are run, be sure that unique File List and Verify files are specified.  If you blank either of the fields for these files, the corresponding file is not created.

The BitBtnScanClick method is called for a "click" on any of the Scan buttons.  The Tag value of each button is used to determine whether the scan is for the volume in the TDriveCombobox, the directory in the TDirectoryListBox, or the file in the TFileListBox.  A further helper routine, ScanDirectoryTarget, is called for processing a volume of directory scan.

The BitBtnVerifyFileClick method is called for the "verify" operation.  Many of the variables used for scanning are replicated within this routine so that (in theory) a scan and a verify could run simultaneously without interfering with each other.

See the CRC Calculator Lab Report for how to compute the CRC-16/CRC-32 of a character string, including source code for a CalcFileCRC32 procedure from the CRC32.PAS unit.

Two versions of CalcFileCRC32 are available.  The StreamIO conditional compilation variable allows to select I/O using Streams or with the older BlockRead routine.  Since I have observed that BlockRead is still faster than Stream.LoadFileFrom, the default is setting is NoStreamIO

Here are the two possible ways the CRC32 of a file is computed using the CalcCRC32 procedure:

CalcFileCRC32 using a TMemoryStream
(Compile with StreamIO set.  See CRC32.PAS for details.)

// The CRC-32 value calculated here matches the one from the PKZIP
// program.  Use MemoryStream to read file in binary mode.
PROCEDURE CalcFileCRC32 (FromName: STRING; VAR CRCvalue: DWORD;
    VAR TotalBytes: TInteger8;
    VAR error: WORD);
  VAR
    Stream: TMemoryStream;
BEGIN
  error := 0;
  CRCValue := $FFFFFFFF;
  Stream := TMemoryStream.Create;
  TRY
    TRY
      Stream.LoadFromFile(FromName);
      IF   Stream.Size > 0
      THEN CalcCRC32 (Stream.Memory, Stream.Size, CRCvalue)
    EXCEPT
      ON E: EReadError DO
        error := 1   // arbitrarily set this for now
    END;

    CRCvalue := NOT CRCvalue;
    TotalBytes := Stream.Size
  FINALLY
    Stream.Free
  END
END {CalcFileCRC32};

An Error code 1  is return from this procedure when an EReadException is encountered since the Exception Message string did have any additional useful information.   (See IOResult values below with BlockRead).

CalcFileCRC32 using BlockRead
(Compile with NoStreamIO set.  See CRC32.PAS for details.)

// The CRC-32 value calculated here matches the one from the PKZIP program.
// Use BlockRead to read file in binary mode.
PROCEDURE CalcFileCRC32 (FromName: STRING; VAR CRCvalue: DWORD;
    VAR TotalBytes: TInteger8;
    VAR error: WORD);
  CONST
    BufferSize  = 32768;

  TYPE
    BufferIndex =  0..BufferSize-1;
    TBuffer     =  ARRAY[BufferIndex] OF BYTE;
    pBuffer     =  ^TBuffer;

  VAR
    BytesRead:  INTEGER;
    FromFile :  FILE;
    IOBuffer :  pBuffer;
BEGIN
  New(IOBuffer);
  TRY
    FileMode := 0; {Turbo default is 2 for R/W; 0 is for R/O}
    CRCValue := $FFFFFFFF;
    ASSIGN (FromFile,FromName);
    {$I-} RESET (FromFile,1); {$I+}
    error := IOResult;
    IF       error = 0
    THEN BEGIN
      TotalBytes := 0;

      REPEAT
        {$I-}
        BlockRead (FromFile, IOBuffer^, BufferSize, BytesRead);
        {$I+}
        error := IOResult;
        IF       (error = 0) AND (BytesRead > 0)
        THEN BEGIN
          CalcCRC32 (IOBuffer, BytesRead, CRCvalue);
          TotalBytes := TotalBytes + BytesRead; // can't use INC with COMP
        END
      UNTIL (BytesRead = 0) OR (error > 0);

      CLOSE (FromFile)
    END;
    CRCvalue := NOT CRCvalue
  FINALLY
    Dispose(IOBuffer)
  END
END {CalcFileCRC32};

The most likely error values returned by this routine are as follows:

Error Brief Description
30 ERROR_READ_FAULT occurs when the system cannot read from the specified device.
31 ERROR_GEN_FAILURE occurs when a device attached to the system is not functioning.
32 ERROR_SHARING_VIOLATION.  The process cannot access the file because it is being used by another process.
This is likely to happen if you try to scan the Windows Swap file, e.g.,
Error Code 32 reading file c:\WINDOWS\WIN386.SWP

Whenver a read error occurs, an error message is displayed in the log and the CRC is assigned a value of $00000000.

A CRC-32 value can be computed for a each file in a directory.    The CRC of an ordered list of files in a directory could be directly computed, but maintaining the information about the computation is somewhat a pain.   So instead of a "true" directory CRC, a "MetaCRC" is computed for a well-ordered list of files in a directory.  This MetaCRC is simply a CRC of the file CRCs. 

A Directory  MetaCRC is a CRC of the file CRCs in a directory, which are processed in alphabetical order.  Each of the file CRCs is converted to an 8-byte hex string for computing the Directory MetaCRC.  (This facilitates a similar computation on machines of a different endianess.  That is, CRCing the list of file hex CRCs will give the same result on either a PC with little endian words, or a UNIX workstation with big endian words.)

The Volume MetaCRC is a CRC of the Directory MetaCRCs taken in alphabetical order.

[Erratum:  In the original version of FileCheck the Directory and Volume MetaCRCs were computed using a statement like this:

CalcCRC32 (@CRCValueHex[1], SizeOf(CRCValueHex), CRCTemp);

 Unfortunately, the SizeOf function should have been the string Length function -- SizeOf returned "4" as the length of the string pointer, while Length returned "8", which was the correct number of bytes in a hex character string of a 4-byte integer value.  The correction was made in the April 2001 version, labeled Version 1.01.  Thanks to Miroslav Vancl for bringing this error to my attention.  efg, 1 April 2001.]

The FileListLibrary.PAS unit provides a ScanDirectory procedure for a generic way to process a hierarchy of directories and files.    Two callback routines are parameters to ScanDirectory to process each file, and to process the beginning and end of a directory.  The routines ProcessDirectory and ProcessFile in ScreenFileCheck.PAS are the routines used as parameters to ScanDirectory.

To define a well-ordered list of files in a directory, a third parameter is a routine that is used to compare file names within a directory.  The OrderByFilename function in ScreenFileCheck.PAS uses StrIComp to compare filenames in a case insensitive way.

A global variable in the FileListLibrary unit, ContinueScan, allows an external routine to stop the processing of directories and files (intended to be set by a "Cancel" button).

The Dbt_h.PAS file is a partial translation of DBT.H, which was adapted from  "Notification of CD-ROM insertion and removal,"  http://www.undu.com/Articles/980221b.htm.   The WmDeviceChange message is used to detect a change in CD-ROMs so the .   (Setting a Debug compilation conditional enables additional log comments when this messgae is received).

The Refresh button on the Scan TabSheet forces an update of the TDriveComboBox, which may be necessary on some devices that do not generate a WmDeviceChange such as ZIP drives.   Calling the BuildList methods of  both the TDriveComboBox and the TDirectoryListBox updated these controls.

Unfortunately, the BuildList methods of both the TDriveComboBox and the TDirectoryListBox are protected methods.  Creating new controls derived from these classes is somewhat of a pain just to call the protected BuildList method.   To get around this limitation, derived classes were defined:

type
  // Trick to call protected method of TDriveCombobox
  TMyDriveComboBox = CLASS(TDriveComboBox)
  END;

  // Trick to call protected method of TDirectoryListbox
  TMyDirectoryListBox = CLASS(TDirectoryListbox)
  END;

These new derived classes were only used to typecast the original values and call the "protected" methods in the WmDeviceChange routine and the following:

procedure TFormFileList.SpeedButtonRefreshClick(Sender: TObject);
  VAR SaveDrive: CHAR;
begin
  SaveDrive := DriveComboBox.Drive;
  TMyDriveComboBox(DriveComboBox).BuildList;
  DriveComboBox.Drive := SaveDrive;

  TMyDirectoryListBox(DirectoryListBox).BuildList;
end;

Any change in a file will most likely result in different CRC value.   Keeping the number of bytes and the CRC value the same is even a more strict requirement.  The "verify" operation for each file checks that a file's size and CRC-32 is the same.  The "verify" operation for a directory is that the directory has the same number of files, bytes and MetaCRC values.  Likewise, a volume match looks for the same number of directories, files, bytes and MetaCRC values.

A ScanDetails Radiobox is partially implemented but is hidden in the current implementation.  This allows the CRC file to only contain directory information instead of file-by-file details.  (The "Scan" functionality of this feature works, but the "Verify" functionality doesn't work correctly when "Directories" is chosen instead of "Files.")

The Verify operation reads a Verify.CRC file created in the Scan phase.  The number of lines in this file is used as the measure of progress in the progress bar.  A TTokens class is used to parse the tokens in the Verify.CRC file.

So far, the process of simply attempting to read each file on a CD-R has identified the "bad" files -- files that cannot be opened and read.  CRC mismatches have not yet been observed on the same CD-R over time. 

One side effect of the process of verifying every byte on a CD was to identify a virus (using McAfee VirusScan) that was stored on several of my CD backups.

Conclusions
The FileCheck utility is a handy utility to verify a copy of a file, directory or even a volume (within acceptable probabilities).


Keywords
cyclic redundancy check, CRC-32, Lookup Table, MetaCRC, CalcCRC32, CalcFileCrc32, Stream I/O, TMemoryStream, BlockRead, WmDeviceChange message, DBT.H, FindFirst/FindNext/FindClose, TSearchRec, TStringList, Sort, StrIComp, Int64, Comp, IntToHex, FormatFloat, FormatDateTime, Format, GetVolumeInformation, Volume Serial Number, Volume Label, TTabSheet, TDriveComboBox, TDirectoryListBox, TFileListBox, procedure variables, calling protected methods, tokens

Files (only for noncommercial use)
Delphi 3/4/5 Source Code and EXE (195 KB):  FileCheck.ZIP


Updated 15 Dec 2002


since 6 Sep 1999