Jump to article
< >

Active GUI element

Static GUI element

Code

WPS object

File/Path

Command line

Entry-field content

[Key combination]

more

SigmaMD5
A program to check message digests

by Keith Merrington, © February 2006

Prologue

I wrote this program because I had problems installing eCS and needed a method of checking the integrity of my CDs. I had purchased an electronic version of eCS, burnt the two ISO images onto two CDs but they did not work when it came to the installation!

I started the usual round of problem elimination, the first of which was to check the data integrity and validity of my boot CD. Now I had unzipped both ISO images without a problem, but I was unsure if the boot CD was okay. On the CD I noticed a file called MD5SUM.TXT. In this file a MD5 checksum was given for each file. I found a program called md5sum which would generate checksums on a single file, but to do this for every file on the CD was a task I didn't fancy doing. So again I started to look around for other programs that could handle multiple files. A quick scan on Hobbes didn't give me the solution I was looking for, although there were a number of MD5 check programs available.

Looking further afield I found a program called MD5summer but this was a Windows program. Nevertheless it did allow me to check my boot CD and determine that a number of files were incorrect. It turned out after checking with Mensys, that the files were correct but a mistake had been made with some of the supplied checksums!

By the way, the reason my install failed turned out to be a CD drive that was a bit finicky with some CDs.

Now not being a Windows fan but quite liking the way md5summer worked, I thought I would write a PM program using a similar interface.

I should mention that although I had purchased Visual Age C++ 3.0 many years ago, about the same time "Merlin" was released, I had never written any programs for OS/2 at all. My programming experience with C was limited to some simple programs when studying UNIX back in the 70's. I managed to get started thanks to the a book called "OS/2 Presentation Manager Programming," by Charles Petzold that l had purchased at the same time I bought Visual Age. If you want to try your hand at PM programming then Petzold's book is to my mind an excellent book to get started with and I would recommend it to everyone.

Using md5summer as the template for my program helped simplify the basic design. What I wanted was to read in a file with a list of filenames with the associated MD5 checksum (such as MD5SUM.TXT) and quickly determine if the files were correct. I won't bore you with all the programming mistakes I made before I eventually had something than worked and I could check my CD on OS/2.

MD5

(See http://www.ietf.org/rfc/rfc1321.txt)

During my search to find an algorithm for MD5, I discovered that there had been at least two earlier versions called MD2 and MD4. I really don't know if there was ever a MD1 or MD3, although there is something called Metadata3 which is sometimes called MD3. As with all checksums, CRCs, etc., the MD5 algorithm is just an arithmetic manipulation on input data to reduce it to a value. CRC is typically 16 or 32 bits in length, but for MD4 and MD5 it is 128 bits. The manipulation is executed on a message of arbitrary length resulting in a 128 bit checksum or digest. Hence the name Message Digest and abbreviation MD. The method of creating this message digest was developed by Professor Ronald L. Rivest of MIT, as were the versions MD2 and MD4. Although the structures of these algorithms are somewhat similar, the design of MD2 is quite different from that of MD4 and MD5 as it was optimized for 8-bit machines. MD4 and MD5 were for 32-bit machines. MD5 was developed to avoid some of the weaknesses found in MD4. This means that it provides a better check with less chance of being compromised. In essence, MD5 is a very secure way to verify data integrity and is more reliable than other commonly used methods.

The program

The current version of my program 1.02 comes as five files zipped together as SigmaMD5-1-2.zip. They are:

readme.txt
the general information file
SigmaMD5.exe
the executable file.
SigmaMD5.hlp
the help file.
SigmaMD5.dll
a DLL with executable code for large file handling.
SigmaMD5.MD5
the MD5 checksum file of the above files.

Installation

As I wanted to concentrate my efforts on the program, I have not written an installation script or install program. You just have to unzip the zip file to a directory, preferably referenced by the PATH statement in CONFIG.SYS. The help file may be moved to a directory referenced by the HELP statement, as may the DLL to a directory referenced by the LIBPATH statement (this was one of the suggestions I received from a user). To make life easy, it is worthwhile associating all files with the filename extension of .MD5 to SigmaMD5. Associating the file also to the E editor (or the editor of your choice) allows either program to be selected when using the right mouse button on the file object with Open As. The easiest way to make this association is to use Assoedit written by Henk Kelder, select Associations > Associations by file Filter, Add the filter type *.MD5 first then using the Add button for the association window type in the location of SigmaMD5.exe and then repeat this operation for the editor.

Specifications

The current version is 1.0. The first public version released on Hobbes was 0.73 and a special version 0.902 was included with the eComStation 1.2 media refresh. The current version differs from the earlier versions by including large file support and file filters. It has the following specifications:

Memory usage is about 1 MiB plus the size of the list file to be read/written as this is loaded directly into memory.

Function

This program has two main functions. 1) Checking and 2) generating MD5 checksums with a file containing a list of filenames together with their corresponding MD5 checksums. The format of this 'list' file is standardized and is compatible with the output of the LINUX program GNU MD5Sum. Basically this file has two line types: comments, or a MD5 checksum line. A comment line either starts with a hash (#) followed by the comment, or is blank. A checksum line starts with the 128 bit MD5 checksum as a 32 character hexadecimal string starting on column 1. After the checksum there is a space and then an asterisk (*). After the asterisk the pathname of the file from which the checksum was generated is given. As the directory separators in Linux are forward slashes I internally convert and display them as backslashes in my program. They are however converted back to forward slashes to maintain compatibility when the MD5 list file is written. The filename extension of the list file is generally .txt or .md5. In my program I always write this list file with the extension .MD5.

Checking

The simplest method to check a file or list of files is to doubleclick on the appropriate MD5 file—assuming that the file type has been associated with SigmaMD5. To check correct functioning, it is easiest to click on SigmaMD5.MD5. As soon as this is done, SigmaMD5 is loaded and the contents of the MD5 file is read into memory using a separate thread. The first lines, excluding any comment lines, are displayed in the window in the listing area. The format of the line is first a "LED," then the pathname of the file being processed, and lastly the MD5 checksum. The LED can be white, yellow, green, or red indicating, respectively, waiting to be processed, currently being processed, already processed with a correct checksum, a checksum or file error occurred.

Main window
Fig. 1: Main window

Almost immediately the checksum for each file is calculated and compared against the checksum in the list file. If correct, the 'LED' belonging to that line is set to green, or if it is incorrect to red. While each file is being processed, the filename, path and file size is displayed in the status area, as is the current file number and the total number of files to be processed. This information is supplemented by two progress bars showing both the progress of the current file and the total progress of the complete batch. I have tried to give a true indication of how far the batch has progressed by first calculating the total size of all the files in the list file and then showing the progress as the number of bytes read against the total number of bytes as a percentage. In practice, I discovered that the time taken to do this could become excessive, I therefore only do this calculation if there are less than 200 files, otherwise the calculation is simply a percentage of the number of files completed as against the total number of files.

This method of checking, although the simplest, relies on the checksum file being located at the same start location or root as the files to be checked. This most often is the case as with the eCS 1.2 boot disk but is not always so! By definition, the file name in the MD5 list file is only the pathname, so no drive letter and no root indicator ("\") is present . This allows complete flexibility as to where the file-set and MD5 list file is placed. If the start of the file-set is not at the same position as the MD5 list file, then its position must be given.

This is probably best explained by an example. Suppose I have a copy of the eCS CD on the hard disk, at say location E:\images\ver12, but the MD5 list file is at f:\ecs\md5\ver12.md5. First we start SigmaMD5 and then using either the file menu open, or the Check button, the standard OS/2 Open dialogue is displayed and I can select the MD5 file.

MD5 File Open Dialogue
Fig. 2: MD5 File Open Dialogue

After successfully selecting the list file, a new dialogue is opened. In this dialogue the start directory of the file-set is given.

Selecting the start directory
Fig. 3: Selecting the start directory [Larger image]

Navigating is either by selecting the directory shown or by using one of the pushbuttons. In our example we would first select the drive, and then go through the directory tree until we had selected E:\images\ver12. The location of each file is thus specified by the pathname in the MD5 list file prefixed with the path and drive as specified in the Select Start Directory dialogue. In many instances SigmaMD5 is not started via an association but directly. In those cases where the MD5 file is at the root of the file-set, simply pressing the Set to MD5 Dir button does all the work for us by pointing immediately to the same directory as that in which the MD5 list file was selected. Once the dialogue has been acknowledged by using the Okay button, the list file is read in and the checksums are again checked.

Creating a MD5 file

To create a MD5 list file, the root directory and the files for the file-set need to be specified. After selecting create (using the file menu or the Create pushbutton) the root directory can be selected in a dialogue similar to Fig 3, and then the Create MD5 File dialogue is displayed. On the first page of this two page dialogue, the files to be included can be selected. Files can be selected in the 'file window' and copied to the selection window either individually or in groups. Selection is by highlighting the required files and then using the Add button, double-click or [Enter] key. Complete directories together with their subdirectories and files can similarly be selected and copied to the selection window. Since the recursive selection of files can be time consuming, I implemented this function in a separate thread which also allowed me to implement a cancel function such that while files are being transferred to the selection window, the Finish button becomes a Stop button.

It is easy when selecting individual files and then directories to inadvertently select the same file more than once. I added a function in the form of Dup file inhibit checkbox to prevent this. I made this an option as it can consume significant processor power to check that the file has not already been selected when a large number of files are involved. Since it may not always be apparent that the program is active, I also incorporated an activity indicator in the form of a expanding line of blocks at the top of the dialogue. This may be the case when selecting a large number of files with the Dup File Inhibit function enabled, or when the selection of files has been affected by filters.

Create File Selection
Fig. 4: Create File Selection [Larger image]

There are filters to limit which files are selected. Filters are by date, file attribute, as well as include and exclude lists. These filters are specified on Page 2 and can be turned on and off individually. The filters selected are remembered using an INI file between executions of SigmaMD5. A precautionary message is flashed on page 1 if any filters are set active which may be the case from a previous session.

To enter data or use a filter, it must first be activated. The checkbox to the right of the Inclusive Date filter enables this function when checked and positions the cursor to the start of the inclusive date entry field. OS/2 keeps a record of three dates and times for each file: Creation date, Last access date, and Last Write date. I assume that most people would choose when the file was last written so that is how I implemented it.

As I have been frustrated by entry fields in the past, I have tried to make this simple to use by allowing the original data to be overwritten (type over) and with checks on the fly. Also the cursor is moved automatically to the next field without having to use either the [Tab] key or mouse. When a day, month or year is entered, no error message is given (unless a character or symbol instead of a number is entered) until either another filter is selected or the dialogue is being closed. If an invalid day, month or year is entered, the value in the corresponding field is highlighted in red. If it is a combination error such as the 31st of September or the 29th of February in a non leap year, then all the associated fields are shown orange since it is unclear if the day, the month, or the year are incorrect. I also programmed a minimum and a maximum date. The minimum is January 1st 1980 since an earlier date cannot be represented in a PC. And I set the maximum to the 31st of December 2099 which I think might be a reasonable age for OS/2 to retire. When an inclusive date is active only those files having the 'last written date' within this period are selectable on the file selection page (page 1).

File attributes can also be selected as Don't Care, On or Off. As the normal symbols for a tristate checkbox to my mind are ambiguous, I looked for another solution. As I could not find a way of changing the symbols shown on a checkbox, I decided to use a normal push button sized to look like a checkbox which allowed me to choose the symbols myself. I use blank to represent Don't Care, + for On, and - for Off, as this most closely follows the attribute command and to my mind was more clear. Normally only files without the hidden or system attribute are displayed by most programs. I opted to have all attributes by default as Don't Care as to my mind a filter always removes something but never adds it.

File filters
Fig. 5: File filters

The include and exclude filters I think speak for themselves. Since most errors occur due to faulty input I try to check that the exclude filter falls within any one of the include filters but this can be switched off if the Filter Cross Check box is not enabled. Since a filter cannot be a filename by definition, I added the Exclude File by Name function so that files such as swapper.dat could easily be excluded.

Once the required files have been selected, pressing the Finish button (shown as Stop during file selection) on page 1, closes the Create dialogue and starts the generation of the MD5 checksums. The format in the display area is exactly the same as when checking a MD5 list file, and again this is done in a separate thread. When complete, the list file can be saved to a file (via the standard File Save dialogue) and is given the MD5 file extension.

Menu and Options

The menu is divided into three main parts. File, Options and Help. Under File menu there is Open and Create which perform the same functions as the Check and Create pushbuttons, Save and Save and Quit, and Exit. The last part of the file menu contains up to four pathnames. These are the names of the last four files open and or saved. They are ordered chronologically with the most recent at the top. I made this option so that I could quickly select a file again (which I found handy when I incorrectly selected the file-set root).

File history
Fig. 6: File history [Larger image]

Three font sizes can be selected using the font sub options Small, Medium and Large which allow for different screen resolutions.

In the options menu a default or user specified log can be started, stopped or deleted. The log records the actions and results of creating or checking a list file. It records the date and time that the log is (re-)started, whether creating or checking a file, the name of the list file, and its content. If checking, the results of the check per file plus any error messages or user actions are recorded. If logging is switched on, the size of the log file can grow quite rapidly so I implemented a warning system as a sub option in the log menu.

To help aid readability of the checksum I added a separator function. All that this does is when selected is to add a hyphen every 4th hexadecimal character. Leaving this on when saving a file does however make the file incompatible with the GNU MD5Sum output!

Restore window size is a quick way to restore the window to its default size and position.

Sort on Error moves all errors found during checking or generation to the top of the list making it easy to see where errors occurred.

The last option is a registration option. The program will work for 31 days after which registration is necessary. Registration is completely free and without obligation. I added this to get an idea of how many people used my program so I would know if it was worthwhile developing it further. Until now the response has been limited!

Miscellaneous

One problem I ran into was long file and pathnames. It was quite possible that a file or pathname was too long to be displayed fully. I chose a truncation scheme such that in the main window it would be truncated from the left hand side, preserving the filename as much as possible. I indicate that the name has been truncated by starting the pathname with two dots. In the status window the pathname is truncated from the right, as is the filename. This meant that in general there was sufficient information on the screen to reconstitute a file name in most cases.

If the window is resized to a width less than the default width, then I only truncate the MD5 checksum, but then from the right. I also change the colour of the checksum from blue to dark blue and add three dots to the end of the checksum. Resizing I decided to limit to a minimum size both horizontally and vertically such that there is always a minimum of information shown.

Depending on the position of the Main window I noticed that it was possible that a message or dialogue window, could be partially off screen. Sometimes this meant that it was not possible to close the window as the button to do so was also off screen. I now check and center all sub windows and messages, so that they are as fully visible as the screen resolution allows.

You may wonder why I have a DLL for large files. I found out that if I included the code in the executable file and the OS did not support large files—as is the case with Warp 4 up to fixpack 12—then a system error occurs when loading the executable and the load is aborted. I now first check if the OS version can handle large files and only then load my DLL.

Epilogue

I would like to thank Martin Vieregg for his program Hypermake which I used to generate the help file for SigmaMD5.

As I have said in my help file let me know if you like my program as like any other programmer I also need encouragement. Should you find a bug or have an idea to improve this program then please let me know.

Formatting: Christian Hennecke
Editing: James Moe
References

SigmaMD5
Developer: Keith Merrington
Price: Freeware

MD5 summer a Windows Program : http://www.md5summer.org
OS/2 Presentation Manager Programming, by Charles Petzold. ISBN 1-56276-123-4
The MD5 Message-Digest Algorithm: http://www.ietf.org/rfc/rfc1321.txt
Professor Ronald L. Rivest: http://theory.lcs.mit.edu/~rivest/homepage.html
Assoed21 by Henk Kelder: http://hobbes.nmsu.edu/pub/os2/util/system/assoed21.zip
Hypermake by Martin Vieregg: http://www.hypermake.com