a GNU program that computes MD5 sums for arbitrary files (read md5 hash function to understand what we are talking about). Present in many Linux distributions, it conforms to RFC 1321. Checksums are expressed as sequences of letters and numbers, which means that they can be transmissed through email without encoding.

The program can run basically in 2 modes. The first (default) mode is checksum generation. A practical application:

bash$ ls
-rw-rw-r-- 1 baffo baffo 92738 Aug 3 2000 bff30.jpg
-rw-rw-r-- 1 baffo baffo 92738 Aug 3 2000 swizzler30.jpg

Feh ! Two files with the same size and the same date. Maybe they contain exactly the same data. Let us see:

bash# md5sum swizzler30.jpg bff30.jpg
3a015e3c1d18fe3420730248a86f80ba swizzler30.jpg
3a015e3c1d18fe3420730248a86f80ba bff30.jpg

Since they have the same checksum, we can safely assume that they are the same file.

If you save the generated checksums in a file

bash$ md5sum abd* >checksum.md5
bash$ more checksum.md5
99dd4f09e5b304ea54e3686ebfd9d704 abd01x01.jpg
a138c024277840b2aac023e31c728a40 abd01x02.jpg
095d0a22445c3e6173bc0bb7c147c9e5 abd01x08.jpg
d0aeeda5b0677d53c24c64d08c33dc10 abd01x11.jpg
8493ac7ae3004aea9dd1ff69ef824d85 abd01x14.jpg
d00795108b2f10111cfed69afe671442 abd04x06.jpg
a3c992bec356b961728b8d5c05770aef abd04x13.jpg

You will later be able to run the program with the --check or -c switch. This will tell you if the files have changed, even by one single bit.

bash$ md5sum -c checksum.md5
abd01x01.jpg: OK
abd01x02.jpg: OK
abd01x08.jpg: OK
abd01x11.jpg: OK
abd01x14.jpg: OK
abd04x06.jpg: OK
abd04x13.jpg: OK

The basic property of the MD5 hash is precisely that a minute change of the input file will produce a wildly different result.
It is very difficult (read: a computational nightmare) to generate a file that has an arbitrary MD5 hash. This holds true although there are some known MD5 collisions, that you can read about in ariels' md5 hash function.
This is why you can also use the md5sum program to check if a file has been transfered without any tampering or corruption.

The fact that MD5 hashing has collisions should not come as a surprise, of course - after all it is a hash function.

Log in or register to write something here or to contact authors.