Freedup always searches for files of identical size and compares them byte-by-byte. The only exception are "extra styles", where the tags (details see next chapter) are intentionally skipped.
In principle freedup only knows about linking. Therefore maximum risk is to link different files. During development many precautions are taken, but I have to emphasize that this risk exists. Only when using interactive mode you may delete files in a two step process, too.
Hash functions should speed up freedup since they avoid comparing files that have been scanned before (and might differ in the last characters). Freedup is slowed down, if files of the same size are very likely to be identical. Then you should switch the hash function off. There is an internal hash function that allows some interesting speed enhancements (see below). External hash functions might be interesting to check the internal one for correctness.
Before files are compared byte-by-byte you might apply restrictions, like being owned by the same group or user, having the same permission or whatever the options allow you.
Freedup
This concept was introduced in version 1.1 due to the fact that I wanted files to be linked although they differed. I am talking of mp3 files where the tags showed minor variations. First I considered retagging all files, but I would have to remove either all or complete all tags (n.b. MP3v1 tags are at the end, MP3v2 tags are at the beginning of an mp3 file, both are optional).
The extra style now should compare the essential file content, i.e. the mpeg encoded sound part in case of the mp3 files. Currently the following rules are established:
Please note, that these styles change the behaviour according to the file contents. The change the size of the compared contents, but this does not affect the options that belong to the files, like ownerships or file names.
If you like to contribute, this is quite simple. There are source files for each style. Start with a copy of my.c and my.h. Rename the functions, fill in your way to evaluate the irrelevant bytes at start and the trailing ones, as well as a way to find size and magic. Add a matching line to the extra[] table in auto.c, compile, test and submit to me.