Name

samearchive-lite - find duplicate files, while keeping archives intact

Synopsis

samearchive-lite [-g size] [-m size] [-S sep] [-0qVv] dir1 dir2 [...]

Description

samearchive-lite reads the paths from the input from stdin and output the duplicate files. This program is written for the special case where each directory acts as an archive or backup. The output will only contain filename pairs that have the same relative path from the archive base.

This version uses a lot less memory then samefile and is faster, but only find a partial set of duplicate files. It basicaly does 80% of the job, but does this in 50% of the time while using 10% of the resources compared to samearchive.

The output will only contain filename pairs that have the same relative path. For each filename pair with duplicate contents, a line consisting of six fields is output: The size in bytes, two filenames, the character ‘‘=’’ if the two files are on the same device, ‘‘X’’ otherwise, and the link counts of the two files. The output is sorted in reverse order by size as the primary key and a user defined field(s) as the secondary key.

Options

-0
Indicates that the input list of file names is NUL terminated, for example as generated by implementations of find(1) that support the -print0 option. Without this option, the file names are assumed to be newline terminated.
-g size
Compare only files with size greater than size bytes. Default is 0.
-m size
Compare only files with size less or equal than size bytes. Default is 0 which indicates there is no limit.
-q
This option keep the information you are recieved during the processes to a minimum. (Verbose level 0)
-S sep
Use string sep as the output field separator, defaults to a tab character. Useful if filenames contain tab characters and output must be processed by another program, say awk (1) .
-V
Print the version information and exit.
-v
This option increases the amount of information you recieve while running samearchive-lite. At level 0 you will just see the error messages. At level 1 you will see warning messages indicating that samearchive-lite coudn’t do something. Defaults to verbose level 1.

Examples

Find all duplicate files with in the system archives that live within the current working directory:

% find system-arch1 | samearchive system-arch1 system-arch*

Diagnostics

inaccessible: <path> This is probably due to a ’permission denied’ error on files or directories within the given path for which you have no read permission.
unreadable: <path> The file could be opend for reading jet failed while reading. You shouldn’t encounter such a warnings but if you do, and recieve more than a few, this could be very well due to failing hard disk.

Skipped line path because it didn’t start with %s. This indicates that the former path was skipped because the it didn’t start with first argument on the command prompt.

See Also

samearchive(1) samefile(1) samelink(1) find(1) ls(1)

Notes

Input filenames must not have leading or trailing white space unless the white space is part of the filename.

Author

Alex de Kruijff

Table of Contents