Name
samearchive-lite - find duplicate files, while keeping archives intact
Synopsis
samearchive-lite [-g size] [-m size] [-S sep] [-0qVv] dir1 dir2 [...]
Description
samearchive-lite reads the paths from the input from stdin and output the duplicate files. This program is written for the special case where each directory acts as an archive or backup. The output will only contain filename pairs that have the same relative path from the archive base.
This version uses a lot less memory then samefile and is faster, but only find a partial set of duplicate files. It basicaly does 80% of the job, but does this in 50% of the time while using 10% of the resources compared to samearchive.
The output will only contain filename pairs that have the same relative path. For each filename pair with duplicate contents, a line consisting of six fields is output: The size in bytes, two filenames, the character ‘‘=’’ if the two files are on the same device, ‘‘X’’ otherwise, and the link counts of the two files. The output is sorted in reverse order by size as the primary key and a user defined field(s) as the secondary key.
Options
- -0
- Indicates that the input list of file names is NUL terminated, for example as generated by implementations of find(1) that support the -print0 option. Without this option, the file names are assumed to be newline terminated.
- -g size
- Compare only files with size greater than size bytes. Default is 0.
- -m size
- Compare only files with size less or equal than size bytes. Default is 0 which indicates there is no limit.
- -q
- This option keep the information you are recieved during the processes to a minimum. (Verbose level 0)
- -S sep
- Use string sep as the output field separator, defaults to a tab character. Useful if filenames contain tab characters and output must be processed by another program, say awk (1) .
- -V
- Print the version information and exit.
- -v
- This option increases the amount of information you recieve while running samearchive-lite. At level 0 you will just see the error messages. At level 1 you will see warning messages indicating that samearchive-lite coudn’t do something. Defaults to verbose level 1.
Examples
Find all duplicate files with in the system archives that live within the current working directory:
% find system-arch1 | samearchive system-arch1 system-arch*
Diagnostics
inaccessible: <path> This is probably due to a ’permission denied’
error on files or directories within the given path for which you have
no read permission.
unreadable: <path> The file could be opend for reading jet failed while
reading. You shouldn’t encounter such a warnings but if you do, and recieve
more than a few, this could be very well due to failing hard disk.
Skipped line path because it didn’t start with %s. This indicates that the former path was skipped because the it didn’t start with first argument on the command prompt.
See Also
samearchive(1) samefile(1) samelink(1) find(1) ls(1)
Notes
Input filenames must not have leading or trailing white space unless the white space is part of the filename.
Author
Alex de Kruijff