samechflags change file flags
change file modes
samechown change file owner and group
the first file of a pair of duplicate files
samedelay delays line output
until the files are no longer in use
sameln links duplicate files together
samemv moves the first file of a pair of duplicate files
samechflags [-g size] [-m size] [-S sep] [-dHmqVvw] flags
samechmod [-g size] [-m size] [-S sep] [-dHmqVvw]
samechown [-g size] [-m size] [-S sep] [-dHmqVvw] owner[:group]
samecp [-A | -At | -L | -Z | -Zt] [-g size] [-m size] [-o filename | -p command] [-S sep] [-HlmsqVvw] destination [source]
samedelay [-S sep] [-HqVv]
sameln [-A | -At | -L | -Z | -Zt] [-g size] [-m size] [-o filename | -p command] [-S sep] [-HmsqVvw]
samemv [-A | -At | -L | -Z | -Zt] [-g size] [-m size] [-o filename | -p command] [-S sep] [-HmqVvw] destination [source]
These programs reads the samefile output (indicating two duplicate files) from stdin and perform some action. The naming convention is as follows: all prorams start with the prefix ’same’ followed by the name of the equivalent (system) program or, if this doesn’t exist, a discriptive name.
Some programs, like sameln, create a temporary backup in order to prevent the loss of data. As of version 1.9 they use the postfix the path with ’.programname.numer’ (i.e. .sameln.0). Earlier versions used might be wise to backup your data. On FreeBSD 7.1 or newer you could make a backup using the following command:
cp -Rpl sources backup-dir/
And later you can use rsync to sync the files.
rsync -aHW backup-dir/sources .
- Links to the filename that comes first in alphabetical order. (default)
- Links to the filename that was created at the earliest date.
- Work on the parent directory of the file.
- -g size
- Don’t process files with size greater then size bytes. Default is 0.
- Print human friendly statistic when at verbose level 2
- Links to the filename with the most hard links.
- Create a hard link instead. (implied for sameln)
- -m size
- Don’t process files with size less or equal then size bytes. Default is 0 which indicates there is no limit.
- -o file
- Match lines that are consumed are written in to the file instead of being silently dismissed. programs.
- -p command
- Match lines that
are consumed are passed though the command instead of being silently dismissed.
For example: to prevent that I excedently delete my backup I set a flag on the entaire backup. As a consequence matches within the backups can’t be linked together. If one pipes the matches lines first twice to samechflags, onces with the option -d. This however leafs the linked but also strips of there flags. For this the option -p exist.
# find . -type f -size +0 | samefile -i | samechflags -d noschg \ | samechflags noschg | sameln -p "samedelay | \ samechflags -d schg | samechflags schg > /dev/null" | \ samedelay | samechflags -d schg | samechflags schg
The command above will find all matches withing the files that are larger than zero, remove the flags, hard links them together and than set the flags back to them. Even if the files didn’t have the schg flag it will end up with it, unless the files do not have identical contences. Since sameln splits the match lines that it was able to link and there are two extra instances of samechflags.
# pstree -s same -+= 00100 root csh |--= 12301 root find . -type f -size +0 |--- 12302 root samefile -i |--- 12303 root samechflags -d noschg |--- 12304 root samechflags noschg |-+- 12305 root sameln -p samedelay | samechflags -d schg | s... | \-+- 12309 root sh -c samedelay | samechflags -d schg | sam... | |--- 12310 root samedelay | |--- 12311 root samechflags -d schg | \--- 12312 root samechflags schg |--- 12306 root samedelay |--- 12307 root samechflags -d schg \--- 12308 root samechflags schg
The processes (12307, 12308, 12311, 12312) that set the schg flag can prevent sameln (12305) to hard link the two files. To eliminate this problem samedelay delays the line output until the new line and the old line have totaly different files on them.
- This option keep the information you are recieved during the processes to a minimum. (Verbose level 0)
- -S sep
- Use string sep as the output field separator, defaults to a tab character. Useful if filenames contain tab characters and output must be processed by another program, say awk (1) .
- Create a symbolic link.
- Print the version information and exit.
- This option increases the amount of information you recieve while running sameln. At level 0 you will just see the error messages. At level 1 you will see warning messages indicating that sameln coudn’t do something. And at level 2 you will recieve information about the stages that sameln enters and some statistic when sameln finishes. Defaults to verbose level 1.
- Don’t check the file contence before linking two files to gether. Consequences of this option are that you run the risk of losing data, because you trust the input list (i.e. bugs); and reduce the time needed to process the input list. The advance is expected to be low when sameln gets its input though a pipe, because the file will most likely be in the cache of the operating system.
- Links to the filename that comes last it in alphabetical order.
- Links to the filename that was created at the latest date.
Link all duplicate files in the current working directory:
% ls | samefile -i | sameln
Link all duplicate files in my HOME directory and subdirectories and also tell me if there are hard links:
% find $HOME -type f -size +0 | samefile -i | sameln
Remove all duplicate files in my HOME directory and subdirectory
% find $HOME -type f -size +0 | samefile -ir | samerm
Link all duplicate files in the /usr directory tree that are bigger than 10000 bytes and write filename pares that coudn’t be processed to /tmp/usr. (that one is for the sysadmin folks, you may want to ’amp’ - put it in the background with the ampersand & - this command because it takes a few minutes.)
% find /usr -type f -size +0 | samefile -g 10000 | sameln > /tmp/usr
Link all duplicate files in reverse order, but first changes the file flags of the files and there parent directories and later set them.
% find /path/to/backups -type f -size +0 | samefile -iZ | \ chflags -d noschg | chflags noschg | sameln -Zp "samedelay | \ chflags -d schg | chflags schg > /dev/null" | samedelay | \ chflags -d schg | chflags % schg
<number>... This amount of lines processed so far.
failed to create the backup path This is probably due to a ’permission denied’ error on files or directories within the given path for which you have no read permission. This error is given in order to prevent the loss of files.
failed to remove path This is probably due to a ’permission denied’ error on files or directories within the given path for which you have no read permission.
failed to link path -> path This is probably due to a ’permission denied’ error on files or directories within the given path for which you have no read permission. Before version 1.1 this owuld mean you would have lost a file, but from version 1.2 a backup file is created.
failed to delete the backup path This is probably due to a ’permission denied’ error on files or directories within the given path for which you have no read permission. The relink of the original files was succesfull, but you do need to clean up the backup manually.
free vnodes droped below threshold This is normal. The program goes to sleep in order to alleviate stress on the system until the amount of free vnodes has risen above the threshold. The first this happend you may want to check top for any processes in the ’vlruwk’ state. If this is then you might like to read BUGS section.
BugsYour computer may become so slow that it may appear to frees. This is not a bug but is caused because the operating system is running out of resources. If you’re on a FreeBSD you might ’cure’ this by raising the kern.maxvnodes by 10 000. Other solutions incluse removing the option -w so the process will spend more time on one file or pipe the program to samefile instead of reading from a large file.
Alex de Kruijff