Finding duplicate files and moving ONE copy to another drive, deleting all other copies
up vote
1
down vote
favorite
Actually I am trying to do two different things....
First off is to make a copy (onto my travel HDD) of all the video files, photos and documents, excluding any duplicates. So there is only one copy on my travel HDD. This would have to be able to see into many levels of folders (ntfs file system).
Second thing is to transfer one copy of all those files while deleting from source any duplicates, leaving one original in source and the copy in the travel drive. This is for two different systems, that is why the two similar requests
I would like to be able to limit it to any particular file type (either by encoding comparison or by .xyz extension) as well as making sure the files are hash checked for duplication.
My needs combine the duplicate file finding function with the automated transfer/copy onto another media... preferably all in one step.
Is there such a tool out there? or how would I use the command line to do this?
linux ubuntu files
New contributor
add a comment |
up vote
1
down vote
favorite
Actually I am trying to do two different things....
First off is to make a copy (onto my travel HDD) of all the video files, photos and documents, excluding any duplicates. So there is only one copy on my travel HDD. This would have to be able to see into many levels of folders (ntfs file system).
Second thing is to transfer one copy of all those files while deleting from source any duplicates, leaving one original in source and the copy in the travel drive. This is for two different systems, that is why the two similar requests
I would like to be able to limit it to any particular file type (either by encoding comparison or by .xyz extension) as well as making sure the files are hash checked for duplication.
My needs combine the duplicate file finding function with the automated transfer/copy onto another media... preferably all in one step.
Is there such a tool out there? or how would I use the command line to do this?
linux ubuntu files
New contributor
I don't know of anything since Picasa that has this kind of functionality. (And AFAIK Picasa was Windows anyway.) Iterate over your saved files building a list of checksums (and pathnames). Each time you add a file, append its data to this file. Then, to check if you have a duplicate file, generate its checksum and search for that in your list.
– roaima
2 days ago
add a comment |
up vote
1
down vote
favorite
up vote
1
down vote
favorite
Actually I am trying to do two different things....
First off is to make a copy (onto my travel HDD) of all the video files, photos and documents, excluding any duplicates. So there is only one copy on my travel HDD. This would have to be able to see into many levels of folders (ntfs file system).
Second thing is to transfer one copy of all those files while deleting from source any duplicates, leaving one original in source and the copy in the travel drive. This is for two different systems, that is why the two similar requests
I would like to be able to limit it to any particular file type (either by encoding comparison or by .xyz extension) as well as making sure the files are hash checked for duplication.
My needs combine the duplicate file finding function with the automated transfer/copy onto another media... preferably all in one step.
Is there such a tool out there? or how would I use the command line to do this?
linux ubuntu files
New contributor
Actually I am trying to do two different things....
First off is to make a copy (onto my travel HDD) of all the video files, photos and documents, excluding any duplicates. So there is only one copy on my travel HDD. This would have to be able to see into many levels of folders (ntfs file system).
Second thing is to transfer one copy of all those files while deleting from source any duplicates, leaving one original in source and the copy in the travel drive. This is for two different systems, that is why the two similar requests
I would like to be able to limit it to any particular file type (either by encoding comparison or by .xyz extension) as well as making sure the files are hash checked for duplication.
My needs combine the duplicate file finding function with the automated transfer/copy onto another media... preferably all in one step.
Is there such a tool out there? or how would I use the command line to do this?
linux ubuntu files
linux ubuntu files
New contributor
New contributor
edited 2 days ago
Rui F Ribeiro
38.2k1475123
38.2k1475123
New contributor
asked 2 days ago
Marc
61
61
New contributor
New contributor
I don't know of anything since Picasa that has this kind of functionality. (And AFAIK Picasa was Windows anyway.) Iterate over your saved files building a list of checksums (and pathnames). Each time you add a file, append its data to this file. Then, to check if you have a duplicate file, generate its checksum and search for that in your list.
– roaima
2 days ago
add a comment |
I don't know of anything since Picasa that has this kind of functionality. (And AFAIK Picasa was Windows anyway.) Iterate over your saved files building a list of checksums (and pathnames). Each time you add a file, append its data to this file. Then, to check if you have a duplicate file, generate its checksum and search for that in your list.
– roaima
2 days ago
I don't know of anything since Picasa that has this kind of functionality. (And AFAIK Picasa was Windows anyway.) Iterate over your saved files building a list of checksums (and pathnames). Each time you add a file, append its data to this file. Then, to check if you have a duplicate file, generate its checksum and search for that in your list.
– roaima
2 days ago
I don't know of anything since Picasa that has this kind of functionality. (And AFAIK Picasa was Windows anyway.) Iterate over your saved files building a list of checksums (and pathnames). Each time you add a file, append its data to this file. Then, to check if you have a duplicate file, generate its checksum and search for that in your list.
– roaima
2 days ago
add a comment |
1 Answer
1
active
oldest
votes
up vote
1
down vote
One idea would be to use a combination of fdupes
and rsync
.
- Create staging area of all files considered for the transfer using
rsync
. - Delete all duplicates except for one in the staging area (only).
- Transfer the remaining files in the staging to their destination, again using
rsync
.
To do this, we need three locations:
- Originals, a directory path in
$origdir
. - Staging area, a directory path in
$stagingdir
. - Destination, a local or remote path in
$destdir
.
First, create the staging area (this assumes that the staging area does not already exist, or if it does, that it only contains things that should be transferred):
rsync --archive --verbose --link-dest="$origdir"
--include="*.jpg" --include="*/" --exclude="*"
"$origdir/" "$stagingdir"
This would copy all files whose names end in .jpg
to the staging area by means of creating hard links from their original locations. Only the space to create the directory structure would be needed and the file data would not be duplicated (unless $stagingdir
and $origdir
were located on two different partitions). To add other filename patterns, add more --include
options (before the --exclude
).
Then run fdupes
over $stagingdir
:
fdupes --delete --recurse "$stagingdir"
This will interactively ask you for confirmation before removing anything. There's also a --noprompt
option that would remove the files without confirmation. Please read the fdupes
manual carefully. The files under $origdir
would not be affected by deleting files from the staging area,
Then delete empty directories from the staging directory (this is a bonus step that just cleans things up a bit):
find "$stagingdir" -type d -empty -delete -print
This would go through the entire staging area and delete any empty directory. Any deleted directory would be printed after successful deletion.
And finally transfer the non-duplicates:
rsync --archive --verbose "$stagingdir/" "$destdir"
This process would retain the original directory structure for the files that matches the patterns used in the first rsync
and that are still left in place after fdupes
has removed duplicates.
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
One idea would be to use a combination of fdupes
and rsync
.
- Create staging area of all files considered for the transfer using
rsync
. - Delete all duplicates except for one in the staging area (only).
- Transfer the remaining files in the staging to their destination, again using
rsync
.
To do this, we need three locations:
- Originals, a directory path in
$origdir
. - Staging area, a directory path in
$stagingdir
. - Destination, a local or remote path in
$destdir
.
First, create the staging area (this assumes that the staging area does not already exist, or if it does, that it only contains things that should be transferred):
rsync --archive --verbose --link-dest="$origdir"
--include="*.jpg" --include="*/" --exclude="*"
"$origdir/" "$stagingdir"
This would copy all files whose names end in .jpg
to the staging area by means of creating hard links from their original locations. Only the space to create the directory structure would be needed and the file data would not be duplicated (unless $stagingdir
and $origdir
were located on two different partitions). To add other filename patterns, add more --include
options (before the --exclude
).
Then run fdupes
over $stagingdir
:
fdupes --delete --recurse "$stagingdir"
This will interactively ask you for confirmation before removing anything. There's also a --noprompt
option that would remove the files without confirmation. Please read the fdupes
manual carefully. The files under $origdir
would not be affected by deleting files from the staging area,
Then delete empty directories from the staging directory (this is a bonus step that just cleans things up a bit):
find "$stagingdir" -type d -empty -delete -print
This would go through the entire staging area and delete any empty directory. Any deleted directory would be printed after successful deletion.
And finally transfer the non-duplicates:
rsync --archive --verbose "$stagingdir/" "$destdir"
This process would retain the original directory structure for the files that matches the patterns used in the first rsync
and that are still left in place after fdupes
has removed duplicates.
add a comment |
up vote
1
down vote
One idea would be to use a combination of fdupes
and rsync
.
- Create staging area of all files considered for the transfer using
rsync
. - Delete all duplicates except for one in the staging area (only).
- Transfer the remaining files in the staging to their destination, again using
rsync
.
To do this, we need three locations:
- Originals, a directory path in
$origdir
. - Staging area, a directory path in
$stagingdir
. - Destination, a local or remote path in
$destdir
.
First, create the staging area (this assumes that the staging area does not already exist, or if it does, that it only contains things that should be transferred):
rsync --archive --verbose --link-dest="$origdir"
--include="*.jpg" --include="*/" --exclude="*"
"$origdir/" "$stagingdir"
This would copy all files whose names end in .jpg
to the staging area by means of creating hard links from their original locations. Only the space to create the directory structure would be needed and the file data would not be duplicated (unless $stagingdir
and $origdir
were located on two different partitions). To add other filename patterns, add more --include
options (before the --exclude
).
Then run fdupes
over $stagingdir
:
fdupes --delete --recurse "$stagingdir"
This will interactively ask you for confirmation before removing anything. There's also a --noprompt
option that would remove the files without confirmation. Please read the fdupes
manual carefully. The files under $origdir
would not be affected by deleting files from the staging area,
Then delete empty directories from the staging directory (this is a bonus step that just cleans things up a bit):
find "$stagingdir" -type d -empty -delete -print
This would go through the entire staging area and delete any empty directory. Any deleted directory would be printed after successful deletion.
And finally transfer the non-duplicates:
rsync --archive --verbose "$stagingdir/" "$destdir"
This process would retain the original directory structure for the files that matches the patterns used in the first rsync
and that are still left in place after fdupes
has removed duplicates.
add a comment |
up vote
1
down vote
up vote
1
down vote
One idea would be to use a combination of fdupes
and rsync
.
- Create staging area of all files considered for the transfer using
rsync
. - Delete all duplicates except for one in the staging area (only).
- Transfer the remaining files in the staging to their destination, again using
rsync
.
To do this, we need three locations:
- Originals, a directory path in
$origdir
. - Staging area, a directory path in
$stagingdir
. - Destination, a local or remote path in
$destdir
.
First, create the staging area (this assumes that the staging area does not already exist, or if it does, that it only contains things that should be transferred):
rsync --archive --verbose --link-dest="$origdir"
--include="*.jpg" --include="*/" --exclude="*"
"$origdir/" "$stagingdir"
This would copy all files whose names end in .jpg
to the staging area by means of creating hard links from their original locations. Only the space to create the directory structure would be needed and the file data would not be duplicated (unless $stagingdir
and $origdir
were located on two different partitions). To add other filename patterns, add more --include
options (before the --exclude
).
Then run fdupes
over $stagingdir
:
fdupes --delete --recurse "$stagingdir"
This will interactively ask you for confirmation before removing anything. There's also a --noprompt
option that would remove the files without confirmation. Please read the fdupes
manual carefully. The files under $origdir
would not be affected by deleting files from the staging area,
Then delete empty directories from the staging directory (this is a bonus step that just cleans things up a bit):
find "$stagingdir" -type d -empty -delete -print
This would go through the entire staging area and delete any empty directory. Any deleted directory would be printed after successful deletion.
And finally transfer the non-duplicates:
rsync --archive --verbose "$stagingdir/" "$destdir"
This process would retain the original directory structure for the files that matches the patterns used in the first rsync
and that are still left in place after fdupes
has removed duplicates.
One idea would be to use a combination of fdupes
and rsync
.
- Create staging area of all files considered for the transfer using
rsync
. - Delete all duplicates except for one in the staging area (only).
- Transfer the remaining files in the staging to their destination, again using
rsync
.
To do this, we need three locations:
- Originals, a directory path in
$origdir
. - Staging area, a directory path in
$stagingdir
. - Destination, a local or remote path in
$destdir
.
First, create the staging area (this assumes that the staging area does not already exist, or if it does, that it only contains things that should be transferred):
rsync --archive --verbose --link-dest="$origdir"
--include="*.jpg" --include="*/" --exclude="*"
"$origdir/" "$stagingdir"
This would copy all files whose names end in .jpg
to the staging area by means of creating hard links from their original locations. Only the space to create the directory structure would be needed and the file data would not be duplicated (unless $stagingdir
and $origdir
were located on two different partitions). To add other filename patterns, add more --include
options (before the --exclude
).
Then run fdupes
over $stagingdir
:
fdupes --delete --recurse "$stagingdir"
This will interactively ask you for confirmation before removing anything. There's also a --noprompt
option that would remove the files without confirmation. Please read the fdupes
manual carefully. The files under $origdir
would not be affected by deleting files from the staging area,
Then delete empty directories from the staging directory (this is a bonus step that just cleans things up a bit):
find "$stagingdir" -type d -empty -delete -print
This would go through the entire staging area and delete any empty directory. Any deleted directory would be printed after successful deletion.
And finally transfer the non-duplicates:
rsync --archive --verbose "$stagingdir/" "$destdir"
This process would retain the original directory structure for the files that matches the patterns used in the first rsync
and that are still left in place after fdupes
has removed duplicates.
edited 2 days ago
answered 2 days ago
Kusalananda
116k15218351
116k15218351
add a comment |
add a comment |
Marc is a new contributor. Be nice, and check out our Code of Conduct.
Marc is a new contributor. Be nice, and check out our Code of Conduct.
Marc is a new contributor. Be nice, and check out our Code of Conduct.
Marc is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f482309%2ffinding-duplicate-files-and-moving-one-copy-to-another-drive-deleting-all-other%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
I don't know of anything since Picasa that has this kind of functionality. (And AFAIK Picasa was Windows anyway.) Iterate over your saved files building a list of checksums (and pathnames). Each time you add a file, append its data to this file. Then, to check if you have a duplicate file, generate its checksum and search for that in your list.
– roaima
2 days ago