Finding duplicate files and moving ONE copy to another drive, deleting all other copies

up vote
1
down vote

favorite

Actually I am trying to do two different things....

First off is to make a copy (onto my travel HDD) of all the video files, photos and documents, excluding any duplicates. So there is only one copy on my travel HDD. This would have to be able to see into many levels of folders (ntfs file system).
Second thing is to transfer one copy of all those files while deleting from source any duplicates, leaving one original in source and the copy in the travel drive. This is for two different systems, that is why the two similar requests

I would like to be able to limit it to any particular file type (either by encoding comparison or by .xyz extension) as well as making sure the files are hash checked for duplication.

My needs combine the duplicate file finding function with the automated transfer/copy onto another media... preferably all in one step.

Is there such a tool out there? or how would I use the command line to do this?

edited 2 days ago

Rui F Ribeiro

38.2k1475123

asked 2 days ago

Marc

New contributor

I don't know of anything since Picasa that has this kind of functionality. (And AFAIK Picasa was Windows anyway.) Iterate over your saved files building a list of checksums (and pathnames). Each time you add a file, append its data to this file. Then, to check if you have a duplicate file, generate its checksum and search for that in your list.
– roaima
2 days ago

add a comment |

up vote
1
down vote

favorite

Actually I am trying to do two different things....

I would like to be able to limit it to any particular file type (either by encoding comparison or by .xyz extension) as well as making sure the files are hash checked for duplication.

My needs combine the duplicate file finding function with the automated transfer/copy onto another media... preferably all in one step.

Is there such a tool out there? or how would I use the command line to do this?

edited 2 days ago

Rui F Ribeiro

38.2k1475123

asked 2 days ago

Marc

New contributor

I don't know of anything since Picasa that has this kind of functionality. (And AFAIK Picasa was Windows anyway.) Iterate over your saved files building a list of checksums (and pathnames). Each time you add a file, append its data to this file. Then, to check if you have a duplicate file, generate its checksum and search for that in your list.
– roaima
2 days ago

add a comment |

up vote
1
down vote

favorite

Actually I am trying to do two different things....

I would like to be able to limit it to any particular file type (either by encoding comparison or by .xyz extension) as well as making sure the files are hash checked for duplication.

My needs combine the duplicate file finding function with the automated transfer/copy onto another media... preferably all in one step.

Is there such a tool out there? or how would I use the command line to do this?

edited 2 days ago

Rui F Ribeiro

38.2k1475123

asked 2 days ago

Marc

New contributor

Actually I am trying to do two different things....

I would like to be able to limit it to any particular file type (either by encoding comparison or by .xyz extension) as well as making sure the files are hash checked for duplication.

My needs combine the duplicate file finding function with the automated transfer/copy onto another media... preferably all in one step.

Is there such a tool out there? or how would I use the command line to do this?

linux ubuntu files

edited 2 days ago

Rui F Ribeiro

38.2k1475123

asked 2 days ago

Marc

New contributor

edited 2 days ago

Rui F Ribeiro

38.2k1475123

asked 2 days ago

Marc

New contributor

edited 2 days ago

Rui F Ribeiro

38.2k1475123

edited 2 days ago

Rui F Ribeiro

38.2k1475123

edited 2 days ago

Rui F Ribeiro

38.2k1475123

asked 2 days ago

Marc

New contributor

asked 2 days ago

Marc

asked 2 days ago

Marc

New contributor

Marc is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

I don't know of anything since Picasa that has this kind of functionality. (And AFAIK Picasa was Windows anyway.) Iterate over your saved files building a list of checksums (and pathnames). Each time you add a file, append its data to this file. Then, to check if you have a duplicate file, generate its checksum and search for that in your list.
– roaima
2 days ago

add a comment |

I don't know of anything since Picasa that has this kind of functionality. (And AFAIK Picasa was Windows anyway.) Iterate over your saved files building a list of checksums (and pathnames). Each time you add a file, append its data to this file. Then, to check if you have a duplicate file, generate its checksum and search for that in your list.
– roaima
2 days ago

I don't know of anything since Picasa that has this kind of functionality. (And AFAIK Picasa was Windows anyway.) Iterate over your saved files building a list of checksums (and pathnames). Each time you add a file, append its data to this file. Then, to check if you have a duplicate file, generate its checksum and search for that in your list.
– roaima
2 days ago

add a comment |

1 Answer
1

active

oldest

votes

up vote
1
down vote

One idea would be to use a combination of fdupes and rsync.

Create staging area of all files considered for the transfer using rsync.

Delete all duplicates except for one in the staging area (only).

Transfer the remaining files in the staging to their destination, again using rsync.

To do this, we need three locations:

Originals, a directory path in $origdir.

Staging area, a directory path in $stagingdir.

Destination, a local or remote path in $destdir.

First, create the staging area (this assumes that the staging area does not already exist, or if it does, that it only contains things that should be transferred):

rsync --archive --verbose --link-dest="$origdir" 

    --include="*.jpg" --include="*/" --exclude="*" 

    "$origdir/" "$stagingdir"

This would copy all files whose names end in .jpg to the staging area by means of creating hard links from their original locations. Only the space to create the directory structure would be needed and the file data would not be duplicated (unless $stagingdir and $origdir were located on two different partitions). To add other filename patterns, add more --include options (before the --exclude).

Then run fdupes over $stagingdir:

fdupes --delete --recurse "$stagingdir"

This will interactively ask you for confirmation before removing anything. There's also a --noprompt option that would remove the files without confirmation. Please read the fdupes manual carefully. The files under $origdir would not be affected by deleting files from the staging area,

Then delete empty directories from the staging directory (this is a bonus step that just cleans things up a bit):

find "$stagingdir" -type d -empty -delete -print

This would go through the entire staging area and delete any empty directory. Any deleted directory would be printed after successful deletion.

And finally transfer the non-duplicates:

rsync --archive --verbose "$stagingdir/" "$destdir"

This process would retain the original directory structure for the files that matches the patterns used in the first rsync and that are still left in place after fdupes has removed duplicates.

edited 2 days ago

answered 2 days ago

Kusalananda

116k15218351

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

Marc is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f482309%2ffinding-duplicate-files-and-moving-one-copy-to-another-drive-deleting-all-other%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
1
down vote

One idea would be to use a combination of fdupes and rsync.

Create staging area of all files considered for the transfer using rsync.

Delete all duplicates except for one in the staging area (only).

Transfer the remaining files in the staging to their destination, again using rsync.

To do this, we need three locations:

Originals, a directory path in $origdir.

Staging area, a directory path in $stagingdir.

Destination, a local or remote path in $destdir.

First, create the staging area (this assumes that the staging area does not already exist, or if it does, that it only contains things that should be transferred):

rsync --archive --verbose --link-dest="$origdir" 

    --include="*.jpg" --include="*/" --exclude="*" 

    "$origdir/" "$stagingdir"

Then run fdupes over $stagingdir:

fdupes --delete --recurse "$stagingdir"

Then delete empty directories from the staging directory (this is a bonus step that just cleans things up a bit):

find "$stagingdir" -type d -empty -delete -print

This would go through the entire staging area and delete any empty directory. Any deleted directory would be printed after successful deletion.

And finally transfer the non-duplicates:

rsync --archive --verbose "$stagingdir/" "$destdir"

This process would retain the original directory structure for the files that matches the patterns used in the first rsync and that are still left in place after fdupes has removed duplicates.

edited 2 days ago

answered 2 days ago

Kusalananda

116k15218351

add a comment |

up vote
1
down vote

One idea would be to use a combination of fdupes and rsync.

Create staging area of all files considered for the transfer using rsync.

Delete all duplicates except for one in the staging area (only).

Transfer the remaining files in the staging to their destination, again using rsync.

To do this, we need three locations:

Originals, a directory path in $origdir.

Staging area, a directory path in $stagingdir.

Destination, a local or remote path in $destdir.

First, create the staging area (this assumes that the staging area does not already exist, or if it does, that it only contains things that should be transferred):

rsync --archive --verbose --link-dest="$origdir" 

    --include="*.jpg" --include="*/" --exclude="*" 

    "$origdir/" "$stagingdir"

Then run fdupes over $stagingdir:

fdupes --delete --recurse "$stagingdir"

Then delete empty directories from the staging directory (this is a bonus step that just cleans things up a bit):

find "$stagingdir" -type d -empty -delete -print

This would go through the entire staging area and delete any empty directory. Any deleted directory would be printed after successful deletion.

And finally transfer the non-duplicates:

rsync --archive --verbose "$stagingdir/" "$destdir"

This process would retain the original directory structure for the files that matches the patterns used in the first rsync and that are still left in place after fdupes has removed duplicates.

edited 2 days ago

answered 2 days ago

Kusalananda

116k15218351

add a comment |

up vote
1
down vote

One idea would be to use a combination of fdupes and rsync.

Create staging area of all files considered for the transfer using rsync.

Delete all duplicates except for one in the staging area (only).

Transfer the remaining files in the staging to their destination, again using rsync.

To do this, we need three locations:

Originals, a directory path in $origdir.

Staging area, a directory path in $stagingdir.

Destination, a local or remote path in $destdir.

First, create the staging area (this assumes that the staging area does not already exist, or if it does, that it only contains things that should be transferred):

rsync --archive --verbose --link-dest="$origdir" 

    --include="*.jpg" --include="*/" --exclude="*" 

    "$origdir/" "$stagingdir"

Then run fdupes over $stagingdir:

fdupes --delete --recurse "$stagingdir"

Then delete empty directories from the staging directory (this is a bonus step that just cleans things up a bit):

find "$stagingdir" -type d -empty -delete -print

This would go through the entire staging area and delete any empty directory. Any deleted directory would be printed after successful deletion.

And finally transfer the non-duplicates:

rsync --archive --verbose "$stagingdir/" "$destdir"

This process would retain the original directory structure for the files that matches the patterns used in the first rsync and that are still left in place after fdupes has removed duplicates.

edited 2 days ago

answered 2 days ago

Kusalananda

116k15218351

One idea would be to use a combination of fdupes and rsync.

Create staging area of all files considered for the transfer using rsync.

Delete all duplicates except for one in the staging area (only).

Transfer the remaining files in the staging to their destination, again using rsync.

To do this, we need three locations:

Originals, a directory path in $origdir.

Staging area, a directory path in $stagingdir.

Destination, a local or remote path in $destdir.

First, create the staging area (this assumes that the staging area does not already exist, or if it does, that it only contains things that should be transferred):

rsync --archive --verbose --link-dest="$origdir" 

    --include="*.jpg" --include="*/" --exclude="*" 

    "$origdir/" "$stagingdir"

Then run fdupes over $stagingdir:

fdupes --delete --recurse "$stagingdir"

Then delete empty directories from the staging directory (this is a bonus step that just cleans things up a bit):

find "$stagingdir" -type d -empty -delete -print

This would go through the entire staging area and delete any empty directory. Any deleted directory would be printed after successful deletion.

And finally transfer the non-duplicates:

rsync --archive --verbose "$stagingdir/" "$destdir"

This process would retain the original directory structure for the files that matches the patterns used in the first rsync and that are still left in place after fdupes has removed duplicates.

edited 2 days ago

answered 2 days ago

Kusalananda

116k15218351

edited 2 days ago

answered 2 days ago

Kusalananda

116k15218351

answered 2 days ago

Kusalananda

116k15218351

answered 2 days ago

Kusalananda

116k15218351

add a comment |

Marc is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Marc is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Sstrhsrtj