Finding duplicate files and moving ONE copy to another drive, deleting all other copies











up vote
1
down vote

favorite












Actually I am trying to do two different things....



First off is to make a copy (onto my travel HDD) of all the video files, photos and documents, excluding any duplicates. So there is only one copy on my travel HDD. This would have to be able to see into many levels of folders (ntfs file system).
Second thing is to transfer one copy of all those files while deleting from source any duplicates, leaving one original in source and the copy in the travel drive. This is for two different systems, that is why the two similar requests



I would like to be able to limit it to any particular file type (either by encoding comparison or by .xyz extension) as well as making sure the files are hash checked for duplication.



My needs combine the duplicate file finding function with the automated transfer/copy onto another media... preferably all in one step.



Is there such a tool out there? or how would I use the command line to do this?










share|improve this question









New contributor




Marc is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.




















  • I don't know of anything since Picasa that has this kind of functionality. (And AFAIK Picasa was Windows anyway.) Iterate over your saved files building a list of checksums (and pathnames). Each time you add a file, append its data to this file. Then, to check if you have a duplicate file, generate its checksum and search for that in your list.
    – roaima
    2 days ago

















up vote
1
down vote

favorite












Actually I am trying to do two different things....



First off is to make a copy (onto my travel HDD) of all the video files, photos and documents, excluding any duplicates. So there is only one copy on my travel HDD. This would have to be able to see into many levels of folders (ntfs file system).
Second thing is to transfer one copy of all those files while deleting from source any duplicates, leaving one original in source and the copy in the travel drive. This is for two different systems, that is why the two similar requests



I would like to be able to limit it to any particular file type (either by encoding comparison or by .xyz extension) as well as making sure the files are hash checked for duplication.



My needs combine the duplicate file finding function with the automated transfer/copy onto another media... preferably all in one step.



Is there such a tool out there? or how would I use the command line to do this?










share|improve this question









New contributor




Marc is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.




















  • I don't know of anything since Picasa that has this kind of functionality. (And AFAIK Picasa was Windows anyway.) Iterate over your saved files building a list of checksums (and pathnames). Each time you add a file, append its data to this file. Then, to check if you have a duplicate file, generate its checksum and search for that in your list.
    – roaima
    2 days ago















up vote
1
down vote

favorite









up vote
1
down vote

favorite











Actually I am trying to do two different things....



First off is to make a copy (onto my travel HDD) of all the video files, photos and documents, excluding any duplicates. So there is only one copy on my travel HDD. This would have to be able to see into many levels of folders (ntfs file system).
Second thing is to transfer one copy of all those files while deleting from source any duplicates, leaving one original in source and the copy in the travel drive. This is for two different systems, that is why the two similar requests



I would like to be able to limit it to any particular file type (either by encoding comparison or by .xyz extension) as well as making sure the files are hash checked for duplication.



My needs combine the duplicate file finding function with the automated transfer/copy onto another media... preferably all in one step.



Is there such a tool out there? or how would I use the command line to do this?










share|improve this question









New contributor




Marc is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











Actually I am trying to do two different things....



First off is to make a copy (onto my travel HDD) of all the video files, photos and documents, excluding any duplicates. So there is only one copy on my travel HDD. This would have to be able to see into many levels of folders (ntfs file system).
Second thing is to transfer one copy of all those files while deleting from source any duplicates, leaving one original in source and the copy in the travel drive. This is for two different systems, that is why the two similar requests



I would like to be able to limit it to any particular file type (either by encoding comparison or by .xyz extension) as well as making sure the files are hash checked for duplication.



My needs combine the duplicate file finding function with the automated transfer/copy onto another media... preferably all in one step.



Is there such a tool out there? or how would I use the command line to do this?







linux ubuntu files






share|improve this question









New contributor




Marc is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question









New contributor




Marc is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question








edited 2 days ago









Rui F Ribeiro

38.2k1475123




38.2k1475123






New contributor




Marc is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked 2 days ago









Marc

61




61




New contributor




Marc is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





Marc is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






Marc is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.












  • I don't know of anything since Picasa that has this kind of functionality. (And AFAIK Picasa was Windows anyway.) Iterate over your saved files building a list of checksums (and pathnames). Each time you add a file, append its data to this file. Then, to check if you have a duplicate file, generate its checksum and search for that in your list.
    – roaima
    2 days ago




















  • I don't know of anything since Picasa that has this kind of functionality. (And AFAIK Picasa was Windows anyway.) Iterate over your saved files building a list of checksums (and pathnames). Each time you add a file, append its data to this file. Then, to check if you have a duplicate file, generate its checksum and search for that in your list.
    – roaima
    2 days ago


















I don't know of anything since Picasa that has this kind of functionality. (And AFAIK Picasa was Windows anyway.) Iterate over your saved files building a list of checksums (and pathnames). Each time you add a file, append its data to this file. Then, to check if you have a duplicate file, generate its checksum and search for that in your list.
– roaima
2 days ago






I don't know of anything since Picasa that has this kind of functionality. (And AFAIK Picasa was Windows anyway.) Iterate over your saved files building a list of checksums (and pathnames). Each time you add a file, append its data to this file. Then, to check if you have a duplicate file, generate its checksum and search for that in your list.
– roaima
2 days ago












1 Answer
1






active

oldest

votes

















up vote
1
down vote













One idea would be to use a combination of fdupes and rsync.




  1. Create staging area of all files considered for the transfer using rsync.

  2. Delete all duplicates except for one in the staging area (only).

  3. Transfer the remaining files in the staging to their destination, again using rsync.


To do this, we need three locations:




  1. Originals, a directory path in $origdir.

  2. Staging area, a directory path in $stagingdir.

  3. Destination, a local or remote path in $destdir.


First, create the staging area (this assumes that the staging area does not already exist, or if it does, that it only contains things that should be transferred):



rsync --archive --verbose --link-dest="$origdir" 
--include="*.jpg" --include="*/" --exclude="*"
"$origdir/" "$stagingdir"


This would copy all files whose names end in .jpg to the staging area by means of creating hard links from their original locations. Only the space to create the directory structure would be needed and the file data would not be duplicated (unless $stagingdir and $origdir were located on two different partitions). To add other filename patterns, add more --include options (before the --exclude).



Then run fdupes over $stagingdir:



fdupes --delete --recurse "$stagingdir"


This will interactively ask you for confirmation before removing anything. There's also a --noprompt option that would remove the files without confirmation. Please read the fdupes manual carefully. The files under $origdir would not be affected by deleting files from the staging area,



Then delete empty directories from the staging directory (this is a bonus step that just cleans things up a bit):



find "$stagingdir" -type d -empty -delete -print


This would go through the entire staging area and delete any empty directory. Any deleted directory would be printed after successful deletion.



And finally transfer the non-duplicates:



rsync --archive --verbose "$stagingdir/" "$destdir"


This process would retain the original directory structure for the files that matches the patterns used in the first rsync and that are still left in place after fdupes has removed duplicates.






share|improve this answer























    Your Answer








    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "106"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });






    Marc is a new contributor. Be nice, and check out our Code of Conduct.










     

    draft saved


    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f482309%2ffinding-duplicate-files-and-moving-one-copy-to-another-drive-deleting-all-other%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    1
    down vote













    One idea would be to use a combination of fdupes and rsync.




    1. Create staging area of all files considered for the transfer using rsync.

    2. Delete all duplicates except for one in the staging area (only).

    3. Transfer the remaining files in the staging to their destination, again using rsync.


    To do this, we need three locations:




    1. Originals, a directory path in $origdir.

    2. Staging area, a directory path in $stagingdir.

    3. Destination, a local or remote path in $destdir.


    First, create the staging area (this assumes that the staging area does not already exist, or if it does, that it only contains things that should be transferred):



    rsync --archive --verbose --link-dest="$origdir" 
    --include="*.jpg" --include="*/" --exclude="*"
    "$origdir/" "$stagingdir"


    This would copy all files whose names end in .jpg to the staging area by means of creating hard links from their original locations. Only the space to create the directory structure would be needed and the file data would not be duplicated (unless $stagingdir and $origdir were located on two different partitions). To add other filename patterns, add more --include options (before the --exclude).



    Then run fdupes over $stagingdir:



    fdupes --delete --recurse "$stagingdir"


    This will interactively ask you for confirmation before removing anything. There's also a --noprompt option that would remove the files without confirmation. Please read the fdupes manual carefully. The files under $origdir would not be affected by deleting files from the staging area,



    Then delete empty directories from the staging directory (this is a bonus step that just cleans things up a bit):



    find "$stagingdir" -type d -empty -delete -print


    This would go through the entire staging area and delete any empty directory. Any deleted directory would be printed after successful deletion.



    And finally transfer the non-duplicates:



    rsync --archive --verbose "$stagingdir/" "$destdir"


    This process would retain the original directory structure for the files that matches the patterns used in the first rsync and that are still left in place after fdupes has removed duplicates.






    share|improve this answer



























      up vote
      1
      down vote













      One idea would be to use a combination of fdupes and rsync.




      1. Create staging area of all files considered for the transfer using rsync.

      2. Delete all duplicates except for one in the staging area (only).

      3. Transfer the remaining files in the staging to their destination, again using rsync.


      To do this, we need three locations:




      1. Originals, a directory path in $origdir.

      2. Staging area, a directory path in $stagingdir.

      3. Destination, a local or remote path in $destdir.


      First, create the staging area (this assumes that the staging area does not already exist, or if it does, that it only contains things that should be transferred):



      rsync --archive --verbose --link-dest="$origdir" 
      --include="*.jpg" --include="*/" --exclude="*"
      "$origdir/" "$stagingdir"


      This would copy all files whose names end in .jpg to the staging area by means of creating hard links from their original locations. Only the space to create the directory structure would be needed and the file data would not be duplicated (unless $stagingdir and $origdir were located on two different partitions). To add other filename patterns, add more --include options (before the --exclude).



      Then run fdupes over $stagingdir:



      fdupes --delete --recurse "$stagingdir"


      This will interactively ask you for confirmation before removing anything. There's also a --noprompt option that would remove the files without confirmation. Please read the fdupes manual carefully. The files under $origdir would not be affected by deleting files from the staging area,



      Then delete empty directories from the staging directory (this is a bonus step that just cleans things up a bit):



      find "$stagingdir" -type d -empty -delete -print


      This would go through the entire staging area and delete any empty directory. Any deleted directory would be printed after successful deletion.



      And finally transfer the non-duplicates:



      rsync --archive --verbose "$stagingdir/" "$destdir"


      This process would retain the original directory structure for the files that matches the patterns used in the first rsync and that are still left in place after fdupes has removed duplicates.






      share|improve this answer

























        up vote
        1
        down vote










        up vote
        1
        down vote









        One idea would be to use a combination of fdupes and rsync.




        1. Create staging area of all files considered for the transfer using rsync.

        2. Delete all duplicates except for one in the staging area (only).

        3. Transfer the remaining files in the staging to their destination, again using rsync.


        To do this, we need three locations:




        1. Originals, a directory path in $origdir.

        2. Staging area, a directory path in $stagingdir.

        3. Destination, a local or remote path in $destdir.


        First, create the staging area (this assumes that the staging area does not already exist, or if it does, that it only contains things that should be transferred):



        rsync --archive --verbose --link-dest="$origdir" 
        --include="*.jpg" --include="*/" --exclude="*"
        "$origdir/" "$stagingdir"


        This would copy all files whose names end in .jpg to the staging area by means of creating hard links from their original locations. Only the space to create the directory structure would be needed and the file data would not be duplicated (unless $stagingdir and $origdir were located on two different partitions). To add other filename patterns, add more --include options (before the --exclude).



        Then run fdupes over $stagingdir:



        fdupes --delete --recurse "$stagingdir"


        This will interactively ask you for confirmation before removing anything. There's also a --noprompt option that would remove the files without confirmation. Please read the fdupes manual carefully. The files under $origdir would not be affected by deleting files from the staging area,



        Then delete empty directories from the staging directory (this is a bonus step that just cleans things up a bit):



        find "$stagingdir" -type d -empty -delete -print


        This would go through the entire staging area and delete any empty directory. Any deleted directory would be printed after successful deletion.



        And finally transfer the non-duplicates:



        rsync --archive --verbose "$stagingdir/" "$destdir"


        This process would retain the original directory structure for the files that matches the patterns used in the first rsync and that are still left in place after fdupes has removed duplicates.






        share|improve this answer














        One idea would be to use a combination of fdupes and rsync.




        1. Create staging area of all files considered for the transfer using rsync.

        2. Delete all duplicates except for one in the staging area (only).

        3. Transfer the remaining files in the staging to their destination, again using rsync.


        To do this, we need three locations:




        1. Originals, a directory path in $origdir.

        2. Staging area, a directory path in $stagingdir.

        3. Destination, a local or remote path in $destdir.


        First, create the staging area (this assumes that the staging area does not already exist, or if it does, that it only contains things that should be transferred):



        rsync --archive --verbose --link-dest="$origdir" 
        --include="*.jpg" --include="*/" --exclude="*"
        "$origdir/" "$stagingdir"


        This would copy all files whose names end in .jpg to the staging area by means of creating hard links from their original locations. Only the space to create the directory structure would be needed and the file data would not be duplicated (unless $stagingdir and $origdir were located on two different partitions). To add other filename patterns, add more --include options (before the --exclude).



        Then run fdupes over $stagingdir:



        fdupes --delete --recurse "$stagingdir"


        This will interactively ask you for confirmation before removing anything. There's also a --noprompt option that would remove the files without confirmation. Please read the fdupes manual carefully. The files under $origdir would not be affected by deleting files from the staging area,



        Then delete empty directories from the staging directory (this is a bonus step that just cleans things up a bit):



        find "$stagingdir" -type d -empty -delete -print


        This would go through the entire staging area and delete any empty directory. Any deleted directory would be printed after successful deletion.



        And finally transfer the non-duplicates:



        rsync --archive --verbose "$stagingdir/" "$destdir"


        This process would retain the original directory structure for the files that matches the patterns used in the first rsync and that are still left in place after fdupes has removed duplicates.







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited 2 days ago

























        answered 2 days ago









        Kusalananda

        116k15218351




        116k15218351






















            Marc is a new contributor. Be nice, and check out our Code of Conduct.










             

            draft saved


            draft discarded


















            Marc is a new contributor. Be nice, and check out our Code of Conduct.













            Marc is a new contributor. Be nice, and check out our Code of Conduct.












            Marc is a new contributor. Be nice, and check out our Code of Conduct.















             


            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f482309%2ffinding-duplicate-files-and-moving-one-copy-to-another-drive-deleting-all-other%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Accessing regular linux commands in Huawei's Dopra Linux

            Can't connect RFCOMM socket: Host is down

            Kernel panic - not syncing: Fatal Exception in Interrupt