Parallelise rsync using GNU Parallel











up vote
10
down vote

favorite
5












I have been using a rsync script to synchronize data at one host with the data at another host. The data has numerous small-sized files that contribute to almost 1.2TB.



In order to sync those files, I have been using rsync command as follows:



rsync -avzm --stats --human-readable --include-from proj.lst /data/projects REMOTEHOST:/data/


The contents of proj.lst are as follows:



+ proj1
+ proj1/*
+ proj1/*/*
+ proj1/*/*/*.tar
+ proj1/*/*/*.pdf
+ proj2
+ proj2/*
+ proj2/*/*
+ proj2/*/*/*.tar
+ proj2/*/*/*.pdf
...
...
...
- *


As a test, I picked up two of those projects (8.5GB of data) and I executed the command above. Being a sequential process, it tool 14 minutes 58 seconds to complete. So, for 1.2TB of data it would take several hours.



If I would could multiple rsync processes in parallel (using &, xargs or parallel), it would save my time.



I tried with below command with parallel (after cding to source directory) and it took 12 minutes 37 seconds to execute:



parallel --will-cite -j 5 rsync -avzm --stats --human-readable {} REMOTEHOST:/data/ ::: .


This should have taken 5 times less time, but it didn't. I think, I'm going wrong somewhere.



How can I run multiple rsync processes in order to reduce the execution time?










share|improve this question


















  • 1




    Are you limited by network bandwidth? Disk iops? Disk bandwidth?
    – Ole Tange
    Mar 13 '15 at 7:25










  • If possible, we would want to use 50% of total bandwidth. But, parallelising multiple rsyncs is our first priority.
    – Mandar Shinde
    Mar 13 '15 at 7:32










  • Can you let us know your: Network bandwidth, disk iops, disk bandwidth, and the bandwidth actually used?
    – Ole Tange
    Mar 13 '15 at 7:41










  • In fact, I do not know about above parameters. For the time being, we can neglect the optimization part. Multiple rsyncs in parallel is the primary focus now.
    – Mandar Shinde
    Mar 13 '15 at 7:47










  • No point in going parallel if the limitation isn't the CPU. It can/will even make matters worse (conflicting disk arm movements on source or target disk).
    – xenoid
    Nov 22 at 15:55















up vote
10
down vote

favorite
5












I have been using a rsync script to synchronize data at one host with the data at another host. The data has numerous small-sized files that contribute to almost 1.2TB.



In order to sync those files, I have been using rsync command as follows:



rsync -avzm --stats --human-readable --include-from proj.lst /data/projects REMOTEHOST:/data/


The contents of proj.lst are as follows:



+ proj1
+ proj1/*
+ proj1/*/*
+ proj1/*/*/*.tar
+ proj1/*/*/*.pdf
+ proj2
+ proj2/*
+ proj2/*/*
+ proj2/*/*/*.tar
+ proj2/*/*/*.pdf
...
...
...
- *


As a test, I picked up two of those projects (8.5GB of data) and I executed the command above. Being a sequential process, it tool 14 minutes 58 seconds to complete. So, for 1.2TB of data it would take several hours.



If I would could multiple rsync processes in parallel (using &, xargs or parallel), it would save my time.



I tried with below command with parallel (after cding to source directory) and it took 12 minutes 37 seconds to execute:



parallel --will-cite -j 5 rsync -avzm --stats --human-readable {} REMOTEHOST:/data/ ::: .


This should have taken 5 times less time, but it didn't. I think, I'm going wrong somewhere.



How can I run multiple rsync processes in order to reduce the execution time?










share|improve this question


















  • 1




    Are you limited by network bandwidth? Disk iops? Disk bandwidth?
    – Ole Tange
    Mar 13 '15 at 7:25










  • If possible, we would want to use 50% of total bandwidth. But, parallelising multiple rsyncs is our first priority.
    – Mandar Shinde
    Mar 13 '15 at 7:32










  • Can you let us know your: Network bandwidth, disk iops, disk bandwidth, and the bandwidth actually used?
    – Ole Tange
    Mar 13 '15 at 7:41










  • In fact, I do not know about above parameters. For the time being, we can neglect the optimization part. Multiple rsyncs in parallel is the primary focus now.
    – Mandar Shinde
    Mar 13 '15 at 7:47










  • No point in going parallel if the limitation isn't the CPU. It can/will even make matters worse (conflicting disk arm movements on source or target disk).
    – xenoid
    Nov 22 at 15:55













up vote
10
down vote

favorite
5









up vote
10
down vote

favorite
5






5





I have been using a rsync script to synchronize data at one host with the data at another host. The data has numerous small-sized files that contribute to almost 1.2TB.



In order to sync those files, I have been using rsync command as follows:



rsync -avzm --stats --human-readable --include-from proj.lst /data/projects REMOTEHOST:/data/


The contents of proj.lst are as follows:



+ proj1
+ proj1/*
+ proj1/*/*
+ proj1/*/*/*.tar
+ proj1/*/*/*.pdf
+ proj2
+ proj2/*
+ proj2/*/*
+ proj2/*/*/*.tar
+ proj2/*/*/*.pdf
...
...
...
- *


As a test, I picked up two of those projects (8.5GB of data) and I executed the command above. Being a sequential process, it tool 14 minutes 58 seconds to complete. So, for 1.2TB of data it would take several hours.



If I would could multiple rsync processes in parallel (using &, xargs or parallel), it would save my time.



I tried with below command with parallel (after cding to source directory) and it took 12 minutes 37 seconds to execute:



parallel --will-cite -j 5 rsync -avzm --stats --human-readable {} REMOTEHOST:/data/ ::: .


This should have taken 5 times less time, but it didn't. I think, I'm going wrong somewhere.



How can I run multiple rsync processes in order to reduce the execution time?










share|improve this question













I have been using a rsync script to synchronize data at one host with the data at another host. The data has numerous small-sized files that contribute to almost 1.2TB.



In order to sync those files, I have been using rsync command as follows:



rsync -avzm --stats --human-readable --include-from proj.lst /data/projects REMOTEHOST:/data/


The contents of proj.lst are as follows:



+ proj1
+ proj1/*
+ proj1/*/*
+ proj1/*/*/*.tar
+ proj1/*/*/*.pdf
+ proj2
+ proj2/*
+ proj2/*/*
+ proj2/*/*/*.tar
+ proj2/*/*/*.pdf
...
...
...
- *


As a test, I picked up two of those projects (8.5GB of data) and I executed the command above. Being a sequential process, it tool 14 minutes 58 seconds to complete. So, for 1.2TB of data it would take several hours.



If I would could multiple rsync processes in parallel (using &, xargs or parallel), it would save my time.



I tried with below command with parallel (after cding to source directory) and it took 12 minutes 37 seconds to execute:



parallel --will-cite -j 5 rsync -avzm --stats --human-readable {} REMOTEHOST:/data/ ::: .


This should have taken 5 times less time, but it didn't. I think, I'm going wrong somewhere.



How can I run multiple rsync processes in order to reduce the execution time?







linux rhel rsync gnu-parallel






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Mar 13 '15 at 6:51









Mandar Shinde

1,40782747




1,40782747








  • 1




    Are you limited by network bandwidth? Disk iops? Disk bandwidth?
    – Ole Tange
    Mar 13 '15 at 7:25










  • If possible, we would want to use 50% of total bandwidth. But, parallelising multiple rsyncs is our first priority.
    – Mandar Shinde
    Mar 13 '15 at 7:32










  • Can you let us know your: Network bandwidth, disk iops, disk bandwidth, and the bandwidth actually used?
    – Ole Tange
    Mar 13 '15 at 7:41










  • In fact, I do not know about above parameters. For the time being, we can neglect the optimization part. Multiple rsyncs in parallel is the primary focus now.
    – Mandar Shinde
    Mar 13 '15 at 7:47










  • No point in going parallel if the limitation isn't the CPU. It can/will even make matters worse (conflicting disk arm movements on source or target disk).
    – xenoid
    Nov 22 at 15:55














  • 1




    Are you limited by network bandwidth? Disk iops? Disk bandwidth?
    – Ole Tange
    Mar 13 '15 at 7:25










  • If possible, we would want to use 50% of total bandwidth. But, parallelising multiple rsyncs is our first priority.
    – Mandar Shinde
    Mar 13 '15 at 7:32










  • Can you let us know your: Network bandwidth, disk iops, disk bandwidth, and the bandwidth actually used?
    – Ole Tange
    Mar 13 '15 at 7:41










  • In fact, I do not know about above parameters. For the time being, we can neglect the optimization part. Multiple rsyncs in parallel is the primary focus now.
    – Mandar Shinde
    Mar 13 '15 at 7:47










  • No point in going parallel if the limitation isn't the CPU. It can/will even make matters worse (conflicting disk arm movements on source or target disk).
    – xenoid
    Nov 22 at 15:55








1




1




Are you limited by network bandwidth? Disk iops? Disk bandwidth?
– Ole Tange
Mar 13 '15 at 7:25




Are you limited by network bandwidth? Disk iops? Disk bandwidth?
– Ole Tange
Mar 13 '15 at 7:25












If possible, we would want to use 50% of total bandwidth. But, parallelising multiple rsyncs is our first priority.
– Mandar Shinde
Mar 13 '15 at 7:32




If possible, we would want to use 50% of total bandwidth. But, parallelising multiple rsyncs is our first priority.
– Mandar Shinde
Mar 13 '15 at 7:32












Can you let us know your: Network bandwidth, disk iops, disk bandwidth, and the bandwidth actually used?
– Ole Tange
Mar 13 '15 at 7:41




Can you let us know your: Network bandwidth, disk iops, disk bandwidth, and the bandwidth actually used?
– Ole Tange
Mar 13 '15 at 7:41












In fact, I do not know about above parameters. For the time being, we can neglect the optimization part. Multiple rsyncs in parallel is the primary focus now.
– Mandar Shinde
Mar 13 '15 at 7:47




In fact, I do not know about above parameters. For the time being, we can neglect the optimization part. Multiple rsyncs in parallel is the primary focus now.
– Mandar Shinde
Mar 13 '15 at 7:47












No point in going parallel if the limitation isn't the CPU. It can/will even make matters worse (conflicting disk arm movements on source or target disk).
– xenoid
Nov 22 at 15:55




No point in going parallel if the limitation isn't the CPU. It can/will even make matters worse (conflicting disk arm movements on source or target disk).
– xenoid
Nov 22 at 15:55










6 Answers
6






active

oldest

votes

















up vote
11
down vote



accepted










Following steps did the job for me:




  1. Run the rsync --dry-run first in order to get the list of files those would be affected.


rsync -avzm --stats --safe-links --ignore-existing --dry-run --human-readable /data/projects REMOTE-HOST:/data/ > /tmp/transfer.log




  1. I fed the output of cat transfer.log to parallel in order to run 5 rsyncs in parallel, as follows:


cat /tmp/transfer.log | parallel --will-cite -j 5 rsync -avzm --relative --stats --safe-links --ignore-existing --human-readable {} REMOTE-HOST:/data/ > result.log



Here, --relative option (link) ensured that the directory structure for the affected files, at the source and destination, remains the same (inside /data/ directory), so the command must be run in the source folder (in example, /data/projects).






share|improve this answer



















  • 4




    That would do an rsync per file. It would probably be more efficient to split up the whole file list using split and feed those filenames to parallel. Then use rsync's --files-from to get the filenames out of each file and sync them. rm backups.* split -l 3000 backup.list backups. ls backups.* | parallel --line-buffer --verbose -j 5 rsync --progress -av --files-from {} /LOCAL/PARENT/PATH/ REMOTE_HOST:REMOTE_PATH/
    – Sandip Bhattacharya
    Nov 17 '16 at 21:22












  • How does the second rsync command handle the lines in result.log that are not files? i.e. receiving file list ... done created directory /data/.
    – Mike D
    Sep 19 '17 at 16:42






  • 1




    On newer versions of rsync (3.1.0+), you can use --info=name in place of -v, and you'll get just the names of the files and directories. You may want to use --protect-args to the 'inner' transferring rsync too if any files might have spaces or shell metacharacters in them.
    – Cheetah
    Oct 12 '17 at 5:31


















up vote
7
down vote













I would strongly discourage anybody from using the accepted answer, a better solution is to crawl the top level directory and launch a proportional number of rync operations.



I have a large zfs volume and my source was was a cifs mount. Both are linked with 10G, and in some benchmarks can saturate the link. Performance was evaluated using zpool iostat 1.



The source drive was mounted like:



mount -t cifs -o username=,password= //static_ip/70tb /mnt/Datahoarder_Mount/ -o vers=3.0


Using a single rsync process:



rsync -h -v -r -P -t /mnt/Datahoarder_Mount/ /StoragePod


the io meter reads:



StoragePod  30.0T   144T      0  1.61K      0   130M
StoragePod 30.0T 144T 0 1.61K 0 130M
StoragePod 30.0T 144T 0 1.62K 0 130M


This in synthetic benchmarks (crystal disk), performance for sequential write approaches 900 MB/s which means the link is saturated. 130MB/s is not very good, and the difference between waiting a weekend and two weeks.



So, I built the file list and tried to run the sync again (I have a 64 core machine):



cat /home/misha/Desktop/rsync_logs_syncs/Datahoarder_Mount.log | parallel --will-cite -j 16 rsync -avzm --relative --stats --safe-links --size-only --human-readable {} /StoragePod/ > /home/misha/Desktop/rsync_logs_syncs/Datahoarder_Mount_result.log


and it had the same performance!



StoragePod  29.9T   144T      0  1.63K      0   130M
StoragePod 29.9T 144T 0 1.62K 0 130M
StoragePod 29.9T 144T 0 1.56K 0 129M


As an alternative I simply ran rsync on the root folders:



rsync -h -v -r -P -t /mnt/Datahoarder_Mount/Mikhail/Marcello_zinc_bone /StoragePod/Marcello_zinc_bone
rsync -h -v -r -P -t /mnt/Datahoarder_Mount/Mikhail/fibroblast_growth /StoragePod/fibroblast_growth
rsync -h -v -r -P -t /mnt/Datahoarder_Mount/Mikhail/QDIC /StoragePod/QDIC
rsync -h -v -r -P -t /mnt/Datahoarder_Mount/Mikhail/sexy_dps_cell /StoragePod/sexy_dps_cell


This actually boosted performance:



StoragePod  30.1T   144T     13  3.66K   112K   343M
StoragePod 30.1T 144T 24 5.11K 184K 469M
StoragePod 30.1T 144T 25 4.30K 196K 373M


In conclusion, as @Sandip Bhattacharya brought up, write a small script to get the directories and parallel that. Alternatively, pass a file list to rsync. But don't create new instances for each file.






share|improve this answer




























    up vote
    6
    down vote













    I personally use this simple one:



    ls -1 | parallel rsync -a {} /destination/directory/


    Which only is usefull when you have more than a few non-near-empty directories, else you'll end up having almost every rsync terminating and the last one doing all the job alone.






    share|improve this answer




























      up vote
      4
      down vote













      A tested way to do the parallelized rsync is: http://www.gnu.org/software/parallel/man.html#EXAMPLE:-Parallelizing-rsync




      rsync is a great tool, but sometimes it will not fill up the available bandwidth. This is often a problem when copying several big files over high speed connections.



      The following will start one rsync per big file in src-dir to dest-dir
      on the server fooserver:



      cd src-dir; find . -type f -size +100000 | 
      parallel -v ssh fooserver mkdir -p /dest-dir/{//};
      rsync -s -Havessh {} fooserver:/dest-dir/{}


      The directories created may end up with wrong permissions and smaller files are not being transferred. To fix those run rsync a final time:



      rsync -Havessh src-dir/ fooserver:/dest-dir/ 


      If you are unable to
      push data, but need to pull them and the files are called digits.png
      (e.g. 000000.png) you might be able to do:



      seq -w 0 99 | parallel rsync -Havessh fooserver:src/*{}.png destdir/






      share|improve this answer























      • Any other alternative in order to avoid find?
        – Mandar Shinde
        Mar 13 '15 at 7:34






      • 1




        Limit the -maxdepth of find.
        – Ole Tange
        Mar 17 '15 at 9:20










      • If I use --dry-run option in rsync, I would have a list of files that would be transferred. Can I provide that file list to parallel in order to parallelise the process?
        – Mandar Shinde
        Apr 10 '15 at 3:47






      • 1




        cat files | parallel -v ssh fooserver mkdir -p /dest-dir/{//}; rsync -s -Havessh {} fooserver:/dest-dir/{}
        – Ole Tange
        Apr 10 '15 at 5:51










      • Can you please explain the mkdir -p /dest-dir/{//}; part? Especially the {//} thing is a bit confusing.
        – Mandar Shinde
        Apr 10 '15 at 9:49


















      up vote
      0
      down vote













      For multi destination syncs, I am using



      parallel rsync -avi /path/to/source ::: host1: host2: host3:


      Hint: All ssh connections are established with public keys in ~/.ssh/authorized_keys






      share|improve this answer




























        up vote
        0
        down vote













        I always google for parallel rsync as I always forget the full command, but no solution worked for me as I wanted - either it includes multiple steps or needs to install parallel. I ended up using this one-liner to sync multiple folders:



        find dir/ -type d|xargs -P 5 -I % sh -c 'rsync -a --delete --bwlimit=50000 $(echo dir/%/ host:/dir/%/)'


        -P 5 is the amount of processes you want to spawn - use 0 for unlimited (obviously not recommended).



        --bwlimit to avoid using all bandwidth.



        -I % argument provided by find (directory found in dir/)



        $(echo dir/%/ host:/dir/%/) - prints source and destination directories which are read by rsync as arguments. % is replaced by xargs with directory name found by find.



        Let's assume I have two directories in /home: dir1 and dir2. I run find /home -type d|xargs -P 5 -I % sh -c 'rsync -a --delete --bwlimit=50000 $(echo /home/%/ host:/home/%/)'. So rsync command will run as two processes (two processes because /home has two directories) with following arguments:



        rsync -a --delete --bwlimit=50000 /home/dir1/ host:/home/dir1/
        rsync -a --delete --bwlimit=50000 /home/dir1/ host:/home/dir1/





        share|improve this answer










        New contributor




        Sebastjanas is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.


















        • OK, can you explain $(echo dir/%/ host:/dir/%/) now?   Please do not respond in comments; edit your answer to make it clearer and more complete.
          – Scott
          Nov 22 at 16:16











        Your Answer








        StackExchange.ready(function() {
        var channelOptions = {
        tags: "".split(" "),
        id: "106"
        };
        initTagRenderer("".split(" "), "".split(" "), channelOptions);

        StackExchange.using("externalEditor", function() {
        // Have to fire editor after snippets, if snippets enabled
        if (StackExchange.settings.snippets.snippetsEnabled) {
        StackExchange.using("snippets", function() {
        createEditor();
        });
        }
        else {
        createEditor();
        }
        });

        function createEditor() {
        StackExchange.prepareEditor({
        heartbeatType: 'answer',
        convertImagesToLinks: false,
        noModals: true,
        showLowRepImageUploadWarning: true,
        reputationToPostImages: null,
        bindNavPrevention: true,
        postfix: "",
        imageUploader: {
        brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
        contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
        allowUrls: true
        },
        onDemand: true,
        discardSelector: ".discard-answer"
        ,immediatelyShowMarkdownHelp:true
        });


        }
        });














         

        draft saved


        draft discarded


















        StackExchange.ready(
        function () {
        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f189878%2fparallelise-rsync-using-gnu-parallel%23new-answer', 'question_page');
        }
        );

        Post as a guest















        Required, but never shown

























        6 Answers
        6






        active

        oldest

        votes








        6 Answers
        6






        active

        oldest

        votes









        active

        oldest

        votes






        active

        oldest

        votes








        up vote
        11
        down vote



        accepted










        Following steps did the job for me:




        1. Run the rsync --dry-run first in order to get the list of files those would be affected.


        rsync -avzm --stats --safe-links --ignore-existing --dry-run --human-readable /data/projects REMOTE-HOST:/data/ > /tmp/transfer.log




        1. I fed the output of cat transfer.log to parallel in order to run 5 rsyncs in parallel, as follows:


        cat /tmp/transfer.log | parallel --will-cite -j 5 rsync -avzm --relative --stats --safe-links --ignore-existing --human-readable {} REMOTE-HOST:/data/ > result.log



        Here, --relative option (link) ensured that the directory structure for the affected files, at the source and destination, remains the same (inside /data/ directory), so the command must be run in the source folder (in example, /data/projects).






        share|improve this answer



















        • 4




          That would do an rsync per file. It would probably be more efficient to split up the whole file list using split and feed those filenames to parallel. Then use rsync's --files-from to get the filenames out of each file and sync them. rm backups.* split -l 3000 backup.list backups. ls backups.* | parallel --line-buffer --verbose -j 5 rsync --progress -av --files-from {} /LOCAL/PARENT/PATH/ REMOTE_HOST:REMOTE_PATH/
          – Sandip Bhattacharya
          Nov 17 '16 at 21:22












        • How does the second rsync command handle the lines in result.log that are not files? i.e. receiving file list ... done created directory /data/.
          – Mike D
          Sep 19 '17 at 16:42






        • 1




          On newer versions of rsync (3.1.0+), you can use --info=name in place of -v, and you'll get just the names of the files and directories. You may want to use --protect-args to the 'inner' transferring rsync too if any files might have spaces or shell metacharacters in them.
          – Cheetah
          Oct 12 '17 at 5:31















        up vote
        11
        down vote



        accepted










        Following steps did the job for me:




        1. Run the rsync --dry-run first in order to get the list of files those would be affected.


        rsync -avzm --stats --safe-links --ignore-existing --dry-run --human-readable /data/projects REMOTE-HOST:/data/ > /tmp/transfer.log




        1. I fed the output of cat transfer.log to parallel in order to run 5 rsyncs in parallel, as follows:


        cat /tmp/transfer.log | parallel --will-cite -j 5 rsync -avzm --relative --stats --safe-links --ignore-existing --human-readable {} REMOTE-HOST:/data/ > result.log



        Here, --relative option (link) ensured that the directory structure for the affected files, at the source and destination, remains the same (inside /data/ directory), so the command must be run in the source folder (in example, /data/projects).






        share|improve this answer



















        • 4




          That would do an rsync per file. It would probably be more efficient to split up the whole file list using split and feed those filenames to parallel. Then use rsync's --files-from to get the filenames out of each file and sync them. rm backups.* split -l 3000 backup.list backups. ls backups.* | parallel --line-buffer --verbose -j 5 rsync --progress -av --files-from {} /LOCAL/PARENT/PATH/ REMOTE_HOST:REMOTE_PATH/
          – Sandip Bhattacharya
          Nov 17 '16 at 21:22












        • How does the second rsync command handle the lines in result.log that are not files? i.e. receiving file list ... done created directory /data/.
          – Mike D
          Sep 19 '17 at 16:42






        • 1




          On newer versions of rsync (3.1.0+), you can use --info=name in place of -v, and you'll get just the names of the files and directories. You may want to use --protect-args to the 'inner' transferring rsync too if any files might have spaces or shell metacharacters in them.
          – Cheetah
          Oct 12 '17 at 5:31













        up vote
        11
        down vote



        accepted







        up vote
        11
        down vote



        accepted






        Following steps did the job for me:




        1. Run the rsync --dry-run first in order to get the list of files those would be affected.


        rsync -avzm --stats --safe-links --ignore-existing --dry-run --human-readable /data/projects REMOTE-HOST:/data/ > /tmp/transfer.log




        1. I fed the output of cat transfer.log to parallel in order to run 5 rsyncs in parallel, as follows:


        cat /tmp/transfer.log | parallel --will-cite -j 5 rsync -avzm --relative --stats --safe-links --ignore-existing --human-readable {} REMOTE-HOST:/data/ > result.log



        Here, --relative option (link) ensured that the directory structure for the affected files, at the source and destination, remains the same (inside /data/ directory), so the command must be run in the source folder (in example, /data/projects).






        share|improve this answer














        Following steps did the job for me:




        1. Run the rsync --dry-run first in order to get the list of files those would be affected.


        rsync -avzm --stats --safe-links --ignore-existing --dry-run --human-readable /data/projects REMOTE-HOST:/data/ > /tmp/transfer.log




        1. I fed the output of cat transfer.log to parallel in order to run 5 rsyncs in parallel, as follows:


        cat /tmp/transfer.log | parallel --will-cite -j 5 rsync -avzm --relative --stats --safe-links --ignore-existing --human-readable {} REMOTE-HOST:/data/ > result.log



        Here, --relative option (link) ensured that the directory structure for the affected files, at the source and destination, remains the same (inside /data/ directory), so the command must be run in the source folder (in example, /data/projects).







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Apr 13 '17 at 12:36









        Community

        1




        1










        answered Apr 11 '15 at 13:53









        Mandar Shinde

        1,40782747




        1,40782747








        • 4




          That would do an rsync per file. It would probably be more efficient to split up the whole file list using split and feed those filenames to parallel. Then use rsync's --files-from to get the filenames out of each file and sync them. rm backups.* split -l 3000 backup.list backups. ls backups.* | parallel --line-buffer --verbose -j 5 rsync --progress -av --files-from {} /LOCAL/PARENT/PATH/ REMOTE_HOST:REMOTE_PATH/
          – Sandip Bhattacharya
          Nov 17 '16 at 21:22












        • How does the second rsync command handle the lines in result.log that are not files? i.e. receiving file list ... done created directory /data/.
          – Mike D
          Sep 19 '17 at 16:42






        • 1




          On newer versions of rsync (3.1.0+), you can use --info=name in place of -v, and you'll get just the names of the files and directories. You may want to use --protect-args to the 'inner' transferring rsync too if any files might have spaces or shell metacharacters in them.
          – Cheetah
          Oct 12 '17 at 5:31














        • 4




          That would do an rsync per file. It would probably be more efficient to split up the whole file list using split and feed those filenames to parallel. Then use rsync's --files-from to get the filenames out of each file and sync them. rm backups.* split -l 3000 backup.list backups. ls backups.* | parallel --line-buffer --verbose -j 5 rsync --progress -av --files-from {} /LOCAL/PARENT/PATH/ REMOTE_HOST:REMOTE_PATH/
          – Sandip Bhattacharya
          Nov 17 '16 at 21:22












        • How does the second rsync command handle the lines in result.log that are not files? i.e. receiving file list ... done created directory /data/.
          – Mike D
          Sep 19 '17 at 16:42






        • 1




          On newer versions of rsync (3.1.0+), you can use --info=name in place of -v, and you'll get just the names of the files and directories. You may want to use --protect-args to the 'inner' transferring rsync too if any files might have spaces or shell metacharacters in them.
          – Cheetah
          Oct 12 '17 at 5:31








        4




        4




        That would do an rsync per file. It would probably be more efficient to split up the whole file list using split and feed those filenames to parallel. Then use rsync's --files-from to get the filenames out of each file and sync them. rm backups.* split -l 3000 backup.list backups. ls backups.* | parallel --line-buffer --verbose -j 5 rsync --progress -av --files-from {} /LOCAL/PARENT/PATH/ REMOTE_HOST:REMOTE_PATH/
        – Sandip Bhattacharya
        Nov 17 '16 at 21:22






        That would do an rsync per file. It would probably be more efficient to split up the whole file list using split and feed those filenames to parallel. Then use rsync's --files-from to get the filenames out of each file and sync them. rm backups.* split -l 3000 backup.list backups. ls backups.* | parallel --line-buffer --verbose -j 5 rsync --progress -av --files-from {} /LOCAL/PARENT/PATH/ REMOTE_HOST:REMOTE_PATH/
        – Sandip Bhattacharya
        Nov 17 '16 at 21:22














        How does the second rsync command handle the lines in result.log that are not files? i.e. receiving file list ... done created directory /data/.
        – Mike D
        Sep 19 '17 at 16:42




        How does the second rsync command handle the lines in result.log that are not files? i.e. receiving file list ... done created directory /data/.
        – Mike D
        Sep 19 '17 at 16:42




        1




        1




        On newer versions of rsync (3.1.0+), you can use --info=name in place of -v, and you'll get just the names of the files and directories. You may want to use --protect-args to the 'inner' transferring rsync too if any files might have spaces or shell metacharacters in them.
        – Cheetah
        Oct 12 '17 at 5:31




        On newer versions of rsync (3.1.0+), you can use --info=name in place of -v, and you'll get just the names of the files and directories. You may want to use --protect-args to the 'inner' transferring rsync too if any files might have spaces or shell metacharacters in them.
        – Cheetah
        Oct 12 '17 at 5:31












        up vote
        7
        down vote













        I would strongly discourage anybody from using the accepted answer, a better solution is to crawl the top level directory and launch a proportional number of rync operations.



        I have a large zfs volume and my source was was a cifs mount. Both are linked with 10G, and in some benchmarks can saturate the link. Performance was evaluated using zpool iostat 1.



        The source drive was mounted like:



        mount -t cifs -o username=,password= //static_ip/70tb /mnt/Datahoarder_Mount/ -o vers=3.0


        Using a single rsync process:



        rsync -h -v -r -P -t /mnt/Datahoarder_Mount/ /StoragePod


        the io meter reads:



        StoragePod  30.0T   144T      0  1.61K      0   130M
        StoragePod 30.0T 144T 0 1.61K 0 130M
        StoragePod 30.0T 144T 0 1.62K 0 130M


        This in synthetic benchmarks (crystal disk), performance for sequential write approaches 900 MB/s which means the link is saturated. 130MB/s is not very good, and the difference between waiting a weekend and two weeks.



        So, I built the file list and tried to run the sync again (I have a 64 core machine):



        cat /home/misha/Desktop/rsync_logs_syncs/Datahoarder_Mount.log | parallel --will-cite -j 16 rsync -avzm --relative --stats --safe-links --size-only --human-readable {} /StoragePod/ > /home/misha/Desktop/rsync_logs_syncs/Datahoarder_Mount_result.log


        and it had the same performance!



        StoragePod  29.9T   144T      0  1.63K      0   130M
        StoragePod 29.9T 144T 0 1.62K 0 130M
        StoragePod 29.9T 144T 0 1.56K 0 129M


        As an alternative I simply ran rsync on the root folders:



        rsync -h -v -r -P -t /mnt/Datahoarder_Mount/Mikhail/Marcello_zinc_bone /StoragePod/Marcello_zinc_bone
        rsync -h -v -r -P -t /mnt/Datahoarder_Mount/Mikhail/fibroblast_growth /StoragePod/fibroblast_growth
        rsync -h -v -r -P -t /mnt/Datahoarder_Mount/Mikhail/QDIC /StoragePod/QDIC
        rsync -h -v -r -P -t /mnt/Datahoarder_Mount/Mikhail/sexy_dps_cell /StoragePod/sexy_dps_cell


        This actually boosted performance:



        StoragePod  30.1T   144T     13  3.66K   112K   343M
        StoragePod 30.1T 144T 24 5.11K 184K 469M
        StoragePod 30.1T 144T 25 4.30K 196K 373M


        In conclusion, as @Sandip Bhattacharya brought up, write a small script to get the directories and parallel that. Alternatively, pass a file list to rsync. But don't create new instances for each file.






        share|improve this answer

























          up vote
          7
          down vote













          I would strongly discourage anybody from using the accepted answer, a better solution is to crawl the top level directory and launch a proportional number of rync operations.



          I have a large zfs volume and my source was was a cifs mount. Both are linked with 10G, and in some benchmarks can saturate the link. Performance was evaluated using zpool iostat 1.



          The source drive was mounted like:



          mount -t cifs -o username=,password= //static_ip/70tb /mnt/Datahoarder_Mount/ -o vers=3.0


          Using a single rsync process:



          rsync -h -v -r -P -t /mnt/Datahoarder_Mount/ /StoragePod


          the io meter reads:



          StoragePod  30.0T   144T      0  1.61K      0   130M
          StoragePod 30.0T 144T 0 1.61K 0 130M
          StoragePod 30.0T 144T 0 1.62K 0 130M


          This in synthetic benchmarks (crystal disk), performance for sequential write approaches 900 MB/s which means the link is saturated. 130MB/s is not very good, and the difference between waiting a weekend and two weeks.



          So, I built the file list and tried to run the sync again (I have a 64 core machine):



          cat /home/misha/Desktop/rsync_logs_syncs/Datahoarder_Mount.log | parallel --will-cite -j 16 rsync -avzm --relative --stats --safe-links --size-only --human-readable {} /StoragePod/ > /home/misha/Desktop/rsync_logs_syncs/Datahoarder_Mount_result.log


          and it had the same performance!



          StoragePod  29.9T   144T      0  1.63K      0   130M
          StoragePod 29.9T 144T 0 1.62K 0 130M
          StoragePod 29.9T 144T 0 1.56K 0 129M


          As an alternative I simply ran rsync on the root folders:



          rsync -h -v -r -P -t /mnt/Datahoarder_Mount/Mikhail/Marcello_zinc_bone /StoragePod/Marcello_zinc_bone
          rsync -h -v -r -P -t /mnt/Datahoarder_Mount/Mikhail/fibroblast_growth /StoragePod/fibroblast_growth
          rsync -h -v -r -P -t /mnt/Datahoarder_Mount/Mikhail/QDIC /StoragePod/QDIC
          rsync -h -v -r -P -t /mnt/Datahoarder_Mount/Mikhail/sexy_dps_cell /StoragePod/sexy_dps_cell


          This actually boosted performance:



          StoragePod  30.1T   144T     13  3.66K   112K   343M
          StoragePod 30.1T 144T 24 5.11K 184K 469M
          StoragePod 30.1T 144T 25 4.30K 196K 373M


          In conclusion, as @Sandip Bhattacharya brought up, write a small script to get the directories and parallel that. Alternatively, pass a file list to rsync. But don't create new instances for each file.






          share|improve this answer























            up vote
            7
            down vote










            up vote
            7
            down vote









            I would strongly discourage anybody from using the accepted answer, a better solution is to crawl the top level directory and launch a proportional number of rync operations.



            I have a large zfs volume and my source was was a cifs mount. Both are linked with 10G, and in some benchmarks can saturate the link. Performance was evaluated using zpool iostat 1.



            The source drive was mounted like:



            mount -t cifs -o username=,password= //static_ip/70tb /mnt/Datahoarder_Mount/ -o vers=3.0


            Using a single rsync process:



            rsync -h -v -r -P -t /mnt/Datahoarder_Mount/ /StoragePod


            the io meter reads:



            StoragePod  30.0T   144T      0  1.61K      0   130M
            StoragePod 30.0T 144T 0 1.61K 0 130M
            StoragePod 30.0T 144T 0 1.62K 0 130M


            This in synthetic benchmarks (crystal disk), performance for sequential write approaches 900 MB/s which means the link is saturated. 130MB/s is not very good, and the difference between waiting a weekend and two weeks.



            So, I built the file list and tried to run the sync again (I have a 64 core machine):



            cat /home/misha/Desktop/rsync_logs_syncs/Datahoarder_Mount.log | parallel --will-cite -j 16 rsync -avzm --relative --stats --safe-links --size-only --human-readable {} /StoragePod/ > /home/misha/Desktop/rsync_logs_syncs/Datahoarder_Mount_result.log


            and it had the same performance!



            StoragePod  29.9T   144T      0  1.63K      0   130M
            StoragePod 29.9T 144T 0 1.62K 0 130M
            StoragePod 29.9T 144T 0 1.56K 0 129M


            As an alternative I simply ran rsync on the root folders:



            rsync -h -v -r -P -t /mnt/Datahoarder_Mount/Mikhail/Marcello_zinc_bone /StoragePod/Marcello_zinc_bone
            rsync -h -v -r -P -t /mnt/Datahoarder_Mount/Mikhail/fibroblast_growth /StoragePod/fibroblast_growth
            rsync -h -v -r -P -t /mnt/Datahoarder_Mount/Mikhail/QDIC /StoragePod/QDIC
            rsync -h -v -r -P -t /mnt/Datahoarder_Mount/Mikhail/sexy_dps_cell /StoragePod/sexy_dps_cell


            This actually boosted performance:



            StoragePod  30.1T   144T     13  3.66K   112K   343M
            StoragePod 30.1T 144T 24 5.11K 184K 469M
            StoragePod 30.1T 144T 25 4.30K 196K 373M


            In conclusion, as @Sandip Bhattacharya brought up, write a small script to get the directories and parallel that. Alternatively, pass a file list to rsync. But don't create new instances for each file.






            share|improve this answer












            I would strongly discourage anybody from using the accepted answer, a better solution is to crawl the top level directory and launch a proportional number of rync operations.



            I have a large zfs volume and my source was was a cifs mount. Both are linked with 10G, and in some benchmarks can saturate the link. Performance was evaluated using zpool iostat 1.



            The source drive was mounted like:



            mount -t cifs -o username=,password= //static_ip/70tb /mnt/Datahoarder_Mount/ -o vers=3.0


            Using a single rsync process:



            rsync -h -v -r -P -t /mnt/Datahoarder_Mount/ /StoragePod


            the io meter reads:



            StoragePod  30.0T   144T      0  1.61K      0   130M
            StoragePod 30.0T 144T 0 1.61K 0 130M
            StoragePod 30.0T 144T 0 1.62K 0 130M


            This in synthetic benchmarks (crystal disk), performance for sequential write approaches 900 MB/s which means the link is saturated. 130MB/s is not very good, and the difference between waiting a weekend and two weeks.



            So, I built the file list and tried to run the sync again (I have a 64 core machine):



            cat /home/misha/Desktop/rsync_logs_syncs/Datahoarder_Mount.log | parallel --will-cite -j 16 rsync -avzm --relative --stats --safe-links --size-only --human-readable {} /StoragePod/ > /home/misha/Desktop/rsync_logs_syncs/Datahoarder_Mount_result.log


            and it had the same performance!



            StoragePod  29.9T   144T      0  1.63K      0   130M
            StoragePod 29.9T 144T 0 1.62K 0 130M
            StoragePod 29.9T 144T 0 1.56K 0 129M


            As an alternative I simply ran rsync on the root folders:



            rsync -h -v -r -P -t /mnt/Datahoarder_Mount/Mikhail/Marcello_zinc_bone /StoragePod/Marcello_zinc_bone
            rsync -h -v -r -P -t /mnt/Datahoarder_Mount/Mikhail/fibroblast_growth /StoragePod/fibroblast_growth
            rsync -h -v -r -P -t /mnt/Datahoarder_Mount/Mikhail/QDIC /StoragePod/QDIC
            rsync -h -v -r -P -t /mnt/Datahoarder_Mount/Mikhail/sexy_dps_cell /StoragePod/sexy_dps_cell


            This actually boosted performance:



            StoragePod  30.1T   144T     13  3.66K   112K   343M
            StoragePod 30.1T 144T 24 5.11K 184K 469M
            StoragePod 30.1T 144T 25 4.30K 196K 373M


            In conclusion, as @Sandip Bhattacharya brought up, write a small script to get the directories and parallel that. Alternatively, pass a file list to rsync. But don't create new instances for each file.







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Apr 10 '17 at 3:28









            Mikhail

            17013




            17013






















                up vote
                6
                down vote













                I personally use this simple one:



                ls -1 | parallel rsync -a {} /destination/directory/


                Which only is usefull when you have more than a few non-near-empty directories, else you'll end up having almost every rsync terminating and the last one doing all the job alone.






                share|improve this answer

























                  up vote
                  6
                  down vote













                  I personally use this simple one:



                  ls -1 | parallel rsync -a {} /destination/directory/


                  Which only is usefull when you have more than a few non-near-empty directories, else you'll end up having almost every rsync terminating and the last one doing all the job alone.






                  share|improve this answer























                    up vote
                    6
                    down vote










                    up vote
                    6
                    down vote









                    I personally use this simple one:



                    ls -1 | parallel rsync -a {} /destination/directory/


                    Which only is usefull when you have more than a few non-near-empty directories, else you'll end up having almost every rsync terminating and the last one doing all the job alone.






                    share|improve this answer












                    I personally use this simple one:



                    ls -1 | parallel rsync -a {} /destination/directory/


                    Which only is usefull when you have more than a few non-near-empty directories, else you'll end up having almost every rsync terminating and the last one doing all the job alone.







                    share|improve this answer












                    share|improve this answer



                    share|improve this answer










                    answered May 25 '16 at 14:15









                    Julien Palard

                    26635




                    26635






















                        up vote
                        4
                        down vote













                        A tested way to do the parallelized rsync is: http://www.gnu.org/software/parallel/man.html#EXAMPLE:-Parallelizing-rsync




                        rsync is a great tool, but sometimes it will not fill up the available bandwidth. This is often a problem when copying several big files over high speed connections.



                        The following will start one rsync per big file in src-dir to dest-dir
                        on the server fooserver:



                        cd src-dir; find . -type f -size +100000 | 
                        parallel -v ssh fooserver mkdir -p /dest-dir/{//};
                        rsync -s -Havessh {} fooserver:/dest-dir/{}


                        The directories created may end up with wrong permissions and smaller files are not being transferred. To fix those run rsync a final time:



                        rsync -Havessh src-dir/ fooserver:/dest-dir/ 


                        If you are unable to
                        push data, but need to pull them and the files are called digits.png
                        (e.g. 000000.png) you might be able to do:



                        seq -w 0 99 | parallel rsync -Havessh fooserver:src/*{}.png destdir/






                        share|improve this answer























                        • Any other alternative in order to avoid find?
                          – Mandar Shinde
                          Mar 13 '15 at 7:34






                        • 1




                          Limit the -maxdepth of find.
                          – Ole Tange
                          Mar 17 '15 at 9:20










                        • If I use --dry-run option in rsync, I would have a list of files that would be transferred. Can I provide that file list to parallel in order to parallelise the process?
                          – Mandar Shinde
                          Apr 10 '15 at 3:47






                        • 1




                          cat files | parallel -v ssh fooserver mkdir -p /dest-dir/{//}; rsync -s -Havessh {} fooserver:/dest-dir/{}
                          – Ole Tange
                          Apr 10 '15 at 5:51










                        • Can you please explain the mkdir -p /dest-dir/{//}; part? Especially the {//} thing is a bit confusing.
                          – Mandar Shinde
                          Apr 10 '15 at 9:49















                        up vote
                        4
                        down vote













                        A tested way to do the parallelized rsync is: http://www.gnu.org/software/parallel/man.html#EXAMPLE:-Parallelizing-rsync




                        rsync is a great tool, but sometimes it will not fill up the available bandwidth. This is often a problem when copying several big files over high speed connections.



                        The following will start one rsync per big file in src-dir to dest-dir
                        on the server fooserver:



                        cd src-dir; find . -type f -size +100000 | 
                        parallel -v ssh fooserver mkdir -p /dest-dir/{//};
                        rsync -s -Havessh {} fooserver:/dest-dir/{}


                        The directories created may end up with wrong permissions and smaller files are not being transferred. To fix those run rsync a final time:



                        rsync -Havessh src-dir/ fooserver:/dest-dir/ 


                        If you are unable to
                        push data, but need to pull them and the files are called digits.png
                        (e.g. 000000.png) you might be able to do:



                        seq -w 0 99 | parallel rsync -Havessh fooserver:src/*{}.png destdir/






                        share|improve this answer























                        • Any other alternative in order to avoid find?
                          – Mandar Shinde
                          Mar 13 '15 at 7:34






                        • 1




                          Limit the -maxdepth of find.
                          – Ole Tange
                          Mar 17 '15 at 9:20










                        • If I use --dry-run option in rsync, I would have a list of files that would be transferred. Can I provide that file list to parallel in order to parallelise the process?
                          – Mandar Shinde
                          Apr 10 '15 at 3:47






                        • 1




                          cat files | parallel -v ssh fooserver mkdir -p /dest-dir/{//}; rsync -s -Havessh {} fooserver:/dest-dir/{}
                          – Ole Tange
                          Apr 10 '15 at 5:51










                        • Can you please explain the mkdir -p /dest-dir/{//}; part? Especially the {//} thing is a bit confusing.
                          – Mandar Shinde
                          Apr 10 '15 at 9:49













                        up vote
                        4
                        down vote










                        up vote
                        4
                        down vote









                        A tested way to do the parallelized rsync is: http://www.gnu.org/software/parallel/man.html#EXAMPLE:-Parallelizing-rsync




                        rsync is a great tool, but sometimes it will not fill up the available bandwidth. This is often a problem when copying several big files over high speed connections.



                        The following will start one rsync per big file in src-dir to dest-dir
                        on the server fooserver:



                        cd src-dir; find . -type f -size +100000 | 
                        parallel -v ssh fooserver mkdir -p /dest-dir/{//};
                        rsync -s -Havessh {} fooserver:/dest-dir/{}


                        The directories created may end up with wrong permissions and smaller files are not being transferred. To fix those run rsync a final time:



                        rsync -Havessh src-dir/ fooserver:/dest-dir/ 


                        If you are unable to
                        push data, but need to pull them and the files are called digits.png
                        (e.g. 000000.png) you might be able to do:



                        seq -w 0 99 | parallel rsync -Havessh fooserver:src/*{}.png destdir/






                        share|improve this answer














                        A tested way to do the parallelized rsync is: http://www.gnu.org/software/parallel/man.html#EXAMPLE:-Parallelizing-rsync




                        rsync is a great tool, but sometimes it will not fill up the available bandwidth. This is often a problem when copying several big files over high speed connections.



                        The following will start one rsync per big file in src-dir to dest-dir
                        on the server fooserver:



                        cd src-dir; find . -type f -size +100000 | 
                        parallel -v ssh fooserver mkdir -p /dest-dir/{//};
                        rsync -s -Havessh {} fooserver:/dest-dir/{}


                        The directories created may end up with wrong permissions and smaller files are not being transferred. To fix those run rsync a final time:



                        rsync -Havessh src-dir/ fooserver:/dest-dir/ 


                        If you are unable to
                        push data, but need to pull them and the files are called digits.png
                        (e.g. 000000.png) you might be able to do:



                        seq -w 0 99 | parallel rsync -Havessh fooserver:src/*{}.png destdir/







                        share|improve this answer














                        share|improve this answer



                        share|improve this answer








                        edited Dec 11 '17 at 7:04









                        Ryan Long

                        1032




                        1032










                        answered Mar 13 '15 at 7:25









                        Ole Tange

                        11.8k1448105




                        11.8k1448105












                        • Any other alternative in order to avoid find?
                          – Mandar Shinde
                          Mar 13 '15 at 7:34






                        • 1




                          Limit the -maxdepth of find.
                          – Ole Tange
                          Mar 17 '15 at 9:20










                        • If I use --dry-run option in rsync, I would have a list of files that would be transferred. Can I provide that file list to parallel in order to parallelise the process?
                          – Mandar Shinde
                          Apr 10 '15 at 3:47






                        • 1




                          cat files | parallel -v ssh fooserver mkdir -p /dest-dir/{//}; rsync -s -Havessh {} fooserver:/dest-dir/{}
                          – Ole Tange
                          Apr 10 '15 at 5:51










                        • Can you please explain the mkdir -p /dest-dir/{//}; part? Especially the {//} thing is a bit confusing.
                          – Mandar Shinde
                          Apr 10 '15 at 9:49


















                        • Any other alternative in order to avoid find?
                          – Mandar Shinde
                          Mar 13 '15 at 7:34






                        • 1




                          Limit the -maxdepth of find.
                          – Ole Tange
                          Mar 17 '15 at 9:20










                        • If I use --dry-run option in rsync, I would have a list of files that would be transferred. Can I provide that file list to parallel in order to parallelise the process?
                          – Mandar Shinde
                          Apr 10 '15 at 3:47






                        • 1




                          cat files | parallel -v ssh fooserver mkdir -p /dest-dir/{//}; rsync -s -Havessh {} fooserver:/dest-dir/{}
                          – Ole Tange
                          Apr 10 '15 at 5:51










                        • Can you please explain the mkdir -p /dest-dir/{//}; part? Especially the {//} thing is a bit confusing.
                          – Mandar Shinde
                          Apr 10 '15 at 9:49
















                        Any other alternative in order to avoid find?
                        – Mandar Shinde
                        Mar 13 '15 at 7:34




                        Any other alternative in order to avoid find?
                        – Mandar Shinde
                        Mar 13 '15 at 7:34




                        1




                        1




                        Limit the -maxdepth of find.
                        – Ole Tange
                        Mar 17 '15 at 9:20




                        Limit the -maxdepth of find.
                        – Ole Tange
                        Mar 17 '15 at 9:20












                        If I use --dry-run option in rsync, I would have a list of files that would be transferred. Can I provide that file list to parallel in order to parallelise the process?
                        – Mandar Shinde
                        Apr 10 '15 at 3:47




                        If I use --dry-run option in rsync, I would have a list of files that would be transferred. Can I provide that file list to parallel in order to parallelise the process?
                        – Mandar Shinde
                        Apr 10 '15 at 3:47




                        1




                        1




                        cat files | parallel -v ssh fooserver mkdir -p /dest-dir/{//}; rsync -s -Havessh {} fooserver:/dest-dir/{}
                        – Ole Tange
                        Apr 10 '15 at 5:51




                        cat files | parallel -v ssh fooserver mkdir -p /dest-dir/{//}; rsync -s -Havessh {} fooserver:/dest-dir/{}
                        – Ole Tange
                        Apr 10 '15 at 5:51












                        Can you please explain the mkdir -p /dest-dir/{//}; part? Especially the {//} thing is a bit confusing.
                        – Mandar Shinde
                        Apr 10 '15 at 9:49




                        Can you please explain the mkdir -p /dest-dir/{//}; part? Especially the {//} thing is a bit confusing.
                        – Mandar Shinde
                        Apr 10 '15 at 9:49










                        up vote
                        0
                        down vote













                        For multi destination syncs, I am using



                        parallel rsync -avi /path/to/source ::: host1: host2: host3:


                        Hint: All ssh connections are established with public keys in ~/.ssh/authorized_keys






                        share|improve this answer

























                          up vote
                          0
                          down vote













                          For multi destination syncs, I am using



                          parallel rsync -avi /path/to/source ::: host1: host2: host3:


                          Hint: All ssh connections are established with public keys in ~/.ssh/authorized_keys






                          share|improve this answer























                            up vote
                            0
                            down vote










                            up vote
                            0
                            down vote









                            For multi destination syncs, I am using



                            parallel rsync -avi /path/to/source ::: host1: host2: host3:


                            Hint: All ssh connections are established with public keys in ~/.ssh/authorized_keys






                            share|improve this answer












                            For multi destination syncs, I am using



                            parallel rsync -avi /path/to/source ::: host1: host2: host3:


                            Hint: All ssh connections are established with public keys in ~/.ssh/authorized_keys







                            share|improve this answer












                            share|improve this answer



                            share|improve this answer










                            answered Apr 10 '17 at 6:37









                            ingopingo

                            61944




                            61944






















                                up vote
                                0
                                down vote













                                I always google for parallel rsync as I always forget the full command, but no solution worked for me as I wanted - either it includes multiple steps or needs to install parallel. I ended up using this one-liner to sync multiple folders:



                                find dir/ -type d|xargs -P 5 -I % sh -c 'rsync -a --delete --bwlimit=50000 $(echo dir/%/ host:/dir/%/)'


                                -P 5 is the amount of processes you want to spawn - use 0 for unlimited (obviously not recommended).



                                --bwlimit to avoid using all bandwidth.



                                -I % argument provided by find (directory found in dir/)



                                $(echo dir/%/ host:/dir/%/) - prints source and destination directories which are read by rsync as arguments. % is replaced by xargs with directory name found by find.



                                Let's assume I have two directories in /home: dir1 and dir2. I run find /home -type d|xargs -P 5 -I % sh -c 'rsync -a --delete --bwlimit=50000 $(echo /home/%/ host:/home/%/)'. So rsync command will run as two processes (two processes because /home has two directories) with following arguments:



                                rsync -a --delete --bwlimit=50000 /home/dir1/ host:/home/dir1/
                                rsync -a --delete --bwlimit=50000 /home/dir1/ host:/home/dir1/





                                share|improve this answer










                                New contributor




                                Sebastjanas is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                Check out our Code of Conduct.


















                                • OK, can you explain $(echo dir/%/ host:/dir/%/) now?   Please do not respond in comments; edit your answer to make it clearer and more complete.
                                  – Scott
                                  Nov 22 at 16:16















                                up vote
                                0
                                down vote













                                I always google for parallel rsync as I always forget the full command, but no solution worked for me as I wanted - either it includes multiple steps or needs to install parallel. I ended up using this one-liner to sync multiple folders:



                                find dir/ -type d|xargs -P 5 -I % sh -c 'rsync -a --delete --bwlimit=50000 $(echo dir/%/ host:/dir/%/)'


                                -P 5 is the amount of processes you want to spawn - use 0 for unlimited (obviously not recommended).



                                --bwlimit to avoid using all bandwidth.



                                -I % argument provided by find (directory found in dir/)



                                $(echo dir/%/ host:/dir/%/) - prints source and destination directories which are read by rsync as arguments. % is replaced by xargs with directory name found by find.



                                Let's assume I have two directories in /home: dir1 and dir2. I run find /home -type d|xargs -P 5 -I % sh -c 'rsync -a --delete --bwlimit=50000 $(echo /home/%/ host:/home/%/)'. So rsync command will run as two processes (two processes because /home has two directories) with following arguments:



                                rsync -a --delete --bwlimit=50000 /home/dir1/ host:/home/dir1/
                                rsync -a --delete --bwlimit=50000 /home/dir1/ host:/home/dir1/





                                share|improve this answer










                                New contributor




                                Sebastjanas is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                Check out our Code of Conduct.


















                                • OK, can you explain $(echo dir/%/ host:/dir/%/) now?   Please do not respond in comments; edit your answer to make it clearer and more complete.
                                  – Scott
                                  Nov 22 at 16:16













                                up vote
                                0
                                down vote










                                up vote
                                0
                                down vote









                                I always google for parallel rsync as I always forget the full command, but no solution worked for me as I wanted - either it includes multiple steps or needs to install parallel. I ended up using this one-liner to sync multiple folders:



                                find dir/ -type d|xargs -P 5 -I % sh -c 'rsync -a --delete --bwlimit=50000 $(echo dir/%/ host:/dir/%/)'


                                -P 5 is the amount of processes you want to spawn - use 0 for unlimited (obviously not recommended).



                                --bwlimit to avoid using all bandwidth.



                                -I % argument provided by find (directory found in dir/)



                                $(echo dir/%/ host:/dir/%/) - prints source and destination directories which are read by rsync as arguments. % is replaced by xargs with directory name found by find.



                                Let's assume I have two directories in /home: dir1 and dir2. I run find /home -type d|xargs -P 5 -I % sh -c 'rsync -a --delete --bwlimit=50000 $(echo /home/%/ host:/home/%/)'. So rsync command will run as two processes (two processes because /home has two directories) with following arguments:



                                rsync -a --delete --bwlimit=50000 /home/dir1/ host:/home/dir1/
                                rsync -a --delete --bwlimit=50000 /home/dir1/ host:/home/dir1/





                                share|improve this answer










                                New contributor




                                Sebastjanas is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                Check out our Code of Conduct.









                                I always google for parallel rsync as I always forget the full command, but no solution worked for me as I wanted - either it includes multiple steps or needs to install parallel. I ended up using this one-liner to sync multiple folders:



                                find dir/ -type d|xargs -P 5 -I % sh -c 'rsync -a --delete --bwlimit=50000 $(echo dir/%/ host:/dir/%/)'


                                -P 5 is the amount of processes you want to spawn - use 0 for unlimited (obviously not recommended).



                                --bwlimit to avoid using all bandwidth.



                                -I % argument provided by find (directory found in dir/)



                                $(echo dir/%/ host:/dir/%/) - prints source and destination directories which are read by rsync as arguments. % is replaced by xargs with directory name found by find.



                                Let's assume I have two directories in /home: dir1 and dir2. I run find /home -type d|xargs -P 5 -I % sh -c 'rsync -a --delete --bwlimit=50000 $(echo /home/%/ host:/home/%/)'. So rsync command will run as two processes (two processes because /home has two directories) with following arguments:



                                rsync -a --delete --bwlimit=50000 /home/dir1/ host:/home/dir1/
                                rsync -a --delete --bwlimit=50000 /home/dir1/ host:/home/dir1/






                                share|improve this answer










                                New contributor




                                Sebastjanas is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                Check out our Code of Conduct.









                                share|improve this answer



                                share|improve this answer








                                edited 2 days ago





















                                New contributor




                                Sebastjanas is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                Check out our Code of Conduct.









                                answered Nov 22 at 15:43









                                Sebastjanas

                                11




                                11




                                New contributor




                                Sebastjanas is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                Check out our Code of Conduct.





                                New contributor





                                Sebastjanas is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                Check out our Code of Conduct.






                                Sebastjanas is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                                Check out our Code of Conduct.












                                • OK, can you explain $(echo dir/%/ host:/dir/%/) now?   Please do not respond in comments; edit your answer to make it clearer and more complete.
                                  – Scott
                                  Nov 22 at 16:16


















                                • OK, can you explain $(echo dir/%/ host:/dir/%/) now?   Please do not respond in comments; edit your answer to make it clearer and more complete.
                                  – Scott
                                  Nov 22 at 16:16
















                                OK, can you explain $(echo dir/%/ host:/dir/%/) now?   Please do not respond in comments; edit your answer to make it clearer and more complete.
                                – Scott
                                Nov 22 at 16:16




                                OK, can you explain $(echo dir/%/ host:/dir/%/) now?   Please do not respond in comments; edit your answer to make it clearer and more complete.
                                – Scott
                                Nov 22 at 16:16


















                                 

                                draft saved


                                draft discarded



















































                                 


                                draft saved


                                draft discarded














                                StackExchange.ready(
                                function () {
                                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f189878%2fparallelise-rsync-using-gnu-parallel%23new-answer', 'question_page');
                                }
                                );

                                Post as a guest















                                Required, but never shown





















































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown

































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown







                                Popular posts from this blog

                                Accessing regular linux commands in Huawei's Dopra Linux

                                Can't connect RFCOMM socket: Host is down

                                Kernel panic - not syncing: Fatal Exception in Interrupt