Downloading nested pdf files with wget












1














I am trying to download dozens of PDF files located on pages linked from here:



http://machineknittingetc.com/passap.html?limit=all



Each PDF is referred to by a URL ending with /downloadable/download/sample/sample_id/[some three digit number]/.



I have tried these:



wget -r -l2 -A.pdf http://machineknittingetc.com/passap.html?limit=all
wget -r -l2 -np http://machineknittingetc.com/passap.html?limit=all -A "*.pdf"
wget -r -l2 -np http://machineknittingetc.com/passap.html?limit=all -A "*.###"


It doesn't get the PDFs.



Does it have something to do with the server not being indexed to allow me to access the URLs like a file hierarchy? Is there a way to make it work?










share|improve this question
























  • That is much better, thank you.
    – Kallaste
    Jun 9 '17 at 20:02
















1














I am trying to download dozens of PDF files located on pages linked from here:



http://machineknittingetc.com/passap.html?limit=all



Each PDF is referred to by a URL ending with /downloadable/download/sample/sample_id/[some three digit number]/.



I have tried these:



wget -r -l2 -A.pdf http://machineknittingetc.com/passap.html?limit=all
wget -r -l2 -np http://machineknittingetc.com/passap.html?limit=all -A "*.pdf"
wget -r -l2 -np http://machineknittingetc.com/passap.html?limit=all -A "*.###"


It doesn't get the PDFs.



Does it have something to do with the server not being indexed to allow me to access the URLs like a file hierarchy? Is there a way to make it work?










share|improve this question
























  • That is much better, thank you.
    – Kallaste
    Jun 9 '17 at 20:02














1












1








1







I am trying to download dozens of PDF files located on pages linked from here:



http://machineknittingetc.com/passap.html?limit=all



Each PDF is referred to by a URL ending with /downloadable/download/sample/sample_id/[some three digit number]/.



I have tried these:



wget -r -l2 -A.pdf http://machineknittingetc.com/passap.html?limit=all
wget -r -l2 -np http://machineknittingetc.com/passap.html?limit=all -A "*.pdf"
wget -r -l2 -np http://machineknittingetc.com/passap.html?limit=all -A "*.###"


It doesn't get the PDFs.



Does it have something to do with the server not being indexed to allow me to access the URLs like a file hierarchy? Is there a way to make it work?










share|improve this question















I am trying to download dozens of PDF files located on pages linked from here:



http://machineknittingetc.com/passap.html?limit=all



Each PDF is referred to by a URL ending with /downloadable/download/sample/sample_id/[some three digit number]/.



I have tried these:



wget -r -l2 -A.pdf http://machineknittingetc.com/passap.html?limit=all
wget -r -l2 -np http://machineknittingetc.com/passap.html?limit=all -A "*.pdf"
wget -r -l2 -np http://machineknittingetc.com/passap.html?limit=all -A "*.###"


It doesn't get the PDFs.



Does it have something to do with the server not being indexed to allow me to access the URLs like a file hierarchy? Is there a way to make it work?







linux wget






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 2 '17 at 8:05









Kusalananda

121k16229372




121k16229372










asked Jan 2 '17 at 7:47









Kallaste

83




83












  • That is much better, thank you.
    – Kallaste
    Jun 9 '17 at 20:02


















  • That is much better, thank you.
    – Kallaste
    Jun 9 '17 at 20:02
















That is much better, thank you.
– Kallaste
Jun 9 '17 at 20:02




That is much better, thank you.
– Kallaste
Jun 9 '17 at 20:02










3 Answers
3






active

oldest

votes


















1














@rajaganesh87
you are guessing at the directory link numbers and are your code does not work for the actual links needed per the base link http://machineknittingetc.com/passap.html?limit=all
and the (.pdf) files correlating to it.



The problem is your being blocked by the




robots.txt file




and your using the dot (.) in



    -A .pdf


Try the code below that I tested and it works.



 wget -np -nd -r -l2 -A pdf -e robots=off http://machineknittingetc.com/passap.html?limit=all 


Hope this helps.






share|improve this answer





























    2














    Does this work for you ?



    #!/bin/bash
    for i in {000..175}
    do
    wget http://machineknittingetc.com/downloadable/download/sample/sample_id/$i
    done





    share|improve this answer





















    • Yes, thanks! But it gets a lot more than the links on that page. Apparently the downloadable subpath has a lot of files. I will look for a range for the files I want (hopefully they are not randomly numbered) and see if I can alter it.
      – Kallaste
      Jan 2 '17 at 9:21










    • I really should have thought of that.
      – Kallaste
      Jan 2 '17 at 9:23










    • No, it seems the files I want are not numbered predictably. Some are consecutive but then they start to jump around. Short of passing it a list of each file path, this will not work. Is there no way to do it with the wget filters?
      – Kallaste
      Jan 2 '17 at 9:37










    • @Kallaste In that case get the html using wget and grep for the document numbers, download again from that list
      – rajaganesh87
      Jan 4 '17 at 11:26



















    0














    @rajaganesh87 have you bash script for wget to download pdf books of school, college and high school in JSP page http://cnp.com.tn/CNP1/web/french/biblio/man-eleves.jsp
    Thank you for help






    share|improve this answer








    New contributor




    Karim Bn Abdlaziz is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.


















    • If you have a new question, please ask it by clicking the Ask Question button. Include a link to this question if it helps provide context. - From Review
      – Jeff Schaller
      59 mins ago











    Your Answer








    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "106"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f334243%2fdownloading-nested-pdf-files-with-wget%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    3 Answers
    3






    active

    oldest

    votes








    3 Answers
    3






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1














    @rajaganesh87
    you are guessing at the directory link numbers and are your code does not work for the actual links needed per the base link http://machineknittingetc.com/passap.html?limit=all
    and the (.pdf) files correlating to it.



    The problem is your being blocked by the




    robots.txt file




    and your using the dot (.) in



        -A .pdf


    Try the code below that I tested and it works.



     wget -np -nd -r -l2 -A pdf -e robots=off http://machineknittingetc.com/passap.html?limit=all 


    Hope this helps.






    share|improve this answer


























      1














      @rajaganesh87
      you are guessing at the directory link numbers and are your code does not work for the actual links needed per the base link http://machineknittingetc.com/passap.html?limit=all
      and the (.pdf) files correlating to it.



      The problem is your being blocked by the




      robots.txt file




      and your using the dot (.) in



          -A .pdf


      Try the code below that I tested and it works.



       wget -np -nd -r -l2 -A pdf -e robots=off http://machineknittingetc.com/passap.html?limit=all 


      Hope this helps.






      share|improve this answer
























        1












        1








        1






        @rajaganesh87
        you are guessing at the directory link numbers and are your code does not work for the actual links needed per the base link http://machineknittingetc.com/passap.html?limit=all
        and the (.pdf) files correlating to it.



        The problem is your being blocked by the




        robots.txt file




        and your using the dot (.) in



            -A .pdf


        Try the code below that I tested and it works.



         wget -np -nd -r -l2 -A pdf -e robots=off http://machineknittingetc.com/passap.html?limit=all 


        Hope this helps.






        share|improve this answer












        @rajaganesh87
        you are guessing at the directory link numbers and are your code does not work for the actual links needed per the base link http://machineknittingetc.com/passap.html?limit=all
        and the (.pdf) files correlating to it.



        The problem is your being blocked by the




        robots.txt file




        and your using the dot (.) in



            -A .pdf


        Try the code below that I tested and it works.



         wget -np -nd -r -l2 -A pdf -e robots=off http://machineknittingetc.com/passap.html?limit=all 


        Hope this helps.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered May 26 '17 at 8:19









        Jason Swartz

        862




        862

























            2














            Does this work for you ?



            #!/bin/bash
            for i in {000..175}
            do
            wget http://machineknittingetc.com/downloadable/download/sample/sample_id/$i
            done





            share|improve this answer





















            • Yes, thanks! But it gets a lot more than the links on that page. Apparently the downloadable subpath has a lot of files. I will look for a range for the files I want (hopefully they are not randomly numbered) and see if I can alter it.
              – Kallaste
              Jan 2 '17 at 9:21










            • I really should have thought of that.
              – Kallaste
              Jan 2 '17 at 9:23










            • No, it seems the files I want are not numbered predictably. Some are consecutive but then they start to jump around. Short of passing it a list of each file path, this will not work. Is there no way to do it with the wget filters?
              – Kallaste
              Jan 2 '17 at 9:37










            • @Kallaste In that case get the html using wget and grep for the document numbers, download again from that list
              – rajaganesh87
              Jan 4 '17 at 11:26
















            2














            Does this work for you ?



            #!/bin/bash
            for i in {000..175}
            do
            wget http://machineknittingetc.com/downloadable/download/sample/sample_id/$i
            done





            share|improve this answer





















            • Yes, thanks! But it gets a lot more than the links on that page. Apparently the downloadable subpath has a lot of files. I will look for a range for the files I want (hopefully they are not randomly numbered) and see if I can alter it.
              – Kallaste
              Jan 2 '17 at 9:21










            • I really should have thought of that.
              – Kallaste
              Jan 2 '17 at 9:23










            • No, it seems the files I want are not numbered predictably. Some are consecutive but then they start to jump around. Short of passing it a list of each file path, this will not work. Is there no way to do it with the wget filters?
              – Kallaste
              Jan 2 '17 at 9:37










            • @Kallaste In that case get the html using wget and grep for the document numbers, download again from that list
              – rajaganesh87
              Jan 4 '17 at 11:26














            2












            2








            2






            Does this work for you ?



            #!/bin/bash
            for i in {000..175}
            do
            wget http://machineknittingetc.com/downloadable/download/sample/sample_id/$i
            done





            share|improve this answer












            Does this work for you ?



            #!/bin/bash
            for i in {000..175}
            do
            wget http://machineknittingetc.com/downloadable/download/sample/sample_id/$i
            done






            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Jan 2 '17 at 8:51









            rajaganesh87

            7382825




            7382825












            • Yes, thanks! But it gets a lot more than the links on that page. Apparently the downloadable subpath has a lot of files. I will look for a range for the files I want (hopefully they are not randomly numbered) and see if I can alter it.
              – Kallaste
              Jan 2 '17 at 9:21










            • I really should have thought of that.
              – Kallaste
              Jan 2 '17 at 9:23










            • No, it seems the files I want are not numbered predictably. Some are consecutive but then they start to jump around. Short of passing it a list of each file path, this will not work. Is there no way to do it with the wget filters?
              – Kallaste
              Jan 2 '17 at 9:37










            • @Kallaste In that case get the html using wget and grep for the document numbers, download again from that list
              – rajaganesh87
              Jan 4 '17 at 11:26


















            • Yes, thanks! But it gets a lot more than the links on that page. Apparently the downloadable subpath has a lot of files. I will look for a range for the files I want (hopefully they are not randomly numbered) and see if I can alter it.
              – Kallaste
              Jan 2 '17 at 9:21










            • I really should have thought of that.
              – Kallaste
              Jan 2 '17 at 9:23










            • No, it seems the files I want are not numbered predictably. Some are consecutive but then they start to jump around. Short of passing it a list of each file path, this will not work. Is there no way to do it with the wget filters?
              – Kallaste
              Jan 2 '17 at 9:37










            • @Kallaste In that case get the html using wget and grep for the document numbers, download again from that list
              – rajaganesh87
              Jan 4 '17 at 11:26
















            Yes, thanks! But it gets a lot more than the links on that page. Apparently the downloadable subpath has a lot of files. I will look for a range for the files I want (hopefully they are not randomly numbered) and see if I can alter it.
            – Kallaste
            Jan 2 '17 at 9:21




            Yes, thanks! But it gets a lot more than the links on that page. Apparently the downloadable subpath has a lot of files. I will look for a range for the files I want (hopefully they are not randomly numbered) and see if I can alter it.
            – Kallaste
            Jan 2 '17 at 9:21












            I really should have thought of that.
            – Kallaste
            Jan 2 '17 at 9:23




            I really should have thought of that.
            – Kallaste
            Jan 2 '17 at 9:23












            No, it seems the files I want are not numbered predictably. Some are consecutive but then they start to jump around. Short of passing it a list of each file path, this will not work. Is there no way to do it with the wget filters?
            – Kallaste
            Jan 2 '17 at 9:37




            No, it seems the files I want are not numbered predictably. Some are consecutive but then they start to jump around. Short of passing it a list of each file path, this will not work. Is there no way to do it with the wget filters?
            – Kallaste
            Jan 2 '17 at 9:37












            @Kallaste In that case get the html using wget and grep for the document numbers, download again from that list
            – rajaganesh87
            Jan 4 '17 at 11:26




            @Kallaste In that case get the html using wget and grep for the document numbers, download again from that list
            – rajaganesh87
            Jan 4 '17 at 11:26











            0














            @rajaganesh87 have you bash script for wget to download pdf books of school, college and high school in JSP page http://cnp.com.tn/CNP1/web/french/biblio/man-eleves.jsp
            Thank you for help






            share|improve this answer








            New contributor




            Karim Bn Abdlaziz is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
            Check out our Code of Conduct.


















            • If you have a new question, please ask it by clicking the Ask Question button. Include a link to this question if it helps provide context. - From Review
              – Jeff Schaller
              59 mins ago
















            0














            @rajaganesh87 have you bash script for wget to download pdf books of school, college and high school in JSP page http://cnp.com.tn/CNP1/web/french/biblio/man-eleves.jsp
            Thank you for help






            share|improve this answer








            New contributor




            Karim Bn Abdlaziz is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
            Check out our Code of Conduct.


















            • If you have a new question, please ask it by clicking the Ask Question button. Include a link to this question if it helps provide context. - From Review
              – Jeff Schaller
              59 mins ago














            0












            0








            0






            @rajaganesh87 have you bash script for wget to download pdf books of school, college and high school in JSP page http://cnp.com.tn/CNP1/web/french/biblio/man-eleves.jsp
            Thank you for help






            share|improve this answer








            New contributor




            Karim Bn Abdlaziz is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
            Check out our Code of Conduct.









            @rajaganesh87 have you bash script for wget to download pdf books of school, college and high school in JSP page http://cnp.com.tn/CNP1/web/french/biblio/man-eleves.jsp
            Thank you for help







            share|improve this answer








            New contributor




            Karim Bn Abdlaziz is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
            Check out our Code of Conduct.









            share|improve this answer



            share|improve this answer






            New contributor




            Karim Bn Abdlaziz is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
            Check out our Code of Conduct.









            answered 1 hour ago









            Karim Bn Abdlaziz

            11




            11




            New contributor




            Karim Bn Abdlaziz is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
            Check out our Code of Conduct.





            New contributor





            Karim Bn Abdlaziz is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
            Check out our Code of Conduct.






            Karim Bn Abdlaziz is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
            Check out our Code of Conduct.












            • If you have a new question, please ask it by clicking the Ask Question button. Include a link to this question if it helps provide context. - From Review
              – Jeff Schaller
              59 mins ago


















            • If you have a new question, please ask it by clicking the Ask Question button. Include a link to this question if it helps provide context. - From Review
              – Jeff Schaller
              59 mins ago
















            If you have a new question, please ask it by clicking the Ask Question button. Include a link to this question if it helps provide context. - From Review
            – Jeff Schaller
            59 mins ago




            If you have a new question, please ask it by clicking the Ask Question button. Include a link to this question if it helps provide context. - From Review
            – Jeff Schaller
            59 mins ago


















            draft saved

            draft discarded




















































            Thanks for contributing an answer to Unix & Linux Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f334243%2fdownloading-nested-pdf-files-with-wget%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Entries order in /etc/network/interfaces

            新発田市

            Grub takes very long (several minutes) to open Menu (in Multi-Boot-System)