Downloading nested pdf files with wget

I am trying to download dozens of PDF files located on pages linked from here:

http://machineknittingetc.com/passap.html?limit=all

Each PDF is referred to by a URL ending with /downloadable/download/sample/sample_id/[some three digit number]/.

I have tried these:

wget -r -l2 -A.pdf http://machineknittingetc.com/passap.html?limit=all

wget -r -l2 -np http://machineknittingetc.com/passap.html?limit=all -A "*.pdf"

wget -r -l2 -np http://machineknittingetc.com/passap.html?limit=all -A "*.###"

It doesn't get the PDFs.

Does it have something to do with the server not being indexed to allow me to access the URLs like a file hierarchy? Is there a way to make it work?

edited Jan 2 '17 at 8:05

Kusalananda

121k16229372

asked Jan 2 '17 at 7:47

Kallaste

That is much better, thank you.
– Kallaste
Jun 9 '17 at 20:02

add a comment |

I am trying to download dozens of PDF files located on pages linked from here:

http://machineknittingetc.com/passap.html?limit=all

Each PDF is referred to by a URL ending with /downloadable/download/sample/sample_id/[some three digit number]/.

I have tried these:

wget -r -l2 -A.pdf http://machineknittingetc.com/passap.html?limit=all

wget -r -l2 -np http://machineknittingetc.com/passap.html?limit=all -A "*.pdf"

wget -r -l2 -np http://machineknittingetc.com/passap.html?limit=all -A "*.###"

It doesn't get the PDFs.

Does it have something to do with the server not being indexed to allow me to access the URLs like a file hierarchy? Is there a way to make it work?

edited Jan 2 '17 at 8:05

Kusalananda

121k16229372

asked Jan 2 '17 at 7:47

Kallaste

That is much better, thank you.
– Kallaste
Jun 9 '17 at 20:02

add a comment |

I am trying to download dozens of PDF files located on pages linked from here:

http://machineknittingetc.com/passap.html?limit=all

Each PDF is referred to by a URL ending with /downloadable/download/sample/sample_id/[some three digit number]/.

I have tried these:

wget -r -l2 -A.pdf http://machineknittingetc.com/passap.html?limit=all

wget -r -l2 -np http://machineknittingetc.com/passap.html?limit=all -A "*.pdf"

wget -r -l2 -np http://machineknittingetc.com/passap.html?limit=all -A "*.###"

It doesn't get the PDFs.

Does it have something to do with the server not being indexed to allow me to access the URLs like a file hierarchy? Is there a way to make it work?

edited Jan 2 '17 at 8:05

Kusalananda

121k16229372

asked Jan 2 '17 at 7:47

Kallaste

I am trying to download dozens of PDF files located on pages linked from here:

http://machineknittingetc.com/passap.html?limit=all

Each PDF is referred to by a URL ending with /downloadable/download/sample/sample_id/[some three digit number]/.

I have tried these:

wget -r -l2 -A.pdf http://machineknittingetc.com/passap.html?limit=all

wget -r -l2 -np http://machineknittingetc.com/passap.html?limit=all -A "*.pdf"

wget -r -l2 -np http://machineknittingetc.com/passap.html?limit=all -A "*.###"

It doesn't get the PDFs.

Does it have something to do with the server not being indexed to allow me to access the URLs like a file hierarchy? Is there a way to make it work?

linux wget

edited Jan 2 '17 at 8:05

Kusalananda

121k16229372

asked Jan 2 '17 at 7:47

Kallaste

edited Jan 2 '17 at 8:05

Kusalananda

121k16229372

asked Jan 2 '17 at 7:47

Kallaste

edited Jan 2 '17 at 8:05

Kusalananda

121k16229372

edited Jan 2 '17 at 8:05

Kusalananda

121k16229372

edited Jan 2 '17 at 8:05

Kusalananda

121k16229372

asked Jan 2 '17 at 7:47

Kallaste

asked Jan 2 '17 at 7:47

Kallaste

asked Jan 2 '17 at 7:47

Kallaste

That is much better, thank you.
– Kallaste
Jun 9 '17 at 20:02

add a comment |

That is much better, thank you.
– Kallaste
Jun 9 '17 at 20:02

That is much better, thank you.
– Kallaste
Jun 9 '17 at 20:02

add a comment |

3 Answers
3

active

oldest

votes

@rajaganesh87
you are guessing at the directory link numbers and are your code does not work for the actual links needed per the base link http://machineknittingetc.com/passap.html?limit=all
and the (.pdf) files correlating to it.

The problem is your being blocked by the

robots.txt file

and your using the dot (.) in

    -A .pdf

Try the code below that I tested and it works.

 wget -np -nd -r -l2 -A pdf -e robots=off http://machineknittingetc.com/passap.html?limit=all

Hope this helps.

answered May 26 '17 at 8:19

Jason Swartz

862

add a comment |

Does this work for you ?

#!/bin/bash

for i in {000..175}

do

     wget  http://machineknittingetc.com/downloadable/download/sample/sample_id/$i

done

answered Jan 2 '17 at 8:51

rajaganesh87

7382825

Yes, thanks! But it gets a lot more than the links on that page. Apparently the downloadable subpath has a lot of files. I will look for a range for the files I want (hopefully they are not randomly numbered) and see if I can alter it.
– Kallaste
Jan 2 '17 at 9:21

I really should have thought of that.
– Kallaste
Jan 2 '17 at 9:23

No, it seems the files I want are not numbered predictably. Some are consecutive but then they start to jump around. Short of passing it a list of each file path, this will not work. Is there no way to do it with the wget filters?
– Kallaste
Jan 2 '17 at 9:37

@Kallaste In that case get the html using wget and grep for the document numbers, download again from that list
– rajaganesh87
Jan 4 '17 at 11:26

add a comment |

@rajaganesh87 have you bash script for wget to download pdf books of school, college and high school in JSP page http://cnp.com.tn/CNP1/web/french/biblio/man-eleves.jsp
Thank you for help

answered 1 hour ago

Karim Bn Abdlaziz

New contributor

If you have a new question, please ask it by clicking the Ask Question button. Include a link to this question if it helps provide context. - From Review
– Jeff Schaller
59 mins ago

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f334243%2fdownloading-nested-pdf-files-with-wget%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

The problem is your being blocked by the

robots.txt file

and your using the dot (.) in

    -A .pdf

Try the code below that I tested and it works.

 wget -np -nd -r -l2 -A pdf -e robots=off http://machineknittingetc.com/passap.html?limit=all

Hope this helps.

answered May 26 '17 at 8:19

Jason Swartz

862

add a comment |

The problem is your being blocked by the

robots.txt file

and your using the dot (.) in

    -A .pdf

Try the code below that I tested and it works.

 wget -np -nd -r -l2 -A pdf -e robots=off http://machineknittingetc.com/passap.html?limit=all

Hope this helps.

answered May 26 '17 at 8:19

Jason Swartz

862

add a comment |

The problem is your being blocked by the

robots.txt file

and your using the dot (.) in

    -A .pdf

Try the code below that I tested and it works.

 wget -np -nd -r -l2 -A pdf -e robots=off http://machineknittingetc.com/passap.html?limit=all

Hope this helps.

answered May 26 '17 at 8:19

Jason Swartz

862

The problem is your being blocked by the

robots.txt file

and your using the dot (.) in

    -A .pdf

Try the code below that I tested and it works.

 wget -np -nd -r -l2 -A pdf -e robots=off http://machineknittingetc.com/passap.html?limit=all

Hope this helps.

answered May 26 '17 at 8:19

Jason Swartz

862

answered May 26 '17 at 8:19

Jason Swartz

862

answered May 26 '17 at 8:19

Jason Swartz

862

answered May 26 '17 at 8:19

Jason Swartz

862

add a comment |

Does this work for you ?

#!/bin/bash

for i in {000..175}

do

     wget  http://machineknittingetc.com/downloadable/download/sample/sample_id/$i

done

answered Jan 2 '17 at 8:51

rajaganesh87

7382825

Yes, thanks! But it gets a lot more than the links on that page. Apparently the downloadable subpath has a lot of files. I will look for a range for the files I want (hopefully they are not randomly numbered) and see if I can alter it.
– Kallaste
Jan 2 '17 at 9:21

I really should have thought of that.
– Kallaste
Jan 2 '17 at 9:23

No, it seems the files I want are not numbered predictably. Some are consecutive but then they start to jump around. Short of passing it a list of each file path, this will not work. Is there no way to do it with the wget filters?
– Kallaste
Jan 2 '17 at 9:37

@Kallaste In that case get the html using wget and grep for the document numbers, download again from that list
– rajaganesh87
Jan 4 '17 at 11:26

add a comment |

Does this work for you ?

#!/bin/bash

for i in {000..175}

do

     wget  http://machineknittingetc.com/downloadable/download/sample/sample_id/$i

done

answered Jan 2 '17 at 8:51

rajaganesh87

7382825

Yes, thanks! But it gets a lot more than the links on that page. Apparently the downloadable subpath has a lot of files. I will look for a range for the files I want (hopefully they are not randomly numbered) and see if I can alter it.
– Kallaste
Jan 2 '17 at 9:21

I really should have thought of that.
– Kallaste
Jan 2 '17 at 9:23

No, it seems the files I want are not numbered predictably. Some are consecutive but then they start to jump around. Short of passing it a list of each file path, this will not work. Is there no way to do it with the wget filters?
– Kallaste
Jan 2 '17 at 9:37

@Kallaste In that case get the html using wget and grep for the document numbers, download again from that list
– rajaganesh87
Jan 4 '17 at 11:26

add a comment |

Does this work for you ?

#!/bin/bash

for i in {000..175}

do

     wget  http://machineknittingetc.com/downloadable/download/sample/sample_id/$i

done

answered Jan 2 '17 at 8:51

rajaganesh87

7382825

Does this work for you ?

#!/bin/bash

for i in {000..175}

do

     wget  http://machineknittingetc.com/downloadable/download/sample/sample_id/$i

done

answered Jan 2 '17 at 8:51

rajaganesh87

7382825

answered Jan 2 '17 at 8:51

rajaganesh87

7382825

answered Jan 2 '17 at 8:51

rajaganesh87

7382825

answered Jan 2 '17 at 8:51

rajaganesh87

7382825

Yes, thanks! But it gets a lot more than the links on that page. Apparently the downloadable subpath has a lot of files. I will look for a range for the files I want (hopefully they are not randomly numbered) and see if I can alter it.
– Kallaste
Jan 2 '17 at 9:21

I really should have thought of that.
– Kallaste
Jan 2 '17 at 9:23

No, it seems the files I want are not numbered predictably. Some are consecutive but then they start to jump around. Short of passing it a list of each file path, this will not work. Is there no way to do it with the wget filters?
– Kallaste
Jan 2 '17 at 9:37

@Kallaste In that case get the html using wget and grep for the document numbers, download again from that list
– rajaganesh87
Jan 4 '17 at 11:26

add a comment |

Yes, thanks! But it gets a lot more than the links on that page. Apparently the downloadable subpath has a lot of files. I will look for a range for the files I want (hopefully they are not randomly numbered) and see if I can alter it.
– Kallaste
Jan 2 '17 at 9:21

I really should have thought of that.
– Kallaste
Jan 2 '17 at 9:23

No, it seems the files I want are not numbered predictably. Some are consecutive but then they start to jump around. Short of passing it a list of each file path, this will not work. Is there no way to do it with the wget filters?
– Kallaste
Jan 2 '17 at 9:37

@Kallaste In that case get the html using wget and grep for the document numbers, download again from that list
– rajaganesh87
Jan 4 '17 at 11:26

Yes, thanks! But it gets a lot more than the links on that page. Apparently the downloadable subpath has a lot of files. I will look for a range for the files I want (hopefully they are not randomly numbered) and see if I can alter it.
– Kallaste
Jan 2 '17 at 9:21

I really should have thought of that.
– Kallaste
Jan 2 '17 at 9:23

No, it seems the files I want are not numbered predictably. Some are consecutive but then they start to jump around. Short of passing it a list of each file path, this will not work. Is there no way to do it with the wget filters?
– Kallaste
Jan 2 '17 at 9:37

@Kallaste In that case get the html using wget and grep for the document numbers, download again from that list
– rajaganesh87
Jan 4 '17 at 11:26

add a comment |

@rajaganesh87 have you bash script for wget to download pdf books of school, college and high school in JSP page http://cnp.com.tn/CNP1/web/french/biblio/man-eleves.jsp
Thank you for help

answered 1 hour ago

Karim Bn Abdlaziz

New contributor

If you have a new question, please ask it by clicking the Ask Question button. Include a link to this question if it helps provide context. - From Review
– Jeff Schaller
59 mins ago

add a comment |

@rajaganesh87 have you bash script for wget to download pdf books of school, college and high school in JSP page http://cnp.com.tn/CNP1/web/french/biblio/man-eleves.jsp
Thank you for help

answered 1 hour ago

Karim Bn Abdlaziz

New contributor

If you have a new question, please ask it by clicking the Ask Question button. Include a link to this question if it helps provide context. - From Review
– Jeff Schaller
59 mins ago

add a comment |

@rajaganesh87 have you bash script for wget to download pdf books of school, college and high school in JSP page http://cnp.com.tn/CNP1/web/french/biblio/man-eleves.jsp
Thank you for help

answered 1 hour ago

Karim Bn Abdlaziz

New contributor

@rajaganesh87 have you bash script for wget to download pdf books of school, college and high school in JSP page http://cnp.com.tn/CNP1/web/french/biblio/man-eleves.jsp
Thank you for help

answered 1 hour ago

Karim Bn Abdlaziz

New contributor

answered 1 hour ago

Karim Bn Abdlaziz

New contributor

answered 1 hour ago

Karim Bn Abdlaziz

answered 1 hour ago

Karim Bn Abdlaziz

New contributor

Karim Bn Abdlaziz is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

If you have a new question, please ask it by clicking the Ask Question button. Include a link to this question if it helps provide context. - From Review
– Jeff Schaller
59 mins ago

add a comment |

If you have a new question, please ask it by clicking the Ask Question button. Include a link to this question if it helps provide context. - From Review
– Jeff Schaller
59 mins ago

If you have a new question, please ask it by clicking the Ask Question button. Include a link to this question if it helps provide context. - From Review
– Jeff Schaller
59 mins ago

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Unix & Linux Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Sstrhsrtj