Browser-like Reader Mode with text-only output












0















Background: Reader Mode, as seen in Safari and other browsers, extracts the main content of article based web pages using sophisticated heuristics, and displays this with a very readable font.



All navigation, headers, footers, and other fluff is removed. The mode only works with "articles", ie. pages where there is a "main content" like a news article, scientific paper, etc.



The question: Is there an open source implementation of this for Terminals (ie. text-only)? Or alternatively, another way to accomplish the same thing?



Example: This article from The New York Times should output like so:



$ utility --reader-mode https://www.nytimes.com/2019/01/30/reader-center/polar-vortex-tips.html

SEND US YOUR IDEAS FOR WHAT TO DO DURING THE POLAR VORTEX. WE
WANT TO HEAR FROM YOU.

It’s so cold in much of the Midwest today that you could get
frostbite within five minutes once you step outside. If you’re
living through it indoors, give us your tips.

A commuter during an extremely light morning rush hour in Chicago
on Wednesday. Businesses and schools have closed as the city
copes with record low temperatures.

Across the Midwest, where wind chills were minus 51 in
Minneapolis and minus 45 in Chicago, the risks of going outside
on Wednesday were dire. So, many people simply didn’t bother,
while others took a chance to briefly experience the coldest
weather in a generation.

Whether you’re an adventurer or a hibernator, tell us your
recommendations for staying warm and busy. What are you cooking
or binge-watching? What board games are you playing? If you’re
venturing outside, what are you doing to stay safe? (Experts warn
that even a short time in the extreme cold can be very
dangerous.) How many layers of clothing are you wearing, and
which special hats and gloves are necessary? Send us your photos
and your stories.









share|improve this question




















  • 1





    determining the "main content" seems to me to be a tricky problem to solve

    – Jeff Schaller
    15 hours ago











  • Yes. This is "solved" in best effort with various implementations of "Reader Mode" using heuristics. So it would have to be a text-only port of that, or something similar. google.com/search?q=reader+mode+source+code

    – forthrin
    15 hours ago
















0















Background: Reader Mode, as seen in Safari and other browsers, extracts the main content of article based web pages using sophisticated heuristics, and displays this with a very readable font.



All navigation, headers, footers, and other fluff is removed. The mode only works with "articles", ie. pages where there is a "main content" like a news article, scientific paper, etc.



The question: Is there an open source implementation of this for Terminals (ie. text-only)? Or alternatively, another way to accomplish the same thing?



Example: This article from The New York Times should output like so:



$ utility --reader-mode https://www.nytimes.com/2019/01/30/reader-center/polar-vortex-tips.html

SEND US YOUR IDEAS FOR WHAT TO DO DURING THE POLAR VORTEX. WE
WANT TO HEAR FROM YOU.

It’s so cold in much of the Midwest today that you could get
frostbite within five minutes once you step outside. If you’re
living through it indoors, give us your tips.

A commuter during an extremely light morning rush hour in Chicago
on Wednesday. Businesses and schools have closed as the city
copes with record low temperatures.

Across the Midwest, where wind chills were minus 51 in
Minneapolis and minus 45 in Chicago, the risks of going outside
on Wednesday were dire. So, many people simply didn’t bother,
while others took a chance to briefly experience the coldest
weather in a generation.

Whether you’re an adventurer or a hibernator, tell us your
recommendations for staying warm and busy. What are you cooking
or binge-watching? What board games are you playing? If you’re
venturing outside, what are you doing to stay safe? (Experts warn
that even a short time in the extreme cold can be very
dangerous.) How many layers of clothing are you wearing, and
which special hats and gloves are necessary? Send us your photos
and your stories.









share|improve this question




















  • 1





    determining the "main content" seems to me to be a tricky problem to solve

    – Jeff Schaller
    15 hours ago











  • Yes. This is "solved" in best effort with various implementations of "Reader Mode" using heuristics. So it would have to be a text-only port of that, or something similar. google.com/search?q=reader+mode+source+code

    – forthrin
    15 hours ago














0












0








0


1






Background: Reader Mode, as seen in Safari and other browsers, extracts the main content of article based web pages using sophisticated heuristics, and displays this with a very readable font.



All navigation, headers, footers, and other fluff is removed. The mode only works with "articles", ie. pages where there is a "main content" like a news article, scientific paper, etc.



The question: Is there an open source implementation of this for Terminals (ie. text-only)? Or alternatively, another way to accomplish the same thing?



Example: This article from The New York Times should output like so:



$ utility --reader-mode https://www.nytimes.com/2019/01/30/reader-center/polar-vortex-tips.html

SEND US YOUR IDEAS FOR WHAT TO DO DURING THE POLAR VORTEX. WE
WANT TO HEAR FROM YOU.

It’s so cold in much of the Midwest today that you could get
frostbite within five minutes once you step outside. If you’re
living through it indoors, give us your tips.

A commuter during an extremely light morning rush hour in Chicago
on Wednesday. Businesses and schools have closed as the city
copes with record low temperatures.

Across the Midwest, where wind chills were minus 51 in
Minneapolis and minus 45 in Chicago, the risks of going outside
on Wednesday were dire. So, many people simply didn’t bother,
while others took a chance to briefly experience the coldest
weather in a generation.

Whether you’re an adventurer or a hibernator, tell us your
recommendations for staying warm and busy. What are you cooking
or binge-watching? What board games are you playing? If you’re
venturing outside, what are you doing to stay safe? (Experts warn
that even a short time in the extreme cold can be very
dangerous.) How many layers of clothing are you wearing, and
which special hats and gloves are necessary? Send us your photos
and your stories.









share|improve this question
















Background: Reader Mode, as seen in Safari and other browsers, extracts the main content of article based web pages using sophisticated heuristics, and displays this with a very readable font.



All navigation, headers, footers, and other fluff is removed. The mode only works with "articles", ie. pages where there is a "main content" like a news article, scientific paper, etc.



The question: Is there an open source implementation of this for Terminals (ie. text-only)? Or alternatively, another way to accomplish the same thing?



Example: This article from The New York Times should output like so:



$ utility --reader-mode https://www.nytimes.com/2019/01/30/reader-center/polar-vortex-tips.html

SEND US YOUR IDEAS FOR WHAT TO DO DURING THE POLAR VORTEX. WE
WANT TO HEAR FROM YOU.

It’s so cold in much of the Midwest today that you could get
frostbite within five minutes once you step outside. If you’re
living through it indoors, give us your tips.

A commuter during an extremely light morning rush hour in Chicago
on Wednesday. Businesses and schools have closed as the city
copes with record low temperatures.

Across the Midwest, where wind chills were minus 51 in
Minneapolis and minus 45 in Chicago, the risks of going outside
on Wednesday were dire. So, many people simply didn’t bother,
while others took a chance to briefly experience the coldest
weather in a generation.

Whether you’re an adventurer or a hibernator, tell us your
recommendations for staying warm and busy. What are you cooking
or binge-watching? What board games are you playing? If you’re
venturing outside, what are you doing to stay safe? (Experts warn
that even a short time in the extreme cold can be very
dangerous.) How many layers of clothing are you wearing, and
which special hats and gloves are necessary? Send us your photos
and your stories.






terminal browser






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 12 mins ago







forthrin

















asked 20 hours ago









forthrinforthrin

8901121




8901121








  • 1





    determining the "main content" seems to me to be a tricky problem to solve

    – Jeff Schaller
    15 hours ago











  • Yes. This is "solved" in best effort with various implementations of "Reader Mode" using heuristics. So it would have to be a text-only port of that, or something similar. google.com/search?q=reader+mode+source+code

    – forthrin
    15 hours ago














  • 1





    determining the "main content" seems to me to be a tricky problem to solve

    – Jeff Schaller
    15 hours ago











  • Yes. This is "solved" in best effort with various implementations of "Reader Mode" using heuristics. So it would have to be a text-only port of that, or something similar. google.com/search?q=reader+mode+source+code

    – forthrin
    15 hours ago








1




1





determining the "main content" seems to me to be a tricky problem to solve

– Jeff Schaller
15 hours ago





determining the "main content" seems to me to be a tricky problem to solve

– Jeff Schaller
15 hours ago













Yes. This is "solved" in best effort with various implementations of "Reader Mode" using heuristics. So it would have to be a text-only port of that, or something similar. google.com/search?q=reader+mode+source+code

– forthrin
15 hours ago





Yes. This is "solved" in best effort with various implementations of "Reader Mode" using heuristics. So it would have to be a text-only port of that, or something similar. google.com/search?q=reader+mode+source+code

– forthrin
15 hours ago










2 Answers
2






active

oldest

votes


















0














The comment about "navigation content" is addressed by the -nolist option, e.g.,



lynx -nolist -dump www.google.com > file.txt


which shows no links, etc:



$ lynx -nolist -dump www.google.com > file.txt
$ cat file.txt

Search Images Maps Play YouTube News Gmail Drive More »
Web History | Settings | Sign in

Google

_______________________________________________________
Google Search I'm Feeling Lucky Advanced search
Language tools

Advertising Programs Business Solutions +Google About
Google

© 2019 - Privacy - Terms


w3m gives something similar, without the option:



$ w3m -dump https://www.google.com
Search Images Maps Play YouTube News Gmail Drive More >>
Web History | Settings | Sign in

Google

[ ] Advanced
searchLanguage
[Google Search][I'm Feeling Lucky] tools

Advertising ProgramsBusiness Solutions+GoogleAbout Google

(C) 2019 - Privacy - Terms


links2 output looks much like w3m's (noting the missing space before About):



$ links2 -dump www.google.com                                          
Search Images Maps Play YouTube News Gmail Drive More >>========(97,1) 31% ==
Web History | Settings | Sign in
Google

__________________________________________________________ Advanced
[ Google Search ] [ I'm Feeling Lucky ] searchLanguage
tools

Advertising ProgramsBusiness Solutions+GoogleAbout Google

(c) 2019 - Privacy - Terms

$ links2 -dump www.google.com >file.txt
$ cat file.txt
Search Images Maps Play YouTube News Gmail Drive More >>
Web History | Settings | Sign in
Google

__________________________________________________________ Advanced
[ Google Search ] [ I'm Feeling Lucky ] searchLanguage
tools

Advertising ProgramsBusiness Solutions+GoogleAbout Google

(c) 2019 - Privacy - Terms


(oddly enough, it also prints progress if the dump goes directly to the terminal—not a good feature)
and elinks apparently only dumps the format with "navigation content" (ymmv).



From further comments, it turns out that OP is interested in something which could render the contents of a given division on the page. Comparing the sizes of the source and dump for that page gives some clues:




Size Buffer name Contents
------- -------------------- ----------------------------------------------------------------------------------------
0# 267624 [!lynx -source ht-1] !lynx -source https://www.nytimes.com/2019/01/30/reader-center/polar-vortex-tips.html
1 5475 [!lynx -dump -nolis] !lynx -dump -nolist https://www.nytimes.com/2019/01/30/reader-center/polar-vortex-tips.html


shows that the dump is about 2% of the size of the source. Most of the page is non-informational, and the text-browsers show the information. But the division requested is in a two-line chunk that looks like this (only the beginning: the first line actually has 62265 characters):



<div id="app"><div class="css-v89234 e3w10z60"><div><div><div class="css-13lpfd6 e1nre7570"><header class="css-1bymuyk e1>
<script>window.__preloadedData = {"initialState":{"Article:QXJ0aWNsZTpueXQ6Ly9hcnRpY2xlLzBhODc0MTcxLWM0MjEtNWRjOS1hN2IzLW>


The first line holds the article text (plus a lot of markup), and offhand, looking at the second line, that's probably the script which the GUI browsers detect to show the article. None of the above-mentioned text-browsers has a feature for just showing a given <div>...</div>, or interpreting a script in that manner. These articles mention the absence of standard URI for reader mode in several GUI browsers:




  • Web Reading Mode: The non-standard rendering mode

  • Web Reading Mode: A bad reading experience






share|improve this answer


























  • Thanks for sharing! I've updated the question a bit to point out that the target for such use is article pages, where the objective is to extract the article itself. Have a look at the posting again and see if you can help.

    – forthrin
    9 hours ago



















-1














Does this satisfy your requirement? (From https://stackoverflow.com/questions/12422289/bash-command-to-convert-html-page-to-a-text-file )



lynx --dump www.google.com > file.txt





share|improve this answer








New contributor




VBB is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
















  • 1





    Nope. This dumps a ton of navigation links, eg. lynx -dump https://www.nytimes.com/2019/01/30/reader-center/polar-vortex-tips.html A solution should strip away all navigation and fluff, and leave ONLY the MAIN CONTENT, like Reader Mode in a browser does.

    – forthrin
    15 hours ago













Your Answer








StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f497911%2fbrowser-like-reader-mode-with-text-only-output%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









0














The comment about "navigation content" is addressed by the -nolist option, e.g.,



lynx -nolist -dump www.google.com > file.txt


which shows no links, etc:



$ lynx -nolist -dump www.google.com > file.txt
$ cat file.txt

Search Images Maps Play YouTube News Gmail Drive More »
Web History | Settings | Sign in

Google

_______________________________________________________
Google Search I'm Feeling Lucky Advanced search
Language tools

Advertising Programs Business Solutions +Google About
Google

© 2019 - Privacy - Terms


w3m gives something similar, without the option:



$ w3m -dump https://www.google.com
Search Images Maps Play YouTube News Gmail Drive More >>
Web History | Settings | Sign in

Google

[ ] Advanced
searchLanguage
[Google Search][I'm Feeling Lucky] tools

Advertising ProgramsBusiness Solutions+GoogleAbout Google

(C) 2019 - Privacy - Terms


links2 output looks much like w3m's (noting the missing space before About):



$ links2 -dump www.google.com                                          
Search Images Maps Play YouTube News Gmail Drive More >>========(97,1) 31% ==
Web History | Settings | Sign in
Google

__________________________________________________________ Advanced
[ Google Search ] [ I'm Feeling Lucky ] searchLanguage
tools

Advertising ProgramsBusiness Solutions+GoogleAbout Google

(c) 2019 - Privacy - Terms

$ links2 -dump www.google.com >file.txt
$ cat file.txt
Search Images Maps Play YouTube News Gmail Drive More >>
Web History | Settings | Sign in
Google

__________________________________________________________ Advanced
[ Google Search ] [ I'm Feeling Lucky ] searchLanguage
tools

Advertising ProgramsBusiness Solutions+GoogleAbout Google

(c) 2019 - Privacy - Terms


(oddly enough, it also prints progress if the dump goes directly to the terminal—not a good feature)
and elinks apparently only dumps the format with "navigation content" (ymmv).



From further comments, it turns out that OP is interested in something which could render the contents of a given division on the page. Comparing the sizes of the source and dump for that page gives some clues:




Size Buffer name Contents
------- -------------------- ----------------------------------------------------------------------------------------
0# 267624 [!lynx -source ht-1] !lynx -source https://www.nytimes.com/2019/01/30/reader-center/polar-vortex-tips.html
1 5475 [!lynx -dump -nolis] !lynx -dump -nolist https://www.nytimes.com/2019/01/30/reader-center/polar-vortex-tips.html


shows that the dump is about 2% of the size of the source. Most of the page is non-informational, and the text-browsers show the information. But the division requested is in a two-line chunk that looks like this (only the beginning: the first line actually has 62265 characters):



<div id="app"><div class="css-v89234 e3w10z60"><div><div><div class="css-13lpfd6 e1nre7570"><header class="css-1bymuyk e1>
<script>window.__preloadedData = {"initialState":{"Article:QXJ0aWNsZTpueXQ6Ly9hcnRpY2xlLzBhODc0MTcxLWM0MjEtNWRjOS1hN2IzLW>


The first line holds the article text (plus a lot of markup), and offhand, looking at the second line, that's probably the script which the GUI browsers detect to show the article. None of the above-mentioned text-browsers has a feature for just showing a given <div>...</div>, or interpreting a script in that manner. These articles mention the absence of standard URI for reader mode in several GUI browsers:




  • Web Reading Mode: The non-standard rendering mode

  • Web Reading Mode: A bad reading experience






share|improve this answer


























  • Thanks for sharing! I've updated the question a bit to point out that the target for such use is article pages, where the objective is to extract the article itself. Have a look at the posting again and see if you can help.

    – forthrin
    9 hours ago
















0














The comment about "navigation content" is addressed by the -nolist option, e.g.,



lynx -nolist -dump www.google.com > file.txt


which shows no links, etc:



$ lynx -nolist -dump www.google.com > file.txt
$ cat file.txt

Search Images Maps Play YouTube News Gmail Drive More »
Web History | Settings | Sign in

Google

_______________________________________________________
Google Search I'm Feeling Lucky Advanced search
Language tools

Advertising Programs Business Solutions +Google About
Google

© 2019 - Privacy - Terms


w3m gives something similar, without the option:



$ w3m -dump https://www.google.com
Search Images Maps Play YouTube News Gmail Drive More >>
Web History | Settings | Sign in

Google

[ ] Advanced
searchLanguage
[Google Search][I'm Feeling Lucky] tools

Advertising ProgramsBusiness Solutions+GoogleAbout Google

(C) 2019 - Privacy - Terms


links2 output looks much like w3m's (noting the missing space before About):



$ links2 -dump www.google.com                                          
Search Images Maps Play YouTube News Gmail Drive More >>========(97,1) 31% ==
Web History | Settings | Sign in
Google

__________________________________________________________ Advanced
[ Google Search ] [ I'm Feeling Lucky ] searchLanguage
tools

Advertising ProgramsBusiness Solutions+GoogleAbout Google

(c) 2019 - Privacy - Terms

$ links2 -dump www.google.com >file.txt
$ cat file.txt
Search Images Maps Play YouTube News Gmail Drive More >>
Web History | Settings | Sign in
Google

__________________________________________________________ Advanced
[ Google Search ] [ I'm Feeling Lucky ] searchLanguage
tools

Advertising ProgramsBusiness Solutions+GoogleAbout Google

(c) 2019 - Privacy - Terms


(oddly enough, it also prints progress if the dump goes directly to the terminal—not a good feature)
and elinks apparently only dumps the format with "navigation content" (ymmv).



From further comments, it turns out that OP is interested in something which could render the contents of a given division on the page. Comparing the sizes of the source and dump for that page gives some clues:




Size Buffer name Contents
------- -------------------- ----------------------------------------------------------------------------------------
0# 267624 [!lynx -source ht-1] !lynx -source https://www.nytimes.com/2019/01/30/reader-center/polar-vortex-tips.html
1 5475 [!lynx -dump -nolis] !lynx -dump -nolist https://www.nytimes.com/2019/01/30/reader-center/polar-vortex-tips.html


shows that the dump is about 2% of the size of the source. Most of the page is non-informational, and the text-browsers show the information. But the division requested is in a two-line chunk that looks like this (only the beginning: the first line actually has 62265 characters):



<div id="app"><div class="css-v89234 e3w10z60"><div><div><div class="css-13lpfd6 e1nre7570"><header class="css-1bymuyk e1>
<script>window.__preloadedData = {"initialState":{"Article:QXJ0aWNsZTpueXQ6Ly9hcnRpY2xlLzBhODc0MTcxLWM0MjEtNWRjOS1hN2IzLW>


The first line holds the article text (plus a lot of markup), and offhand, looking at the second line, that's probably the script which the GUI browsers detect to show the article. None of the above-mentioned text-browsers has a feature for just showing a given <div>...</div>, or interpreting a script in that manner. These articles mention the absence of standard URI for reader mode in several GUI browsers:




  • Web Reading Mode: The non-standard rendering mode

  • Web Reading Mode: A bad reading experience






share|improve this answer


























  • Thanks for sharing! I've updated the question a bit to point out that the target for such use is article pages, where the objective is to extract the article itself. Have a look at the posting again and see if you can help.

    – forthrin
    9 hours ago














0












0








0







The comment about "navigation content" is addressed by the -nolist option, e.g.,



lynx -nolist -dump www.google.com > file.txt


which shows no links, etc:



$ lynx -nolist -dump www.google.com > file.txt
$ cat file.txt

Search Images Maps Play YouTube News Gmail Drive More »
Web History | Settings | Sign in

Google

_______________________________________________________
Google Search I'm Feeling Lucky Advanced search
Language tools

Advertising Programs Business Solutions +Google About
Google

© 2019 - Privacy - Terms


w3m gives something similar, without the option:



$ w3m -dump https://www.google.com
Search Images Maps Play YouTube News Gmail Drive More >>
Web History | Settings | Sign in

Google

[ ] Advanced
searchLanguage
[Google Search][I'm Feeling Lucky] tools

Advertising ProgramsBusiness Solutions+GoogleAbout Google

(C) 2019 - Privacy - Terms


links2 output looks much like w3m's (noting the missing space before About):



$ links2 -dump www.google.com                                          
Search Images Maps Play YouTube News Gmail Drive More >>========(97,1) 31% ==
Web History | Settings | Sign in
Google

__________________________________________________________ Advanced
[ Google Search ] [ I'm Feeling Lucky ] searchLanguage
tools

Advertising ProgramsBusiness Solutions+GoogleAbout Google

(c) 2019 - Privacy - Terms

$ links2 -dump www.google.com >file.txt
$ cat file.txt
Search Images Maps Play YouTube News Gmail Drive More >>
Web History | Settings | Sign in
Google

__________________________________________________________ Advanced
[ Google Search ] [ I'm Feeling Lucky ] searchLanguage
tools

Advertising ProgramsBusiness Solutions+GoogleAbout Google

(c) 2019 - Privacy - Terms


(oddly enough, it also prints progress if the dump goes directly to the terminal—not a good feature)
and elinks apparently only dumps the format with "navigation content" (ymmv).



From further comments, it turns out that OP is interested in something which could render the contents of a given division on the page. Comparing the sizes of the source and dump for that page gives some clues:




Size Buffer name Contents
------- -------------------- ----------------------------------------------------------------------------------------
0# 267624 [!lynx -source ht-1] !lynx -source https://www.nytimes.com/2019/01/30/reader-center/polar-vortex-tips.html
1 5475 [!lynx -dump -nolis] !lynx -dump -nolist https://www.nytimes.com/2019/01/30/reader-center/polar-vortex-tips.html


shows that the dump is about 2% of the size of the source. Most of the page is non-informational, and the text-browsers show the information. But the division requested is in a two-line chunk that looks like this (only the beginning: the first line actually has 62265 characters):



<div id="app"><div class="css-v89234 e3w10z60"><div><div><div class="css-13lpfd6 e1nre7570"><header class="css-1bymuyk e1>
<script>window.__preloadedData = {"initialState":{"Article:QXJ0aWNsZTpueXQ6Ly9hcnRpY2xlLzBhODc0MTcxLWM0MjEtNWRjOS1hN2IzLW>


The first line holds the article text (plus a lot of markup), and offhand, looking at the second line, that's probably the script which the GUI browsers detect to show the article. None of the above-mentioned text-browsers has a feature for just showing a given <div>...</div>, or interpreting a script in that manner. These articles mention the absence of standard URI for reader mode in several GUI browsers:




  • Web Reading Mode: The non-standard rendering mode

  • Web Reading Mode: A bad reading experience






share|improve this answer















The comment about "navigation content" is addressed by the -nolist option, e.g.,



lynx -nolist -dump www.google.com > file.txt


which shows no links, etc:



$ lynx -nolist -dump www.google.com > file.txt
$ cat file.txt

Search Images Maps Play YouTube News Gmail Drive More »
Web History | Settings | Sign in

Google

_______________________________________________________
Google Search I'm Feeling Lucky Advanced search
Language tools

Advertising Programs Business Solutions +Google About
Google

© 2019 - Privacy - Terms


w3m gives something similar, without the option:



$ w3m -dump https://www.google.com
Search Images Maps Play YouTube News Gmail Drive More >>
Web History | Settings | Sign in

Google

[ ] Advanced
searchLanguage
[Google Search][I'm Feeling Lucky] tools

Advertising ProgramsBusiness Solutions+GoogleAbout Google

(C) 2019 - Privacy - Terms


links2 output looks much like w3m's (noting the missing space before About):



$ links2 -dump www.google.com                                          
Search Images Maps Play YouTube News Gmail Drive More >>========(97,1) 31% ==
Web History | Settings | Sign in
Google

__________________________________________________________ Advanced
[ Google Search ] [ I'm Feeling Lucky ] searchLanguage
tools

Advertising ProgramsBusiness Solutions+GoogleAbout Google

(c) 2019 - Privacy - Terms

$ links2 -dump www.google.com >file.txt
$ cat file.txt
Search Images Maps Play YouTube News Gmail Drive More >>
Web History | Settings | Sign in
Google

__________________________________________________________ Advanced
[ Google Search ] [ I'm Feeling Lucky ] searchLanguage
tools

Advertising ProgramsBusiness Solutions+GoogleAbout Google

(c) 2019 - Privacy - Terms


(oddly enough, it also prints progress if the dump goes directly to the terminal—not a good feature)
and elinks apparently only dumps the format with "navigation content" (ymmv).



From further comments, it turns out that OP is interested in something which could render the contents of a given division on the page. Comparing the sizes of the source and dump for that page gives some clues:




Size Buffer name Contents
------- -------------------- ----------------------------------------------------------------------------------------
0# 267624 [!lynx -source ht-1] !lynx -source https://www.nytimes.com/2019/01/30/reader-center/polar-vortex-tips.html
1 5475 [!lynx -dump -nolis] !lynx -dump -nolist https://www.nytimes.com/2019/01/30/reader-center/polar-vortex-tips.html


shows that the dump is about 2% of the size of the source. Most of the page is non-informational, and the text-browsers show the information. But the division requested is in a two-line chunk that looks like this (only the beginning: the first line actually has 62265 characters):



<div id="app"><div class="css-v89234 e3w10z60"><div><div><div class="css-13lpfd6 e1nre7570"><header class="css-1bymuyk e1>
<script>window.__preloadedData = {"initialState":{"Article:QXJ0aWNsZTpueXQ6Ly9hcnRpY2xlLzBhODc0MTcxLWM0MjEtNWRjOS1hN2IzLW>


The first line holds the article text (plus a lot of markup), and offhand, looking at the second line, that's probably the script which the GUI browsers detect to show the article. None of the above-mentioned text-browsers has a feature for just showing a given <div>...</div>, or interpreting a script in that manner. These articles mention the absence of standard URI for reader mode in several GUI browsers:




  • Web Reading Mode: The non-standard rendering mode

  • Web Reading Mode: A bad reading experience







share|improve this answer














share|improve this answer



share|improve this answer








edited 8 hours ago

























answered 10 hours ago









Thomas DickeyThomas Dickey

52.7k596170




52.7k596170













  • Thanks for sharing! I've updated the question a bit to point out that the target for such use is article pages, where the objective is to extract the article itself. Have a look at the posting again and see if you can help.

    – forthrin
    9 hours ago



















  • Thanks for sharing! I've updated the question a bit to point out that the target for such use is article pages, where the objective is to extract the article itself. Have a look at the posting again and see if you can help.

    – forthrin
    9 hours ago

















Thanks for sharing! I've updated the question a bit to point out that the target for such use is article pages, where the objective is to extract the article itself. Have a look at the posting again and see if you can help.

– forthrin
9 hours ago





Thanks for sharing! I've updated the question a bit to point out that the target for such use is article pages, where the objective is to extract the article itself. Have a look at the posting again and see if you can help.

– forthrin
9 hours ago













-1














Does this satisfy your requirement? (From https://stackoverflow.com/questions/12422289/bash-command-to-convert-html-page-to-a-text-file )



lynx --dump www.google.com > file.txt





share|improve this answer








New contributor




VBB is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
















  • 1





    Nope. This dumps a ton of navigation links, eg. lynx -dump https://www.nytimes.com/2019/01/30/reader-center/polar-vortex-tips.html A solution should strip away all navigation and fluff, and leave ONLY the MAIN CONTENT, like Reader Mode in a browser does.

    – forthrin
    15 hours ago


















-1














Does this satisfy your requirement? (From https://stackoverflow.com/questions/12422289/bash-command-to-convert-html-page-to-a-text-file )



lynx --dump www.google.com > file.txt





share|improve this answer








New contributor




VBB is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
















  • 1





    Nope. This dumps a ton of navigation links, eg. lynx -dump https://www.nytimes.com/2019/01/30/reader-center/polar-vortex-tips.html A solution should strip away all navigation and fluff, and leave ONLY the MAIN CONTENT, like Reader Mode in a browser does.

    – forthrin
    15 hours ago
















-1












-1








-1







Does this satisfy your requirement? (From https://stackoverflow.com/questions/12422289/bash-command-to-convert-html-page-to-a-text-file )



lynx --dump www.google.com > file.txt





share|improve this answer








New contributor




VBB is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.










Does this satisfy your requirement? (From https://stackoverflow.com/questions/12422289/bash-command-to-convert-html-page-to-a-text-file )



lynx --dump www.google.com > file.txt






share|improve this answer








New contributor




VBB is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this answer



share|improve this answer






New contributor




VBB is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









answered 18 hours ago









VBBVBB

992




992




New contributor




VBB is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





VBB is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






VBB is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.








  • 1





    Nope. This dumps a ton of navigation links, eg. lynx -dump https://www.nytimes.com/2019/01/30/reader-center/polar-vortex-tips.html A solution should strip away all navigation and fluff, and leave ONLY the MAIN CONTENT, like Reader Mode in a browser does.

    – forthrin
    15 hours ago
















  • 1





    Nope. This dumps a ton of navigation links, eg. lynx -dump https://www.nytimes.com/2019/01/30/reader-center/polar-vortex-tips.html A solution should strip away all navigation and fluff, and leave ONLY the MAIN CONTENT, like Reader Mode in a browser does.

    – forthrin
    15 hours ago










1




1





Nope. This dumps a ton of navigation links, eg. lynx -dump https://www.nytimes.com/2019/01/30/reader-center/polar-vortex-tips.html A solution should strip away all navigation and fluff, and leave ONLY the MAIN CONTENT, like Reader Mode in a browser does.

– forthrin
15 hours ago







Nope. This dumps a ton of navigation links, eg. lynx -dump https://www.nytimes.com/2019/01/30/reader-center/polar-vortex-tips.html A solution should strip away all navigation and fluff, and leave ONLY the MAIN CONTENT, like Reader Mode in a browser does.

– forthrin
15 hours ago




















draft saved

draft discarded




















































Thanks for contributing an answer to Unix & Linux Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f497911%2fbrowser-like-reader-mode-with-text-only-output%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Accessing regular linux commands in Huawei's Dopra Linux

Can't connect RFCOMM socket: Host is down

Kernel panic - not syncing: Fatal Exception in Interrupt