Processing a math paper written in TeX to generate a summary

up vote
3
down vote

favorite

I'm interested in writing a program that can automatically process math papers written in TeX in order to generate summaries. I have partly finished one written in Python but it does not seem to work very well because people use different names for theorems, lemmas, etc to the point that they can't simply be captured by begin{theorem}. I believe a part of the problem is that Python can not really natively process the LaTeX language. May I ask whether there is any way for TeX to selectively not process certain texts?

Here are what I want to retain in a summary:

Title, authors, abstract.

New commands and other settings.

Sections, subsections and subsubsections.

Theorems, lemmas, definitions, corollaries, conjectures, notations, examples, exercises and prepositions, all beginning with begin{something} and ending with end{something}.

Bibliography.

Basically most of the text needs to be somehow ignored. My current approach is letting a lexer and a parser written in Python spot what needs to be retained.

asked Oct 15 at 3:52

Ying Zhou

687

1

It would be helpful if you composed a fully compilable MWE including documentclass and the appropriate packages that sets up the problem and illustrates exactly what you want ignored. If you definition is everything from of the form begin{something}...end{somthing}, I assume you want exclude somthing=document. Do you want to exclude lists, etc...
– Peter Grill
Oct 21 at 0:32

2

...I agree. Can you provide some input LaTeX file and the expected output (summary) that you want from that input?
– Werner
Oct 21 at 1:11

I think this is not something that you are going to get a complete solution at the start. This needs to be built up over time. The best way I can think of getting started on this is to provide a very small document with one or two environments you want extracted along with one or two that you want ignored.
– Peter Grill
Oct 23 at 18:01

I assume you mean LaTeX and not TeX, since the later doesn't have enough structure to actually process. But why not catch all begin{somethings} and then figure out what "somethings" you're want to catch, and which you want to ignore?
– Teepeemm
Oct 26 at 2:13

add a comment |

up vote
3
down vote

favorite

Here are what I want to retain in a summary:

Title, authors, abstract.

New commands and other settings.

Sections, subsections and subsubsections.

Theorems, lemmas, definitions, corollaries, conjectures, notations, examples, exercises and prepositions, all beginning with begin{something} and ending with end{something}.

Bibliography.

Basically most of the text needs to be somehow ignored. My current approach is letting a lexer and a parser written in Python spot what needs to be retained.

asked Oct 15 at 3:52

Ying Zhou

687

1

It would be helpful if you composed a fully compilable MWE including documentclass and the appropriate packages that sets up the problem and illustrates exactly what you want ignored. If you definition is everything from of the form begin{something}...end{somthing}, I assume you want exclude somthing=document. Do you want to exclude lists, etc...
– Peter Grill
Oct 21 at 0:32

2

...I agree. Can you provide some input LaTeX file and the expected output (summary) that you want from that input?
– Werner
Oct 21 at 1:11

I think this is not something that you are going to get a complete solution at the start. This needs to be built up over time. The best way I can think of getting started on this is to provide a very small document with one or two environments you want extracted along with one or two that you want ignored.
– Peter Grill
Oct 23 at 18:01

I assume you mean LaTeX and not TeX, since the later doesn't have enough structure to actually process. But why not catch all begin{somethings} and then figure out what "somethings" you're want to catch, and which you want to ignore?
– Teepeemm
Oct 26 at 2:13

add a comment |

up vote
3
down vote

favorite

Here are what I want to retain in a summary:

Title, authors, abstract.

New commands and other settings.

Sections, subsections and subsubsections.

Theorems, lemmas, definitions, corollaries, conjectures, notations, examples, exercises and prepositions, all beginning with begin{something} and ending with end{something}.

Bibliography.

Basically most of the text needs to be somehow ignored. My current approach is letting a lexer and a parser written in Python spot what needs to be retained.

asked Oct 15 at 3:52

Ying Zhou

687

Here are what I want to retain in a summary:

Title, authors, abstract.

New commands and other settings.

Sections, subsections and subsubsections.

Theorems, lemmas, definitions, corollaries, conjectures, notations, examples, exercises and prepositions, all beginning with begin{something} and ending with end{something}.

Bibliography.

Basically most of the text needs to be somehow ignored. My current approach is letting a lexer and a parser written in Python spot what needs to be retained.

programming

asked Oct 15 at 3:52

Ying Zhou

687

asked Oct 15 at 3:52

Ying Zhou

687

asked Oct 15 at 3:52

Ying Zhou

687

asked Oct 15 at 3:52

Ying Zhou

687

asked Oct 15 at 3:52

Ying Zhou

687

1

It would be helpful if you composed a fully compilable MWE including documentclass and the appropriate packages that sets up the problem and illustrates exactly what you want ignored. If you definition is everything from of the form begin{something}...end{somthing}, I assume you want exclude somthing=document. Do you want to exclude lists, etc...
– Peter Grill
Oct 21 at 0:32

2

...I agree. Can you provide some input LaTeX file and the expected output (summary) that you want from that input?
– Werner
Oct 21 at 1:11

I think this is not something that you are going to get a complete solution at the start. This needs to be built up over time. The best way I can think of getting started on this is to provide a very small document with one or two environments you want extracted along with one or two that you want ignored.
– Peter Grill
Oct 23 at 18:01

I assume you mean LaTeX and not TeX, since the later doesn't have enough structure to actually process. But why not catch all begin{somethings} and then figure out what "somethings" you're want to catch, and which you want to ignore?
– Teepeemm
Oct 26 at 2:13

add a comment |

1

It would be helpful if you composed a fully compilable MWE including documentclass and the appropriate packages that sets up the problem and illustrates exactly what you want ignored. If you definition is everything from of the form begin{something}...end{somthing}, I assume you want exclude somthing=document. Do you want to exclude lists, etc...
– Peter Grill
Oct 21 at 0:32

2

...I agree. Can you provide some input LaTeX file and the expected output (summary) that you want from that input?
– Werner
Oct 21 at 1:11

I think this is not something that you are going to get a complete solution at the start. This needs to be built up over time. The best way I can think of getting started on this is to provide a very small document with one or two environments you want extracted along with one or two that you want ignored.
– Peter Grill
Oct 23 at 18:01

I assume you mean LaTeX and not TeX, since the later doesn't have enough structure to actually process. But why not catch all begin{somethings} and then figure out what "somethings" you're want to catch, and which you want to ignore?
– Teepeemm
Oct 26 at 2:13

It would be helpful if you composed a fully compilable MWE including documentclass and the appropriate packages that sets up the problem and illustrates exactly what you want ignored. If you definition is everything from of the form begin{something}...end{somthing}, I assume you want exclude somthing=document. Do you want to exclude lists, etc...
– Peter Grill
Oct 21 at 0:32

...I agree. Can you provide some input LaTeX file and the expected output (summary) that you want from that input?
– Werner
Oct 21 at 1:11

I think this is not something that you are going to get a complete solution at the start. This needs to be built up over time. The best way I can think of getting started on this is to provide a very small document with one or two environments you want extracted along with one or two that you want ignored.
– Peter Grill
Oct 23 at 18:01

I assume you mean LaTeX and not TeX, since the later doesn't have enough structure to actually process. But why not catch all begin{somethings} and then figure out what "somethings" you're want to catch, and which you want to ignore?
– Teepeemm
Oct 26 at 2:13

add a comment |

1 Answer
1

active

oldest

votes

up vote
0
down vote

I think I have a short Python script written using TexSoup that can do at least 4. For now I'm satisfied. The next step is converting a math paper in tex into a plain text that can represent the same mathematical but not typesetting information unrelated to math.

answered 17 mins ago

Ying Zhou

687

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "85"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2ftex.stackexchange.com%2fquestions%2f455204%2fprocessing-a-math-paper-written-in-tex-to-generate-a-summary%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
0
down vote

answered 17 mins ago

Ying Zhou

687

add a comment |

up vote
0
down vote

answered 17 mins ago

Ying Zhou

687

add a comment |

up vote
0
down vote

answered 17 mins ago

Ying Zhou

687

answered 17 mins ago

Ying Zhou

687

answered 17 mins ago

Ying Zhou

687

answered 17 mins ago

Ying Zhou

687

answered 17 mins ago

Ying Zhou

687

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to TeX - LaTeX Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Sstrhsrtj