Processing a math paper written in TeX to generate a summary
up vote
3
down vote
favorite
I'm interested in writing a program that can automatically process math papers written in TeX in order to generate summaries. I have partly finished one written in Python but it does not seem to work very well because people use different names for theorems, lemmas, etc to the point that they can't simply be captured by begin{theorem}
. I believe a part of the problem is that Python can not really natively process the LaTeX language. May I ask whether there is any way for TeX to selectively not process certain texts?
Here are what I want to retain in a summary:
Title, authors, abstract.
New commands and other settings.
Sections, subsections and subsubsections.
Theorems, lemmas, definitions, corollaries, conjectures, notations, examples, exercises and prepositions, all beginning with
begin{something}
and ending withend{something}
.Bibliography.
Basically most of the text needs to be somehow ignored. My current approach is letting a lexer and a parser written in Python spot what needs to be retained.
programming
add a comment |
up vote
3
down vote
favorite
I'm interested in writing a program that can automatically process math papers written in TeX in order to generate summaries. I have partly finished one written in Python but it does not seem to work very well because people use different names for theorems, lemmas, etc to the point that they can't simply be captured by begin{theorem}
. I believe a part of the problem is that Python can not really natively process the LaTeX language. May I ask whether there is any way for TeX to selectively not process certain texts?
Here are what I want to retain in a summary:
Title, authors, abstract.
New commands and other settings.
Sections, subsections and subsubsections.
Theorems, lemmas, definitions, corollaries, conjectures, notations, examples, exercises and prepositions, all beginning with
begin{something}
and ending withend{something}
.Bibliography.
Basically most of the text needs to be somehow ignored. My current approach is letting a lexer and a parser written in Python spot what needs to be retained.
programming
1
It would be helpful if you composed a fully compilable MWE includingdocumentclass
and the appropriate packages that sets up the problem and illustrates exactly what you want ignored. If you definition is everything from of the formbegin{something}...end{somthing}
, I assume you want excludesomthing=document.
Do you want to exclude lists, etc...
– Peter Grill
Oct 21 at 0:32
2
...I agree. Can you provide some input LaTeX file and the expected output (summary) that you want from that input?
– Werner
Oct 21 at 1:11
I think this is not something that you are going to get a complete solution at the start. This needs to be built up over time. The best way I can think of getting started on this is to provide a very small document with one or two environments you want extracted along with one or two that you want ignored.
– Peter Grill
Oct 23 at 18:01
I assume you mean LaTeX and not TeX, since the later doesn't have enough structure to actually process. But why not catch allbegin{somethings}
and then figure out what "somethings" you're want to catch, and which you want to ignore?
– Teepeemm
Oct 26 at 2:13
add a comment |
up vote
3
down vote
favorite
up vote
3
down vote
favorite
I'm interested in writing a program that can automatically process math papers written in TeX in order to generate summaries. I have partly finished one written in Python but it does not seem to work very well because people use different names for theorems, lemmas, etc to the point that they can't simply be captured by begin{theorem}
. I believe a part of the problem is that Python can not really natively process the LaTeX language. May I ask whether there is any way for TeX to selectively not process certain texts?
Here are what I want to retain in a summary:
Title, authors, abstract.
New commands and other settings.
Sections, subsections and subsubsections.
Theorems, lemmas, definitions, corollaries, conjectures, notations, examples, exercises and prepositions, all beginning with
begin{something}
and ending withend{something}
.Bibliography.
Basically most of the text needs to be somehow ignored. My current approach is letting a lexer and a parser written in Python spot what needs to be retained.
programming
I'm interested in writing a program that can automatically process math papers written in TeX in order to generate summaries. I have partly finished one written in Python but it does not seem to work very well because people use different names for theorems, lemmas, etc to the point that they can't simply be captured by begin{theorem}
. I believe a part of the problem is that Python can not really natively process the LaTeX language. May I ask whether there is any way for TeX to selectively not process certain texts?
Here are what I want to retain in a summary:
Title, authors, abstract.
New commands and other settings.
Sections, subsections and subsubsections.
Theorems, lemmas, definitions, corollaries, conjectures, notations, examples, exercises and prepositions, all beginning with
begin{something}
and ending withend{something}
.Bibliography.
Basically most of the text needs to be somehow ignored. My current approach is letting a lexer and a parser written in Python spot what needs to be retained.
programming
programming
asked Oct 15 at 3:52
Ying Zhou
687
687
1
It would be helpful if you composed a fully compilable MWE includingdocumentclass
and the appropriate packages that sets up the problem and illustrates exactly what you want ignored. If you definition is everything from of the formbegin{something}...end{somthing}
, I assume you want excludesomthing=document.
Do you want to exclude lists, etc...
– Peter Grill
Oct 21 at 0:32
2
...I agree. Can you provide some input LaTeX file and the expected output (summary) that you want from that input?
– Werner
Oct 21 at 1:11
I think this is not something that you are going to get a complete solution at the start. This needs to be built up over time. The best way I can think of getting started on this is to provide a very small document with one or two environments you want extracted along with one or two that you want ignored.
– Peter Grill
Oct 23 at 18:01
I assume you mean LaTeX and not TeX, since the later doesn't have enough structure to actually process. But why not catch allbegin{somethings}
and then figure out what "somethings" you're want to catch, and which you want to ignore?
– Teepeemm
Oct 26 at 2:13
add a comment |
1
It would be helpful if you composed a fully compilable MWE includingdocumentclass
and the appropriate packages that sets up the problem and illustrates exactly what you want ignored. If you definition is everything from of the formbegin{something}...end{somthing}
, I assume you want excludesomthing=document.
Do you want to exclude lists, etc...
– Peter Grill
Oct 21 at 0:32
2
...I agree. Can you provide some input LaTeX file and the expected output (summary) that you want from that input?
– Werner
Oct 21 at 1:11
I think this is not something that you are going to get a complete solution at the start. This needs to be built up over time. The best way I can think of getting started on this is to provide a very small document with one or two environments you want extracted along with one or two that you want ignored.
– Peter Grill
Oct 23 at 18:01
I assume you mean LaTeX and not TeX, since the later doesn't have enough structure to actually process. But why not catch allbegin{somethings}
and then figure out what "somethings" you're want to catch, and which you want to ignore?
– Teepeemm
Oct 26 at 2:13
1
1
It would be helpful if you composed a fully compilable MWE including
documentclass
and the appropriate packages that sets up the problem and illustrates exactly what you want ignored. If you definition is everything from of the form begin{something}...end{somthing}
, I assume you want exclude somthing=document.
Do you want to exclude lists, etc...– Peter Grill
Oct 21 at 0:32
It would be helpful if you composed a fully compilable MWE including
documentclass
and the appropriate packages that sets up the problem and illustrates exactly what you want ignored. If you definition is everything from of the form begin{something}...end{somthing}
, I assume you want exclude somthing=document.
Do you want to exclude lists, etc...– Peter Grill
Oct 21 at 0:32
2
2
...I agree. Can you provide some input LaTeX file and the expected output (summary) that you want from that input?
– Werner
Oct 21 at 1:11
...I agree. Can you provide some input LaTeX file and the expected output (summary) that you want from that input?
– Werner
Oct 21 at 1:11
I think this is not something that you are going to get a complete solution at the start. This needs to be built up over time. The best way I can think of getting started on this is to provide a very small document with one or two environments you want extracted along with one or two that you want ignored.
– Peter Grill
Oct 23 at 18:01
I think this is not something that you are going to get a complete solution at the start. This needs to be built up over time. The best way I can think of getting started on this is to provide a very small document with one or two environments you want extracted along with one or two that you want ignored.
– Peter Grill
Oct 23 at 18:01
I assume you mean LaTeX and not TeX, since the later doesn't have enough structure to actually process. But why not catch all
begin{somethings}
and then figure out what "somethings" you're want to catch, and which you want to ignore?– Teepeemm
Oct 26 at 2:13
I assume you mean LaTeX and not TeX, since the later doesn't have enough structure to actually process. But why not catch all
begin{somethings}
and then figure out what "somethings" you're want to catch, and which you want to ignore?– Teepeemm
Oct 26 at 2:13
add a comment |
1 Answer
1
active
oldest
votes
up vote
0
down vote
I think I have a short Python script written using TexSoup that can do at least 4. For now I'm satisfied. The next step is converting a math paper in tex into a plain text that can represent the same mathematical but not typesetting information unrelated to math.
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "85"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2ftex.stackexchange.com%2fquestions%2f455204%2fprocessing-a-math-paper-written-in-tex-to-generate-a-summary%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
I think I have a short Python script written using TexSoup that can do at least 4. For now I'm satisfied. The next step is converting a math paper in tex into a plain text that can represent the same mathematical but not typesetting information unrelated to math.
add a comment |
up vote
0
down vote
I think I have a short Python script written using TexSoup that can do at least 4. For now I'm satisfied. The next step is converting a math paper in tex into a plain text that can represent the same mathematical but not typesetting information unrelated to math.
add a comment |
up vote
0
down vote
up vote
0
down vote
I think I have a short Python script written using TexSoup that can do at least 4. For now I'm satisfied. The next step is converting a math paper in tex into a plain text that can represent the same mathematical but not typesetting information unrelated to math.
I think I have a short Python script written using TexSoup that can do at least 4. For now I'm satisfied. The next step is converting a math paper in tex into a plain text that can represent the same mathematical but not typesetting information unrelated to math.
answered 17 mins ago
Ying Zhou
687
687
add a comment |
add a comment |
Thanks for contributing an answer to TeX - LaTeX Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2ftex.stackexchange.com%2fquestions%2f455204%2fprocessing-a-math-paper-written-in-tex-to-generate-a-summary%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
It would be helpful if you composed a fully compilable MWE including
documentclass
and the appropriate packages that sets up the problem and illustrates exactly what you want ignored. If you definition is everything from of the formbegin{something}...end{somthing}
, I assume you want excludesomthing=document.
Do you want to exclude lists, etc...– Peter Grill
Oct 21 at 0:32
2
...I agree. Can you provide some input LaTeX file and the expected output (summary) that you want from that input?
– Werner
Oct 21 at 1:11
I think this is not something that you are going to get a complete solution at the start. This needs to be built up over time. The best way I can think of getting started on this is to provide a very small document with one or two environments you want extracted along with one or two that you want ignored.
– Peter Grill
Oct 23 at 18:01
I assume you mean LaTeX and not TeX, since the later doesn't have enough structure to actually process. But why not catch all
begin{somethings}
and then figure out what "somethings" you're want to catch, and which you want to ignore?– Teepeemm
Oct 26 at 2:13