Processing a math paper written in TeX to generate a summary











up vote
3
down vote

favorite












I'm interested in writing a program that can automatically process math papers written in TeX in order to generate summaries. I have partly finished one written in Python but it does not seem to work very well because people use different names for theorems, lemmas, etc to the point that they can't simply be captured by begin{theorem}. I believe a part of the problem is that Python can not really natively process the LaTeX language. May I ask whether there is any way for TeX to selectively not process certain texts?



Here are what I want to retain in a summary:




  1. Title, authors, abstract.


  2. New commands and other settings.


  3. Sections, subsections and subsubsections.


  4. Theorems, lemmas, definitions, corollaries, conjectures, notations, examples, exercises and prepositions, all beginning with begin{something} and ending with end{something}.


  5. Bibliography.



Basically most of the text needs to be somehow ignored. My current approach is letting a lexer and a parser written in Python spot what needs to be retained.










share|improve this question


















  • 1




    It would be helpful if you composed a fully compilable MWE including documentclass and the appropriate packages that sets up the problem and illustrates exactly what you want ignored. If you definition is everything from of the form begin{something}...end{somthing}, I assume you want exclude somthing=document. Do you want to exclude lists, etc...
    – Peter Grill
    Oct 21 at 0:32






  • 2




    ...I agree. Can you provide some input LaTeX file and the expected output (summary) that you want from that input?
    – Werner
    Oct 21 at 1:11










  • I think this is not something that you are going to get a complete solution at the start. This needs to be built up over time. The best way I can think of getting started on this is to provide a very small document with one or two environments you want extracted along with one or two that you want ignored.
    – Peter Grill
    Oct 23 at 18:01










  • I assume you mean LaTeX and not TeX, since the later doesn't have enough structure to actually process. But why not catch all begin{somethings} and then figure out what "somethings" you're want to catch, and which you want to ignore?
    – Teepeemm
    Oct 26 at 2:13















up vote
3
down vote

favorite












I'm interested in writing a program that can automatically process math papers written in TeX in order to generate summaries. I have partly finished one written in Python but it does not seem to work very well because people use different names for theorems, lemmas, etc to the point that they can't simply be captured by begin{theorem}. I believe a part of the problem is that Python can not really natively process the LaTeX language. May I ask whether there is any way for TeX to selectively not process certain texts?



Here are what I want to retain in a summary:




  1. Title, authors, abstract.


  2. New commands and other settings.


  3. Sections, subsections and subsubsections.


  4. Theorems, lemmas, definitions, corollaries, conjectures, notations, examples, exercises and prepositions, all beginning with begin{something} and ending with end{something}.


  5. Bibliography.



Basically most of the text needs to be somehow ignored. My current approach is letting a lexer and a parser written in Python spot what needs to be retained.










share|improve this question


















  • 1




    It would be helpful if you composed a fully compilable MWE including documentclass and the appropriate packages that sets up the problem and illustrates exactly what you want ignored. If you definition is everything from of the form begin{something}...end{somthing}, I assume you want exclude somthing=document. Do you want to exclude lists, etc...
    – Peter Grill
    Oct 21 at 0:32






  • 2




    ...I agree. Can you provide some input LaTeX file and the expected output (summary) that you want from that input?
    – Werner
    Oct 21 at 1:11










  • I think this is not something that you are going to get a complete solution at the start. This needs to be built up over time. The best way I can think of getting started on this is to provide a very small document with one or two environments you want extracted along with one or two that you want ignored.
    – Peter Grill
    Oct 23 at 18:01










  • I assume you mean LaTeX and not TeX, since the later doesn't have enough structure to actually process. But why not catch all begin{somethings} and then figure out what "somethings" you're want to catch, and which you want to ignore?
    – Teepeemm
    Oct 26 at 2:13













up vote
3
down vote

favorite









up vote
3
down vote

favorite











I'm interested in writing a program that can automatically process math papers written in TeX in order to generate summaries. I have partly finished one written in Python but it does not seem to work very well because people use different names for theorems, lemmas, etc to the point that they can't simply be captured by begin{theorem}. I believe a part of the problem is that Python can not really natively process the LaTeX language. May I ask whether there is any way for TeX to selectively not process certain texts?



Here are what I want to retain in a summary:




  1. Title, authors, abstract.


  2. New commands and other settings.


  3. Sections, subsections and subsubsections.


  4. Theorems, lemmas, definitions, corollaries, conjectures, notations, examples, exercises and prepositions, all beginning with begin{something} and ending with end{something}.


  5. Bibliography.



Basically most of the text needs to be somehow ignored. My current approach is letting a lexer and a parser written in Python spot what needs to be retained.










share|improve this question













I'm interested in writing a program that can automatically process math papers written in TeX in order to generate summaries. I have partly finished one written in Python but it does not seem to work very well because people use different names for theorems, lemmas, etc to the point that they can't simply be captured by begin{theorem}. I believe a part of the problem is that Python can not really natively process the LaTeX language. May I ask whether there is any way for TeX to selectively not process certain texts?



Here are what I want to retain in a summary:




  1. Title, authors, abstract.


  2. New commands and other settings.


  3. Sections, subsections and subsubsections.


  4. Theorems, lemmas, definitions, corollaries, conjectures, notations, examples, exercises and prepositions, all beginning with begin{something} and ending with end{something}.


  5. Bibliography.



Basically most of the text needs to be somehow ignored. My current approach is letting a lexer and a parser written in Python spot what needs to be retained.







programming






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Oct 15 at 3:52









Ying Zhou

687




687








  • 1




    It would be helpful if you composed a fully compilable MWE including documentclass and the appropriate packages that sets up the problem and illustrates exactly what you want ignored. If you definition is everything from of the form begin{something}...end{somthing}, I assume you want exclude somthing=document. Do you want to exclude lists, etc...
    – Peter Grill
    Oct 21 at 0:32






  • 2




    ...I agree. Can you provide some input LaTeX file and the expected output (summary) that you want from that input?
    – Werner
    Oct 21 at 1:11










  • I think this is not something that you are going to get a complete solution at the start. This needs to be built up over time. The best way I can think of getting started on this is to provide a very small document with one or two environments you want extracted along with one or two that you want ignored.
    – Peter Grill
    Oct 23 at 18:01










  • I assume you mean LaTeX and not TeX, since the later doesn't have enough structure to actually process. But why not catch all begin{somethings} and then figure out what "somethings" you're want to catch, and which you want to ignore?
    – Teepeemm
    Oct 26 at 2:13














  • 1




    It would be helpful if you composed a fully compilable MWE including documentclass and the appropriate packages that sets up the problem and illustrates exactly what you want ignored. If you definition is everything from of the form begin{something}...end{somthing}, I assume you want exclude somthing=document. Do you want to exclude lists, etc...
    – Peter Grill
    Oct 21 at 0:32






  • 2




    ...I agree. Can you provide some input LaTeX file and the expected output (summary) that you want from that input?
    – Werner
    Oct 21 at 1:11










  • I think this is not something that you are going to get a complete solution at the start. This needs to be built up over time. The best way I can think of getting started on this is to provide a very small document with one or two environments you want extracted along with one or two that you want ignored.
    – Peter Grill
    Oct 23 at 18:01










  • I assume you mean LaTeX and not TeX, since the later doesn't have enough structure to actually process. But why not catch all begin{somethings} and then figure out what "somethings" you're want to catch, and which you want to ignore?
    – Teepeemm
    Oct 26 at 2:13








1




1




It would be helpful if you composed a fully compilable MWE including documentclass and the appropriate packages that sets up the problem and illustrates exactly what you want ignored. If you definition is everything from of the form begin{something}...end{somthing}, I assume you want exclude somthing=document. Do you want to exclude lists, etc...
– Peter Grill
Oct 21 at 0:32




It would be helpful if you composed a fully compilable MWE including documentclass and the appropriate packages that sets up the problem and illustrates exactly what you want ignored. If you definition is everything from of the form begin{something}...end{somthing}, I assume you want exclude somthing=document. Do you want to exclude lists, etc...
– Peter Grill
Oct 21 at 0:32




2




2




...I agree. Can you provide some input LaTeX file and the expected output (summary) that you want from that input?
– Werner
Oct 21 at 1:11




...I agree. Can you provide some input LaTeX file and the expected output (summary) that you want from that input?
– Werner
Oct 21 at 1:11












I think this is not something that you are going to get a complete solution at the start. This needs to be built up over time. The best way I can think of getting started on this is to provide a very small document with one or two environments you want extracted along with one or two that you want ignored.
– Peter Grill
Oct 23 at 18:01




I think this is not something that you are going to get a complete solution at the start. This needs to be built up over time. The best way I can think of getting started on this is to provide a very small document with one or two environments you want extracted along with one or two that you want ignored.
– Peter Grill
Oct 23 at 18:01












I assume you mean LaTeX and not TeX, since the later doesn't have enough structure to actually process. But why not catch all begin{somethings} and then figure out what "somethings" you're want to catch, and which you want to ignore?
– Teepeemm
Oct 26 at 2:13




I assume you mean LaTeX and not TeX, since the later doesn't have enough structure to actually process. But why not catch all begin{somethings} and then figure out what "somethings" you're want to catch, and which you want to ignore?
– Teepeemm
Oct 26 at 2:13










1 Answer
1






active

oldest

votes

















up vote
0
down vote













I think I have a short Python script written using TexSoup that can do at least 4. For now I'm satisfied. The next step is converting a math paper in tex into a plain text that can represent the same mathematical but not typesetting information unrelated to math.






share|improve this answer





















    Your Answer








    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "85"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2ftex.stackexchange.com%2fquestions%2f455204%2fprocessing-a-math-paper-written-in-tex-to-generate-a-summary%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    0
    down vote













    I think I have a short Python script written using TexSoup that can do at least 4. For now I'm satisfied. The next step is converting a math paper in tex into a plain text that can represent the same mathematical but not typesetting information unrelated to math.






    share|improve this answer

























      up vote
      0
      down vote













      I think I have a short Python script written using TexSoup that can do at least 4. For now I'm satisfied. The next step is converting a math paper in tex into a plain text that can represent the same mathematical but not typesetting information unrelated to math.






      share|improve this answer























        up vote
        0
        down vote










        up vote
        0
        down vote









        I think I have a short Python script written using TexSoup that can do at least 4. For now I'm satisfied. The next step is converting a math paper in tex into a plain text that can represent the same mathematical but not typesetting information unrelated to math.






        share|improve this answer












        I think I have a short Python script written using TexSoup that can do at least 4. For now I'm satisfied. The next step is converting a math paper in tex into a plain text that can represent the same mathematical but not typesetting information unrelated to math.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered 17 mins ago









        Ying Zhou

        687




        687






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to TeX - LaTeX Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2ftex.stackexchange.com%2fquestions%2f455204%2fprocessing-a-math-paper-written-in-tex-to-generate-a-summary%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Accessing regular linux commands in Huawei's Dopra Linux

            Can't connect RFCOMM socket: Host is down

            Kernel panic - not syncing: Fatal Exception in Interrupt