Automatically convert math written in .tex into a canonical form that retains all the math information











up vote
0
down vote

favorite












I believe what I'm going to describe can be done accurately. However I'm not sure about how hard this will be.



Basically my motivation is to standardize the math part in most existing math papers so that they can be searchable and fed into future automated theorem provers. Since processing PDFs with complete accuracy through machine learning is impossible now we have to rely on .tex files. In order to do so accurately we need to be able to convert a math paper written in .tex into some other form written using ASCII letters that is unique, namely for any unique math symbol there should be one and only one representation. LaTeX itself does not satisfy this condition because it retains a lot of typesetting information that are mathematically irrelevant such as whether ntoinfty should be below or to the right of lim and because there are a lot of user-defined macros.



May I ask what do I need to know in order to develop a tool that can do such conversions? Do I need to modify a TeX engine so that it produces the canonical representation of math instead of a .dvi or .pdf file?










share|improve this question


















  • 1




    I am not an expert on this, but I think a good start is to learn the difference between presentation MathML and content MathML and to understand why the difference exists. It sounds like what you need to do is to convert your formulas to content MathML, a task that is basically impossible without the active involvement of the author, since (La)TeX is basically all about presentation. …
    – Harald Hanche-Olsen
    22 mins ago






  • 1




    … The example you mention is relatively minor, compared to issues such as: Does $x^k$ mean x to the k-th power (most likely in most cases), or does it mean the k-th component of the vector x (most likely in a differential geometry setting)?
    – Harald Hanche-Olsen
    22 mins ago










  • What you are describing sound more like a research problem than a question for TeX.SX.
    – Henri Menke
    21 mins ago










  • @HaraldHanche-Olsen You are right. The problem is indeed even more complicated than what I thought it is. Basically whether some style (e.g. bold, italic, etc) is mathematically significant is strongly context-dependent. For example some people uses bold letters to refer to vectors while others prefer arrows above letters. Shall we merge the two forms in this context? Yes it is complicated.
    – Ying Zhou
    14 mins ago















up vote
0
down vote

favorite












I believe what I'm going to describe can be done accurately. However I'm not sure about how hard this will be.



Basically my motivation is to standardize the math part in most existing math papers so that they can be searchable and fed into future automated theorem provers. Since processing PDFs with complete accuracy through machine learning is impossible now we have to rely on .tex files. In order to do so accurately we need to be able to convert a math paper written in .tex into some other form written using ASCII letters that is unique, namely for any unique math symbol there should be one and only one representation. LaTeX itself does not satisfy this condition because it retains a lot of typesetting information that are mathematically irrelevant such as whether ntoinfty should be below or to the right of lim and because there are a lot of user-defined macros.



May I ask what do I need to know in order to develop a tool that can do such conversions? Do I need to modify a TeX engine so that it produces the canonical representation of math instead of a .dvi or .pdf file?










share|improve this question


















  • 1




    I am not an expert on this, but I think a good start is to learn the difference between presentation MathML and content MathML and to understand why the difference exists. It sounds like what you need to do is to convert your formulas to content MathML, a task that is basically impossible without the active involvement of the author, since (La)TeX is basically all about presentation. …
    – Harald Hanche-Olsen
    22 mins ago






  • 1




    … The example you mention is relatively minor, compared to issues such as: Does $x^k$ mean x to the k-th power (most likely in most cases), or does it mean the k-th component of the vector x (most likely in a differential geometry setting)?
    – Harald Hanche-Olsen
    22 mins ago










  • What you are describing sound more like a research problem than a question for TeX.SX.
    – Henri Menke
    21 mins ago










  • @HaraldHanche-Olsen You are right. The problem is indeed even more complicated than what I thought it is. Basically whether some style (e.g. bold, italic, etc) is mathematically significant is strongly context-dependent. For example some people uses bold letters to refer to vectors while others prefer arrows above letters. Shall we merge the two forms in this context? Yes it is complicated.
    – Ying Zhou
    14 mins ago













up vote
0
down vote

favorite









up vote
0
down vote

favorite











I believe what I'm going to describe can be done accurately. However I'm not sure about how hard this will be.



Basically my motivation is to standardize the math part in most existing math papers so that they can be searchable and fed into future automated theorem provers. Since processing PDFs with complete accuracy through machine learning is impossible now we have to rely on .tex files. In order to do so accurately we need to be able to convert a math paper written in .tex into some other form written using ASCII letters that is unique, namely for any unique math symbol there should be one and only one representation. LaTeX itself does not satisfy this condition because it retains a lot of typesetting information that are mathematically irrelevant such as whether ntoinfty should be below or to the right of lim and because there are a lot of user-defined macros.



May I ask what do I need to know in order to develop a tool that can do such conversions? Do I need to modify a TeX engine so that it produces the canonical representation of math instead of a .dvi or .pdf file?










share|improve this question













I believe what I'm going to describe can be done accurately. However I'm not sure about how hard this will be.



Basically my motivation is to standardize the math part in most existing math papers so that they can be searchable and fed into future automated theorem provers. Since processing PDFs with complete accuracy through machine learning is impossible now we have to rely on .tex files. In order to do so accurately we need to be able to convert a math paper written in .tex into some other form written using ASCII letters that is unique, namely for any unique math symbol there should be one and only one representation. LaTeX itself does not satisfy this condition because it retains a lot of typesetting information that are mathematically irrelevant such as whether ntoinfty should be below or to the right of lim and because there are a lot of user-defined macros.



May I ask what do I need to know in order to develop a tool that can do such conversions? Do I need to modify a TeX engine so that it produces the canonical representation of math instead of a .dvi or .pdf file?







math-mode amsmath






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked 33 mins ago









Ying Zhou

687




687








  • 1




    I am not an expert on this, but I think a good start is to learn the difference between presentation MathML and content MathML and to understand why the difference exists. It sounds like what you need to do is to convert your formulas to content MathML, a task that is basically impossible without the active involvement of the author, since (La)TeX is basically all about presentation. …
    – Harald Hanche-Olsen
    22 mins ago






  • 1




    … The example you mention is relatively minor, compared to issues such as: Does $x^k$ mean x to the k-th power (most likely in most cases), or does it mean the k-th component of the vector x (most likely in a differential geometry setting)?
    – Harald Hanche-Olsen
    22 mins ago










  • What you are describing sound more like a research problem than a question for TeX.SX.
    – Henri Menke
    21 mins ago










  • @HaraldHanche-Olsen You are right. The problem is indeed even more complicated than what I thought it is. Basically whether some style (e.g. bold, italic, etc) is mathematically significant is strongly context-dependent. For example some people uses bold letters to refer to vectors while others prefer arrows above letters. Shall we merge the two forms in this context? Yes it is complicated.
    – Ying Zhou
    14 mins ago














  • 1




    I am not an expert on this, but I think a good start is to learn the difference between presentation MathML and content MathML and to understand why the difference exists. It sounds like what you need to do is to convert your formulas to content MathML, a task that is basically impossible without the active involvement of the author, since (La)TeX is basically all about presentation. …
    – Harald Hanche-Olsen
    22 mins ago






  • 1




    … The example you mention is relatively minor, compared to issues such as: Does $x^k$ mean x to the k-th power (most likely in most cases), or does it mean the k-th component of the vector x (most likely in a differential geometry setting)?
    – Harald Hanche-Olsen
    22 mins ago










  • What you are describing sound more like a research problem than a question for TeX.SX.
    – Henri Menke
    21 mins ago










  • @HaraldHanche-Olsen You are right. The problem is indeed even more complicated than what I thought it is. Basically whether some style (e.g. bold, italic, etc) is mathematically significant is strongly context-dependent. For example some people uses bold letters to refer to vectors while others prefer arrows above letters. Shall we merge the two forms in this context? Yes it is complicated.
    – Ying Zhou
    14 mins ago








1




1




I am not an expert on this, but I think a good start is to learn the difference between presentation MathML and content MathML and to understand why the difference exists. It sounds like what you need to do is to convert your formulas to content MathML, a task that is basically impossible without the active involvement of the author, since (La)TeX is basically all about presentation. …
– Harald Hanche-Olsen
22 mins ago




I am not an expert on this, but I think a good start is to learn the difference between presentation MathML and content MathML and to understand why the difference exists. It sounds like what you need to do is to convert your formulas to content MathML, a task that is basically impossible without the active involvement of the author, since (La)TeX is basically all about presentation. …
– Harald Hanche-Olsen
22 mins ago




1




1




… The example you mention is relatively minor, compared to issues such as: Does $x^k$ mean x to the k-th power (most likely in most cases), or does it mean the k-th component of the vector x (most likely in a differential geometry setting)?
– Harald Hanche-Olsen
22 mins ago




… The example you mention is relatively minor, compared to issues such as: Does $x^k$ mean x to the k-th power (most likely in most cases), or does it mean the k-th component of the vector x (most likely in a differential geometry setting)?
– Harald Hanche-Olsen
22 mins ago












What you are describing sound more like a research problem than a question for TeX.SX.
– Henri Menke
21 mins ago




What you are describing sound more like a research problem than a question for TeX.SX.
– Henri Menke
21 mins ago












@HaraldHanche-Olsen You are right. The problem is indeed even more complicated than what I thought it is. Basically whether some style (e.g. bold, italic, etc) is mathematically significant is strongly context-dependent. For example some people uses bold letters to refer to vectors while others prefer arrows above letters. Shall we merge the two forms in this context? Yes it is complicated.
– Ying Zhou
14 mins ago




@HaraldHanche-Olsen You are right. The problem is indeed even more complicated than what I thought it is. Basically whether some style (e.g. bold, italic, etc) is mathematically significant is strongly context-dependent. For example some people uses bold letters to refer to vectors while others prefer arrows above letters. Shall we merge the two forms in this context? Yes it is complicated.
– Ying Zhou
14 mins ago















active

oldest

votes











Your Answer








StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "85"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2ftex.stackexchange.com%2fquestions%2f466322%2fautomatically-convert-math-written-in-tex-into-a-canonical-form-that-retains-al%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown






























active

oldest

votes













active

oldest

votes









active

oldest

votes






active

oldest

votes
















draft saved

draft discarded




















































Thanks for contributing an answer to TeX - LaTeX Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2ftex.stackexchange.com%2fquestions%2f466322%2fautomatically-convert-math-written-in-tex-into-a-canonical-form-that-retains-al%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Accessing regular linux commands in Huawei's Dopra Linux

Can't connect RFCOMM socket: Host is down

Kernel panic - not syncing: Fatal Exception in Interrupt