Automatically convert math written in .tex into a canonical form that retains all the math information
up vote
0
down vote
favorite
I believe what I'm going to describe can be done accurately. However I'm not sure about how hard this will be.
Basically my motivation is to standardize the math part in most existing math papers so that they can be searchable and fed into future automated theorem provers. Since processing PDFs with complete accuracy through machine learning is impossible now we have to rely on .tex files. In order to do so accurately we need to be able to convert a math paper written in .tex into some other form written using ASCII letters that is unique, namely for any unique math symbol there should be one and only one representation. LaTeX itself does not satisfy this condition because it retains a lot of typesetting information that are mathematically irrelevant such as whether ntoinfty
should be below or to the right of lim
and because there are a lot of user-defined macros.
May I ask what do I need to know in order to develop a tool that can do such conversions? Do I need to modify a TeX engine so that it produces the canonical representation of math instead of a .dvi or .pdf file?
math-mode amsmath
add a comment |
up vote
0
down vote
favorite
I believe what I'm going to describe can be done accurately. However I'm not sure about how hard this will be.
Basically my motivation is to standardize the math part in most existing math papers so that they can be searchable and fed into future automated theorem provers. Since processing PDFs with complete accuracy through machine learning is impossible now we have to rely on .tex files. In order to do so accurately we need to be able to convert a math paper written in .tex into some other form written using ASCII letters that is unique, namely for any unique math symbol there should be one and only one representation. LaTeX itself does not satisfy this condition because it retains a lot of typesetting information that are mathematically irrelevant such as whether ntoinfty
should be below or to the right of lim
and because there are a lot of user-defined macros.
May I ask what do I need to know in order to develop a tool that can do such conversions? Do I need to modify a TeX engine so that it produces the canonical representation of math instead of a .dvi or .pdf file?
math-mode amsmath
1
I am not an expert on this, but I think a good start is to learn the difference between presentation MathML and content MathML and to understand why the difference exists. It sounds like what you need to do is to convert your formulas to content MathML, a task that is basically impossible without the active involvement of the author, since (La)TeX is basically all about presentation. …
– Harald Hanche-Olsen
22 mins ago
1
… The example you mention is relatively minor, compared to issues such as: Does$x^k$
mean x to the k-th power (most likely in most cases), or does it mean the k-th component of the vector x (most likely in a differential geometry setting)?
– Harald Hanche-Olsen
22 mins ago
What you are describing sound more like a research problem than a question for TeX.SX.
– Henri Menke
21 mins ago
@HaraldHanche-Olsen You are right. The problem is indeed even more complicated than what I thought it is. Basically whether some style (e.g. bold, italic, etc) is mathematically significant is strongly context-dependent. For example some people uses bold letters to refer to vectors while others prefer arrows above letters. Shall we merge the two forms in this context? Yes it is complicated.
– Ying Zhou
14 mins ago
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I believe what I'm going to describe can be done accurately. However I'm not sure about how hard this will be.
Basically my motivation is to standardize the math part in most existing math papers so that they can be searchable and fed into future automated theorem provers. Since processing PDFs with complete accuracy through machine learning is impossible now we have to rely on .tex files. In order to do so accurately we need to be able to convert a math paper written in .tex into some other form written using ASCII letters that is unique, namely for any unique math symbol there should be one and only one representation. LaTeX itself does not satisfy this condition because it retains a lot of typesetting information that are mathematically irrelevant such as whether ntoinfty
should be below or to the right of lim
and because there are a lot of user-defined macros.
May I ask what do I need to know in order to develop a tool that can do such conversions? Do I need to modify a TeX engine so that it produces the canonical representation of math instead of a .dvi or .pdf file?
math-mode amsmath
I believe what I'm going to describe can be done accurately. However I'm not sure about how hard this will be.
Basically my motivation is to standardize the math part in most existing math papers so that they can be searchable and fed into future automated theorem provers. Since processing PDFs with complete accuracy through machine learning is impossible now we have to rely on .tex files. In order to do so accurately we need to be able to convert a math paper written in .tex into some other form written using ASCII letters that is unique, namely for any unique math symbol there should be one and only one representation. LaTeX itself does not satisfy this condition because it retains a lot of typesetting information that are mathematically irrelevant such as whether ntoinfty
should be below or to the right of lim
and because there are a lot of user-defined macros.
May I ask what do I need to know in order to develop a tool that can do such conversions? Do I need to modify a TeX engine so that it produces the canonical representation of math instead of a .dvi or .pdf file?
math-mode amsmath
math-mode amsmath
asked 33 mins ago
Ying Zhou
687
687
1
I am not an expert on this, but I think a good start is to learn the difference between presentation MathML and content MathML and to understand why the difference exists. It sounds like what you need to do is to convert your formulas to content MathML, a task that is basically impossible without the active involvement of the author, since (La)TeX is basically all about presentation. …
– Harald Hanche-Olsen
22 mins ago
1
… The example you mention is relatively minor, compared to issues such as: Does$x^k$
mean x to the k-th power (most likely in most cases), or does it mean the k-th component of the vector x (most likely in a differential geometry setting)?
– Harald Hanche-Olsen
22 mins ago
What you are describing sound more like a research problem than a question for TeX.SX.
– Henri Menke
21 mins ago
@HaraldHanche-Olsen You are right. The problem is indeed even more complicated than what I thought it is. Basically whether some style (e.g. bold, italic, etc) is mathematically significant is strongly context-dependent. For example some people uses bold letters to refer to vectors while others prefer arrows above letters. Shall we merge the two forms in this context? Yes it is complicated.
– Ying Zhou
14 mins ago
add a comment |
1
I am not an expert on this, but I think a good start is to learn the difference between presentation MathML and content MathML and to understand why the difference exists. It sounds like what you need to do is to convert your formulas to content MathML, a task that is basically impossible without the active involvement of the author, since (La)TeX is basically all about presentation. …
– Harald Hanche-Olsen
22 mins ago
1
… The example you mention is relatively minor, compared to issues such as: Does$x^k$
mean x to the k-th power (most likely in most cases), or does it mean the k-th component of the vector x (most likely in a differential geometry setting)?
– Harald Hanche-Olsen
22 mins ago
What you are describing sound more like a research problem than a question for TeX.SX.
– Henri Menke
21 mins ago
@HaraldHanche-Olsen You are right. The problem is indeed even more complicated than what I thought it is. Basically whether some style (e.g. bold, italic, etc) is mathematically significant is strongly context-dependent. For example some people uses bold letters to refer to vectors while others prefer arrows above letters. Shall we merge the two forms in this context? Yes it is complicated.
– Ying Zhou
14 mins ago
1
1
I am not an expert on this, but I think a good start is to learn the difference between presentation MathML and content MathML and to understand why the difference exists. It sounds like what you need to do is to convert your formulas to content MathML, a task that is basically impossible without the active involvement of the author, since (La)TeX is basically all about presentation. …
– Harald Hanche-Olsen
22 mins ago
I am not an expert on this, but I think a good start is to learn the difference between presentation MathML and content MathML and to understand why the difference exists. It sounds like what you need to do is to convert your formulas to content MathML, a task that is basically impossible without the active involvement of the author, since (La)TeX is basically all about presentation. …
– Harald Hanche-Olsen
22 mins ago
1
1
… The example you mention is relatively minor, compared to issues such as: Does
$x^k$
mean x to the k-th power (most likely in most cases), or does it mean the k-th component of the vector x (most likely in a differential geometry setting)?– Harald Hanche-Olsen
22 mins ago
… The example you mention is relatively minor, compared to issues such as: Does
$x^k$
mean x to the k-th power (most likely in most cases), or does it mean the k-th component of the vector x (most likely in a differential geometry setting)?– Harald Hanche-Olsen
22 mins ago
What you are describing sound more like a research problem than a question for TeX.SX.
– Henri Menke
21 mins ago
What you are describing sound more like a research problem than a question for TeX.SX.
– Henri Menke
21 mins ago
@HaraldHanche-Olsen You are right. The problem is indeed even more complicated than what I thought it is. Basically whether some style (e.g. bold, italic, etc) is mathematically significant is strongly context-dependent. For example some people uses bold letters to refer to vectors while others prefer arrows above letters. Shall we merge the two forms in this context? Yes it is complicated.
– Ying Zhou
14 mins ago
@HaraldHanche-Olsen You are right. The problem is indeed even more complicated than what I thought it is. Basically whether some style (e.g. bold, italic, etc) is mathematically significant is strongly context-dependent. For example some people uses bold letters to refer to vectors while others prefer arrows above letters. Shall we merge the two forms in this context? Yes it is complicated.
– Ying Zhou
14 mins ago
add a comment |
active
oldest
votes
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "85"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2ftex.stackexchange.com%2fquestions%2f466322%2fautomatically-convert-math-written-in-tex-into-a-canonical-form-that-retains-al%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to TeX - LaTeX Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2ftex.stackexchange.com%2fquestions%2f466322%2fautomatically-convert-math-written-in-tex-into-a-canonical-form-that-retains-al%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
I am not an expert on this, but I think a good start is to learn the difference between presentation MathML and content MathML and to understand why the difference exists. It sounds like what you need to do is to convert your formulas to content MathML, a task that is basically impossible without the active involvement of the author, since (La)TeX is basically all about presentation. …
– Harald Hanche-Olsen
22 mins ago
1
… The example you mention is relatively minor, compared to issues such as: Does
$x^k$
mean x to the k-th power (most likely in most cases), or does it mean the k-th component of the vector x (most likely in a differential geometry setting)?– Harald Hanche-Olsen
22 mins ago
What you are describing sound more like a research problem than a question for TeX.SX.
– Henri Menke
21 mins ago
@HaraldHanche-Olsen You are right. The problem is indeed even more complicated than what I thought it is. Basically whether some style (e.g. bold, italic, etc) is mathematically significant is strongly context-dependent. For example some people uses bold letters to refer to vectors while others prefer arrows above letters. Shall we merge the two forms in this context? Yes it is complicated.
– Ying Zhou
14 mins ago