Identify math expressions

closed account (SECMoG1T)
Is there any library I would use to id math expressions within general text. for example "let X and Y be random variables such that X ~ Norm(X^2, Y)"

for example in the statement above
- X
- Y
- X ~ Norm(X^2,Y)

are the expressions of interest, how would I identify them using code.

Any suggestions will be appreciated.
No. Think about it, if you're allowing single letter variables to count as expressions, what's stopping the algorithm from identifying the article "a" as an expression?

What are you trying to do?
closed account (SECMoG1T)
Am trying to automate the formatting of such expressions with latex code(am using c++ for the program code) within documents, now if i could just identify the expressions it would be a problem solved because the formatting part is super easy.

what's stopping the algorithm from identifying the article "a" as an expression?


Well yea, i get your point, if i exclude such simple statements from the list such that my average expressions would be of the form similar to,


- \integral_{-\infinity}^{\infinity}\ratio{34x^-e}{x \pi} dx
- z ~ \binomial(1,2)
- \Rand(x,y) \subset \Range{1,...,10}
- f_{X,Y}(x,y)
- \sqrt{2}{32}


(Note: _ means subscript while ^means superscript or exponent)

would there be a possible solution
Last edited on
That's more like it. If you have a well-defined grammar you reduce the problem significantly, compared to just finding "math expressions".

A simple search yields no results for me, so I think you'll have to write something yourself. For starters you'll need a parser that parse a subset of the LaTeX grammar; in particular, it must not accept single words as LaTeX.
I still don't know how you'll handle the "z ~ \binomial(1,2)" case.

I have to ask, though: if the user is already writing the LaTeX markup, is it asking so much more that they also write some markup to identify it, like [latex][/latex]? This is how basically all software that supports embedding LaTeX math works.
closed account (SECMoG1T)
I'll try to set up a parser, but i can feel it would so massive because there are numerous special cases that will need to be handled independently.

z ~ \binomial(1,2)

will have to be a special case.



 if the user is already writing the LaTeX markup, is it asking so much more that they also write some markup to identify it, like [latex][/latex]?


That would be awesome but these docs come from thousands of sources (some varying from a single line to hundreds of lines), The problem is placing this markup tags isn't explicitly required, some users will place them some won't so we will have to inspect all documents, places the omitted tags submit the document to an automated pool that i created that will complete the rest.

Furthermore, my position on the whole issue won't allow me to say anything, my suggestions won't get anywhere.

My holy grail: If i could magically id the start and end of an expression i could place the tags there with code and my problem would be over.
"z ~ \binomial(1,2)" case.

you see almost zero ~ in normal text. I would at least trap all those and then do some sort of analysis on what is around the ~ character to see if you can identify it.

I mean you have to start somewhere, and what I have seen so far is
token ~ word(tokens) format.... regx maybe? And now that you found this bit, work backwards and forwards some distance (maybe between periods in the document that are not part of a number?) to look for stray bits of the same phrase, like your X and Y...

maybe "let" should be a trigger word, etc. "such that". any of the weirdo math symbols from proof-lingo. "therefore". "iff or if and only if". you know the phrases... and yea, there are a fair number of them.
Last edited on
closed account (SECMoG1T)
Thanks, @Helios, and @Jonnin I'll try to apply all your suggestions see which can work for me.
Topic archived. No new replies allowed.