Quick Hack: Converting MathML to LaTeX

Furkan Kalkan - Jul 2 '19 - - Dev Community

Recently, I need to convert some MathML codes in article metadata from SCOAP3 to LaTex format. Most of institutional repositories escapes XML entities, so MathML doesn't render correctly. I tried the Wiris' API but it's very slow and give errors in most of long formulas.
Finally, I found Yaroshevich's XSL Schema that works without problem.

Example Python code:

import lxml.etree as ET

def to_latex(text):

    """ Remove TeX codes in text"""
    text = re.sub(r"(\$\$.*?\$\$)", " ", text) 

    """ Find MathML codes and replace it with its LaTeX representations."""
    mml_codes = re.findall(r"(<math.*?<\/math>)", text)
    for mml_code in mml_codes:
        mml_ns = mml_code.replace('<math>', '<math xmlns="http://www.w3.org/1998/Math/MathML">') #Required.
        mml_dom = ET.fromstring(mml_ns)
        xslt = ET.parse("mmltex/mmltex.xsl")
        transform = ET.XSLT(xslt)
        mmldom = transform(mml_dom)
        latex_code = str(mml_dom)
        text = text.replace(mml_code, latex_code)
    return text
Enter fullscreen mode Exit fullscreen mode
. . . . .
Terabox Video Player