MathDOM logo

MathDOM - Content MathML in Python

Support This Project

MathDOM is a set of Python 2.4 modules (using PyXML or lxml, and pyparsing) that import mathematical terms as a Content MathML DOM. It currently parses MathML and literal infix terms into a DOM or lxml document and writes out MathML and literal infix/prefix/postfix/Python terms. The DOM elements are enhanced by domain specific methods that make using the DOM a little easier. Input parsers and output converters are easily extensible.
Newer versions simplify the portability of code between the PyXML and lxml versions. They also extend the latter with an XSLT-based output filter for Presentational MathML and RelaxNG-based document validation. PyXML does not support any of these.

What is it good for?

You can call it the shortest path between different term representations and a Content MathML DOM. Ever noticed the annoying differences between terms in different programming languages? Build your application around MathDOM and stop caring about the term representation that users prefer or that your machine can execute. If you need a different representation, add a converter, but don't change the model of your application. Literal terms are connected through an intermediate AST step that makes writing converters for C/Fortran/SQL/yourfavourite easier.

A quick example

>>> from mathml.lmathdom import MathDOM # use lxml implementation
>>> doc = MathDOM.fromString("+2^x+4*-5i/6","infix_term") # parse infix term
>>> for apply_tag in doc.xpath(u'//math:apply[math:plus]'): # replace '+' with '-'
...    apply_tag.set_operator(u'minus')
>>> [ n.value() for n in doc.xpath(u'//math:cn') ] # find numbers
[2, 4, Complex(0-5j), 6]
>>> from mathml.utils import pyterm # register Python term builder
>>> doc.serialize("python") # serialize to Python term
u'2 ** x - 4 * (-5j) / 6'

Simple, isn't it ?

Why PyXML and lxml?

While basic XML support is part of the Python distribution, PyXML extends it to a largely DOM Level-2 compliant API written in (almost) pure Python, that is available in CPython and Jython. This implementation does, however, have the disadvantage of being rather slow and very, very un-pythonic. Apart from DOM, SAX and XPath, it supports none of the other important XML specifications.
Entering lxml. Based on libxml2, lxml supports basically all major XML technologies (like XPath, XSLT, XInclude, etc.) and combines them with the Pythonic ElementTree-API. Through XSLT and RelaxNG, it can support Presentational MathML export and Content MathML validation.
So, why both? The lxml implementation is newer and takes the lead in the further evolution of MathDOM. Still, the PyXML implementation is written in pure Python and can therefore be used where lxml/libxml2 is not available. The MathDOM package is easily split up into independent packages depending on lxml or PyXML only.


MathML is an XML language for representing mathematics. Content MathML is a part of that specification that focuses on the semantics rather than the representation of mathematical expressions. MathML has received a lot of support in mathematical software as well as web browsers and represents a comfortable layer for the semantic exchange of mathematics - see Note that Content MathML support in MathDOM is not complete, as the primary focus is on term representation.

© Stefan Behnel, 2005
Valid XHTML 1.0 Transitional SourceForge LogoSupport This Project