The unicode problem

The unicode problem

1 What to do when converting XML to [La]TeX?

Input XML documents are, in principle, `richer' than TeX or LaTeX, in that they allow arbitrary unicode characters as well as XML's tree structure. Provided the intended target document does not include glyphs outside TeX's capabilities the document may be encoded in XML in many ways. Which is the `right' one, or best one?

The other problem is that (La)TeX has a strange legacy encoding of characters. (Even with different characters at the same position in different fonts.) In any case, not all the characters you want ever seem to be available. This seems to be a major problem without any solution, despite many people taking much time over it.

There is also a minor issue with the special characters of TeX which must be encoded correctly. These are include $ _ { } # & < > % ~ ' " ^ \ [ ]. This issue seems to be resolved.

I am not going to spend much more time on this. The files here should convert a minimal amount of XHTML+MathML into LaTeX. If it doesn't work, try something else.

2 An incomplete technical discussion

The choice to be made here is technical and somewhat difficult. It is necessary to prioritise the options. Are your priorities...

The options, at present seem to be...

I couldn't get TeXML working (wrong version of python/library) but this route seems flawed as it requires knowing which mode TeX is in (maths or text?) at compile time. Thus using real unicode requires more modern TeX software. ConTeXt looks good, but... We need to stick with safe 7-bit characters for now.

Ughhh... I give up!

This page is copyright. Web page design and creation by GLOSS.