GLOSS: a manifesto

Glosing is a full glorious thing certain,
For letter slayeth, as we clerkes sayn.

from The Summonner's Tale,
By Geoffrey Chaucer

A discerning friend of mine, said Don Quixote, was of opinion that no one ought to waste labour in glossing verses; and the reason he gave was that the gloss can never come up to the text, and that often or most frequently it wanders away from the meaning and purpose aimed at in the glossed lines; and besides, that the laws of the gloss were too strict, as they did not allow interrogations, nor said he, nor I say, nor turning verbs into nouns, or altering the construction, not to speak of other restrictions and limitations that fetter gloss-writers, as you no doubt know.

from Don Quixote,
by Miguel de Cervantes,
Translated by John Ormsby

This isn't the place to extol the virues of writing good marked-up text in XML rather that hybrid text formats. The main obstacle to doing so is that XML is verbose, difficult and slow to author directly in a text editor, and can be difficult to read and maintain when it is written. Furthermore, many existing documents, despite perhaps having a certain amount of structure or mark-up, require considerable amounts of work to convert to well-marked up XML.

Enter GLOSS (for Gloss Linguistic Or Semantic Structure), a program written to convert plain text files to XML with mark-up added automatically. This is a general purpose tool to extract structural information from a text file and write well-formed XML as output. The glossing that is performed follows rules expressed in XML in a form somewhat similar to an XSL stylesheet.

GLOSS may be used as an input device for new documents, to remove much of the tedium of entering XML tags in an ordinary text editor, or as a tool to convert legacy documents to XML.

The original intended application is for authoring mathematics and other technical material especially material involving mathematics. This area has been well-served for some time with TeX and LaTeX, though these systems are only specifically designed for output on paper sheets, not web pages or computer algebra systems or any other output media.

Interestingly, a typical LaTeX user (someone who writes a good deal of technical or mathematical text) is generally not at all worried about the lack of any what-you-see-is-what-you-get facility (though some systems do provide this to a certain extent) and has learnt the commands sufficiently well that typing them into a text editor is much faster than a mouse- or menu-driven equation-editor environment. Moreover, our typical user much appreciates the additional control available from the typesetting commands provided and the superior quality of output. Unfortunately, markup in LaTeX is somewhat ad-hoc, and not always very rich (with particular problems in the area of mathematics itself which in general converts rather poorly to MathML by conventional tools) so direct conversions from LaTeX to XML seem to be difficult and relying on a certain amount of guesswork on behalf of the converting software. It would be better to write more structured source, and convert to LaTeX where necessary.

My experiments have convinced me that none of the freely available XML-editors or conversion tools can provide the productivity already attainable with a text-editor and LaTeX. That on its own would prevent XML being used as a medium of choice by most authors of technical documents. On the other hand, LaTeX input isn't always perfect and some XML tools (XSLT in particular) provide a superior means of transforming documents or enhancing mark-up. So an XML-based system for authoring technical documents using text files as raw input with similar or better productivity to LaTeX (for trained or self-trained users) should be feasible. The output would also be more flexible, with richer mark up (in the areas of mathematics for example) and more possibilities for exporting to other software.

The solution I propose with this software is that authors write structured plain text in a text editor and convert it to the required XML, HTML or LaTeX using GLOSS and XSL transformations. GLOSS itself is intended to be highly configurable and the input format can be designed with care to be easy to write. Its output is not specific to mathematics or to web pages, but can be used to create any XML document. And repetitive tasks can be automated in the glossing process to minimize typing.

However, the text-to-xml transformations possible with GLOSS go further, and in principle GLOSS can be configured to allow many other text-to-XML conversions, including some extant text formats. This can be used for example as a documentation system. It is possible to use GLOSS, as I have here, for documentation and as a sort of literate programming tool, for example stripping comments and quoting code to document the main modularvocab DTD.

This page is copyright. Web page design and creation by GLOSS.