The GLOSS-xml system

1 Introduction

The GLOSS-xml system enables pure XML to be authored in a convenient and easy way using GLOSS. By pure I mean that the system provides facilities to author generic XML but no additional facilities for specific applications. It should be possible to author any XML for any application using the system. GLOSS-xml is also the base for most other GLOSS applications.

Using GLOSS-xml, the author writes a plain text file (typically named filename.xml.gloss) and the modular vocabularies xml.mv and xml.modes (to be found in http://gloss.bham.ac.uk/mv/xml) are used to process the file to the target xml file filename.xml. The command to do this is

gloss filename.xml.gloss

or

gloss -mv xml.mv -in filename.xml.gloss -out filename.xml

This page is a brief user's guide to writing text to be converted to xml using this system. As most gloss aplications (e.g. html,mathml,...) are based on this xml application, it is well-worth reading for many kinds of usage of gloss.

(Of course you don't have to use this encoding at all. You can design your own and write your own MV for it. That is the subject of other pages here.)

2 Overview

GLOSS converts a plain text file into tokens which are read and converted into a structured XML document. Obviously the eventual xml structure must be encoded in the original file in some way. XML is very verbose and takes some time to type. The GLOSS-xml system allows text files to be written rapidly and then converted to the required XML, but still indicating the required structure. Much of the design of the system centres round what is quick to type on an ordinary keyboard.

Presenting structure in a traditional programming language style using begin...end blocks, or even {...}, tends to be quite slow to type and can obscure the structure being presented. The idea of XML structure naturally suggests a syntax based on indentation, similar to that used with the Python programming language (http://www.python.org/), but using braces {...} for finer control where necessary. Thus the first lesson to learn is to use and read indentation.

Names define XML tags, and end tags are added automatically by GLOSS-xml. One of the goals of GLOSS-xml is that the author should only have to type each word once. Because GLOSS-xml input is comparitively terse, there are always possibilities for errors. Some of the most common errors are guarded against, but reasonable care is needed. XML tools are available or can be written to check the output and it is recommended that you use them. (There is a simple XML validator provided as part of the system.)

3 XML elements

XML elements are represented by their names without the opening < and closing >. Normally indentation is used to define which tags are inside which. Thus

apple
  orange
  pear
    kumquat
    guava
  plum

is valid GLOSS-xml text producing

<apple>
  <orange/>
  <pear><kumquat/><guava/></pear>
  <plum/>
</apple>

(I have added a certain amount of indentation to the XML output in this and other examples to help you read it. In practice the output would not be indented or even split into separate lines.)

Elements on the same line following other elements are always sub-elements and always have lower precedence than elements which begin a line (even after whitespace). Thus

apple
  orange pear
    kumquat guava
  plum peach
          cherry
  strawberry

is valid GLOSS-xml text producing

<apple>
  <orange><pear/><kumquat><guava/></kumquat></orange>
  <plum><peach/><cherry/></plum>
  <strawberry/>
</apple>

Further control is exercised with the { and } symbols. The rule of thumb is that an XML group (i.e., a tag together with all its decendants) cannot cross { ... }. Thus { enables a group of tags all to be decendants of the previous tag when otherwise indentation rules would break them up; and } forces a group of decendants to be closed. For example,

apple { { orange pear } 
  kumquat {
  guava 
  plum 
  }
  peach 
    cherry 
strawberry
}

yields

<apple>
  <orange><pear/></orange>
  <kumquat><guava/><plum/></kumquat>
  <peach><cherry/></peach>
  <strawberry/>
</apple>

as you should be able to check shortly when you are using the system.

Although the XML in these cases isn't too awful to type, there is some real saving in typing-time in using gloss in even these simple examples. When I type XML I spent a lot of my time with a finger hovering over the shift key for the < and > letters. I also occasionally make mistakes in typing the wrong close-tag, and these errors have to be corrected manually. With gloss, neither of these problems occur.

4 Basic text

We now add text to our discussion. Text is input between square brackets [ ... ]. Square brackets are chosen because they are generally easy to type without resorting to the shift key and because they appear rather infrequently in normal text. In text-mode indentation is ignored. (In fact, carriage return and white space is significant and is added to the output in the way you might expect.) Thus

apple [Here is some text involving oranges, pears, 
kumquats, guavas, plums, peaches, cherries and a strawberry.]

gives

<apple>Here is some text involving oranges, pears,
kumquats, guavas, plums, peaches, cherries and a strawberry.</apple>

Obviously, inside text, ] has a special meaning as ending the text. We will see in a moment that [ has a special meaning too. It is useful to also forbid { and } within text since these characters often get into the text because a closing ] was forgotten, so a bare { or } usually indicates a missing ] and this rule adds a useful check that the input is OK. Therefore the characters [, ], {, } cannot be entered directly into text but have to be escaped by being preceded with a \; the same applies to the \ character too. For example,

apple [Here is some text involving \\oranges, \[pears\], 
and \{kumquats\}.]

gives

<apple>Here is some text involving \oranges, [pears],
and {kumquats}.</apple>

The indentation of the text is the indentation of the first [ character. Thus

apple
  pear
    orange [lemon]
    [peach]
  [cherry]

gives

<apple><pear><orange>lemon</orange>peach</pear>cherry</apple>

The space before the [ putting the system into text mode is optional. So orange[lemon] would have been perfectly OK in the above example. You may prefer to have this space, or prefer to omit it: it's up to you.

You may know that some characters, such as & < > " and possibly ' are disallowed in XML in some cass unless they are "escaped" as special combinations involving &, such as &amp;. Gloss does this escaping for you automatically so you don't have to worry about it. Thus

p [You can use & < > " and ' without having to worry.]

gives

<p>You can use &amp; &lt; &gt; &quot; and ' without having to worry.</p>

The ' character, while it could have been escaped to &apos; is not escaped, because many browsers that are not fully XML-compliant (such as MS-IE) do not display &apos; correctly (sigh). Gloss always avoids escaping ' and never uses an unescaped ' in text or attributes in a non-valid way.

5 Attributes

Attributes are normally entered using a special attribute tag @name. They may have textual content, or may simply be present (which according to XML rules means their content in XML should be the same as their name). Otherwise, attributes behave just like element names. They must appear before non-attribute content of an element, but GLOSS-xml doesn't mind about a repeated attribute. The first one takes precidence. Thus,

apple @colour[green]
  cherry 
    @location[basket]
    @colour[red]
    @taste[nice]    
    @plentiful
    @colour[cherry-colour]
    [Life is a basket of cherries.]
  pear @taste[good]
    @shape[pear-shaped]
    [makes perry when fermented]
    orange @colour[orange]
    lemon @taste[sour]
    peach @rotten
    warning [life can go pear-shaped]

gives

<apple colour="green">
  <cherry location="basket" colour="red" taste="nice" plentiful="plentiful">
    Life is a basket of cherries.
  </cherry>
  <pear taste="good" shape="pear-shaped">
    makes perry when fermented
    <orange colour="orange"/>
    <lemon taste="sour"/>
    <peach rotten="rotten"/>
    <warning>life can go pear-shaped</warning>
  </pear>
</apple>

The rule that tokens of the type @name correspond to the notion attribute is part of GLOSS-xml and the xml.mv modular vocabulary, but not part of GLOSS itself. (But the name of this kind of token GLOSS uses is somewhat suggestively attr). It would be easy to write MVs that make such tags XML elements, or which allow custom attributes to be defined without the @. Thus the same input text can be processed in different ways by different MVs, and this is a potential strength of GLOSS: that the same text files can be easily processed in different ways. However, further discussion of this is out of the scope of this page.

6 Trying it out

The simplest way to try out these examples on this page is to run the command

gloss -silent -mv http://gloss.bham.ac.uk/lib/xml.mv

and type your input directly. When finished press the key(s) for end-of-file on the operating system you are using. (In unix it is control-D at the beginning of a line. In MS-Windows it is cotrol-Z followed by enter.) Alteratively prepare a file called testfile.xml.gloss in a text editor ("notepad" would be good enough for simple tests) with your input in it and run

gloss testfile.xml.gloss

and inspect the output in testfile.xml. See exceptions.html for a list of error messages produced by the GLOSS system, and a brief explanation of each.

This page is copyright. Web page design and creation by GLOSS.