The GLOSS-xml system enables pure
XML to be authored
in a convenient and easy way using GLOSS. By pure
I mean
that the system provides facilities to author generic XML but
no additional facilities for specific applications. It should be
possible to author any XML for any application using the system.
GLOSS-xml is also the base for most other GLOSS applications.
Using GLOSS-xml, the author writes a plain text file (typically
named filename.xml.gloss
) and the modular vocabularies
xml.mv and xml.modes (to be found in http://gloss.bham.ac.uk/mv/xml)
are used to process the file to the target xml file
filename.xml
. The command to do this is
gloss filename.xml.gloss
or
gloss -mv xml.mv -in filename.xml.gloss -out filename.xml
This page is a brief user's guide to writing text to be converted to xml using this system. As most gloss aplications (e.g. html,mathml,...) are based on this xml application, it is well-worth reading for many kinds of usage of gloss.
(Of course you don't have to use this encoding at all. You can design your own and write your own MV for it. That is the subject of other pages here.)
GLOSS converts a plain text file into tokens which are read and converted into a structured XML document. Obviously the eventual xml structure must be encoded in the original file in some way. XML is very verbose and takes some time to type. The GLOSS-xml system allows text files to be written rapidly and then converted to the required XML, but still indicating the required structure. Much of the design of the system centres round what is quick to type on an ordinary keyboard.
Presenting structure in a traditional programming language style
using begin...end
blocks, or even {...}, tends
to be quite slow to type and can obscure the structure being
presented. The idea of XML structure naturally suggests a syntax
based on indentation, similar to that used with the
Python programming language (http://www.python.org/),
but using braces {...} for finer control where necessary. Thus
the first lesson to learn is to use and read indentation.
Names define XML tags, and end tags are added automatically by GLOSS-xml. One of the goals of GLOSS-xml is that the author should only have to type each word once. Because GLOSS-xml input is comparitively terse, there are always possibilities for errors. Some of the most common errors are guarded against, but reasonable care is needed. XML tools are available or can be written to check the output and it is recommended that you use them. (There is a simple XML validator provided as part of the system.)
XML elements are represented by their names without the opening < and closing >. Normally indentation is used to define which tags are inside which. Thus
apple orange pear kumquat guava plum
is valid GLOSS-xml text producing
<apple> <orange/> <pear><kumquat/><guava/></pear> <plum/> </apple>
(I have added a certain amount of indentation to the XML output in this and other examples to help you read it. In practice the output would not be indented or even split into separate lines.)
Elements on the same line following other elements are always
sub-elements and always have lower precedence
than elements which
begin a line (even after whitespace). Thus
apple orange pear kumquat guava plum peach cherry strawberry
is valid GLOSS-xml text producing
<apple> <orange><pear/><kumquat><guava/></kumquat></orange> <plum><peach/><cherry/></plum> <strawberry/> </apple>
Further control is exercised with the { and } symbols. The rule of thumb is that an XML group (i.e., a tag together with all its decendants) cannot cross { ... }. Thus { enables a group of tags all to be decendants of the previous tag when otherwise indentation rules would break them up; and } forces a group of decendants to be closed. For example,
apple { { orange pear } kumquat { guava plum } peach cherry strawberry }
yields
<apple> <orange><pear/></orange> <kumquat><guava/><plum/></kumquat> <peach><cherry/></peach> <strawberry/> </apple>
as you should be able to check shortly when you are using the system.
Although the XML in these cases isn't too awful to type, there is some real saving in typing-time in using gloss in even these simple examples. When I type XML I spent a lot of my time with a finger hovering over the shift key for the < and > letters. I also occasionally make mistakes in typing the wrong close-tag, and these errors have to be corrected manually. With gloss, neither of these problems occur.
We now add text to our discussion. Text is input between square brackets [ ... ]. Square brackets are chosen because they are generally easy to type without resorting to the shift key and because they appear rather infrequently in normal text. In text-mode indentation is ignored. (In fact, carriage return and white space is significant and is added to the output in the way you might expect.) Thus
apple [Here is some text involving oranges, pears, kumquats, guavas, plums, peaches, cherries and a strawberry.]
gives
<apple>Here is some text involving oranges, pears, kumquats, guavas, plums, peaches, cherries and a strawberry.</apple>
Obviously, inside text, ] has a special meaning as ending the text. We will
see in a moment that [ has a special meaning too. It is useful
to also forbid { and } within text since these characters often
get into the text because a closing ] was forgotten,
so a bare { or } usually indicates a missing ] and this rule adds
a useful check that the input is OK. Therefore the
characters [, ], {, } cannot be entered directly
into text but have to be escaped
by being preceded with a \;
the same applies to the \ character too. For example,
apple [Here is some text involving \\oranges, \[pears\], and \{kumquats\}.]
gives
<apple>Here is some text involving \oranges, [pears], and {kumquats}.</apple>
The indentation
of the text is the indentation of the first [
character. Thus
apple pear orange [lemon] [peach] [cherry]
gives
<apple><pear><orange>lemon</orange>peach</pear>cherry</apple>
The space before the [ putting the system into text mode is optional. So orange[lemon] would have been perfectly OK in the above example. You may prefer to have this space, or prefer to omit it: it's up to you.
You may know that some characters, such as & < > " and possibly ' are disallowed in XML in some cass unless they are "escaped" as special combinations involving &, such as &. Gloss does this escaping for you automatically so you don't have to worry about it. Thus
p [You can use & < > " and ' without having to worry.]
gives
<p>You can use & < > " and ' without having to worry.</p>
The ' character, while it could have been escaped to ' is not escaped, because many browsers that are not fully XML-compliant (such as MS-IE) do not display ' correctly (sigh). Gloss always avoids escaping ' and never uses an unescaped ' in text or attributes in a non-valid way.
Attributes are normally entered using a special
attribute tag
@name. They may have textual
content, or may simply be present (which according
to XML rules means their content in XML should be the
same as their name). Otherwise, attributes behave just
like element names. They must appear before non-attribute
content of an element, but GLOSS-xml doesn't mind about
a repeated attribute. The first one takes precidence.
Thus,
apple @colour[green] cherry @location[basket] @colour[red] @taste[nice] @plentiful @colour[cherry-colour] [Life is a basket of cherries.] pear @taste[good] @shape[pear-shaped] [makes perry when fermented] orange @colour[orange] lemon @taste[sour] peach @rotten warning [life can go pear-shaped]
gives
<apple colour="green"> <cherry location="basket" colour="red" taste="nice" plentiful="plentiful"> Life is a basket of cherries. </cherry> <pear taste="good" shape="pear-shaped"> makes perry when fermented <orange colour="orange"/> <lemon taste="sour"/> <peach rotten="rotten"/> <warning>life can go pear-shaped</warning> </pear> </apple>
The rule that tokens of the type @name correspond
to the notion attribute
is part of GLOSS-xml and
the xml.mv modular vocabulary, but not part of GLOSS itself.
(But the name of this kind of token GLOSS uses is somewhat
suggestively attr).
It would be easy to write MVs that make such tags
XML elements, or which allow custom attributes to be defined
without the @. Thus the same input text can be processed
in different ways by different MVs, and this is a potential
strength of GLOSS: that the same text files can be easily
processed in different ways. However, further discussion
of this is out of the scope of this page.
The simplest way to try out these examples on this page is to run the command
gloss -silent -mv http://gloss.bham.ac.uk/lib/xml.mv
and type your input directly. When finished press the key(s) for end-of-file on the operating system you are using. (In unix it is control-D at the beginning of a line. In MS-Windows it is cotrol-Z followed by enter.) Alteratively prepare a file called testfile.xml.gloss in a text editor ("notepad" would be good enough for simple tests) with your input in it and run
gloss testfile.xml.gloss
and inspect the output in testfile.xml. See exceptions.html for a list of error messages produced by the GLOSS system, and a brief explanation of each.
This page is copyright. Web page design and creation by GLOSS.