Writing Modular Vocabulary files

This document is the first in a group of pages comprising an introduction to writing Modular Vocabulary (MV) files. Basic knowledge of XML is assumed thoughout.

1 Introduction

Modular vocabulary files are the main stylesheets, or programs, or configuration files, that controls a glossing operation. Modular vocabulary files are xml documents matching the DTD at http://gloss.bham.ac.uk/dtd/mv/modularvocab.dtd and this DTD is documented in technical fashion here. There may be several additional tools available to help you write such files, including GLOSS itself in its mv/modes incarnation. The present document is concerned with creating MV files from scratch and the use of the various MV elements defined in the DTD.

2 The glossing process

The GLOSS system works in essentially two stages. First, a text file is scanned and an intermediate XML infoset is created in memory. Then this XML is transformed as necessary to produce the required textual output. The intermediate XML is marked up with tags from the xmlrepresentation section of GLOSS that define the required output structure. (The full structure of an XML file cannot itself be defined fully as its own XML infoset due to designs on XML and the fact that an XML infoset does not uniquely determine its textual representation. See http://www.w3.org/TR/xml-infoset/ for more on these issues).

The first stage described above, the creation of the intermediate XML infoset, is controlled by the MV file, and that is what is discussed here and in pages linked from here.

The usual prefix for the MV tags is "mv:" and the prefix for the xmlrepresentation tags is "xr:", though these may be changed to avoid namespace clashes. We will use these prefixes throughout this document, and you should replace them with whatever is appropriate when different prefixes are being used. The DTD does not make any provision for use of MV tags without any prefix.

3 Basic structure of an MV file

An MV document controls the scanning of text and the creation of the intermediate infoset. Its top-level structure is typically as follows.

<?xml version="1.0"?>

<!DOCTYPE mv:modularvocab 
  SYSTEM "http://gloss.bham.ac.uk/dtd/mv/modularvocab.dtd" [
    ...
]>

<mv:modularvocab xmlns:mv="http://gloss.bham.ac.ukxmlns/modularvocab">

  <mv:mode name="..." accept="..." ... >
     ...
  </mv:mode>

  <mv:mode name="..." accept="..." ... >
     ...
  </mv:mode>

  ...

</mv:modularvocab>

As well as the root's chlidren nodes mv:modes, parameter initialization and prefix definitions may occur as children of the mv:modularvocab root. See the The mv:modularvocab Element, or the Modularvocab DTD for details. I am deliberately keeping things simple here.

4 Modes

The main part of an MV, then, is a set of named modes wrapped in a <mv:modularvocab> element. A DTD is usually read in to validate the mv document.

A mode's job is to continually scan for tokens and then perform certain actions (such as build the XML infoset in memory) depending on what tokens have been received. The type of the tokens that are allowed by a mode is given in the accept="..." attribute, and the token-scanner behaves differently depending on what type of tokens are expected or allowed. One of the possible actions that may take place is that more tokens be scanned in using a different mode, so modes are similar to subroutines in conventional programming languages. The normal way a mode is exited is by executing a <mv:return/> element, though there are other ways a mode may finish, including the slightly different <mv:abort/> command. The MV writer may also choose to limit of the number of immediate child tokens a mode may receive, the way indentation of the source is handled, and what happens if no acceptable token of the required type is available. See links given below and the documentation elsewhere for more information.

5 main

The scanning always starts in a special initial mode called main. This mode must be present in all MV documents. Usually it simply calls another mode.

This page is copyright. Web page design and creation by GLOSS.