12 Content of mv:default and mv:match modes
The elements mv:default and mv:match contain a list of commands or other XML data to be added to the intermediate XML infoset at the current position.
A mode describes the current execution pattern of the glosser, and
at all times the glosser is operating in
some particular mode.
Normally, the execution pattern is to scan for one of the acceptable
tokens and process it according to a table or actions defined in the
content of the <mv:mode> element. When a token has been
processed, the input is scanned for the next token. This can carry on
indefinitely but is usually concluded when a <mv:return/>
command is executed.
The basic structure of a mode is as follows.
<mv:mode name="..." accept="..." ...> <mv:match type="..." value="..."> <!-- actions --> </mv:match> <mv:match ...> ... </mv:match> ... </mv:mode>
Here, the list of <mv:match> elements specifies the action to be taken for each token type or value. This is, once again a simplified picture: as well as <mv:match> elements, a mode may contain <mv:include> elements and a <mv:default> element.
The accept
attribute lists a number of allowed token types,
in a definite order or priority, for the scanner to look for.
The allowed token types are documented fully
in the the tokens DTD documentation, but
include:
and so on. More information on accept="..." can be found below.
When a token has been obtained, the mode looks for the first match available in its list that matches to the token and passes control to the contents of this mv:match element. More information on matches can be found below.
As mentioned, the accept attribute to <mv:mode> defines a list of token types and punctuation that should be scanned for when in this mode. The value of the accept attribute is a list of token types or punctuation combinations, separated by |. The restrictions are: (1) the first character of the value may not be $ and (2) the punctuation sequence cannot contain a |. (Values of accept starting with $ are reserved for possible future expansion.)
The available token types are:
Blank space and comments will be skipped unless uc is given as one of the types in accept="...". Thus the accept attribute is important to drive the token scanner correctly. The presence of the label type also alters the way tokenization takes place and is more specialist (and more experimental) but not described here.
For example, accept="elt|attr|[[|]]|ns" indicates that the tokenizer should accept elt tokens, attr tokens, punc tokens with value [[, punc tokens with value ]], or non-space unicode characters, and look for such tokens in this order. Since uc is not specified, whitespace and comments in the input should be skipped.
These are documented more fully in the
Token Types DTD.
Note that ns
is not strictly a special token type,
but a version of uc that allows the tokenizer to skip space
characters. Thus, in the above example, if a ns is matched
the token is actually returned as a token of type
uc. There isn't a distinct ns token type. Also, the
punc type is not listed above as it is not itself specified in
accept
but rather as a user-defined punctuation sequence.
If this sequence is found is is returned by the tokenizer as a punc
token. The empty punctuation sequence (usually given at the end of
a list of token types as in accept="...|") is allowed and turns out to
be rather useful, as we shall see. Thus accept="" means to skip spaces
and comments and only accept the empty punctuation sequence.
If there is no available token of a type listed in the mode's
accept="..." the tokenizer returns the null
token to the
mode and control is passed back (as if via <mv:return/>) to
the parent mode.
The use-indentation attribute has allowed values true and false.
Any token has a so-called depth
which is normally its indentation
or the column number that its first character is in, if the token is the
first token to be read on this line, or 10000000 plus the column number
otherwise. (Presence of the experimental label token-type in tha
accept
attribute changes these rules. This is not discussed here.)
A mode has a parent token
and looks for child tokens
. If
use-indentation="true" then the depth of all child tokens must be greater
than the depth of the parent token. If no such token can be found then
the mode is automatically closed, as if <mv:return/> had been executed.
To ignore depth of tokens set use-indentation="false". The default
is true.
A mode counts the number of its children modes as it is executed. Normally any number of children are allowed. But if children="..." is specified then this sets the maximum number of children to be allowed. If a token is read that would exceed this number, then the new token is put back onto the input and control is passed back to the parent mode as if an <abort/> command had been executed.
See also the discussion of the children-total="...", children-adjust="...", and children-remaining="..." attributes of the mv:match modes below.
A mode typically contains a list of mv:match elements (and may also contain <mv:default> and <mv:include> elements). Matches are tried in the order they are presented in the mode.
There are two groups of attributes to <mv:match>. The first group define when a token that has been read matches the specification and that these particular actions should take place. The second group can be used to adjust the number of child tokens permitted in this mode. These two groups are discussed below.
If a match does take place and there are not too many child tokens, then control is passed to the contents of the <mv:match> element, and the commands there are executed, usually resulting in data being added to the intermediate XML infoset. The commands in the content of the match node can also specify that further grandchildren tokens should be processed in another mode, in which case control will be passed back to this mode after a <mv:return/> command, <mv:abort/> command, when the number of child tokens is exceeded, or if indentation rules so dictate.
It is an error (giving GlossExeception Illegal token
)
if a token type listed in the mode's accept="..."
can be found, but there is no corresponding <mv:match> for that
token.
mv:match takes a manditory attribute type="..." specifying the type of the matching token. The value of this attribute is an item from the above list except that ns is not a valid type (use uc instead), and punctuation sequences should be specified with type="punc".
To further specify the match, any of the following attributes may also be added. Every such attribute from the following list that is given should agree if a match is to be made. If none of these is given then a match is made on the basis of the token type only.
.... The data value is usually the token data as entered, sometimes less leading or trailing delimiters such as ' or ".
.... The prefix value is usually the part of the fullname before the leading :, or "" if there is no such prefix.
.... The localname value is usually the part of the fullname after the leading :, or the whole of fullname if there is no such prefix.
.... The fullname value is usually the name part of the data, name of element or attribute, or hexadecimal code point for a unicode character, character entity or char.
.... The value is usually the value part of the data, unicode character for a character reference, character entity or char, or the data part of a string, int, fp, hex, etc.
To specify a set of alternatives or ranges of values an extended syntax is available for the above attributes. To use it give $ as the first character in data="$...", prefix="$...", localname="$...", fullname="$...", value="$...". Then separate items with ||, as in localname="$apple||orange||banana". In the case of data and value and the types cref, char, uc, fp, int, hex, items separated by || can also be ranges, with start and end separated by --, as in value="$A--Z||a--z||0--9".
To use the characters |, -, \ in the matched data, escape them with \ as in data="$'||£||$||\-||\|".
These extended matches are experimental. TO DO: some things don't work, e.g., data="$'||£||\|||\-||\|" splits in the wrong place; also more documentation on extended matches is needed.
The three children-YYY="..." attributes modify the allowed number of children tokens after the token has been read but before the number of children has been checked and before any of the match commands are executed.
These attributes are as follows.
In all cases, the value of the attribute is a base 10 integer string.
The mv:default element is like mv:match except that a match is guaranteed. mv:default does not take type, value, etc., attributes, but does take the children-total="XXX", children-remaining="XXX", and children-adjust="XXX" attributes with the same meaning.
As well as <mv:match> elements, the mode may contain
<mv:include mode="modename"/> elements, meaning act as if
all the content of the mode with name "modename" are included here
at this point. The attribute mode="modename"
is fully interpolated, so modename can be evaluated from
parameter values. However, for compatibility with future
versions of GLOSS it is recommended that this feature be only used
for parameters whose value is known at set-up
time and not for
parameters that change value during execution. (This is for efficiency
considerations as in future modes may be hashed
at set-up stage.)
The mv:include element is empty, and has one mandatory attribute mode="modename" for the mode to be included. Note that the included mode's accept and other attributes are all ignored, so the include mechanism allows a set of modes to be used in two or more different contexts and ways.
A mode ends when:
un-got.
nulltoken is read since no token of one of the correct types given in accept="..." is available.
parent token's depth and the attribute
use-indentation="true"is present or implied in the <mv:mode>. In this case the last token read is
un-got. The default for use-indentation is true.
un-got.
The depth of a token is usually the column number of the line in which it starts (column numbers are counted from 0) provided it is the first token in that line, the column number plus 10000000 if it is not the first token on that line. (This works in all cases except where labels are present. The depth of a label is the depth of the first non-label token following it, and for the sake of this computation labels are considered as whitespace. See the Tokenizer module for more details.)
The elements mv:default and mv:match contain a list of commands or other XML data to be added to the intermediate XML infoset at the current position.
This page is copyright. Web page design and creation by GLOSS.