1 Linguistic Classes {#estling}
2 =======================
6 # Introduction {#eslingintro}
8 EST offers a comprehensive integrated system
for handling and storing linguistic information of all types. This system is based on the *Heterogeneous Relation Graph formalism*. There is a basic hierarchy of 5 main classes in
this part of the library:
10 - **Val**: The
EST_Val class holds single atomic entities, such as numbers or strings. A
EST_Val can be thought of as a single variable which can hold a value of a variety of types. It can hold one of 3 built in types (`int`, `float`,
EST_String), plus the ability to have an "other" type which is user-defined. In that way, a *val* can represent a list, waveform, track or other any other type.
14 - **Items**: The
EST_Item class represents basic linguistic entities such as words, phonemes, syllables, phrases etc. It is basically a wrap-around for
EST_Features
16 - **Relations**: A relation is used to represent the structural relationship between groups of items. A single relation would hold all the phones
for an utterance or all the words
for an utterance. Relations can take a number of different structural types: it is common to use a list
for storing words and phones, a tree
for syntactic and prosodic structure and a multi-linear structure
for non-hierarchical linguistic information such as an autosegmental tone diagram.
18 - **Utterance**: A Utterance structure contains all the relations
for an utterance.
20 ## Relations {#estlingrelations}
22 Relations store ordered collections of linguistic items. The most common basic types are the list and the tree. \ref figure-6-1
"Figure 6-1" shows two examples of relations. \ref figure-6-1
"Figure 6-1a" shows a word relation of the sentence
"this is an example". This relation is a simple linear list. \ref figure-6-1
"Figure 6-1b" shows a syntax tree of the same sentence. Both figures are comprised of three components: items, nodes and arcs. The items contain the linguistic information (in
this case the name of the word or syntactic category), whereas the nodes and arcs define the relationship between the items.
25 \image html relations.svg
"Figure 6-1: Relations"
26 \image latex relations.eps
"Relations" width=7cm
29 In the list
case, the arcs occur in complimentary pairs named next and previous, allowing forward and backwards traversal. In the tree
case the arcs occur in complimentary pairs named `
parent` and `
daughter1`, `
daughter2` etc. All arcs in a HRG occur in named complimentary pairs: by convention `next` and `previous` are used
for lists, and `
parent`, `daughter` etc
for trees.
31 Formally, relations can be seen as either directed graphs, where arcs occur in complimentary named pairs, or as acyclic graphs, in which an arc has two complimentary names. A node may have any number of arcs. The names of the arcs in the outgoing direction must all be unique, whereas the names of the arcs in the incoming direction
do not have to be unique. (For example in the tree, a node can have two incoming nodes named `
parent`
for traversal from its daughters, but the names of the nodes to those daughters must be called something like `left daughter` and `right daughter`.)
33 In these simple examples, the use of nodes and items is clearly redundant as all nodes and items have a one-to-one relationship. However,
this is often not the
case in that although a node can be linked to only one item, an item can be linked to many nodes, so
long as each is in a different relation. \ref estling-figure-6-2
"Figure 6-2" shows a structure representing the combination of \ref figure-6-1
"Figure 6-1a" and \ref figure-6-1
"Figure 6-1b". The syntax and word relations are the same, but the nodes of the word list and the terminal nodes of the syntax tree point to the same items. This a the key feature of the HRG system. Of course it would be possible to keep the information separate as in \ref figure-6-1
"figure 6-1", but
this is redundant because these items really
do represent the same word. As will be seen below items can be very complex structures and it would be wasteful to duplicate the information. More importantly, information in items is often changed in the synthesis process and there is a chance of the syntax and word items becoming inconsistent. A list of named backwards links to nodes is kept in each item, so that given an item, any node in any relation linked to that item can be easily found.
35 \anchor estling-figure-6-2
36 \image html relations_2.svg
"Figure 6-2: Multiple relation types"
37 \image latex relations_2.eps
"Multiple relation types" width=7cm
39 In a simple examples such as
this, it is possible to imagine a situation where the word relation is defined to be a traversal of the terminal nodes in the syntax relation. However, in the general
case, more complex intersecting relations are often required and such simple approaches will fail.
41 ## Items and Features {#estlingitemsandfeatures}
43 Items are
attribute value lists (AVL) which contain linguistic information. All "atomic" linguistic entities such as words, syllables and phones are represented by items.
45 In the simple sense, attributes are named linguistic properties such as part-of-speech, place of articulation, duration etc, and values are strings, enumerated sets (that is, a value from a fixed list of possibilities), floating point numbers and integers. For instance, this is a typical word item:
50 \mbox{POS} & \mbox{\emph{Noun}} \\
51 \mbox{NAME} & \mbox{\emph{example}} \\
52 \mbox{FOCUS} & \mbox{+} \\
56 and
this is a typical phoneme item:
61 \mbox{NAME} & \mbox{\emph{sh}} \\
62 \mbox{PLACE OF ARTICULATION} & \mbox{\emph{palatal}} \\
63 \mbox{MANNER} & \mbox{\emph{fricative}} \\
64 \mbox{VOICE} & \mbox{--} \\
65 \mbox{DURATION} & 0.234 \\
70 Items are not directly typed, in that there is nothing in the AVL itself to say
if it is a phoneme, syllable or word. Rather, typing comes from the fact that it is linked to a node in a relation. Items in the word relation are words, items in the syntax relation are syntactic constituents and items in both have both types.
72 Values need not be atomic types. Often it is unattractive to store all the information in an item as a single flat
attribute-value list. For example, in a traditional binary phonological feature system such as SPE \cite spe descriptions of segments are typically given as:
77 \mbox{NAME} & \mbox{\emph{d}} \\
79 \mbox{ANTERIOR} & + \\
81 \mbox{CONTINUANT} & \mbox{--} \\
82 \mbox{SONORANT} & \mbox{--} \\
88 However, it can be useful to provide a named hierarchy on the features:
93 \mbox{NAME} & \mbox{\emph{d}} \\
94 \mbox{PLACE OF ARTICULATION \boxed{1} } &
95 \left [ \begin{array}{ll}
96 \mbox{CORONAL} & \mbox{\emph{+}} \\
97 \mbox{ANTERIOR} & \mbox{\emph{+}} \\
98 \end{array} \right ] \\
99 \mbox{VOICE} & \mbox{\emph{+}} \\
100 \mbox{CONTINUANT} & \mbox{\emph{--}} \\
101 \mbox{SONORANT} & \mbox{\emph{--}} \\
106 This is useful in specifying the consequences of traditional phonological rules such as syllable-
final nasal assimilation. In some situations, the place of articulation of a nasal is assimilated to the place of articulation of the following stop (e.g.
"ten" +
"bags" ->
"tembags"). If the feature bundle mechanism is used, an additional mechanism (indicated by the reference \f$\boxed{1}\f$) allows the value of an item to be a AVL in a different item. In the above example the fact that the place of articulation is the same
for the nasal and stop is shown by
using a reference
for the place of articulation of the nasal, which refers to the place of articulation of the stop. The use of the feature bundle to represent place of articulation allows
this operation to be done with a single reference rather than by a two separate references to CORONAL and ANTERIOR.
111 \mbox{NAME} & \mbox{\emph{d}} \\
112 \mbox{PLACE OF ARTICULATION } & \mbox{\boxed{1}}\\
113 \mbox{VOICE} & \mbox{\emph{+}} \\
114 \mbox{CONTINUANT} & \mbox{\emph{--}} \\
115 \mbox{SONORANT} & \mbox{\emph{--}} \\
119 Often feature bundles are simply used as a tidying mechanism. For instance, it is usually not necessary to access phonological feature and timing information at the same time, so a neater mechanism
for storing
this on a phone would be:
125 \mbox{NAME} & \mbox{\emph{d}} \\
126 \mbox{PHONOLOGICAL FEATURES} & \left [
128 \mbox{PLACE OF ARTICULATION } & \left [
130 \mbox{CORONAL} & \mbox{\emph{+}} \\
131 \mbox{ANTERIOR} & \mbox{\emph{+}} \\
132 \end{array} \right ] \\
133 \mbox{VOICE} & \mbox{\emph{+}} \\
134 \mbox{CONTINUANT} & \mbox{\emph{--}} \\
135 \mbox{SONORANT} & \mbox{\emph{--}} \\
136 \end{array} \right ] \\
137 \mbox{TIMING} & \left [
139 \mbox{START} & 0.412 \\
140 \mbox{END} & 0.489 \\
141 \mbox{DURATION} & 0.077 \\
142 \end{array} \right ] \\
146 The last important feature mechanism is
function values, which is when a
function rather than an atomic value is the value of an
attribute. Discussion of
function values is given in the section on timing as they are particularly relevant to that issue.
148 # Classes {#estlingclassessection}
152 # Functions {#estlingfunctions}
154 This is a sub-library of functions
for creating, traversing and accessing relations which are trees:
156 ## Tree traversal functions
158 \ref treetraversalfunctions
160 ## Tree building functions
162 \ref treebuildfunctions
165 \subpage ling-example
EST_Item * daughter2(const EST_Item *n)
return second daughter of n
EST_Item * parent(const EST_Item *n)
return parent of n
EST_Item * daughter1(const EST_Item *n)
return first daughter of n