This document compiles information from material
published in the WWW by Hans Uskoreit, Annie Zaenen and Juan Carlos Ruiz Antón.
GFU:
A unification based grammar development system
Unification Grammars
- A very advanced and wide-spread class of linguistic formalisms are the
so-called constraint-based grammar formalisms.
- Also often subsumed under the term unification grammars
- Constraint-baed grammars should not be confused with constraint
grammars
- Unification grammars go beyond many earlier representation languages in
that they have a clean denotational semantics
- Permit the encoding of grammatical knowledge independent from any specific
processing algorithm
- These formalisms are currently used in a large number of systems
- Among the most used, constraint-based grammar models are:
- Functional Unification Grammar (FUG) [Kay84]
- Head-Driven Phrase-Structure Grammar (HPSG) [PS94]
- Lexical Functional Grammar (LFG) [Bre82]
- Categorial Unification Grammar (CUG) [HKM87,Kar89,Usz86]
- Categorial Unification Grammar (CUG) [HKM87,Kar89,Usz86]
- For these or similar grammar models, powerful formalisms have been
designed and implemented that are usually employed for both grammar development
and linguistic processing, e.g, LFG [Bre82],
PATR [SURT83], ALE [Car92a], STUF [BKU88], ALEP [AAB91], GFU [Ruiz91], CLE [Als92] TDL [KS94] TFS [EZ90]
- One essential ingredient of all these formalisms is complex formal
descriptions of grammatical units (words, phrases, sentences) by means of sets
of attribute-value pairs, so called feature terms
- Feature terms can be nested, i.e., values can be atomic symbols or feature terms.
- Feature terms can be underspecified.
- Feature terms may contain equality statements expressed by variables or
coreference markers
- The formalisms share a uniform operation for the merging and checking of
grammatical information, which is commonly referred to as unification
- The formalisms differ in other aspects:
- Some of them are restricted to feature terms with simple unification
(PATR)
- Others employ more powerful data types such as disjunctive terms, functional constraints, or sets
- Most formalisms combine phrase-structure rules or other
mechanisms for building trees with the feature-term component of the language
(LFG, TAG, TDL)
- A few formalisms incorporate the phrase-structure information into the
feature terms (HPSG, TFS)
- Some frameworks use inheritance type systems (HPSG, TFS, TDL, ALE).
Classes of feature terms belong to types.
- The types are partially ordered in a tree or in a (semi) lattice.
- The type hierarchy determines for every type from which other types
attributes and values are inherited,
- which attributes are allowed and needed for a well-formed feature term of
the type,
- which types of values these attributes need,
- and with which other types the type can be conjoined by means of
unification.
- If the feature system allows complex features, attribute-value pairs in
which values may again be feature-terms, this recursion can be constrained by
recursive type definitions. In fact, all of grammatical recursion can be
elegantly captured by such recursive types. In the extreme, the entire
linguistic derivation (parsing, generation) can be construed as type deduction
(HPSG, TFS).
- The strength of unification grammar formalisms lies in
the advantages they offer for grammar engineering.
- Another important advabtage is the reusability of
grammars
Advantages for grammar engineering
Experience has proven that large grammars can be specified, but that their
development is extremely labour-extensive. Currently no methods exist for
efficient distributed grammar engineering. This constitutes a serious
bottleneck in the development of language technology products. The hope is
that the new class of declarative formalisms will greatly
facilitate linguistic engineering and thus speed up the entire development
cycle. There are indications that seem to support this expectation. For some
sizable grammars written in unification grammar formalisms, the development
time was four years or less (TUG, CLE, TDL), whereas the development of large
annotated phrase structure grammars had taken 8--12 years.
Reusability of Grammars
Another important issue in grammar engineering is the reusability of
grammars. The more a grammar is committed to a certain
processing model, the less are the chances that it can be adapted to other
processing models or new application areas. Although scientists are still far
from converging on a uniform representation format, the declarative formulation
of grammar greatly facilitates porting of such grammars from one formalism to
the other. Recent experiments in grammar porting seem to
bear out these expectations.
It is mainly because of their expected advantages for grammar engineering
that several unification formalisms have been developed or are currently used
in industrial laboratories. Almost all ongoing European Union-funded language
technology projects involving grammar development have adopted unification
grammar formalisms.
Constraint Grammar
The aim of the Constraint Grammar (CG) formalism, developed by Fred
Karlsson (from the Research Unit for Computational Linguistics at the University
of Helsinki), is to analyse real text, that is, to be used with row text. On
the other hand, CG supports a parsing based on morphology. A very important
part of the analysis through CG is morphological disambiguation, namely the
treatment of ambiguous output from morphological analysis using constraints
based on linguistic knowledge.
This formalism has been applied to the grammar of
Basque by the IXA Grout at the University of the Basque Country.
References
- Surveyof the State of the
Art in Human Language Technology. This "online" book, available
through Internet, surveys the state of the art of human language technology.
The book consists of thirteen chapters written by 97 different authors.
Editorial Board: Ronald A. Cole, Editor in Chief; Joseph Mariani; Hans
Uszkoreit; Annie Zaenen; Victor Zue. Contents: Spoken Language Input,
Written Language Input, Language Analysis and Understanding, Language
Generation, Spoken Output Technologies, Discourse and Dialogue, Document
Processing, Multilinguality, Multimodality, Transmission and Storage,
Mathematical Methods, Language Resources, Evaluation.
- La Lingüística
Computacional, página de Juan Carlos Ruiz Antón, Universitat
Jaume I.