Hans Uszkoreit & Annie Zaenen
Deutsches Forschungzentrum für Künstliche Intelligenz, Saarbrücken, Germany
and Universität des Saarlandes, Saarbrücken, Germany
Rank Xerox Research Centre, Grenoble, France
A very advanced and wide-spread class of linguistic formalisms are the so-called constraint-based grammar formalisms which are also often subsumed under the term unification grammars. They go beyond many earlier representation languages in that they have a clean denotational semantics that permits the encoding of grammatical knowledge independent from any specific processing algorithm. Since these formalisms are currently used in a large number of systems, we will a provide a brief overview of their main characteristics.
Among the most used, constraint-based grammar models are Functional Unification Grammar (FUG) [Kay84] Head-Driven Phrase-Structure Grammar (HPSG) [PS94] Lexical Functional Grammar (LFG) [Bre82], Categorial Unification Grammar (CUG) [HKM87,Kar89,Usz86], and Tree Adjunction Grammar (TAG) [JS92]. For these or similar grammar models, powerful formalisms have been designed and implemented that are usually employed for both grammar development and linguistic processing, e.g, LFG [Bre82], PATR [SURT83], ALE [Car92a], STUF [BKU88], ALEP [AAB91], CLE [Als92] TDL [KS94] TFS [EZ90].
One essential ingredient of all these formalisms is complex formal descriptions of grammatical units (words, phrases, sentences) by means of sets of attribute-value pairs, so called feature terms. These feature terms can be nested, i.e., values can be atomic symbols or feature terms. Feature terms can be underspecified. They may contain equality statements expressed by variables or coreference markers. The formalisms share a uniform operation for the merging and checking of grammatical information, which is commonly referred to as unification.
The formalisms differ in other aspects. Some of them are restricted to feature terms with simple unification (PATR). Others employ more powerful data types such as disjunctive terms, functional constraints, or sets. Most formalisms combine phrase-structure rules or other mechanisms for building trees with the feature-term component of the language (LFG, TAG, TDL). A few formalisms incorporate the phrase-structure information into the feature terms (HPSG, TFS).
Some frameworks use inheritance type systems (HPSG, TFS, TDL, ALE). Classes of feature terms belong to types. The types are partially ordered in a tree or in a (semi) lattice. The type hierarchy determines for every type from which other types attributes and values are inherited, which attributes are allowed and needed for a well-formed feature term of the type, which types of values these attributes need, and with which other types the type can be conjoined by means of unification.
If the feature system allows complex features, attribute-value pairs in which values may again be feature-terms, this recursion can be constrained by recursive type definitions. In fact, all of grammatical recursion can be elegantly captured by such recursive types. In the extreme, the entire linguistic derivation (parsing, generation) can be construed as type deduction (HPSG, TFS).
The strength of unification grammar formalisms lies in the advantages they offer for grammar engineering. Experience has proven that large grammars can be specified, but that their development is extremely labour-extensive. Currently no methods exist for efficient distributed grammar engineering. This constitutes a serious bottleneck in the development of language technology products. The hope is that the new class of declarative formalisms will greatly facilitate linguistic engineering and thus speed up the entire development cycle. There are indications that seem to support this expectation. For some sizable grammars written in unification grammar formalisms, the development time was four years or less (TUG, CLE, TDL), whereas the development of large annotated phrase structure grammars had taken 8--12 years.
Another important issue in grammar engineering is the reusability of grammars. The more a grammar is committed to a certain processing model, the less are the chances that it can be adapted to other processing models or new application areas. Although scientists are still far from converging on a uniform representation format, the declarative formulation of grammar greatly facilitates porting of such grammars from one formalism to the other. Recent experiments in grammar porting seem to bear out these expectations.
It is mainly because of their expected advantages for grammar engineering that several unification formalisms have been developed or are currently used in industrial laboratories. Almost all ongoing European Union-funded language technology projects involving grammar development have adopted unification grammar formalisms.