This document compiles information from material published in the WWW by Hans Uskoreit, Annie Zaenen and Juan Carlos Ruiz Antón.

A unification based grammar development system

Unification Grammars

Advantages for grammar engineering

Experience has proven that large grammars can be specified, but that their development is extremely labour-extensive. Currently no methods exist for efficient distributed grammar engineering. This constitutes a serious bottleneck in the development of language technology products. The hope is that the new class of declarative formalisms will greatly facilitate linguistic engineering and thus speed up the entire development cycle. There are indications that seem to support this expectation. For some sizable grammars written in unification grammar formalisms, the development time was four years or less (TUG, CLE, TDL), whereas the development of large annotated phrase structure grammars had taken 8--12 years.

Reusability of Grammars

Another important issue in grammar engineering is the reusability of grammars. The more a grammar is committed to a certain processing model, the less are the chances that it can be adapted to other processing models or new application areas. Although scientists are still far from converging on a uniform representation format, the declarative formulation of grammar greatly facilitates porting of such grammars from one formalism to the other. Recent experiments in grammar porting seem to bear out these expectations.

It is mainly because of their expected advantages for grammar engineering that several unification formalisms have been developed or are currently used in industrial laboratories. Almost all ongoing European Union-funded language technology projects involving grammar development have adopted unification grammar formalisms.

Constraint Grammar

The aim of the Constraint Grammar (CG) formalism, developed by Fred Karlsson (from the Research Unit for Computational Linguistics at the University of Helsinki), is to analyse real text, that is, to be used with row text. On the other hand, CG supports a parsing based on morphology. A very important part of the analysis through CG is morphological disambiguation, namely the treatment of ambiguous output from morphological analysis using constraints based on linguistic knowledge.

This formalism has been applied to the grammar of Basque by the IXA Grout at the University of the Basque Country.