GLOSIX Part 1-4.
|
| Back to LSD2 Table of Contents |
This section provides recommendations for the representation of linguistic annotation within GLOSIX environments. By linguistic annotation, we mean information derived from primary data (text or speech), usually resulting from linguistic analyzes, such as tokenization, morpho-syntactic tagging, prosody tagging, etc.
All linguistic annotation intended for interchange should follow the MULTEXT/EAGLES Corpus Encoding Standard (CES). The CES is a Text Encoding Initiative (TEI)-based application of SGML (ISO 8879:1986, Information Processing--Text and Office Systems--Standard Generalized Markup Language).
At present, the CES provides the following :
Encoding for other types of linguistic annotation are under development, such as encoding of prosody.
The CES follows the GLOSIX recommendations for character sets (GLOSIX Part 1.1. Characters).
Import/export formats for tools should follow the SGML-based recommendations from the MULTEXT/EAGLES Corpus Encoding Standard (CES) described above.
Other formats will defined in the future.