MULTEXT/EAGLES - Document LSD 1. Annex 1. Version 1.0. Last modified 7 May 1996.

Annex1 - Relevant standards

Character sets
Translitteration
Language and country codes
Document encoding
Software

We list here the main existing or emerging standards that are relevant to the work of the subgroup. This is a first selection, which needs further work.

Some of the comments are quoted from the "Standards FAQ" posted monthly to the USENET groups comp.protocols.iso, comp.std.misc and comp.std.internat and archived on many FAQ servers such as:

<URL:ftp://ftp.inria.fr/faq/comp.std.internat/Standards_FAQ>

Character sets

ISO 646:1991: ISO 7-bit coded character set for information interchange
ISO 8859 (Parts 1-10): 8 bits single-byte coded graphic character sets

ISO 8859 consists of several 8-bit ASCII extensions:
ISO 8859-1:1987--Part 1: Latin Alphabet No. 1
ISO 8859-2:1987--Part 2: Latin Alphabet No. 2
ISO 8859-3:1988--Part 3: Latin Alphabet No. 3
ISO 8859-4:1988--Part 4: Latin Alphabet No. 4
ISO 8859-5:1988--Part 5: Latin/Cyrillic Alphabet
ISO 8859-6:1987--Part 6: Latin/Arabic Alphabet
ISO 8859-7:1987--Part 7: Latin/Greek Alphabet
ISO 8859-8:1988--Part 8: Latin/Hebrew Alphabet
ISO 8859-9:1989-- Part 9: Latin Alphabet No. 5
ISO 8859-10:1992--Part 10: Latin Alphabet No. 6

ISO 8859-1, the "Latin alphabet No. 1" has become widely implemented and may already be seen as the de facto standard ASCII replacement. ISO-8859-1 is also the preferred encoding on the Internet.

The ISO 8859 series is the set of character sets currently recommended by the EAGLES Text Representation subgroup, until ISO 10646/Unicode is finalized and implemented.
See also "Text of the Final Draft of the Revised ISO/IEC 8859-1 with line numbers" at

<URL:ftp://dkuug.dk/i18n/iso8859-1.jvw>
ISO/IEC 6429:1992: Information processing -- Control functions for 7-bit and 8-bit coded character sets
Consists of ASCII Control Codes. Subsets of these are also known as VT100/VT320/ANSI escape sequences. Some sequences allow for character set and language switching (see document Language Coding Using ISO/IEC 6429)

"We propose that work is started to prepare a European standard for the coding of the languages used in a text. This coding may be provided within the framework of the control functions of ISO 6429, or within the Standard Generalized Markup Language (SGML, ISO 8879), or both. " (From <URL:http://www.stonehand.com/unicode/standard/tc304.html>)
Unicode: "The Unicode Worldwide Character Standard is a character coding system designed to support the interchange, processing, and display of the written texts of the diverse languages of the modern world. In addition, it supports classical and historical texts of many written languages.

In its current version, the Unicode standard contains 34,168 distinct coded characters derived from 24 supported scripts. These characters cover the principal written languages of the Americas, Europe, the Middle East, Africa, India, Asia, and Pacifica.

Some modern written languages are not yet supported or only partially supported due to a need for further research into the encoding needs of certain scripts."
ISO/IEC 10646:1992: Information technology -- Universal Multiple-Octet Coded Character Set (UCS)
ISO/IEC 10646-1:1993 Part 1: Architecture and Basic Multilingual Plane
Unicode and ISO 10646 were merged in 1992, and contain ISO 8859 as a subset.
"At the moment they are technically aligned, but UNICODE has published some more documentation on a "character set model" that is used in the design of the system, which is not in the ISO standard.
At the moment it is a 16-bit character set, but will likely extend to 32 bits. ISO plans to allocate 32-bit characters in the next study period; it remains to be seen if the UNICODE consortium will follow."

[Harald.T.Alvestrand@uninett.no, Mon Jan 23 13:16:41 1995, "The fight about Unicode in IETF"]

Transliteration

ISO 9:1986: Transliteration of Slavic Cyrillic characters into Latin characters
ISO 233:1984: Transliteration of Arabic characters into Latin characters
ISO 233-2:1993: Transliteration of Arabic characters into Latin characters--Part 2: simplified translitteration
ISO 259:1984: Transliteration of Hebrew characters into Latin characters
ISO R 843:1968: Transliteration of Greek characters into Latin characters
ISO 3602:1989: Romanization of Japanese (kana script)
ISO 7098:1991: Romanization of Chinese

Language and country codes

ISO 639:1988: Code for the representation of names of languages

Provides two-letter codes for about 140 languages and is intended primarily for use in terminology, lexicography and linguistics.

The list is available online at

<URL:http://www.stonehand.com/unicode/standard/iso639.html>
ISO 639-2:1995: Code for the representation of names of languages--Alpha-3 code

Three-letter codes for the representation of names of languages for information interchange", developed by a Joint Working Group of ISO TC37/SC2 and TC46/SC2. Covers a wider range of the world's languages than ISO 639.

The list is available online at

<URL:http://www.stonehand.com/unicode/standard/cd639-2.html>
Ethnologue's language list: There is also "a comprehensive listing of the world languages along with three letter unique language identifiersmay be found in the Ethnologue, Languages of the World, 12th Edition, Editor Barbara F. Grimes, 1992, Summer Institute of Linguistics, Dallas, Texas. Approximately 6800 languages are described in this text which includes linguistic maps of geographical regions." The list of codes is available online at

<URL:http://www.stonehand.com/unicode/standard/ethn12.htm>
ISO 3166:1993: Codes for the representation of names of countries

This standard defines a 2-letter, a 3-letter and a numeric code for each country on this planet. E.g. US/USA/840=United States, DE/DEU/276=Germany, GB/GBR/826=United Kingdom, FR/FRA/250=France, ...). The 2-letter codes are well known in the Internet as top-level domain names. The 3-letter versions are often used at international sports events.
Internet-Draft of HTML 3.0 [LANG attribute]: The current Internet-Draft of HTML 3.0 (29-Mar-95) provides a LANG Attribute, whose value is composed from the two letter language code from ISO 639, optionally followed by a period and a two letter country code from ISO 3166., e.g. "en.uk" for the variation of English spoken in the United Kingdom

<URL:http://www.hpl.hp.co.uk/people/dsr/html/CoverPage.html>
Internet RFC 1766: Alvestrand, H. (1995) Tags for the Identification of Languages

"This document describes a language tag for use in cases where it is desired to indicate the language used in an information object."

<URL:ftp://ftp.inria.fr/rfc/rfc17xx/rfc1766.Z>

Document encoding

ISO 8879:1986: Information Processing--Text and Office Systems--Standard Generalized Markup Language (SGML)
ISO/IEC DIS 13673:1993: Information Technology -- Text and Office Systems -- Conformance Testing for Standard Generalized Markup Language (SGML) Systems
TEI P3:1994: Sperberg-McQueen, C.M., Burnard, L. (Eds.) (1994) Guidelines for Electronic Text Encoding and Interchange, TextEncoding Initiative, Chicago and Oxford. Available online at

<URL:http://etext.virginia.edu/TEI.html>
EAGLES Corpus Encoding DRAFT: EAGLES DOCUMENT EAG--CSG/IR--T2.1Version of October 1994
Corpus Encoding Standard - Draft proposal

<URL:http://www.ilc.pi.cnr.it/EAGLES/encoding/encoding.html>
ISO/IEC DIS 10744:1992: Hypermedia/Time-based Document Structuring Language (Hytime)
ISO 12083: Standardized SGML document type definitions for books, articles with tables, formulaes, etc.
Internet: RFC 1521: Hypertext markup language (HTML)
ISO 8601:1988: Representation of dates and times.

"This standard defines a lot of details of the calendar. E.g. the ISO definition of the week numbers is that the first day (day number 1) of a week is Monday and that the first week in a year (week number 1) is the week that includes the first Thursday in January, i.e. the first week that has at least four days in January. Other definitions are, e.g., that hours of a day are counted from 0 to 24 and that the international notation of dates is the Bigendian format year-month-day, e.g. 1993-04-17 and that for time is e.g. 20:36:04 (hh:mm:ss). There are also string formats for computer applications specified that have to represent date and time in files and protocol packets. (See

<URL:ftp://ftp.uni-erlangen.de/pub/doc/ISO/ISO8601.ps.Z>

for a very detailed summary.)">
ISO 4217: Codes for the representation of currencies and funds
ITU-T/CCITT Recommendation E.123: Notation for international telephone numbers (a '+' followed by the country code, followed by a space, ...).

Software

ISO/IEC 9075:1992

Information technology--Database languages--SQL

ANSI X3.159-1989

Americal National Standards Institute, American National Standard for Information Systems -- Programming Language -- C.

ISO 9899:1990

The C programming language

ISO 9127:1988

User documentation and cover information for consumer software packages

X/Open Publications

Series of technical manuals addressing portability issues, including (See "X/Open Publications Catalog"):

Guides

X/Open Guides provide information that will be useful in the evaluation, procurement, development and / or management of open systems, covering best practice based on the experience of X/Open members, associate members and other experts from both the developer and user communities.
X/Open Specifications (including CAE and Developers' Specifications)

These are the long life specifications which form the basis for conformant and branded X/Open-compliant systems.
X/Open Portability Guide (XPG3)

This is the formal set of specifications first published In 1989 and used as the basis for X/Open's test and verification programme. The seven volumes plus the Overview have been re-printed and published by X/Open, and are presented in line with our future publications architecture.
XPG4

XPG4 is the latest set of specifications, verification and branding programme. The parent volume "X/Open Systems and Branded Products: XPG4" describes the whole edifice and cross-refers to over 15 specifications.

IEEE 1003

POSIX

Series of standards:

IEEE 1003.0 Posix Guide
IEEE 1003.1 System Application Program Interface (also ISO 9945-1)
IEEE 1003.2 Shell and utilities
etc.