MULTEXT/EAGLES - Document LSD 1. Annex 1. Version 1.0. Last modified 7 May 1996.




Annex1 - Relevant standards





Contents

| Back to table of contents |


We list here the main existing or emerging standards that are relevant to the work of the subgroup. This is a first selection, which needs further work.

Some of the comments are quoted from the "Standards FAQ" posted monthly to the USENET groups comp.protocols.iso, comp.std.misc and comp.std.internat and archived on many FAQ servers such as:

<URL:ftp://ftp.inria.fr/faq/comp.std.internat/Standards_FAQ>


Character sets

ISO 646:1991

ISO 7-bit coded character set for information interchange

ISO 8859 (Parts 1-10)

8 bits single-byte coded graphic character sets

ISO 8859 consists of several 8-bit ASCII extensions:
ISO 8859-1:1987--Part 1: Latin Alphabet No. 1
ISO 8859-2:1987--Part 2: Latin Alphabet No. 2
ISO 8859-3:1988--Part 3: Latin Alphabet No. 3
ISO 8859-4:1988--Part 4: Latin Alphabet No. 4
ISO 8859-5:1988--Part 5: Latin/Cyrillic Alphabet
ISO 8859-6:1987--Part 6: Latin/Arabic Alphabet
ISO 8859-7:1987--Part 7: Latin/Greek Alphabet
ISO 8859-8:1988--Part 8: Latin/Hebrew Alphabet
ISO 8859-9:1989-- Part 9: Latin Alphabet No. 5
ISO 8859-10:1992--Part 10: Latin Alphabet No. 6

ISO 8859-1, the "Latin alphabet No. 1" has become widely implemented and may already be seen as the de facto standard ASCII replacement. ISO-8859-1 is also the preferred encoding on the Internet.

The ISO 8859 series is the set of character sets currently recommended by the EAGLES Text Representation subgroup, until ISO 10646/Unicode is finalized and implemented.
See also "Text of the Final Draft of the Revised ISO/IEC 8859-1 with line numbers" at

<URL:ftp://dkuug.dk/i18n/iso8859-1.jvw>

ISO/IEC 6429:1992

Information processing -- Control functions for 7-bit and 8-bit coded character sets
Consists of ASCII Control Codes. Subsets of these are also known as VT100/VT320/ANSI escape sequences. Some sequences allow for character set and language switching (see document Language Coding Using ISO/IEC 6429)

"We propose that work is started to prepare a European standard for the coding of the languages used in a text. This coding may be provided within the framework of the control functions of ISO 6429, or within the Standard Generalized Markup Language (SGML, ISO 8879), or both. " (From <URL:http://www.stonehand.com/unicode/standard/tc304.html>)

Unicode

"The Unicode Worldwide Character Standard is a character coding system designed to support the interchange, processing, and display of the written texts of the diverse languages of the modern world. In addition, it supports classical and historical texts of many written languages.

In its current version, the Unicode standard contains 34,168 distinct coded characters derived from 24 supported scripts. These characters cover the principal written languages of the Americas, Europe, the Middle East, Africa, India, Asia, and Pacifica.

Some modern written languages are not yet supported or only partially supported due to a need for further research into the encoding needs of certain scripts."

ISO/IEC 10646:1992

Information technology -- Universal Multiple-Octet Coded Character Set (UCS)
ISO/IEC 10646-1:1993 Part 1: Architecture and Basic Multilingual Plane
Unicode and ISO 10646 were merged in 1992, and contain ISO 8859 as a subset.
"At the moment they are technically aligned, but UNICODE has published some more documentation on a "character set model" that is used in the design of the system, which is not in the ISO standard.
At the moment it is a 16-bit character set, but will likely extend to 32 bits. ISO plans to allocate 32-bit characters in the next study period; it remains to be seen if the UNICODE consortium will follow."


[Harald.T.Alvestrand@uninett.no, Mon Jan 23 13:16:41 1995, "The fight about Unicode in IETF"]

Transliteration

ISO 9:1986

Transliteration of Slavic Cyrillic characters into Latin characters

ISO 233:1984

Transliteration of Arabic characters into Latin characters

ISO 233-2:1993

Transliteration of Arabic characters into Latin characters--Part 2: simplified translitteration

ISO 259:1984

Transliteration of Hebrew characters into Latin characters

ISO R 843:1968

Transliteration of Greek characters into Latin characters

ISO 3602:1989

Romanization of Japanese (kana script)

ISO 7098:1991

Romanization of Chinese


Language and country codes

ISO 639:1988

Code for the representation of names of languages

Provides two-letter codes for about 140 languages and is intended primarily for use in terminology, lexicography and linguistics.

The list is available online at

<URL:http://www.stonehand.com/unicode/standard/iso639.html>

ISO 639-2:1995

Code for the representation of names of languages--Alpha-3 code

Three-letter codes for the representation of names of languages for information interchange", developed by a Joint Working Group of ISO TC37/SC2 and TC46/SC2. Covers a wider range of the world's languages than ISO 639.

The list is available online at

<URL:http://www.stonehand.com/unicode/standard/cd639-2.html>

Ethnologue's language list

There is also "a comprehensive listing of the world languages along with three letter unique language identifiersmay be found in the Ethnologue, Languages of the World, 12th Edition, Editor Barbara F. Grimes, 1992, Summer Institute of Linguistics, Dallas, Texas. Approximately 6800 languages are described in this text which includes linguistic maps of geographical regions." The list of codes is available online at

<URL:http://www.stonehand.com/unicode/standard/ethn12.htm>

ISO 3166:1993

Codes for the representation of names of countries

This standard defines a 2-letter, a 3-letter and a numeric code for each country on this planet. E.g. US/USA/840=United States, DE/DEU/276=Germany, GB/GBR/826=United Kingdom, FR/FRA/250=France, ...). The 2-letter codes are well known in the Internet as top-level domain names. The 3-letter versions are often used at international sports events.

Internet-Draft of HTML 3.0 [LANG attribute]

The current Internet-Draft of HTML 3.0 (29-Mar-95) provides a LANG Attribute, whose value is composed from the two letter language code from ISO 639, optionally followed by a period and a two letter country code from ISO 3166., e.g. "en.uk" for the variation of English spoken in the United Kingdom

<URL:http://www.hpl.hp.co.uk/people/dsr/html/CoverPage.html>

Internet RFC 1766

Alvestrand, H. (1995) Tags for the Identification of Languages

"This document describes a language tag for use in cases where it is desired to indicate the language used in an information object."

<URL:ftp://ftp.inria.fr/rfc/rfc17xx/rfc1766.Z>



Document encoding

ISO 8879:1986

Information Processing--Text and Office Systems--Standard Generalized Markup Language (SGML)

ISO/IEC DIS 13673:1993

Information Technology -- Text and Office Systems -- Conformance Testing for Standard Generalized Markup Language (SGML) Systems

TEI P3:1994

Sperberg-McQueen, C.M., Burnard, L. (Eds.) (1994) Guidelines for Electronic Text Encoding and Interchange, TextEncoding Initiative, Chicago and Oxford. Available online at

<URL:http://etext.virginia.edu/TEI.html>

EAGLES Corpus Encoding DRAFT

EAGLES DOCUMENT EAG--CSG/IR--T2.1Version of October 1994
Corpus Encoding Standard - Draft proposal

<URL:http://www.ilc.pi.cnr.it/EAGLES/encoding/encoding.html>

ISO/IEC DIS 10744:1992

Hypermedia/Time-based Document Structuring Language (Hytime)

ISO 12083

Standardized SGML document type definitions for books, articles with tables, formulaes, etc.

Internet: RFC 1521

Hypertext markup language (HTML)

ISO 8601:1988

Representation of dates and times.

"This standard defines a lot of details of the calendar. E.g. the ISO definition of the week numbers is that the first day (day number 1) of a week is Monday and that the first week in a year (week number 1) is the week that includes the first Thursday in January, i.e. the first week that has at least four days in January. Other definitions are, e.g., that hours of a day are counted from 0 to 24 and that the international notation of dates is the Bigendian format year-month-day, e.g. 1993-04-17 and that for time is e.g. 20:36:04 (hh:mm:ss). There are also string formats for computer applications specified that have to represent date and time in files and protocol packets. (See

<URL:ftp://ftp.uni-erlangen.de/pub/doc/ISO/ISO8601.ps.Z>

for a very detailed summary.)">

ISO 4217

Codes for the representation of currencies and funds

ITU-T/CCITT Recommendation E.123

Notation for international telephone numbers (a '+' followed by the country code, followed by a space, ...).



Software

ISO/IEC 9075:1992

Information technology--Database languages--SQL

ANSI X3.159-1989

Americal National Standards Institute, American National Standard for Information Systems -- Programming Language -- C.

ISO 9899:1990

The C programming language

ISO 9127:1988

User documentation and cover information for consumer software packages

X/Open Publications

Series of technical manuals addressing portability issues, including (See "X/Open Publications Catalog"):



IEEE 1003

POSIX

Series of standards:

IEEE 1003.0 Posix Guide
IEEE 1003.1 System Application Program Interface (also ISO 9945-1)
IEEE 1003.2 Shell and utilities
etc.

| Top | Next | LSD1 Table of Contents | MULTEXT | EAGLES Tool subgroup | LPL |
Copyright (c) Centre National de la Recherche Scientifique, 1995. HTML 3.2 Checked!