MULTEXT/EAGLES -
Document LSD 1. Annex 1. Version 1.0. Last modified 7 May 1996.
Contents
| Back to table of contents
|
We list here the main existing or emerging standards that are relevant to the
work of the subgroup. This is a first selection, which needs further work.
Some of the comments are quoted from the "Standards FAQ"
posted monthly to the USENET groups comp.protocols.iso,
comp.std.misc and comp.std.internat and archived on many FAQ servers such as:
<URL:ftp://ftp.inria.fr/faq/comp.std.internat/Standards_FAQ>
- ISO 646:1991
-
ISO 7-bit coded character set for information interchange
- ISO 8859 (Parts 1-10)
-
8 bits single-byte coded graphic character sets
ISO 8859 consists of several 8-bit ASCII extensions:
ISO 8859-1:1987--Part 1: Latin Alphabet No. 1
ISO 8859-2:1987--Part 2: Latin Alphabet No. 2
ISO 8859-3:1988--Part 3: Latin Alphabet No. 3
ISO 8859-4:1988--Part 4: Latin Alphabet No. 4
ISO 8859-5:1988--Part 5: Latin/Cyrillic Alphabet
ISO 8859-6:1987--Part 6: Latin/Arabic Alphabet
ISO 8859-7:1987--Part 7: Latin/Greek Alphabet
ISO 8859-8:1988--Part 8: Latin/Hebrew Alphabet
ISO 8859-9:1989-- Part 9: Latin Alphabet No. 5
ISO 8859-10:1992--Part 10: Latin Alphabet No. 6
ISO 8859-1, the "Latin alphabet No. 1" has become widely implemented and may
already be seen as the de facto standard ASCII replacement. ISO-8859-1 is also
the preferred encoding on the Internet.
The ISO 8859 series is the set of character sets currently recommended by the
EAGLES Text
Representation subgroup, until ISO 10646/Unicode is
finalized and implemented.
See also "Text of the Final Draft of the Revised ISO/IEC 8859-1 with line numbers" at
<URL:ftp://dkuug.dk/i18n/iso8859-1.jvw>
- ISO/IEC 6429:1992
-
Information processing -- Control functions for 7-bit and 8-bit
coded character sets
Consists of ASCII Control Codes. Subsets of these are also known as
VT100/VT320/ANSI escape sequences. Some sequences allow for character set and
language switching (see document
Language
Coding Using ISO/IEC 6429)
"We propose that work is started to prepare a European standard for the coding
of the languages used in a text. This coding may be provided within the
framework of the control functions of ISO 6429, or within the Standard
Generalized Markup Language (SGML, ISO 8879), or both. " (From
<URL:http://www.stonehand.com/unicode/standard/tc304.html>)
-
Unicode
-
"The Unicode Worldwide Character Standard is a character coding
system designed to support the interchange, processing, and display of the
written texts of the diverse languages of the modern world. In addition, it
supports classical and historical texts of many written languages.
In its current version, the Unicode standard contains 34,168 distinct coded
characters derived from 24 supported scripts. These characters cover the
principal written languages of the Americas, Europe, the Middle East, Africa,
India, Asia, and Pacifica.
Some modern written languages are not yet supported or only partially supported
due to a need for further research into the encoding needs of certain
scripts."
- ISO/IEC 10646:1992
-
Information technology -- Universal Multiple-Octet Coded
Character Set (UCS)
ISO/IEC 10646-1:1993 Part 1: Architecture and Basic Multilingual Plane
Unicode and ISO 10646 were merged in 1992, and contain ISO 8859 as a subset.
"At the moment they are technically aligned, but UNICODE has published some
more documentation on a "character set model" that is used in the design of
the system, which is not in the ISO standard.
At the moment it is a 16-bit character set, but will likely extend to 32 bits. ISO plans to allocate 32-bit characters in the next study period; it remains
to be seen if the UNICODE consortium will follow."
[Harald.T.Alvestrand@uninett.no, Mon Jan 23 13:16:41 1995, "The fight about Unicode in IETF"]
- ISO 9:1986
-
Transliteration of Slavic Cyrillic characters into Latin
characters
- ISO 233:1984
-
Transliteration of Arabic characters into Latin characters
- ISO 233-2:1993
-
Transliteration of Arabic characters into Latin characters--Part
2: simplified translitteration
- ISO 259:1984
-
Transliteration of Hebrew characters into Latin
characters
- ISO R 843:1968
-
Transliteration of Greek characters into Latin
characters
- ISO 3602:1989
-
Romanization of Japanese (kana script)
- ISO 7098:1991
-
Romanization of Chinese
- ISO 639:1988
-
Code for the representation of names of languages
Provides two-letter codes for about 140 languages and is intended primarily for
use in terminology, lexicography and linguistics.
The list is available
online
at
<URL:http://www.stonehand.com/unicode/standard/iso639.html>
- ISO 639-2:1995
-
Code for the representation of names of languages--Alpha-3
code
Three-letter codes for the representation of names of languages for information
interchange", developed by a Joint Working Group of ISO TC37/SC2 and TC46/SC2.
Covers a wider range of the world's languages than ISO 639.
The list is available
online
at
<URL:http://www.stonehand.com/unicode/standard/cd639-2.html>
- Ethnologue's language list
-
There is also "a comprehensive listing of the world languages along
with three letter unique language identifiersmay be found in the Ethnologue,
Languages of the World, 12th Edition, Editor Barbara F. Grimes, 1992,
Summer Institute of Linguistics, Dallas, Texas. Approximately 6800 languages
are described in this text which includes linguistic maps of geographical
regions." The list of codes is available
online
at
<URL:http://www.stonehand.com/unicode/standard/ethn12.htm>
- ISO 3166:1993
-
Codes for the representation of names of countries
This standard defines a 2-letter, a 3-letter and a numeric code for each
country on this planet. E.g. US/USA/840=United States, DE/DEU/276=Germany,
GB/GBR/826=United Kingdom, FR/FRA/250=France, ...). The 2-letter codes are well
known in the Internet as top-level domain names. The 3-letter versions are
often used at international sports events.
- Internet-Draft of HTML 3.0 [LANG attribute]
-
The current
Internet-Draft
of HTML 3.0 (29-Mar-95) provides a LANG Attribute, whose value is composed
from the two letter language code from ISO 639, optionally followed by a period
and a two letter country code from ISO 3166., e.g. "en.uk" for the variation
of English spoken in the United Kingdom
<URL:http://www.hpl.hp.co.uk/people/dsr/html/CoverPage.html>
-
Internet RFC 1766
-
Alvestrand, H. (1995) Tags for the Identification of Languages
"This document describes a language tag for use in cases where it is
desired to indicate the language used in an information object."
<URL:ftp://ftp.inria.fr/rfc/rfc17xx/rfc1766.Z>
- ISO 8879:1986
-
Information Processing--Text and Office Systems--Standard
Generalized Markup Language (SGML)
- ISO/IEC DIS 13673:1993
-
Information Technology -- Text and Office Systems -- Conformance
Testing for Standard Generalized Markup Language (SGML)
Systems
- TEI P3:1994
-
Sperberg-McQueen, C.M., Burnard, L. (Eds.) (1994) Guidelines for
Electronic Text Encoding and Interchange, TextEncoding Initiative, Chicago and
Oxford. Available online
at
<URL:http://etext.virginia.edu/TEI.html>
- EAGLES
Corpus Encoding DRAFT
-
EAGLES DOCUMENT EAG--CSG/IR--T2.1Version of October 1994
Corpus Encoding Standard - Draft proposal
<URL:http://www.ilc.pi.cnr.it/EAGLES/encoding/encoding.html>
- ISO/IEC DIS 10744:1992
-
Hypermedia/Time-based Document Structuring Language
(Hytime)
- ISO 12083
-
Standardized SGML document type definitions for books, articles
with tables, formulaes, etc.
- Internet: RFC 1521
-
Hypertext markup language (HTML)
- ISO 8601:1988
-
Representation of dates and times.
"This standard defines a lot of details of the calendar. E.g. the ISO
definition of the week numbers is that the first day (day number 1) of a week
is Monday and that the first week in a year (week number 1) is the week that
includes the first Thursday in January, i.e. the first week that has at least
four days in January. Other definitions are, e.g., that hours of a day are
counted from 0 to 24 and that the international notation of dates is the
Bigendian format year-month-day, e.g. 1993-04-17 and that for time is e.g.
20:36:04 (hh:mm:ss). There are also string formats for computer applications
specified that have to represent date and time in files and protocol packets.
(See
<URL:ftp://ftp.uni-erlangen.de/pub/doc/ISO/ISO8601.ps.Z>
for a very
detailed
summary.)">
- ISO 4217
-
Codes for the representation of currencies and
funds
- ITU-T/CCITT Recommendation E.123
-
Notation for international telephone numbers (a '+' followed by the
country code, followed by a space, ...).
- ISO/IEC 9075:1992
-
Information technology--Database
languages--SQL
- ANSI X3.159-1989
-
Americal National Standards Institute, American National Standard for
Information Systems -- Programming Language -- C.
- ISO 9899:1990
-
The C programming language
- ISO 9127:1988
-
User documentation and cover information for consumer software
packages
- X/Open Publications
-
Series of technical manuals addressing portability
issues, including (See "X/Open Publications Catalog"):
- Guides
X/Open Guides provide information that will be useful in the evaluation, procurement, development and / or management of open systems, covering best practice based on the experience of X/Open members, associate members and other experts from both the developer and user communities.
- X/Open Specifications (including CAE and Developers' Specifications)
These are the long life specifications which form the basis for conformant and branded X/Open-compliant systems.
- X/Open Portability Guide (XPG3)
This is the formal set of specifications first published In 1989 and used as the basis for X/Open's test and verification programme. The seven volumes plus the Overview have been re-printed and published by X/Open, and are presented in line with our future publications architecture.
- XPG4
XPG4 is the latest set of specifications, verification and branding programme. The parent volume "X/Open Systems and Branded Products: XPG4" describes the whole edifice and cross-refers to over 15 specifications.
- IEEE 1003
-
POSIX
Series of standards:
IEEE 1003.0 Posix Guide
IEEE 1003.1 System Application Program Interface (also ISO 9945-1)
IEEE 1003.2 Shell and utilities
etc.
| Top
| Next
| LSD1 Table of Contents
| MULTEXT
| EAGLES Tool subgroup
| LPL
|
Copyright (c) Centre National de la Recherche Scientifique, 1995.