overall objective of HLT is to support e-business in a global context and to
promote a human centred infostructure ensuring
equal access and usage opportunities for all. This is to be achieved by
developing multilingual technologies and demonstrating exemplary applications
providing features and functions that are critical for the realisation of a
truly user friendly Information Society. Projects address generic and applied
RTD from a multi- and cross-lingual perspective, and undertake to demonstrate
how language specific solutions can be transferred to and adapted for other
elements of the three initial HLT action lines - Multilinguality,
Natural Interactivity and Crosslingual Information
Management are still present, there has been periodic re-assessment and tuning
of them to emerging trends and changes in the surrounding economic, social,
and technological environment. The trials and best practice in multilingual e-service
and e-commerce action line was introduced in the IST 2000 work programme
(IST2000) to stimulate new forms of partnership between technology providers,
system integrators and users through trials and best practice actions
addressing end-to-end multi-language platforms and solutions for e-service and
fifth IST call for proposals
covered this action line.
Language Technologies and the information society (Presentation of Action Line,
by the EC: caché)
is one that evolved along with a culture of human native speakers who use the
language for general-purpose communication. Languages like English,
are natural languages, while languages like Esperanto
are called constructed
having been deliberately created for a specific purpose.
people view NLG as the opposite of natural
The difference can be put this way: whereas in natural language understanding
the system needs to disambiguate the input sentence to produce the machine
representation language, in NLG the system needs to take decisions about how
to put a concept into words.
Wikipedia, the free encyclopaedia.
Computational linguistics (CL) is
a discipline between linguistics and computer science which is concerned with
the computational aspects of the human language faculty. It belongs to the
cognitive sciences and overlaps with the field of artificial
intelligence (AI), a branch of computer science aiming
at computational models of human cognition. Computational linguistics has
applied and theoretical components.
Theoretical CL takes
up issues in theoretical linguistics and cognitive science.
It deals with formal theories about the linguistic knowledge that a human
needs for generating and understanding language. Today these theories have
reached a degree of complexity that can only be managed by employing computers.
Computational linguists develop formal models simulating aspects of the human
language faculty and implement them as computer programmes. These programmes
constitute the basis for the evaluation and further development of the
theories. In addition to linguistic theories, findings from cognitive
psychology play a major role in simulating linguistic competence.
Within psychology, it is mainly the area of psycholinguisticsthat
examines the cognitive processes constituting human language use. The
relevance of computational modelling for psycholinguistic research is
reflected in the emergence of a new subdiscipline:
Applied CL focusses
on the practical outcome of modelling human language use. The methods,
techniques, tools and applications in this area are often subsumed under the
term language engineering or (human) language technology.
Although existing CL systems are far from achieving human ability, they have
numerous possible applications. The goal is to create software products that
have some knowledge of human language. Such products are going to change our
lives. They are urgently needed for improving human-machine interaction since
the main obstacle in the interaction beween human
and computer is a communication problem. Today's computers do not understand
our language but computer languages are difficult to learn and do not
correspond to the structure of human thought. Even if the language the machine
understands and its domain of discourse are very restricted, the use of human
language can increase the acceptance of software and the productivity of its
Computational Linguistics?, by Hans Uszkoreit (caché)
development and convergence of computer and telecommunication technologies has
led to a revolution in the way that we work, communicate with each other, buy
goods and use services, and even the way we entertain and educate ourselves.
of the results of this revolution is that large volumes of information will
increasingly be held in a form which is more natural for human users than the
strictly formatted, structured data typical of computer systems of the past.
Information presented in visual images, as sound, and in natural language,
either as text or speech, will become the norm.
all deal with computer systems and services, either directly or indirectly,
every day of our lives. This is the information age and we are a society in
which information is vital to economic, social, and political success as well
as to our quality of life.
changes of the last two decades may have seemed revolutionary but, in reality,
we are only on the threshold of this new age. There are still many new ways in
which the application of telematics and the use of
language technology will benefit our way of life, from interactive
entertainment to lifelong learning.
these changes will bring great benefits, it is important that we anticipate
difficulties which may arise, and develop ways to overcome them. Examples
of such problems
Engineering can solve these problems.
multimedia content and services, interpersonal communication, cross-border
trade and product documentation are all inherently bound to language
and culture. Advances in computerised analysis, understanding and
generation of written and spoken language are going to revolutionise human-computer
interaction and technology mediated person-to-person communication.
Language Technologies aims to further strengthen
focus will be on three major challenges presented by key drivers of the
Information Society - specifically, the globalisation of economy and society,
high-bandwidth digital communication and the World Wide Web - for which human
language technologies play a central role:
to information and communication systems, at all stages of the information
cycle, including content generation and maintenance in multiple languages,
content and software localisation, automated translation and interpretation,
and computer assisted language training;
natural interactivity and accessibility of digital services through
multimodal dialogues, understanding of messages and communicative acts,
unconstrained language input-output and keyboard-less operation;
active digital content for an optimal use and acquisition by all,
through personalised language assistants supporting deep information analysis,
knowledge extraction and summarisation, meaning classification and metadata
any concern in
importance for Europe, in particular in the information age, to capitalise on
the wealth represented by its linguistic and cultural diversity, while
overcoming the inherent inefficiencies associated with it, has repeatedly been
stated at various institutional and extra-institutional levels. In particular
the relevance of linguistic and cultural aspects of the Information Society in
G7 conference on The Information Society and Development,
has emphasised the fact that information technologies have a tremendous
potential to preserve and exploit cultural and linguistic diversity.
Information Society Forum, has pointed out that,
the central resource of European HLT developments, is seeking sponsors for the
continued operation of the web site in 2004 and beyond. A variety of
sponsorship, advertising and content options are available
current situation of the HLTCentral.org
are the main techniques used in Language Engineering?
Engineering comprises a set of techniques and language resources. The former
are implemented in computer software and the latter are a repository of
knowledge which can be accessed by computer software.
human voice is as unique to an individual as a fingerprint. This makes it
possible to identify a speaker and to use this identification as the basis for
verifying that the individual is entitled to access a service or a resource.
The types of problems which have to be overcome are, for example, recognising
that the speech is not recorded, selecting the voice through noise (either in
the environment or the transfer medium), and identifying reliably despite
temporary changes (such as caused by illness).
sound of speech is received by a computer in analogue wave forms which are
analysed to identify the units of sound (called phonemes) which make up words.
Statistical models of phonemes and words are used to recognise discrete or
continuous speech input. The production of quality statistical models requires
extensive training samples (corpora) and vast quantities of speech have been
collected, and continue to be collected, for this purpose.
are a number of significant problems to be overcome if speech is to become a
commonly used medium for dealing with a computer. The first of these is the
ability to recognise continuous speech rather than speech which is
deliberately delivered by the speaker as a series of discrete words separated
by a pause. The next is to recognise any speaker, avoiding the need to train
the system to recognise the speech of a particular individual. There is also
the serious problem of the noise which can interfere with recognition, either
from the environment in which the speaker uses the system or through noise
introduced by the transmission medium, the telephone line, for example. Noise
reduction, signal enhancement and key word spotting can be used to allow
accurate and robust recognition in noisy environments or over
telecommunication networks. Finally, there is the problem of dealing with
accents, dialects, and language spoken, as it often is, ungrammatically.
of written or printed language requires that a symbolic representation of the
language is derived from its spatial form of graphical marks. For most
languages this means recognising and transforming characters. There are two
cases of character recognition:
from a single printed font family can achieve a very high degree of accuracy.
Problems arise when the font is unknown or very decorative, or when the
quality of the print is poor. In these difficult cases, and in the case of
handwriting, good results can only be achieved by using ICR. This involves
word recognition techniques which use language models, such as lexicons or
statistical information about word sequences.
image analysis is closely associated with character recognition but involves
the analysis of the document to determine firstly its make-up in terms of
graphics, photographs, separating lines and text, and then the structure of
the text to identify headings, sub-headings, captions etc. in order to be able
to process the text effectively.
understanding of language is obviously fundamental to many applications.
However, perfect understanding is not always a requirement. In fact, gaining a
partial understanding is often a very useful preliminary step in the process
because it makes it possible to be intelligently selective about taking the
depth of understanding to further levels.
or partial analysis of texts is used to obtain a robust initial classification
of unrestricted texts efficiently. This initial analysis can then be used, for
example, to focus on 'interesting' parts of a text for a deeper semantic
analysis which determines the content of the text within a limited domain. It
can also be used, in conjunction with statistical and linguistic knowledge, to
identify linguistic features of unknown words automatically, which can then be
added to the system's knowledge.
models are used to represent the meaning of language in terms of concepts and
relationships between them. A semantic model can be used, for example, to map
an information request to an underlying meaning which is independent of the
actual terminology or language in which the query was expressed. This supports
multi-lingual access to information without a need to be familiar with the
actual terminology or structuring used to index the information.
of analysis and generation with a semantic model allow texts to be translated.
At the current stage of development, applications where this can be achieved
need be limited in vocabulary and concepts so that adequate Language
Engineering resources can be applied. Templates for document structure, as
well as common phrases with variable parts, can be used to aid generation of a
high quality text.
semantic representation of a text can be used as the basis for generating
language. An interpretation of basic data or the
underlying meaning of a sentence or phrase can be mapped into a surface string
in a selected fashion; either in a chosen language or according to stylistic
specifications by a text planning system.
is generated from filled templates, by playing 'canned' recordings or
concatenating units of speech (phonemes, words) together. Speech generated has
to account for aspects such as intensity, duration and stress in order to
produce a continuous and natural response.
can be established by combining speech recognition with simple generation,
either from concatenation of stored human speech components or synthesising
speech using rules.
a library of speech recognisers and generators, together with a graphical tool
for structuring their application, allows someone who is neither a speech
expert nor a computer programmer to design a structured dialogue which can be
used, for example, in automated handling of telephone calls.
resources are essential components of Language Engineering. They are one of
the main ways of representing the knowledge of language, which is used for the
analytical work leading to recognition and understanding.
work of producing and maintaining language resources is a huge task. Resources
are produced, according to standard formats and protocols to enable access, in
many EU languages, by research laboratories and public institutions. Many of
these resources are being made available through the European Language
Resources Association (ELRA).
A lexicon is a repository of words and knowledge about those words. This knowledge may include details of the grammatical structure of each word (morphology), the sound structure (phonology), the meaning of the word in different textual contexts, e.g. depending on the word or punctuation mark before or after it. A useful lexicon may have hundreds of thousands of entries. Lexicons are needed for every language of application.
are a number of special cases which are usually researched and produced
separately from general purpose lexicons:
Dictionaries of proper names are essential to effective understanding of
language, at least so that they can be recognised within their context as
places, objects, or person, or maybe animals. They take on a special
significance in many applications, however, where the name is key
to the application such as in a voice operated navigation system, a holiday
reservations system, or railway timetable information system, based on
automated telephone call handling.
In today's complex technological environment there are a host of terminologies
which need to be recorded, structured and made available for language enhanced
applications. Many of the most cost-effective applications of Language
Engineering, such as multi-lingual technical document management and machine
translation, depend on the availability of the appropriate terminology banks.
A wordnet describes the relationships between
words; for example, synonyms, antonyms, collective nouns, and so on. These can
be invaluable in such applications as information retrieval, translator
workbenches and intelligent office automation facilities for authoring.
grammar describes the structure of a language at different levels: word (morphological
grammar), phrase, sentence, etc. A grammar can deal with structure both in
terms of surface (syntax) and meaning (semantics and discourse).
corpus is a body of language, either text or speech, which provides the basis
are national corpora of hundreds of millions of words but there are also
corpora which are constructed for particular purposes. For example, a corpus
could comprise recordings of car drivers speaking to a simulation of a control
system, which recognises spoken commands, which is then used to help establish
the user requirements for a voice operated control system for the market.
The diagram below depicts the chain of activities which are involved in Language Engineering, from research to the delivery of language-enabled and language enhanced products and services to end-users. The process of research and development leads to the development of techniques, the production of resources, and the development of standards. These are the basic building blocks.
basic processes of Language Engineering are shown in the diagram below. These
are broadly concerned with:
of a Language Enabled System
tools facilities provided in conjunction with word processing to aid the
author of documents, typically including an on-line dictionary and thesaurus,
spell-, grammar-, and style-checking, and facilities for structuring,
integrating and linking documents.
stemmer for English, for example, should identify the string
"cats" (and possibly "catlike", "catty" etc.) as
based on the root "cat", and "stemmer", "stemming",
"stemmed" as based on "stem".
stemmers are fairly trivial (with only occasional problems, such as "dries"
being the third-person singular present form of the verb "dry",
"axes" being the plural of "ax"
as well as "axis"); but stemmers become harder to design as the
morphology, orthography, and character encoding of the target language becomes
more complex. For example, an Italian stemmer is more complex than an English
one (because of more possible verb inflections), a Russian one is more complex
(more possible noun declensions), a Hebrew one is even more complex (a hairy
writing system), and so on.
are common elements in query systems, since a user who runs a query on "daffodils"
probably cares about documents that contain the word "daffodil" (without
usually applied to the area of application of the language enabled software
e.g. banking, insurance, travel, etc.; the significance in Language
Engineering is that the vocabulary of an application is restricted so the
language resource requirements are effectively limited by limiting the domain
workbench a software system providing a working environment for a human
translator, which offers a range of aids such as on-line dictionaries,
thesauri, translation memories, etc.
parser software which parses language to a point where a rudimentary level of
understanding can be realised; this is often used in order to identify
passages of text which can then be analysed in further depth to fulfil the
about the state-of-the-art need to be made in the context of specific
applications which reflect the constraints on the task.
Moreover, different technologies are sometimes appropriate for different tasks.
For example, when the vocabulary is small, the entire word can be modeled
as a single unit. Such an approach is not practical for large vocabularies,
where word models must be built up from subword
N is the total number of words in the test set, and S, I,
and D are the total number of substitutions, insertions, and deletions,
past decade has witnessed significant progress in speech recognition
technology. Word error rates continue to drop by a factor of 2 every two years.
Substantial progress has been made in the basic technology, leading to the
lowering of barriers to speaker independence, continuous speech, and large
vocabularies. There are several factors that have contributed to this rapid
progress. First, there is the coming of age of the HMM. HMM is powerful in
that, with the availability of training data, the parameters of the model can
be trained automatically to give optimal performance.
much effort has gone into the development of large speech corpora for system
development, training, and testing. Some of these corpora are designed for
acoustic phonetic research, while others are highly task specific. Nowadays,
it is not uncommon to have tens of thousands of sentences available for system
training and testing. These corpora permit researchers to quantify the
acoustic cues important for phonetic contrasts and to determine parameters of
the recognizers in a statistically meaningful way. While many of these corpora
(e.g., TIMIT, RM, ATIS, and WSJ; see section 12.3) were originally collected
under the sponsorship of the U.S. Defense Advanced
Research Projects Agency (ARPA) to spur human language technology development
among its contractors, they have nevertheless gained world-wide acceptance
(e.g., in Canada, France, Germany, Japan, and the U.K.) as standards on which
to evaluate speech recognition.
progress has been brought about by the establishment of standards for
performance evaluation. Only a decade ago, researchers trained and tested
their systems using locally collected data, and had not been very careful in
delineating training and testing sets. As a result, it was very difficult to
compare performance across systems, and a system's performance typically
degraded when it was presented with previously unseen data. The recent
availability of a large body of data in the public domain, coupled with the
specification of evaluation standards, has resulted in uniform documentation
of test results, thus contributing to greater reliability in monitoring
progress (corpus development activities and evaluation methodologies are
summarized in chapters 12 and 13 respectively).
advances in computer technology have also indirectly influenced our progress.
The availability of fast computers with inexpensive mass storage capabilities
has enabled researchers to run many large scale experiments in a short amount
of time. This means that the elapsed time between an idea and its
implementation and evaluation is greatly reduced. In fact, speech recognition
systems with reasonable performance can now run in real time using high-end
workstations without additional hardware---a feat unimaginable only a few
of the most popular, and potentially most useful tasks with low perplexity (PP=11)
is the recognition of digits. For American English, speaker-independent
recognition of digit strings spoken continuously and restricted to telephone
bandwidth can achieve an error rate of 0.3% when the string length is known.
of the best known moderate-perplexity tasks is the 1,000-word so-called
Resource Management (RM) task, in which inquiries can be made concerning
various naval vessels in the
perplexity tasks with a vocabulary of thousands of words are intended
primarily for the dictation application. After working on isolated-word,
speaker-dependent systems for many years, the community has
since 1992 moved towards very-large-vocabulary (20,000 words and more),
the steady improvements in speech recognition performance, systems are now
being deployed within telephone and cellular networks in
many countries. Within the next few years, speech recognition will be
pervasive in telephone networks around the world. There are tremendous forces
driving the development of the technology; in many countries, touch tone
penetration is low, and voice is the only option for controlling automated
services. In voice dialing, for
example, users can dial 10--20 telephone numbers by voice (e.g., call home)
after having enrolled their voices by saying the words associated with
telephone numbers. AT&T, on the other hand, has installed a call routing
system using speaker-independent word-spotting
technology that can detect a few key phrases (e.g., person to person,
calling card) in sentences such as: I want to charge it to my
present, several very large vocabulary dictation systems are
available for document generation. These systems generally require speakers to
pause between words. Their performance can be further enhanced if one can
apply constraints of the specific domain such as dictating medical reports.
though much progress is being made, machines are a long way from recognizing
conversational speech. Word recognition rates on telephone conversations in
the Switchboard corpus are around 50% [CGF94].
It will be many years before unlimited vocabulary,
speaker-independent continuous dictation capability is realized.
main differences between speech recognition and speech synthesis.
recognition is the process of converting an acoustic signal,
captured by a microphone or a telephone, to a set of words. The recognized
words can be the final results, as for applications such as commands &
control, data entry, and document
preparation. They can also serve as the input to further
linguistic processing in order to achieve speech understanding, a subject
covered in section
recognition systems can be characterized by many parameters, some of the more
important of which are shown in Figure
simplest language model can be specified as a finite-state network,
where the permissible words following each word are given explicitly. More
general language models approximating natural language are specified in terms
of a context-sensitive grammar.
popular measure of the difficulty of the task, combining the vocabulary size
and the language model, is perplexity,
loosely defined as the geometric mean of the number of words that can follow a
word after the language model has been applied (see section
recognition is a difficult problem, largely because of the many sources of
variability associated with the signal. First, the acoustic realizations of
phonemes, the smallest sound units of which words are
composed, are highly dependent on the context in which they appear. These phonetic
variabilities are exemplified by the acoustic
differences of the phoneme
can result from changes in the environment as well as in the position and
characteristics of the transducer. Third, within-speaker variabilities
can result from changes in the speaker's physical and emotional state,
speaking rate, or voice quality. Finally, differences in sociolinguistic
background, dialect, and vocal tract size and shape can contribute to across-speaker
recognition systems attempt to model the sources of variability described
above in several ways. At the level of signal representation, researchers have
developed representations that emphasize perceptually important
speaker-independent features of the signal, and de-emphasize speaker-dependent
At the acoustic phonetic level, speaker variability is typically modeled
using statistical techniques applied to large amounts of data. Speaker
adaptation algorithms have also been developed that adapt
speaker-independent acoustic models to those of the current
speaker during system use, (see section
level variability can be handled by allowing alternate pronunciations
of words in representations known as pronunciation networks.
Common alternate pronunciations of words, as well as effects of dialect and
accent are handled by allowing search algorithms to find alternate paths of
phonemes through these networks. Statistical language models, based on
estimates of the frequency of occurrence of word sequences, are often used to
guide the search through the most probable sequence of words.
dominant recognition paradigm in the past fifteen years is known as hidden
Markov models (HMM). An HMM is a doubly stochastic model,
in which the generation of the underlying phoneme string and the
frame-by-frame, surface acoustic realizations are both represented
probabilistically as Markov processes, as discussed in sections
interesting feature of frame-based HMM systems
is that speech segments are identified during the search process,
rather than explicitly. An alternate approach is to first identify speech
segments, then classify the segments and use the segment scores to recognize
words. This approach has produced competitive recognition performance in
several tasks [ZGPS90,FBC95].
generation is the process which allows the transformation of
a string of phonetic and prosodic symbols into a synthetic speech signal.
The quality of the result is a function of the quality of the string, as well
as of the quality of the generation process itself. For a review of speech
generation in English the reader is referred to [FR73]
Recent developments can be found in [BB92],
and in [VSSOH95].
us examine first what is requested today from a text-to-speech (TtS)
system. Usually two quality criteria are proposed. The first one is
intelligibility, which can be measured by taking into
account several kinds of units (phonemes, syllables,
words, phrases). The second one, more
difficult to define, is often labeled as pleasantness
or naturalness. Actually the concept of naturalness
may be related to the concept of realism in the field of
image synthesis: the goal is not to restitute
the reality but to suggest it. Thus, listening to a synthetic voice must allow
the listener to attribute this voice to some pseudo-speaker and to
perceive some kind of expressivity as well as some indices characterizing the
speaking style and the particular situation of elocution. For this purpose the
corresponding extra-linguistic information must be supplied to the system [GN92].
of the present TtS systems produce an acceptable
level of intelligibility, but the naturalness dimension, the ability to
control expressivity, speech style and pseudo-speaker identity still are
poorly mastered. Let us mention however that users demands
vary to a large extent according to the field of application: general public
applications such as telephonic information retrieval need
maximal realism and naturalness, whereas
some applications involving professionals (process or vehicle control) or
highly motivated persons (visually impaired, applications in hostile
environments) demand intelligibility with the highest priority.
machine translation. List and describe at least three projects.
present there are only a few speech-to-speech machine translation projects, be
oral language is the most spontaneous and natural form of communication among
people, speech technology is perceived as a determining factor in achieving
better interaction with computers. The industry is aware of this fact and
realises that the incorporation of speech technology will be the ultimate step
in bringing computers closer to the general public.
the extent that personal computers are being equipped with more and more telematic
applications, coupled with the impending arrival of third generation mobile
phones, reliable speech recognition is becoming a must. There have been
important advances in recent years, although some limitations still persist
e.g. of vocabulary, of domain coverage, in the treatment of disfluencies
(the variation in the fluency of speech), etc. But despite these problems, the
technology today is ready to offer a wide range of services.
of the most attractive applications is without a doubt speech-to-speech
machine translation. There are a small number of initiatives that have
contributed significantly to the development of this technology. Verbmobil,
a project sponsored by the German government, and the European EuTrans
project are two worth mentioning.
the following interview, we have two representatives of one of the Spanish
research groups that has gained recognition in
recent years thanks to its research on speech-to-speech translation. The group
in question is the Pattern
Recognition and Human Language Technology (PRHLT)
Unit of the Universitat Politècnica
of València (UPV), co-directed by Francisco Casacuberta
Nolla and Enrique Vidal Ruiz.
PRHLT group carries out research both in speech technologies and in computer
vision. The EuTrans project—Example-based
language translation systems—is one of the many projects currently
undertaken by the group. Other research projects include "EXTRA:
Example-based extensions to text and speech translation in restricted
domains" and "Translation and comprehension of the language spoken
through example-based learning techniques: TRACOM", both funded by the
Spanish Foundation of Science and Technology (CICYT). The group is also
currently participating in a new European project: "TransType2 (TT2)-
report provides to get a general idea of what Human Language Technologies are
and how important they are in society. I have learnt that technology is
developing a lot and that everybody must learn to use it for two important
reasons: firstly because it is very helpful for people but also because
nowadays it is essential to know it. So people must get used to use these
technologies meanwhile these technologies are being adapt
to people’s necessities.