By: Nuria Sagaley


This report is a summary of the ideas presented in the university course called English Language and New Technologies. During this course we have been thinking about the influence new technologies are acquiring in our daily lives, that can also be titled as "Information Age".

The aim of this report is to analyse briefly the questionnaires we have been answering from the beginning of the term till now. The structure I am going to follow is quite easy, based on the order of questionnaires given in class. The information I am going to use is taken not only from the web pages given by the teacher, Joseba Abaitua, but also from personal searches through internet.

We must realize the fact that nowadays we are living in a period that can be called " Information Age". We are completely surrounded by computers and we are being forced to learn how to use them .


A lot of people have found difficult to understand the use of the technologic elements that are discovered nowadays, the so-called New Technologies. These technologies give us a lot of advantages for our daily life; we can get information for everything(almost) and we can also communicate with people all over the world immediately. Although we have a lot of advantages we must also realize that we have a big problem; the multilinguality, what means that we may get information in a language we don´t understand, this problem is being solved nowadays by the creation of something known as Machine Translation, which allow us, as the proper name explains, to translate the information we get from the web into our native language.

With this report my objective is to learn the proper way of using the tools New Technologies offer us and applying them to our life. I think that nowadays is extremely important to learn how to use the New Technologies and get a bit of fluency with them. What I will try to explain in the pages that follow is the importance of these technologies for our current future life as linguists. These elements (tools) will help us to translate, understand and share information with a lot of people all over the world. We should not forget the human factor in this kind of translation work because the computer is only a factor for helping us to translate some lines, but in every country we have special expressions or words that the machine may not know.

The report is going to be made in form of four parts in which I will define and explain briefly some important features in the field of New Technologies and Information Society, this is how our days are called. The parts are going to be divided according to the weekly questionnaires we were asking to make by our teacher in this course. I’m not going to explain deeply every single question but try t give a general idea about this important element in our present and future lives.


FIRST WEEK (17-20 February)

Human language technologies

Human Language Technologies will help to build bridges across languages and cultures and provide natural access to information and communication services. It will enable an active use and assimilation of multimedia content, and further strengthen Europe's position at the forefront of language-enabled digital services. It will support business activities in a global context and promote a truly human-centred infostructure ensuring equal access and usage opportunities for all. The ultimate goal of Human Language Technologies is an optimal use of the human capital, maximising businesses' competitiveness and empowering people.

Natural language processing:

A natural language is one that evolved along with a culture of human native speakers who use the language for general-purpose communication. Languages like English, American Sign Language and Japanese are natural languages, while languages like Esperanto are called constructed languages, having been deliberately created for a specific purpose.

Natural Language Generation (NLG) is the natural language processing task of generating natural language from a machine representation system such as a knowledge base or a logical form.

Some people view NLG as the opposite of natural language understanding. The difference can be put this way: whereas in natural language understanding the system needs to disambiguate the input sentence to produce the machine representation language, in NLG the system needs to take decisions about how to put a concept into words.

From Wikipedia, the free encyclopaedia.

Version 1.2, November 2002

What’s computational linguistics?

Computational linguistics (CL) is a discipline between linguistics and computer science which is concerned with the computational aspects of the human language faculty. It belongs to the cognitive sciences and overlaps with the field of artificial intelligence (AI), a branch of computer science aiming at computational models of human cognition. Computational linguistics has applied and theoretical components.



Language engineering:

Language Engineering is the application of knowledge of language to the development of computer systems which can recognise, understand, interpret, and generate human language in all its forms. In practice, Language Engineering comprises a set of techniques and language resources. The former are implemented in computer software and the latter are a repository of knowledge which can be accessed by computer software.

hltteam (.at.)  Last updated: 16.02.04

A deeper explanation of this terminology

We begin with the general terms language and engineering, then move on to CL, NLP and LE.

The Collins English Dictionary defines language as ``a system for the expression of thoughts, feelings, etc., by the use of spoken sounds or conventional symbols'' [ Makins1991]. Language is a communication mechanism whose medium is text or speech, and LE is concerned with computer processing of text and speech. We will define engineering in contrast to science (and, later, LE in contrast to CL and NLP).

Herb Simon contends that ``...far from striving to separate science from engineering, we need not distinguish them at all. But if we insist upon a distinction, we can think of engineering as science for people who are impatient''. This point is undermined a little by a criterion which immediately follows and by which the two fields may be separated: ``While the scientist is interested specifically in creating new knowledge, the engineer is interested also in creating systems that achieve desired goals'' [ Simon1995]. We'll preserve the distinction here and use a definition of science as ``the systematic study of the nature and behaviour of the material and physical universe, based on observation, experiment, and measurement, and the formulation of laws to describe these facts in general terms'' [ Makins1991].

Engineering connotes ``creating cost-effective solutions to practical problems by applying scientific knowledge'' [ Shaw and Garlan1996], or ``applying scientific principles to the design, construction and maintenance of engines, cars, machines etc.'' [ Makins1991], or a ``rigorous set of development methods'' [ Brown1989]. The ``...basic difference between science and engineering is that science is concerned with finding out how the world works, while engineering is concerned with using that knowledge to build artifacts in such a way that one can expect them to perform as required'' [ Cowling1996].

Buried in the phrases ``achieve desired goals'' and ``perform as required'' is the implication that engineered systems must conform to externally-specified criteria of adequacy, for example criteria relating to the space/time profile of an executable processing a certain type of data. This in turn has implications for the engineering process: unless our external criteria are floating free in time, a system will not perform as required or achieve its goals if the process of its development is unbounded with respect to resource requirements. In other words, engineers have to operate within finite time and with finite personnel resources and equipment. As well as implying that its outputs perform predictably, engineering implies that the process of constructing those outputs is also predictable.

If the act of engineering is a particular kind of construction process, the field of engineering is both the body of scientific knowledge relevant to a particular engineering task and also what we may call craft, art, lore, or heuristics: a body of known practice offering solutions to problems. For example, in electrical engineering, books with titles like ``The Art of Electronics'' [ Horowitz and Hill1989]; in software engineering books like ``Numerical Recipes in C - the Art of Scientific Computing'' [ Press et al. 1995], or ``Expert C Programming - Deep C Secrets'' [ van der Linden1994], or ``Design Patterns - Elements of Reusable Object-Oriented Software'' [ Gamma et al. 1995]. This craft knowledge takes many forms, e.g. a description of what scientific results are relevant to particular problems; example solutions to problems; notes of common problems or pitfalls in p articular implementation paradigms; anecdotal evidence suggesting that certain solutions may be better than others in some circumstances. For example, a book on database design may refer to relational algebra; a book on algorithms may provide a proven, tested method for sorting with a known complexity and space requirement; a book on expert programming may contain hints like ``versions of the indent code formatting program up to the mid-80's silently transform x=!y into x!=y''1; a book on design patterns might suggest that implementing a monitor pattern under a particular thread regime is inefficient.

To summarise, we define engineering as a construction process that is directed both by intended conformance of the resultant artifact to well-specified criteria of fitness and by constraints operating on the nature of the process itself; both the construction process itself and its outputs should be measurable and predictable; the activity is informed by relevant scientific knowledge and practical experience.

Turning to Computational Linguistics, we refer to our definitions of science and language: CL is that part of the science of human language that uses computers to aid observation of, or experiment with, language. If ``Theoretical linguists... attempt to characterise the nature of... either a language or Language'' or ``a grammar or Grammar'', then ``...theoretical Computational Linguistics proper consists in attempting such a characterisation computationally'' [ Thompson1985]. In other words, CL concentrates ``on studying natural languages, just as traditional Linguistics does, but using computers as a tool to model (and, sometimes, verify or falsify) fragments of linguistic theories deemed of particular interest'' [ Boguraev, Garigliano, and Tait1995].

Natural Language Processing is a term used in a variety of ways in different contexts. Much work that goes under the heading of NLP could well fit under our definition of CL, and some could also fit the definition of LE that follows. Here we'll use a narrow definition that makes CL and NLP disjoint. Whereas CL is ``a branch of linguistics in which computational techniques and concepts are applied to the elucidation of linguistic and phonetic problems'' [ Crystal1991], NLP is a branch of computer science that studies computer for processing natural languages. It includes the development of algorithms for parsing, generation, and acquisition of linguistic knowledge; the investigation of the time and space complexity of such algorithms; the design of computationally useful formal languages (such as grammar and lexicon formalisms) for encoding linguistic knowledge; the investigation of appropriate software architectures for various NLP tasks; and consideration of the types of non-linguistic knowledge that impinge on NLP. It is a fairly abstract area of study and it is not one that makes particular commitments to the study of the human mind, nor indeed does it make particular commitments to producing useful artifacts. [ Gazdar1996]2

There are elements of both CL and NLP in Winograd's early description of the work as

...part of a newly developing paradigm for looking at human behaviour, which has grown up from working with computers. ... Computers and computer languages give us a formal metaphor, within which we can model the processes and test the implications of our theories. [ Winograd1972]

To summarise: CL is a part of the science of language that uses computers as investigative tools; NLP is part of the science of computation whose subject matter is computer systems that process human language3. There is crossover and blurring of these definitions in practice, but they capture some important generalisations.

We have by implication conflated spoken language and textual language, but this hides the fact that there is a distinct field concerned with computer processing of spoken language which is

...known as speech processing or just Speech. Surprisingly, perhaps, Speech and NLP are really rather separate disciplines: until recently there has been little overlap in the personnel involved, the journals and conferences are largely disjoint, and the theoretical approaches and methods used have had little in common. Speech research studies the problems that are peculiar to processing the spoken form of language whereas NLP research, in principle at least, studies the problems that are common to the processing of both spoken and written forms of natural languages. [ Gazdar1996]

We'll have some more to say about speech below, but now we turn to our main subject, Language Engineering.

From our perspective, Gazdar's definition of ``Applied NLP'' is close to that of LE, a subject which

...involves the construction of intelligent computational artifacts that process natural languages in ways that are useful to people other than computational linguists. The test of utility here is essentially that of the market. Examples include machine translation packages, programs that convert numerical data or sequences of error codes into coherent text or speech, systems that map text messages into symbolic or numeric data, and natural language interfaces to databases. [ Gazdar1996]

LE is the application of NLP to the construction of computer systems that process language for some task usually other than modelling language itself, or ``...the instrumental use of language processing, typically as part of a larger system with some practical goal, such as accessing a database'' [ Thompson1985]4. Probably as good a short definition as any of the products of LE is given by Jacobs: LE ``systems are meant to be fast, effective and helpful'' [ Jacobs1992]. A longer definition of the field appears in the editorial of the first issue of this journal, which also contrasts the practices of NLP and CL with those of LE5:

The principal defining characteristic of NLE work is its objective: to engineer products which deal with natural language and which satisfy the constraints in which they have to operate. This definition may seem tautologous or a statement of the obvious to an engineer practising in another, well established area (e.g. mechanical or civil engineering), but is still a useful reminder to practitioners of software engineering, and it becomes near-revolutionary when applied to natural language processing. This is partly because of what, in our opinion, has been the ethos of most Computational Linguistics research. Such research has concentrated on studying natural languages, just as traditional Linguistics does, but using computers as a tool to model (and, sometimes, verify or falsify) fragments of linguistic theories deemed of particular interest. This is of course a perfectly respectable and useful scientific endeavour, but does not necessarily (or even often) lead t o working systems for the general public. [ Boguraev, Garigliano, and Tait1995]

They go on to define the ``constraints to be satisfied'', including the existence of a target user group with a defined need that can be solved by the LE system, favourable cost-benefit profile and defined criteria for testing and evaluation of the system.

The final component of our definition of LE is that it is an art, or a body of rules of thumb. This is little written of at present, but has been noted at various workshops. For example, at panel discussions quoted in [ Mitkov1995, Mitkov1996]: Tsujii called LE heuristic and empirical (and not normally interested in language per se); Matusmoto said it is art, craft, and technique. Not strictly relevant, but included here for flavouring are Yusoff's comments that language engineers make things work without knowing why, whereas computational linguists know why their systems don't work, and that art is ``doing it'', engineering is ``doing right things'' and science is ``doing things right''. As with any engineering discipline, LE rests on an ever-expanding body of craft, lore, art, heuristics, and practice.

To summarise, our gloss on these various definitions is this:

Language Engineering is the discipline or act of engineering software systems that perform tasks involving processing human language. Both the construction process and its outputs are measurable and predictable. The literature of the field relates to both application of relevant scientific results and a body of practice.

Having arrived at an answer to ``what is LE?'', we now turn to the question of whether it constitutes a new field or just a new label.


Does the notion of "Information Society" have any relation to human language?

The term Information Society has been around for a long time now and, indeed, has become something of a cliché. The notion of the coming Information Society reminds me of the way the idea of the Sydney 2000 Olympics and the way it shimmers in the distance. We look towards the Olympics and resolve to prepare hard for it. We must rapidly transform ourselves, our city, our demeanour to be ready and worthy. Time is of the essence in making ourselves ready for the challenge. There is certain breathlessness in all of this rhetoric.

Is there any concern in Europe with Human Language Technologies?

In the European Union, the concept of the Information Society has been evolving strongly over the past few years building on the philosophy originally spelled out by Commissioner Martin Bangemann in 1994. Bangemann argued that the Information Society represents a "revolution based on information ... [which] adds huge new capacities to human intelligence and constitutes a resource which changes the way we work together and the way we live together..." (European Commission, 1994:4). One of the main implications of this "revolution" for Bangemann is that the Information Society can secure badly needed jobs (Europe and the Global Information Society, 1994:3). In other words, a driving motivation for the Information Society is the creation of employment for depressed economies.


Closer to home it is instructive to look at just a few policy (or would-be policy) documents to see the views of the Information Society dominant here. The Goldsworthy report sees the Information Society as a "societal revolution based around information and communication technologies and about the role of these in developing global competitiveness and managing the transition to a globalised free trade world" (Department of Industry, Science and Tourism, 1997). In short, Goldsworthy's idea of the Information Society is entirely an economic one. At a broader level Barry Jones, the author of the House of Representatives Standing Committee's 1991 report 'Australia as a Information Society' sets out a definition of the Information Society which sees it as simply "a period when use of time, family life, employment, education and social interaction are increasingly influenced by access to Information Technology" (Australia as an Information Society: Grasping New Pa radigms, 1991).

These are just a few examples of ideas underpinning information policy drives in the developed world where the concept is accepted almost without challenge, and there is an inherent belief that like the Olympics, the Information Society is real - or will be very soon if only we can get ourselves organised properly. Some claim, of course, that the Information Society is here already and not just on its way. But one way or the other "it" exists and is a "good thing". By and large, national and regional Information Society documents do not question the belief that the Information Society will bring prosperity and happiness if a few basic safeguards are put in place. Some of the very few notes of serious caution in the practice of information policy have come through the influence of the Scandinavian countries which joined the European Union when the EU was already in full flight with implementing the actions flowing from the Bangemann report.

Interestingly, in recent travels in India I noticed an extraordinary level of hope and trust in that developing country in the potential of information technology to transform India into a modern fully developed economy. The push to develop information and technological infrastructure initiated by Rajiv Gandhi is seen as positive and a necessary step for the goal of a universally prosperous society in India. Effectively there is the same acceptance of the goodness of an Information Society and the absolute necessity to be one, that is found in the West.

Given this blind faith in the existence and the desirability of an Information Society among diverse nations, it is instructive to look at the theoretical literature which has spawned the idea to see what it claims for the Information Society. The term Information Society has many synonyms: Information Age, Information Revolution, Information Explosion and so on and it is found across a wide spectrum of disciplines. Fortunately the task of unravelling many of these ideas has been accomplished in a masterly way by Frank Webster. He has categorised the variety of concepts of the Information Society, Information Revolution, or whatever, and provided an analysis of five common conceptions of the Information Society (Webster, 1995).


What is the current situation of the office?

The overall objective of HLT is to support e-business in a global context and to promote a human centred infostructure ensuring equal access and usage opportunities for all. This is to be achieved by developing multilingual technologies and demonstrating exemplary applications providing features and functions that are critical for the realisation of a truly user friendly Information Society. Projects address generic and applied RTD from a multi- and cross-lingual perspective, and undertake to demonstrate how language specific solutions can be transferred to and adapted for other languages. (2001)

SECOND WEEK (24-27 February)


Which are the main techniques used in Language Engineering?

Techniques There are many techniques used in Language Engineering and some of these are described below: Speaker Identification and Verification A human voice is as unique to an individual as a fingerprint. This makes it possible to identify a speaker and to use this identification as the basis for verifying that the individual is entitled to access a service or a resource. The types of problems which have to be overcome are, for example, recognising that the speech is not recorded, selecting the voice through noise (either in the environment or the transfer medium), and identifying reliably despite temporary changes (such as caused by illness). Speech Recognition The sound of speech is received by a computer in analogue wave forms which are analysed to identify the units of sound (called phonemes) which make up words. Statistical models of phonemes and words are used to recognise discrete or continuous speech input. The production of quality statistical models requires extensive training samples (corpora) and vast quantities of speech have been collected, and continue to be collected, for this purpose. There are a number of significant problems to be overcome if speech is to become a commonly used medium for dealing with a computer. The first of these is the ability to recognise continuous speech rather than speech which is deliberately delivered by the speaker as a series of discrete words separated by a pause. The next is to recognise any speaker, avoiding the need to train the system to recognise the speech of a particular individual. There is also the serious problem of the noise which can interfere with recognition, either from the environment in which the speaker uses the system or through noise introduced by the transmission medium, the telephone line, for example. Noise reduction, signal enhancement and key word spotting can be used to allow accurate and robust recognition in noisy environments or over telecommunication n etworks. Finally, there is the problem of dealing with accents, dialects, and language spoken, as it often is, ungrammatically. Character and Document Image Recognition Recognition of written or printed language requires that a symbolic representation of the language is derived from its spatial form of graphical marks. For most languages this means recognising and transforming characters. There are two cases of character recognition: recognition of printed images, referred to as Optical Character Recognition (OCR) recognising handwriting, usually known as Intelligent Character Recognition (ICR) OCR from a single printed font family can achieve a very high degree of accuracy. Problems arise when the font is unknown or very decorative, or when the quality of the print is poor. In these difficult cases, and in the case of handwriting, good results can only be achieved by using ICR. This involves word recognition techniques which use language models, such as lexicons or statisti cal information about word sequences. Document image analysis is closely associated with character recognition but involves the analysis of the document to determine firstly its make-up in terms of graphics, photographs, separating lines and text, and then the structure of the text to identify headings, sub-headings, captions etc. in order to be able to process the text effectively. Natural Language Understanding The understanding of language is obviously fundamental to many applications. However, perfect understanding is not always a requirement. In fact, gaining a partial understanding is often a very useful preliminary step in the process because it makes it possible to be intelligently selective about taking the depth of understanding to further levels. Shallow or partial analysis of texts is used to obtain a robust initial classification of unrestricted texts efficiently. This initial analysis can then be used, for example, to focus on 'interesting' parts of a text for a deeper semantic analysis which determines the content of the text within a limited domain. It can also be used, in conjunction with statistical and linguistic knowledge, to identify linguistic features of unknown words automatically, which can then be added to the system's knowledge. Semantic models are used to represent the meaning of language in terms of concepts and relationships between them. A semantic model can be used, for example, to map an information request to an underlying meaning which is independent of the actual terminology or language in which the query was expressed. This supports multi-lingual access to information without a need to be familiar with the actual terminology or structuring used to index the information. Combinations of analysis and generation with a semantic model allow texts to be translated. At the current stage of development, applications where this can be achieved need be limited in vocabulary and concepts so that adequate Language Engi neering resources can be applied. Templates for document structure, as well as common phrases with variable parts, can be used to aid generation of a high quality text. Natural Language Generation A semantic representation of a text can be used as the basis for generating language. An interpretation of basic data or the underlying meaning of a sentence or phrase can be mapped into a surface string in a selected fashion; either in a chosen language or according to stylistic specifications by a text planning system. Speech Generation Speech is generated from filled templates, by playing 'canned' recordings or concatenating units of speech (phonemes, words) together. Speech generated has to account for aspects such as intensity, duration and stress in order to produce a continuous and natural response. Dialogue can be established by combining speech recognition with simple generation, either from concatenation of stored human speech components or synthesising speech using rules . Providing a library of speech recognisers and generators, together with a graphical tool for structuring their application, allows someone who is neither a speech expert nor a computer programmer to design a structured dialogue which can be used, for example, in automated handling of telephone calls.

Which language resources are essential components of Language Engineering?

 The essential components of Language Engineering are the ones that follow:

*Language Resources

Language resources are essential components of Language Engineering. They are one of the main ways of representing the knowledge of language, which is used for the analytical work leading to recognition and understanding.

The work of producing and maintaining language resources is a huge task. Resources are produced, according to standard formats and protocols to enable access, in many EU languages, by research laboratories and public institutions. Many of these resources are being made available through the European Language Resources Association (ELRA).


A lexicon is a repository of words and knowledge about those words. This knowledge may include details of the grammatical structure of each word (morphology), the sound structure (phonology), the meaning of the word in different textual contexts, e.g. depending on the word or punctuation mark before or after it. A useful lexicon may have hundreds of thousands of entries. Lexicons are needed for every language of application.

*Specialist Lexicons

There are a number of special cases which are usually researched and produced separately from general purpose lexicons:

Proper names: Dictionaries of proper names are essential to effective understanding of language, at least so that they can be recognised within their context as places, objects, or person, or maybe animals. They take on a special significance in many applications, however, where the name is key to the application such as in a voice operated navigation system, a holiday reservations system, or railway timetable information system, based on automated telephone call handling.

Terminology: In today's complex technological environment there are a host of terminologies which need to be recorded, structured and made available for language enhanced applications. Many of the most cost-effective applications of Language Engineering, such as multi-lingual technical document management and machine translation, depend on the availability of the appropriate terminology banks.

Word nets: A word net describes the relationships between words; for example, synonyms, antonyms, collective nouns, and so on. These can be invaluable in such applications as information retrieval, translator workbenches and intelligent office automation facilities for authoring.


A grammar describes the structure of a language at different levels: word (morphological grammar), phrase, sentence, etc. A grammar can deal with structure both in terms of surface (syntax) and meaning (semantics and discourse).


A corpus is a body of language, either text or speech, which provides the basis for:

There are national corpora of hundreds of millions of words but there are also corpora which are constructed for particular purposes. For example, a corpus could comprise recordings of car drivers speaking to a simulation of a control system, which recognises spoken commands, which is then used to help establish the user requirements for a voice operated control system for the market.





Check for the following terms (choose at least five):



A stemmer is a program or algorithm which determines the morphological root of a given inflected (or, sometimes, derived) word form -- generally a written word form.

A stemmer for English, for example, should identify the string "cats" (and possibly "catlike", "catty" etc.) as based on the root "cat", and "stemmer", "stemming", "stemmed" as based on "stem".

English stemmers are fairly trivial (with only occasional problems, such as "dries" being the third-person singular present form of the verb "dry", "axes" being the plural of "ax" as well as "axis"); but stemmers become harder to design as the morphology, orthography, and character encoding of the target language becomes more complex. For example, an Italian stemmer is more complex than an English one (because of more possible verb inflections), a Russian one is more complex (more possible noun declensions), a Hebrew one is even more complex (a hairy writing system), and so on.

Stemmers are common elements in query systems, since a user who runs a query on "daffodils" probably cares about documents that contain the word "daffodil" (without the s).




*Shallow parser

Shallow parser software which parses language to a point where a rudimentary level of understanding can be realised; this is often used in order to identify passages of text which can then be analysed in further depth to fulfil the particular objective.


Domain usually applied to the area of application of the language enabled software e.g. banking, insurance, travel, etc.; the significance in Language Engineering is that the vocabulary of an application is restricted so the language resource requirements are effectively limited by limiting the domain of application.

*Translator`s workbench

Translator's workbench a software system providing a working environment for a human translator, which offers a range of aids such as on-line dictionaries, thesauri, translation memories, etc.


*Authoring tools

Authoring tools facilities provided in conjunction with word processing to aid the author of documents, typically including an on-line dictionary and thesaurus, spell-, grammar-, and style-checking, and facilities for structuring, integrating and linking documents.





THIRD WEEK (2-5 March)


State of the Art

Comments about the state-of-the-art need to be made in the context of specific applications which reflect the constraints on the task. Moreover, different technologies are sometimes appropriate for different tasks. For example, when the vocabulary is small, the entire word can be modeled as a single unit. Such an approach is not practical for large vocabularies, where word models must be built up from subword units.

Performance of speech recognition systems is typically described in terms of word error rate, E, defined as:

E= S+I+D100


Where N is the total number of words in the test set, and S, I, and D are the total number of substitutions, insertions, and deletions, respectively.

The past decade has witnessed significant progress in speech recognition technology. Word error rates continue to drop by a factor of 2 every two years. Substantial progress has been made in the basic technology, leading to the lowering of barriers to speaker independence, continuous speech, and large vocabularies. There are several factors that have contributed to this rapid progress. First, there is the coming of age of the HMM. HMM is powerful in that, with the availability of training data, the parameters of the model can be trained automatically to give optimal performance.

Second, much effort has gone into the development of large speech corpora for system development, training, and testing. Some of these corpora are designed for acoustic phonetic research, while others are highly task specific. Nowadays, it is not uncommon to have tens of thousands of sentences available for system training and testing. These corpora permit researchers to quantify the acoustic cues important for phonetic contrasts and to determine parameters of the recognizers in a statistically meaningful way. While many of these corpora (e.g., TIMIT, RM, ATIS, and WSJ; see section 12.3) were originally collected under the sponsorship of the U.S. Defense Advanced Research Projects Agency (ARPA) to spur human language technology development among its contractors, they have nevertheless gained world-wide acceptance (e.g., in Canada, France, Germany, Japan, and the U.K.) as standards on which to evaluate speech recognition.

Third, progress has been brought about by the establishment of standards for performance evaluation. Only a decade ago, researchers trained and tested their systems using locally collected data, and had not been very careful in delineating training and testing sets. As a result, it was very difficult to compare performance across systems, and a system's performance typically degraded when it was presented with previously unseen data. The recent availability of a large body of data in the public domain, coupled with the specification of evaluation standards, has resulted in uniform documentation of test results, thus contributing to greater reliability in monitoring progress (corpus development activities and evaluation methodologies are summarized in chapters 12 and 13 respectively).

Finally, advances in computer technology have also indirectly influenced our progress. The availability of fast computers with inexpensive mass storage capabilities has enabled researchers to run many large scale experiments in a short amount of time. This means that the elapsed time between an idea and its implementation and evaluation is greatly reduced. In fact, speech recognition systems with reasonable performance can now run in real time using high-end workstations without additional hardware---a feat unimaginable only a few years ago.

One of the most popular, and potentially most useful tasks with low perplexity (PP=11) is the recognition of digits. For American English, speaker-independent recognition of digit strings spoken continuously and restricted to telephone bandwidth can achieve an error rate of 0.3% when the string length is known.

One of the best known moderate-perplexity tasks is the 1,000-word so-called Resource Management (RM) task, in which inquiries can be made concerning various naval vessels in the Pacific ocean. The best speaker-independent performance on the RM task is less than 4%, using a word-pair language model that constrains the possible words following a given word (PP=60). More recently, researchers have begun to address the issue of recognizing spontaneously generated speech. For example, in the Air Travel Information Service (ATIS) domain, word error rates of less than 3% has been reported for a vocabulary of nearly 2,000 words and a bigram language model with a perplexity of around 15.

High perplexity tasks with a vocabulary of thousands of words are intended primarily for the dictation application. After working on isolated-word, speaker-dependent systems for many years, the community has since 1992 moved towards very-large-vocabulary (20,000 words and more), high-perplexity ,speaker-independent, continuous speech recognition. The best system in 1994 achieved an error rate of 7.2% on read sentences drawn from North America business news.

With the steady improvements in speech recognition performance, systems are now being deployed within telephone and cellular networks in many countries. Within the next few years, speech recognition will be pervasive in telephone networks around the world. There are tremendous forces driving the development of the technology; in many countries, touch tone penetration is low, and voice is the only option for controlling automated services. In voice dialing, for example, users can dial 10--20 telephone numbers by voice (e.g., call home) after having enrolled their voices by saying the words associated with telephone numbers. AT&T, on the other hand, has installed a call routing system using speaker-independent word-spotting technology that can detect a few key phrases (e.g., person to person, calling card) in sentences such as: I want to charge it to my calling card.

At present, several very large vocabulary dictation systems are available for document generation. These systems generally require speakers to pause between words. Their performance can be further enhanced if one can apply constraints of the specific domain such as dictating medical reports.

Even though much progress is being made, machines are a long way from recognizing conversational speech. Word recognition rates on telephone conversations in the Switchboard corpus are around 50% . It will be many years before unlimited vocabulary, speaker-independent continuous dictation capability is realized.


Speech-to-Speech machine translation.

It is worth remembering that most prototypes developed within research projects are currently only capable of processing a few hundreds of sentences (around 300), on very specific topics (accommodation-booking, planning trips, etc.) and for a small group of languages—English, German, Japanese, Spanish, Italian. It seems unlikely that any application will be able to go beyond these boundaries in the near future.

The direct incorporation of speech translation prototypes into industrial applications is at present too costly. However, the growing demand for these products leads us to believe that they will soon be on the market at more affordable prices. The systems developed in projects such as Verbmobil, EuTrans or Janus—despite being at the laboratory phase—contain in practice thoroughly evaluated and robust technologies. A manufacturer considering their integration may join R&D projects and take part in the development of prototypes with the prospect of a fast return on investment. It is quite clear that we are witnessing the emergence of a new technology with great potential for penetrating the telecommunications and microelectronics market in the not too distant future.

Another remarkable aspect of the EuTrans project is its methodological contribution to machine translation as a whole, both in speech and written modes. Although these two modes of communication are very different in essence, and their respective technologies cannot always be compared, speech-to-speech translation has brought prospects of improvement for text translation. Traditional methods for written texts tend to be based on grammatical rules. Therefore, many MT systems show no coverage problem, although this is achieved at the expense of quality. The most common way of improving quality is by restricting the topic of interest. It is widely accepted that broadening of coverage immediately endangers quality. In this sense, learning techniques that enable systems to automatically adapt to new textual typologies, styles, structures, terminological and lexical items could have a radical impact on the technology.

Due to the differences between oral and written communication, rule-based systems prepared for written texts can hardly be re-adapted to oral applications. This is an approach that has been tried, and has failed. On the contrary, example-based learning methods designed for speech-to-speech translation systems can easily be adapted to the written texts, given the increasing availability of bilingual corpora. One of the main contributions of the PRHLT-ITI group is precisely in its learning model based on bilingual corpora. Herein lie some interesting prospects for improving written translation techniques.

Effective speech-to-speech translation, along with other voice-oriented technologies, will become available in the coming years, albeit with some limitations e.g. the number of languages, linguistic coverage, and context. It could be argued that EuTrans' main contribution has been to raise the possibilities of speech-to-speech translation to the levels of speech recognition technology, making any new innovation immediatly accessible.


Speech recognition

(Or voice recognition) The identification of spoken words by a machine. The spoken words are digitised (turned into sequence of numbers) and matched against coded dictionaries in order to identify the words.

Most systems must be "trained," requiring samples of all the actual words that will be spoken by the user of the system. The sample words are digitised, stored in the computer and used to match against future words. More sophisticated systems require voice samples, but not of every word. The system uses the voice samples in conjunction with dictionaries of larger vocabularies to match the incoming words. Yet other systems aim to be "speaker-independent", i.e. they will recognise words in their vocabulary from any speaker without training.

Another variation is the degree with which systems can cope with connected speech. People tend to run words together, e.g. "next week" becomes "neksweek" (the "t" is dropped). For a voice recognition system to identify words in connected speech it must take into account the way words are modified by the preceding and following words.

It has been said (in 1994) that computers will need to be something like 1000 times faster before large vocabulary (a few thousand words), speaker-independent, connected speech voice recognition will be feasible.



One other definition about SR

This involves the computer taking the user's speech and interpreting what has been said. This allows the user to control the computer (or certain aspects of it) by voice, rather than having to use the mouse and keyboard, or alternatively just dictating the contents of a document.

The complex nature of translating the raw audio into phonemes involves a lot of signal processing and is not focused on here. These details are taken care of by an SR engine that will be installed on your machine. SR engines are often called recognisers and these days typically implement continuous speech recognition (older recognisers implemented isolated or discrete speech recognition, where pauses were required between words).

Speech recognition usually means one of two things. The application can understand and follow simple commands that it has been educated about in advance. This is known as command and control (sometimes seen abbreviated as CnC, or simply SR).

Alternatively an application can support dictation (sometimes abbreviated to DSR). Dictation is more complex as the engine has to try and identify arbitrary spoken words, and will need to decide which spelling of similarly sounding words is required. It develops context information based on the preceding and following words to try and help decide. Because this context analysis is not required with Command and Control recognition, CnC is sometimes referred to as context-free recognition.


Speech Synthesis

Speech synthesis is the computer-generated simulation of human speech. It is used to translate written information into aural information where it is more convenient, especially for mobile applications such as voice-enabled e-mail and unified messaging. It is also used to assist the vision-impaired so that, for example, the contents of a display screen can be automatically read aloud to a blind user. Speech synthesis is the counterpart of speech or voice recognition. The earliest speech synthesis effort was in 1779 when Russian Professor Christian Kratzenstein created an apparatus based on the human vocal tract to demonstrate the physiological differences involved in the production of five long vowel s ounds. The first fully functional voice synthesizer, Homer Dudley's VODER (Voice Operating Demonstrator), was shown at the 1939 World's Fair. The VODER was based on Bell Laboratories' vocoder (voice coder) research of the mid-thirties.,,sid9_gci773595,00.html

FOURTH WEEK (9-12 March)

Information fatigue syndrome

David Lewis coined the term "information fatigue syndrome" for what he expects will soon be a recognized medical condition.

"Having too much information can be as dangerous as having too little. Among other problems, it can lead to a paralysis of analysis, making it far harder to find the right solutions or make the best decisions."

"Information is supposed to speed the flow of commerce, but it often just clogs the pipes."



Summary of Findings

How much new information is created each year? Newly created information is stored in four physical media – print, film, magnetic and optical – and seen or heard in four information flows through electronic channels – telephone, radio and TV, and the Internet. This study of information storage and flows analyzes the year 2002 in order to estimate the annual size of the stock of new information recorded in storage media, and heard or seen each year in information flows. Where reliable data was available we have compared the 2002 findings to those of our 2000 study (which used 1999 data) in order to describe a few trends in the growth rate of information.

  1. Print, film, magnetic, and optical storage media produced about 5 exabytes of new information in 2002. Ninety-two percent of the new information was stored on magnetic media, mostly in hard disks.
  2. We estimate that the amount of new information stored on paper, film, magnetic, and optical media has about doubled in the last three years.
  3. Information flows through electronic channels -- telephone, radio, TV, and the Internet -- contained almost 18 exabytes of new information in 2002, three and a half times more than is recorded in storage media. Ninety eight percent of this total is the information sent and received in telephone calls - including both voice and data on both fixed lines and wireless.  



The overall objective of HLT is to support e-business in a global context and to promote a human centred infostructure ensuring equal access and usage opportunities for all. This is to be achieved by developing multilingual technologies and demonstrating exemplary applications providing features and functions that are critical for the realisation of a truly user friendly Information Society. Projects address generic and applied RTD from a multi- and cross-lingual perspective, and undertake to demonstrate how language specific solutions can be transferred to and adapted for other languages.

While elements of the three initial HLT action lines - Multilinguality, Natural Interactivity and Crosslingual Information Management are still present, there has been periodic re-assessment and tuning of them to emerging trends and changes in the surrounding economic, social, and technological environment. The trials and best practice in multilingual e-service and e-commerce action line was introduced in the IST 2000 work programme (IST2000) to stimulate new forms of partnership between technology providers, system integrators and users through trials and best practice actions addressing end-to-end multi-language platforms and solutions for e-service and e-commerce. The fifth IST call for proposals covered this action line.



Human language technologies

"Language technology refers to a range of technologies that have been developed over the last 40 years to enable people to more easily and naturally communicate with computers, through speech or text and, when called for, receive an intelligent and natural reply in much the same way as a person might respond." (E-S.l)

"Human Language Techology is the term for the language capabilities designed into the computing applications used in information and communication technology systems." (EM).

"Human Language Technology is sometimes quite familiar, e.g. the spell checker in your word processor, but can often be hidden away inside complex networks – a machine for automatically reading postal addresses, for example." (EM)

"From speech recognition to automatic translation, Human Language Technology products and services enable humans to communicate more naturally and more effectively with their computers – but above all, with each other." (EM)


Intelligent Text Processing

Intelligent Text Processing: "Ever been frustrated by a search engine? Find out how they work, but more importantly, find out how to make them intelligent. This unit also covers sophisticated web-based language technologies like document summarization, information extraction and machine translation. If you want to know about the Semantic Web, this is the unit for you." (CLT)


Semantic Web

The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries. It is a collaborative effort led by W3C with participation from a large number of researchers and industrial partners. It is based on the Resource Description Framework (RDF), which integrates a variety of applications using XML for syntax and URIs for naming.

"The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation." -- Tim Berners-Lee, James Hendler, Ora Lassila, The Semantic Web, Scientific American, May 2001



This report shows a big amount of information about the information society and New Technologies. Through it I have learnt the importance of being well informed in our daily lives as students. I have explained before the difficulties we may find with these technologies, one of them was the problem with the language, that we are supposed to overcome because we are students of English Philology.

In our degree we have to read a lot of English poets and novelists whose books aren´t often availables in our library, that´s why having the opportunity of using the net provide us with the necessary tools to understand their works. But there are people who can´t understand the language in which the information appears, that´s why the Machine Translation have appeared in our lives.