In my first report I am going to talk about human language technologies. The information that my report is going to contain is going to be based in some information taken by some pages of internet. During 8 weeks with our teacher Joseba Abaitua we have been following and writing some questionnaires. My report is going to be like a summary of all of them.

    I will ask some questions in my report like: what is human technologies?; what is language engineering? and its main components?;  what is speech technology? and finally I will write something about machine translation.In order to answer those questions I will divide my report in some parts. 




As I have said in the abstract I am going to divide my report in some parts; the first part will cover three questions that are going to be available for anybody who wants to have a general idea of human language technologies; the questions are:

        - Some definitions of human language technologies

        - Human language technologies and Europe

The second part will cover some questions relationated with language engineering, like:

    - What is language engineering? some definitions

    - What are the main techniques of language engineering?

The third part is going to be based in speech technology, and the question inside this part is going to be:

        - What is speech technology?

     -    What is speech recognition and what is speech synthesis?

The fourth part is going to be based on "information". I am going to answer questions like:

            - How much new information is created every year?

The last part of my essay is going to be based on " machine translator", I am going to quote and write the usefulness of machine translator nowadays.







                Definitions of human language technologies:



"Human Language Techology is the term for the language capabilities designed into the computing applications used in information and communication technology systems."

"Human Language Technology is sometimes quite familiar, e.g. the spell checker in your word processor, but can often be hidden away inside complex networks – a machine for automatically reading postal addresses, for example."

"From speech recognition to automatic translation, Human Language Technology products and services enable humans to communicate more naturally and more effectively with their computers – but above all, with each other."

Joseba Abaitua, Universidad de Deusto

Last modified: 04/19/2004 16:09:20Last modified: 03/30/2004 14:31:46 Last modified: 03/09/2004 13:01:33


The architecture of Human Language Technologies


The architecture of HLT


Living and Working Together in the Information Society

Discussion Document, Luxembourg, July 1997


Last updated: 16.10.00 16:50




Language technologies are information technologies that are specialized for dealing with the most complex information medium in our world: human language. Therefore these technologies

are also often subsumed under the term Human Language Technology. Human language

occurs in spoken and written form. Whereas speech is the oldest and most natural mode of

language communication, complex information and most of human knowledge is maintained

and transmitted in written texts.

Speech and text technologies process or produce language in

these two modes of realization. But language also has aspects that are shared between

speech and text such as dictionaries, most of grammar and the meaning of sentences. Thus

large parts of language technology cannot be subsumed under speech and text technologies.

Among those are technologies that link language to knowledge. We do not know how

language, knowledge and thought are represented in the human brain. Nevertheless, language

technology had to create formal representation systems that link language to concepts and

tasks in the real world. This provides the interface to the fast growing area of knowledge


In our communication we mix language with other modes of communication and other information

media. We combine speech with gesture and facial expressions. Digital texts are combined

with pictures and sounds. Movies may contain language and spoken and written form. Thus

speech and text technologies overlap and interact with many other technologies that facilitate

processing of multimodal communication and multimedia documents.

 the reader is referred to: Cole R.A., J. Mariani, H. Uszkoreit, G. Varile, A. Zaenen, V. Zue, A. Zampolli (Eds.) (1997) Survey of the State of the Art in Human Language Technology, Cambridge

University Press and Giardini. (


                                          HUMAN LANGUAGE TECHNOLOGIES AND EUROPE:

In Europe we have the benefit of a diversity of languages and cultures, which means that we have the opportunity to learn a great deal about each others' culture and way of life. This remains one of the bases for a cohesive European society. If the benefits of a multi-lingual society are to remain a feature of the European way of life then we must explore ways in which to overcome the barriers to communication and understanding.



Europe's position as a naturally multi-lingual community in a multi-lingual world can be used to our commercial advantage. As we endeavour to collaborate more closely, to develop the single market as our home market, we have a special incentive to develop solutions to the problems of a multi-lingual market place. In successfully supporting our own language needs, especially in business, administration and education, Language Engineering will help us to compete for business in the global marketplace. On the one hand, our businesses will have a competitive edge through their experience in using technology to service the needs of a multi-lingual marketplace. On the other hand, we shall also have language products to sell to the rest of the world.



[Main Page]




                              WHAT IS LANGUAGE ENGINEERING?



Language Engineering is the application of knowledge of language to the development of computer systems which can recognise, understand, interpret, and generate human language in all its forms. In practice, Language Engineering comprises a set of techniques and language resources. The former are implemented in computer software and the latter are a repository of knowledge which can be accessed by computer software.

The basic processes of Language Engineering are shown in the diagram below. These are broadly concerned with:



Model of a Language Enabled System

Model of a Language Enabled System

 Wikipedya enciclopedia




Speaker Identification and Verification

A human voice is as unique to an individual as a fingerprint. This makes it possible to identify a speaker and to use this identification as the basis for verifying that the individual is entitled to access a service or a resource.

Speech Recognition


The sound of speech is received by a computer in analogue wave forms which are analysed to identify the units of sound (called phonemes) which make up words. Statistical models of phonemes and words are used to recognise discrete or continuous speech input. The production of quality statistical models requires extensive training samples (corpora) and vast quantities of speech have been collected, and continue to be collected, for this purpose.

Character and Document Image Recognition

Recognition of written or printed language requires that a symbolic representation of the language is derived from its spatial form of graphical marks. For most languages this means recognising and transforming characters. There are two cases of character recognition:



Techniques There are many techniques used in Language Engineering and some of these are described below:

Speaker Identification and Verification A human voice is as unique to an individual as a fingerprint. This makes it possible to identify a speaker and to use this identification as the basis for verifying that the individual is entitled to access a service or a resource. The types of problems which have to be overcome are, for example, recognising that the speech is not recorded, selecting the voice through noise (either in the environment or the transfer medium), and identifying reliably despite temporary changes (such as caused by illness).

Speech Recognition The sound of speech is received by a computer in analogue wave forms which are analysed to identify the units of sound (called phonemes) which make up words. Statistical models of phonemes and words are used to recognise discrete or continuous speech input. The production of quality statistical models requires extensive training samples (corpora) and vast quantities of speech have been collected, and continue to be collected, for this purpose. There are a number of significant problems to be overcome if speech is to become a commonly used medium for dealing with a computer. The first of these is the ability to recognise continuous speech rather than speech which is deliberately delivered by the speaker as a series of discrete words separated by a pause. The next is to recognise any speaker, avoiding the need to train the system to recognise the speech of a particular individual. There is also the serious problem of the noise which can interfere with recognition, either from the environment in which the speaker uses the system or through noise introduced by the transmission medium, the telephone line, for example. Noise reduction, signal enhancement and key word spotting can be used to allow accurate and robust recognition in noisy environments or over telecommunication networks. Finally, there is the problem of dealing with accents, dialects, and language spoken, as it often is, ungrammatically.

Character and Document Image Recognition Recognition of written or printed language requires that a symbolic representation of the language is derived from its spatial form of graphical marks. For most languages this means recognising and transforming characters. There are two cases of character recognition: recognition of printed images, referred to as Optical Character Recognition (OCR) recognising handwriting, usually known as Intelligent Character Recognition (ICR) OCR from a single printed font family can achieve a very high degree of accuracy. Problems arise when the font is unknown or very decorative, or when the quality of the print is poor. In these difficult cases, and in the case of handwriting, good results can only be achieved by using ICR. This involves word recognition techniques which use language models, such as lexicons or statistical information about word sequences. Document image analysis is closely associated with character recognition but involves the analysis of the document to determine firstly its make-up in terms of graphics, photographs, separating lines and text, and then the structure of the text to identify headings, sub-headings, captions etc. in order to be able to process the text effectively.

Natural Language Understanding The understanding of language is obviously fundamental to many applications. However, perfect understanding is not always a requirement. In fact, gaining a partial understanding is often a very useful preliminary step in the process because it makes it possible to be intelligently selective about taking the depth of understanding to further levels. Shallow or partial analysis of texts is used to obtain a robust initial classification of unrestricted texts efficiently. This initial analysis can then be used, for example, to focus on 'interesting' parts of a text for a deeper semantic analysis which determines the content of the text within a limited domain. It can also be used, in conjunction with statistical and linguistic knowledge, to identify linguistic features of unknown words automatically, which can then be added to the system's knowledge. Semantic models are used to represent the meaning of language in terms of concepts and relationships between them. A semantic model can be used, for example, to map an information request to an underlying meaning which is independent of the actual terminology or language in which the query was expressed. This supports multi-lingual access to information without a need to be familiar with the actual terminology or structuring used to index the information. Combinations of analysis and generation with a semantic model allow texts to be translated. At the current stage of development, applications where this can be achieved need be limited in vocabulary and concepts so that adequate Language Engineering resources can be applied. Templates for document structure, as well as common phrases with variable parts, can be used to aid generation of a high quality text.

Natural Language Generation A semantic representation of a text can be used as the basis for generating language. An interpretation of basic data or the underlying meaning of a sentence or phrase can be mapped into a surface string in a selected fashion; either in a chosen language or according to stylistic specifications by a text planning system.

Speech Generation Speech is generated from filled templates, by playing 'canned' recordings or concatenating units of speech (phonemes, words) together. Speech generated has to account for aspects such as intensity, duration and stress in order to produce a continuous and natural response. Dialogue can be established by combining speech recognition with simple generation, either from concatenation of stored human speech components or synthesising speech using rules. Providing a library of speech recognisers and generators, together with a graphical tool for structuring their application, allows someone who is neither a speech expert nor a computer programmer to design a structured dialogue which can be used, for example, in automated handling of telephone calls.






Speech technology

Speech recognition: "This technology involves recognising spoken language and transforming it into text. The technology ranges from extremely accurate continuous dictation by systems trained to recognise an individual voice, through to systems that work in specific domains and with any user." (E-S.l)

Speech synthesis: "By breaking down human language into all its key parts, scientists have been able to recreate human voices that even have regional accents. Such synthesised voices can be used to instantly read out any text." (E-S.l)

Spoken dialogue systems: "The latest advances now allow users to have natural dialogue with a computer or 'agent' that can react to unexpected events in an apparently intelligent way. Scientists are now working on spoken group dialogue systems that involve numerous agents. " (E-S.l)

"These systems enable you to talk to a computer via a telephone in order to enact some transaction or information-seeking task. For example, you might call up on the phone and talk to a machine in order to buy or sell stocks and shares, or to get route directions from one city to another." (CLT)

Speech to speech machine translation: These are systems that recognise spoken input, analyise and translate this input and then utter the translation into a target language. Some are bidirectional and cope with spontaneous dialogs. However, the majority can only cope with restricted topics and limitted vocabularies.

© Joseba Abaitua, Universidad de Deusto

Last modified: 04/19/2004 16:09:20Last modified: 04/16/2004 14:04:32

Last modified: 03/09/2004 13:01:33





                      WHAT IS SPEECH RECOGNITION?

Speech recognition technologies allow computers equipped with microphones to interpret human speech, e.g. for transcription or as a control method.

Such systems can be classified as to whether they require the user to "train" the system to recognise their own particular speech patterns or not, whether the system can recognise continuous speech or requires users to break up their speech into discrete words, and whether the vocabulary the system recognises is small (in the order of tens or at most hundreds of words), or large (thousands of words).

Systems requiring a short amount of training can (as of 2001) capture continuous speech with a large vocabulary at normal pace with an accuracy of about 98% (getting two words in one hundred wrong), and different systems that require no training can recognize a small number of words (for instance, the ten digits of the decimal system) as spoken by most English speakers. Such systems are popular for routing incoming phone calls to their destinations in large organisations.

Commercial systems for speech recognition have been available off-the-shelf since the 1990s. However, it is interesting to note that despite the apparent success of the technology, few people use such speech recognition systems.

It appears that most computer users can create and edit documents more quickly with a conventional keyboard, despite the fact that most people are able to speak considerably faster than they can type. Additionally, heavy use of the speech organs results in vocal loading.

Some of the key technical problems in speech recognition are that:

This page was last modified 08:11, 5 Apr 2004. All text is available under the terms of the GNU Free Documentation License (see Copyrights for details). Disclaimers. Wikipedia is powered by MediaWiki, an open source wiki engine.



                        WHAT IS SPEECH SYNTHESIS?


Speech synthesis is the generation of human speech without directly using a human voice.

Generally speaking, a speech synthesizer is software or hardware capable of rendering artificial speech.

Speech synthesis systems are often called text-to-speech (TTS) systems in reference to their ability to convert text into speech. However, there exist systems that can only render symbolic linguistic representations like phonetic transcriptions into speech.

From Wikipedia, the free encyclopedia.






Newly created information is stored in four physical media – print, film, magnetic and optical – and seen or heard in four information flows through electronic channels – telephone, radio and TV, and the Internet. This study of information storage and flows analyzes the year 2002 in order to estimate the annual size of the stock of new information recorded in storage media, and heard or seen each year in information flows. Where reliable data was available we have compared the 2002 findings to those of our 2000 study (which used 1999 data) in order to describe a few trends in the growth rate of information.

1.      1.      Print, film, magnetic, and optical storage media produced about 5 exabytes of new information in 2002. Ninety-two percent of the new information was stored on magnetic media, mostly in hard disks.


o        o        How big is five exabytes? If digitized, the nineteen million books and other print collections in the Library of Congress would contain about ten terabytes of information; five exabytes of information is equivalent in size to the information contained in half a million new libraries the size of the Library of Congress print collections.

o        o        Hard disks store most new information. Ninety-two percent of new information is stored on magnetic media, primarily hard disks. Film represents 7% of the total, paper 0.01%, and optical media 0.002%.

o        o        The United States produces about 40% of the world's new stored information, including 33% of the world's new printed information, 30% of the world's new film titles, 40% of the world's information stored on optical media, and about 50% of the information stored on magnetic media.

o        o        How much new information per person? According to the Population Reference Bureau, the world population is 6.3 billion, thus almost 800 MB of recorded information is produced per person each year. It would take about 30 feet of books to store the equivalent of 800 MB of information on paper.

2.    2.    We estimate that the amount of new information stored on paper, film, magnetic, and optical media has about doubled in the last three years.


o        o        Information explosion? We estimate that new stored information grew about 30% a year between 1999 and 2002.

o        o        Paperless society? The amount of information printed on paper is still increasing, but the vast majority of original information on paper is produced by individuals in office documents and postal mail, not in formally published titles such as books, newspapers and journals.

3.    3.    Information flows through electronic channels -- telephone, radio, TV, and the Internet -- contained almost 18 exabytes of new information in 2002, three and a half times more than is recorded in storage media. Ninety eight percent of this total is the information sent and received in telephone calls - including both voice and data on both fixed lines and wireless.


o        o        Telephone calls worldwide – on both landlines and mobile phones – contained 17.3 exabytes of new information if stored in digital form; this represents 98% of the total of all information transmitted in electronic information flows, most of it person to person.

o        o        Most radio and TV broadcast content is not new information. About 70 million hours (3,500 terabytes) of the 320 million hours of radio broadcasting is original programming. TV worldwide produces about 31 million hours of original programming (70,000 terabytes) out of 123 million total hours of broadcasting.

o        o        The World Wide Web contains about 170 terabytes of information on its surface; in volume this is seventeen times the size of the Library of Congress print collections.

o        o        Instant messaging generates five billion messages a day (750GB), or 274 Terabytes a year.

o        o        Email generates about 400,000 terabytes of new information each year worldwide.

o        o        P2P file exchange on the Internet is growing rapidly. Seven percent of users provide files for sharing, while 93% of P2P users only download files. The largest files exchanged are video files larger than 100 MB, but the most frequently exchanged files contain music (MP3 files).

o        o        How we use information. Published studies on media use say that the average American adult uses the telephone 16.17 hours a month, listens to radio 90 hours a month, and watches TV 131 hours a month. About 53% of the U.S. population uses the Internet, averaging 25 hours and 25 minutes a month at home, and 74 hours and 26 minutes a month at work – about 13% of the time.


II. Method

In 2000 we conducted a study to estimate how much information is produced every year (see We then estimated that in 1999 the world produced between 1 and 2 exabytes of unique information. In Summer 2003 we repeated the study, using 2002 data, in order to begin to identify trends in the production and consumption of information. Some of the 1999 data has been revised in this study because new information sources were identified; our revised estimate is that in 1999 the world produced between 2 and 3 exabytes of new information.

As in 1999, we have estimated the magnitudes of information flows (TV, Radio, Telephone, Internet) that are currently not systematically archived, but may well be in the future. This year we have added two studies of the Internet. We have sampled the World Wide Web, to determine the size of the surface web and to define the source, functions and content of Web pages. And we have studied desktop disk drives, to determine how people consume information on the Internet.

Because information is created and distributed in different media or formats there is no common standard with which to measure the amount of information created each year, thus we have translated the vast array of information formats and media to a single standard – terabytes. Terabytes used as a common standard of measurement of the amount of new information is particularly useful given that most new information is in digital form, and other formats are increasingly giving way to digital form (i.e., digital images replacing film based photographs), or are archived in digital form (i.e., print newspapers also published on the Web).

However, this methodology measures only the volume of information, not the quality of information in a given format or its utility for different purposes, i.e., the relative value of information in an edited book or peer-eviewed journal articles when compared to digital storage of raw data.





                              MACHINE TRANSLATOR:



The term machine translation (MT) is normally taken in its restricted and precise meaning of fully automatic translation. However, in this chapter we consider the whole range of tools that may support translation and document production in general, which is especially important when considering the integration of other language processing techniques and resources with MT. We therefore define Machine Translation to include any computer-based process that transforms (or helps a user to transform) written text from one human language into another. We define Fully Automated Machine Translation (FAMT) to be MT performed without the intervention of a human being during the process. Human-Assisted Machine Translation (HAMT) is the style of translation in which a computer system does most of the translation, appealing in case of difficulty to a (mono- or bilingual) human for help. Machine-Aided Translation (MAT) is the style of translation in which a human does most of the work but uses one of more computer systems, mainly as resources such as dictionaries and spelling checkers, as assistants.

Traditionally, two very different classes of MT have been identified. Assimilation refers to the class of translation in which an individual or organization wants to gather material written by others in a variety of languages and convert them all into his or her own language. Dissemination refers to the class in which an individual or organization wants to broadcast his or her own material, written in one language, in a variety of language to the world. A third class of translation has also recently become evident. Communication refers to the class in which two or more individuals are in more or less immediate interaction, typically via email or otherwise online, with an MT system mediating between them. Each class of translation has very different features, is best supported by different underlying technology, and is to be evaluated according to somewhat different criteria.

Machine Translation




John Hutchins

(University of East Anglia)

[Paper presented at the MT Summit, Luxembourg, 1995]

Early pragmatism

When machine translation was in its infancy, in the early 1950s, research was necessarily modest in its aims. It was constrained by the limitations of hardware, in particular inadequate memories and slow access to storage, and the unavailability of high-level programming languages. Even more crucially it could look to no assistance from the language experts. Syntax was a relatively neglected area of linguistic study and semantics was virtually ignored in the United States thanks to the behaviourist inclinations of the leading scholars. It was therefore not surprising that the first MT researchers turned initially to crude dictionary based approaches, i.e. predominantly word-for-word translation, and to the application of statistical methods. Warren Weaver himself, in the 1949 memorandum which effectively launched MT research, had advocated statistical methods alongside cryptography, which was soon recognised as being irrelevant, and more futuristically the investigation of universal interlinguas.

With such limitations, early researchers set out with modest aims. They knew that whatever systems they could develop would produce low quality results, and consequently they suggested the major involvement of human translators both for the pre-editing of input texts and for the post-editing of the output, and they proposed the development of controlled languages and the restriction of systems to specific domains. Above all, they proposed that MT systems could progress by the cyclical improvement of imperfect approaches, i.e. an application of the engineering feedback mechanism with which they were familiar. In this atmosphere the first demonstration systems were developed, notably the collaboration between IBM and the Georgetown University in 1954.

The outcome of these early demonstrations was, however, that the general public and potential sponsors of MT research were led to believe that good quality output was achievable within a matter of a few years. The belief was strengthened by the emergence of greatly improved computer hardware, the first programming languages, and above all by developments in syntactic analysis. It was not clear which methods would prove most successful in the long run, so US agencies were encouraged to support a large number of projects. Enthusiasm for MT spread throughout the world, and in this period from the mid 1950s to the mid 1960s many of the approaches which are still current were first put forward.




As I have tried to show with my first report, is that human language technologies are very important. There are the main way to communicate with each other, the human language technologies facilitate us our communication, they make it easier. We have to take into account that human language technologies is a big term that fills a lot of other terms, like “language engineering”, “speech technology”...; there is no a clear or a simple way to explain whar really human language technologies are. In order to explain that, you have to give a lot of definitions of technologies in general and try to precisie what you want to explain. With my report I have try to show you or to give you a general idea of what human language technologies can be, but as I have said before, it will be a hard, complicated and a work with no end if we try to analize little by little what human language technologies are.