HUMAN LANGUAGE TECHNOLOGIES ON INFORMATION SOCIETY:

ABSTRACT:

This project is an assignment I have been given  for the subject "English Language and New Technologies" for Deusto University.  In this project we are going to talk about Human Language Technologies (HLT) and the relationship it has with what is called Information Society. This is an important issue due to the incredible advance that computers have undergone in the last few years, as well as the impact that these changes have had on what is called Language Technologies.

 

INTRODUCTION

Through out this last semester we have worked on several questionaries. These questionaries we focused on  topics such as Human Language Technologies, Language Engineering, Speech Technologies, Machine Translation as well as many other topics related to the capacity of computers to analize and transalte different texts written in different languages. 

The ultimate goal of Human Language Technologies is to be able to write a question in one's mother language and get an answer in that same language. 

This project has been written following this questionary system. Therefore, what we are going to find is a number of questions related to the topic chosen, that seek to answer in the best way, what could be considered key questions to this topic. 

The first few questions seek to explain and clarify what could be considered the most important concepts in the report. Then, the next questions try to show which is the relationship between these concepts and how they interact. And last, we will find a conclusion, in which we can find all that we have learned through out this semester, followed by an index fo references that show the web sites we have visited to accomplish this task.  

 

 

BODY

First of all, we are going to procede to give an explanation of what Human Language Technologies and Information Society are, as well as a few other terms that can be usefull to understand this project.

What is Human Language Technologies?

The overall objective of HLT is to support e-business in a global context and to promote a human centred infostructure ensuring equal access and usage opportunities for all. This is to be achieved by developing multilingual technologies and demonstrating exemplary applications providing features and functions that are critical for the realisation of a truly user friendly Information Society. Projects address generic and applied RTD from a multi- and cross-lingual perspective, and undertake to demonstrate how language specific solutions can be transferred to and adapted for other languages.

While elements of the three initial HLT action lines - Multilinguality, Natural Interactivity and Crosslingual Information Management are still present, there has been periodic re-assessment and tuning of them to emerging trends and changes in the surrounding economic, social, and technological environment. The trials and best practice in multilingual e-service and e-commerce action line was introduced in the IST 2000 work programme (IST2000) to stimulate new forms of partnership between technology providers, system integrators and users through trials and best practice actions addressing end-to-end multi-language platforms and solutions for e-service and e-commerce. The fifth IST call for proposals covered this action line.

HLT features in three action lines of the IST 2001 work programme. The Multilingual Web action line is a refocused version of the IST2000 crosslingual information management action line programme and is geared towards multilingual web content, translation and cross-media delivery.

The Natural and multilingual interactivity action line results from a merger and consolidation of the multilinguality and natural interactivity action lines in IST2000 and is geared towards intelligent information appliances and advanced communication services. It is covered in the sixth IST call for proposals.

The Key Action III specific support measures action line is a refocused version of the previous action line - working groups, and dissemination and awareness actions - in IST2000.

http://www.hhltcentral.org/page-615.shtml

 

What is information Society?

The term Information Society has been around for a long time now and, indeed, has become something of a cliché. The notion of the coming Information Society reminds me of the way the idea of the Sydney 2000 Olympics and the way it shimmers in the distance. We look towards the Olympics and resolve to prepare hard for it. We must rapidly transform ourselves, our city, our demeanour to be ready and worthy. Time is of the essence in making ourselves ready for the challenge. There is certain breathlessness in all of this rhetoric.

The same can be said of much of the documents and writings on the Information Society. The recent Department of Industry, Science and Tourism's Goldsworthy report on the Global Information Economy urges "...time is short, and the need for action is urgent. Government must grasp the challenge now." (Department of Industry, Science and Tourism, 1997:7). But when you push past the rhetoric and the sense of urgency being conveyed , what is the reality of the Information Society? What, in particular, do policy makers think it is?

In the European Union, the concept of the Information Society has been evolving strongly over the past few years building on the philosophy originally spelled out by Commissioner Martin Bangemann in 1994. Bangemann argued that the Information Society represents a "revolution based on information ... [which] adds huge new capacities to human intelligence and constitutes a resource which changes the way we work together and the way we live together..." (European Commission, 1994:4). One of the main implications of this "revolution" for Bangemann is that the Information Society can secure badly needed jobs (Europe and the Global Information Society, 1994:3). In other words, a driving motivation for the Information Society is the creation of employment for depressed economies.

Closer to home it is instructive to look at just a few policy (or would-be policy) documents to see the views of the Information Society dominant here. The Goldsworthy report sees the Information Society as a "societal revolution based around information and communication technologies and about the role of these in developing global competitiveness and managing the transition to a globalised free trade world" (Department of Industry, Science and Tourism, 1997). In short, Goldsworthy's idea of the Information Society is entirely an economic one. At a broader level Barry Jones, the author of the House of Representatives Standing Committee's 1991 report 'Australia as a Information Society' sets out a definition of the Information Society which sees it as simply "a period when use of time, family life, employment, education and social interaction are increasingly influenced by access to Information Technology" (Australia as an Information Society: Grasping New Paradigms, 1991).

These are just a few examples of ideas underpinning information policy drives in the developed world where the concept is accepted almost without challenge, and there is an inherent belief that like the Olympics, the Information Society is real - or will be very soon if only we can get ourselves organised properly. Some claim, of course, that the Information Society is here already and not just on its way. But one way or the other "it" exists and is a "good thing". By and large, national and regional Information Society documents do not question the belief that the Information Society will bring prosperity and happiness if a few basic safeguards are put in place. Some of the very few notes of serious caution in the practice of information policy have come through the influence of the Scandinavian countries which joined the European Union when the EU was already in full flight with implementing the actions flowing from the Bangemann report. Interestingly, in recent travels in India I noticed an extraordinary level of hope and trust in that developing country in the potential of information technology to transform India into a modern fully developed economy. The push to develop information and technological infrastructure initiated by Rajiv Gandhi is seen as positive and a necessary step for the goal of a universally prosperous society in India. Effectively there is the same acceptance of the goodness of an Information Society and the absolute necessity to be one, that is found in the West.

Given this blind faith in the existence and the desirability of an Information Society among diverse nations, it is instructive to look at the theoretical literature which has spawned the idea to see what it claims for the Information Society. The term Information Society has many synonyms: Information Age, Information Revolution, Information Explosion and so on and it is found across a wide spectrum of disciplines. Fortunately the task of unravelling many of these ideas has been accomplished in a masterly way by Frank Webster. He has categorised the variety of concepts of the Information Society, Information Revolution, or whatever, and provided an analysis of five common conceptions of the Information Society (Webster, 1995). 

http://hltcentral.org/page-165.shtml

 

What is Natural Language Processing?

Natural Language Processing (NLP) technology allows a computer to understand the main linguistic concepts within a question or solution. Its goal is to design and build Computers that analyze, understand and generate language that humans use naturally.

Natural language interfaces enable the user to communicate with the computer in German, English or another human language. Some applications of such interfaces are database queries, information retrieval from texts and so-called expert systems. Current advances in recognition of spoken language improve the usability of many types of natural language systems. Communication with computers using spoken language will have a lasting impact upon the work environment, opening up completely new areas of application for information technology.

http://nlplab.cn/nlp.html

 

Language Tecnology, Language Engineering and Computational Linguistics. Similarities and differencies. 

Language Tecnology, Language Engineering and Computational Linguistics. Similarities and differencies. Computational linguistics (CL) is a discipline between linguistics and computer science which is concerned with the computational aspects of the human language faculty. It belongs to the cognitive sciences and overlaps with the field of artificial intelligence (AI), a branch of computer science aiming at computational models of human cognition. Computational linguistics has applied and theoretical components. Theoretical CL takes up issues in theoretical linguistics and cognitive science. It deals with formal theories about the linguistic knowledge that a human needs for generating and understanding language. Today these theories have reached a degree of complexity that can only be managed by employing computers. Computational linguists develop formal models simulating aspects of the human language faculty and implement them as computer programmes. These programmes constitut e the basis for the evaluation and further development of the theories. In addition to linguistic theories, findings from cognitive psychology play a major role in simulating linguistic competence. Within psychology, it is mainly the area of psycholinguisticsthat examines the cognitive processes constituting human language use. The relevance of computational modelling for psycholinguistic research is reflected in the emergence of a new subdiscipline: computational psycholinguistics. Applied CL focusses on the practical outcome of modelling human language use. The methods, techniques, tools and applications in this area are often subsumed under the term language engineering or (human) language technology. Although existing CL systems are far from achieving human ability, they have numerous possible applications. The goal is to create software products that have some knowledge of human language. Such products are going to change our lives. They are urgently needed for improvin g human-machine interaction since the main obstacle in the interaction beween human and computer is a communication problem. Today's computers do not understand our language but computer languages are difficult to learn and do not correspond to the structure of human thought. Even if the language the machine understands and its domain of discourse are very restricted, the use of human language can increase the acceptance of software and the productivity of its users. http://sirio.deusto.es/abaitua/konzeptu/nlp/HU_what_cl.htm Language Engineering is the application of knowledge of language to the development of computer systems which can recognise, understand, interpret, and generate human language in all its forms. In practice, Language Engineering comprises a set of techniques and language resources. The former are implemented in computer software and the latter are a repository of knowledge which can be accessed by computer software. http://www.hltcentral.org/usr_docs/proj ect-source/en/broch/harness.html#wile LanguageTechnologies are information technologies that are specialized for dealing wwith the most complex information medium in our world: human language. Therefore this technologies are also often subsumed under the term Human Language Technology. Human language occurs in spoken and writen form.Whereas the speech is the oldest and most natural mode of language comunication, complex information and most human knowledge is manteined and transmitted in wwritten texts. Speech and texts technologies process or produce language in this two models of realization.But language has also aspects that shared between speach and text such as diccionaries, most of grammar and the meaning of the sentence. Thus large parts of language technology cannot be subsued under speech and texts technologies. Among those are technologies that link language to knowledge. We do not know how language,knowledge and thought are represented in the human brain. Neverthe less, language technology have to create formal representation systems that link language to concepts and tasks in the real wrold. This probides the interface to the fast growing area of knowledge technologies. 

http://sirio.deusto.es/abaitua/konzeptu/nlp/HU_whatLT.pdf

 

Does the notion of Information Society have any relationship to Human Language?

The Information Age The development and convergence of computer and telecommunication technologies has led to a revolution in the way that we work, communicate with each other, buy goods and use services, and even the way we entertain and educate ourselves. One of the results of this revolution is that large volumes of information will increasingly be held in a form which is more natural for human users than the strictly formatted, structured data typical of computer systems of the past. Information presented in visual images, as sound, and in natural language, either as text or speech, will become the norm. We all deal with computer systems and services, either directly or indirectly, every day of our lives. This is the information age and we are a society in which information is vital to economic, social, and political success as well as to our quality of life. The changes of the last two decades may have seemed revolutionary but, in reality, we are only on the threshold of this new age. There are still many new ways in which the application of telematics and the use of language technology will benefit our way of life, from interactive entertainment to lifelong learning. Although these changes will bring great benefits, it is important that we anticipate difficulties which may arise, and develop ways to overcome them. Examples of such problems are: access to much of the information may be available only to the computer literate and those who understand English; a surfeit of information from which it is impossible to identify and select what is really wanted. Language Engineering can solve these problems. Information universally available The language technologies will make an indispensable contribution to the success of this information revolution. The availability and usability of new telematics services will depend on developments in language engineering. Speech recognition will become a standard computer function providing us with the facil ity to talk to a range of devices, from our cars to our home computers, and to do so in our native language. In turn, these devices will present us with information, at least in part, by generating speech. Multi-lingual services will also be developed in many areas. In time, material provided by information services will be generated automatically in different languages. This will increase the availability of information to the general public throughout Europe. Initially, multi-lingual services will become available, based on basic data, such as weather forecasts and details of job vacancies, from which text can be generated in any language. Eventually, however, we can expect to see automated translation as an everyday part of information services so that we can both request and receive all sorts of information in our own language. Home and Abroad Language Engineering will also help in the way that we deal with associates abroad. Although the development of electronic commer ce depends very much on the adoption of interchange standards for communications and business transactions, the use of natural language will continue, precisely because it is natural. However, systems to generate business letters and other forms of communication in foreign languages will ease and greatly enhance communication. Automated translation combined with the management of documentation, including technical manuals and user handbooks, will help to improve the quality of service in a global marketplace. Export business will be handled cost effectively with the same high level of customer care that is provided in the home market. How can we cope with so much information ? One of the fundamental components of Language Engineering is the understanding of language, by the computer. This is the basis of speech operated control systems and of translation, for example. It is also the way in which we can prevent ourselves from being overwhelmed with information, unable to coll ate, analyse, and select what we need. However, if information services are capable of understanding our requests, and can scan and select from the information base with real understanding, not only will the problem of information overload be solved but also no significant information will be missed. Language Engineering will deliver the right information at the right time. 

http://sirio.deusto.es/abaitua/konzeptu/nlp/echo/infoage.html

Is there any concern in Europe with Human Language Technologies?

In the European Union, the concept of the Information Society has been evolving strongly over the past few years building on the philosophy originally spelled out by Commissioner Martin Bangemann in 1994. Bangemann argued that the Information Society represents a "revolution based on information ... [which] adds huge new capacities to human intelligence and constitutes a resource which changes the way we work together and the way we live together..." (European Commission, 1994:4). One of the main implications of this "revolution" for Bangemann is that the Information Society can secure badly needed jobs (Europe and the Global Information Society, 1994:3). In other words, a driving motivation for the Information Society is the creation of employment for depressed economies.

Closer to home it is instructive to look at just a few policy (or would-be policy) documents to see the views of the Information Society dominant here. The Goldsworthy report sees the Information Society as a "societal revolution based around information and communication technologies and about the role of these in developing global competitiveness and managing the transition to a globalised free trade world" (Department of Industry, Science and Tourism, 1997). In short, Goldsworthy's idea of the Information Society is entirely an economic one. At a broader level Barry Jones, the author of the House of Representatives Standing Committee's 1991 report 'Australia as a Information Society' sets out a definition of the Information Society which sees it as simply "a period when use of time, family life, employment, education and social interaction are increasingly influenced by access to Information Technology" (Australia as an Information Society: Grasping New Paradigms, 1991).

These are just a few examples of ideas underpinning information policy drives in the developed world where the concept is accepted almost without challenge, and there is an inherent belief that like the Olympics, the Information Society is real - or will be very soon if only we can get ourselves organised properly. Some claim, of course, that the Information Society is here already and not just on its way. But one way or the other "it" exists and is a "good thing". By and large, national and regional Information Society documents do not question the belief that the Information Society will bring prosperity and happiness if a few basic safeguards are put in place. Some of the very few notes of serious caution in the practice of information policy have come through the influence of the Scandinavian countries which joined the European Union when the EU was already in full flight with implementing the actions flowing from the Bangemann report.

Interestingly, in recent travels in India I noticed an extraordinary level of hope and trust in that developing country in the potential of information technology to transform India into a modern fully developed economy. The push to develop information and technological infrastructure initiated by Rajiv Gandhi is seen as positive and a necessary step for the goal of a universally prosperous society in India. Effectively there is the same acceptance of the goodness of an Information Society and the absolute necessity to be one, that is found in the West.

Given this blind faith in the existence and the desirability of an Information Society among diverse nations, it is instructive to look at the theoretical literature which has spawned the idea to see what it claims for the Information Society. The term Information Society has many synonyms: Information Age, Information Revolution, Information Explosion and so on and it is found across a wide spectrum of disciplines. Fortunately the task of unravelling many of these ideas has been accomplished in a masterly way by Frank Webster. He has categorised the variety of concepts of the Information Society, Information Revolution, or whatever, and provided an analysis of five common conceptions of the Information Society (Webster, 1995).

http://www.gu.edu.au/centre/cmp/Papers_97/Browne_M.html

 



What is the role of HLTCentral.org?

HLTCentral - Gateway to Speech & Language Technology Opportunities on the Web HLTCentral web site was established as an online information resource of human language technologies and related topics of interest to the HLT community at large. It covers news, R&D, technological and business developments in the field of speech, language, multilinguality, automatic translation, localisation and related areas. Its coverage of HLT news and developments is worldwide - with a unique European perspective. HLTCentral is Powered by Two EU funded projects, ELSNET and EUROMAP, are behind the development of HLTCentral. EUROMAP ("Facilitating the path to market for language and speech technologies in Europe") - aims to provide awareness, bridge-building and market-enabling services for accelerating the rate of technology transfer and market take-up of the results of European HLT RTD projects. ELSNET ("The European Network of Excellence in Human Language Technologies") - aims to br ing together the key players in language and speech technology, both in industry and in academia, and to encourage interdisciplinary co-operation through a variety of events and services. Web site maintenance development of HLTCentral. EUROMAP ("Facilitating the path to market for language and speech technologies in Europe") - aims to provide awareness, bridge-building and market-enabling services for accelerating the rate of technology transfer and market take-up of the results of European HLT RTD projects. ELSNET ("The European Network of Excellence in Human Language Technologies") - aims to bring together the key players in language and speech technology, both in industry and in academia, and to encourage interdisciplinary co-operation through a variety of events and services. 

http://www.hltcentral.org/page-615.shtml 

Which are the main techniques used in Language Engineering? 

  There are many techniques used in Language Engineering and some of these are described below: 

Speaker Identification and Verification A human voice is as unique to an individual as a fingerprint. This makes it possible to identify a speaker and to use this identification as the basis for verifying that the individual is entitled to access a service or a resource. The types of problems which have to be overcome are, for example, recognising that the speech is not recorded, selecting the voice through noise (either in the environment or the transfer medium), and identifying reliably despite temporary changes (such as caused by illness).

Speech Recognition The sound of speech is received by a computer in analogue wave forms which are analysed to identify the units of sound (called phonemes) which make up words. Statistical models of phonemes and words are used to recognise discrete or continuous speech input. The production of quality statistical models requires extensive training samples (corpora) and vast quantities of speech have been collected, and continue to be collected, for this purpose. There are a number of significant problems to be overcome if speech is to become a commonly used medium for dealing with a computer. The first of these is the ability to recognise continuous speech rather than speech which is deliberately delivered by the speaker as a series of discrete words separated by a pause. The next is to recognise any speaker, avoiding the need to train the system to recognise the speech of a particular individual. There is also the serious problem of the noise which can interfere with recogni tion, either from the environment in which the speaker uses the system or through noise introduced by the transmission medium, the telephone line, for example. Noise reduction, signal enhancement and key word spotting can be used to allow accurate and robust recognition in noisy environments or over telecommunication networks. Finally, there is the problem of dealing with accents, dialects, and language spoken, as it often is, ungrammatically. 

Character and Document Image Recognition Recognition of written or printed language requires that a symbolic representation of the language is derived from its spatial form of graphical marks. For most languages this means recognising and transforming characters. There are two cases of character recognition: recognition of printed images, referred to as Optical Character Recognition (OCR) recognising handwriting, usually known as Intelligent Character Recognition (ICR) OCR from a single printed font family can achieve a very high degree of accuracy. Problems arise when the font is unknown or very decorative, or when the quality of the print is poor. In these difficult cases, and in the case of handwriting, good results can only be achieved by using ICR. This involves word recognition techniques which use language models, such as lexicons or statistical information about word sequences. Document image analysis is closely associated with character recognition but involves the a nalysis of the document to determine firstly its make-up in terms of graphics, photographs, separating lines and text, and then the structure of the text to identify headings, sub-headings, captions etc. in order to be able to process the text effectively. 

Natural Language Understanding The understanding of language is obviously fundamental to many applications. However, perfect understanding is not always a requirement. In fact, gaining a partial understanding is often a very useful preliminary step in the process because it makes it possible to be intelligently selective about taking the depth of understanding to further levels. Shallow or partial analysis of texts is used to obtain a robust initial classification of unrestricted texts efficiently. This initial analysis can then be used, for example, to focus on 'interesting' parts of a text for a deeper semantic analysis which determines the content of the text within a limited domain. It can also be used, in conjunction with statistical and linguistic knowledge, to identify linguistic features of unknown words automatically, which can then be added to the system's knowledge. Semantic models are used to represent the meaning of language in terms of concepts and relationships between them. A semantic model can be used, for example, to map an information request to an underlying meaning which is independent of the actual terminology or language in which the query was expressed. This supports multi-lingual access to information without a need to be familiar with the actual terminology or structuring used to index the information. Combinations of analysis and generation with a semantic model allow texts to be translated. At the current stage of development, applications where this can be achieved need be limited in vocabulary and concepts so that adequate Language Engineering resources can be applied. Templates for document structure, as well as common phrases with variable parts, can be used to aid generation of a high quality text. 

Natural Language Generation A semantic representation of a text can be used as the basis for generating language. An interpretation of basic data or the underlying meaning of a sentence or phrase can be mapped into a surface string in a selected fashion; either in a chosen language or according to stylistic specifications by a text planning system. 

Speech Generation Speech is generated from filled templates, by playing 'canned' recordings or concatenating units of speech (phonemes, words) together. Speech generated has to account for aspects such as intensity, duration and stress in order to produce a continuous and natural response. Dialogue can be established by combining speech recognition with simple generation, either from concatenation of stored human speech components or synthesising speech using rules. Providing a library of speech recognisers and generators, together with a graphical tool for structuring their application, allows someone who is neither a speech expert nor a computer programmer to design a structured dialogue which can be used, for example, in automated handling of telephone calls. 

http://www.hltcentral.org/usr_docs/project-source/en/broch/harness.html#t




Which language resources are essential components of Language Engineering?

Language resources are essential components of Language Engineering. They are one of the main ways of representing the knowledge of language, which is used for the analytical work leading to recognition and understanding. The work of producing and maintaining language resources is a huge task. Resources are produced, according to standard formats and protocols to enable access, in many EU languages, by research laboratories and public institutions. Many of these resources are being made available through the European Language Resources Association (ELRA).

http://www.hltcentral.org/usr_docs/project-source/en/broch/harness.html#t 

Here we have a precise definition of a few important terms concerning this topic like: natural language processing, translator's workbench, shallow parser, formalism, speech recognition, text alignment, authoring tools, controlled language, domain 

Natural language processing is a term in use since the 1980s to define a class of software systems which handle text intelligently

Translator's workbench is a software system providing a working environment for a human translator, which offers a range of aids such as on-line dictionaries, thesauri, translation memories, etc 

Shallow parser is software which parses language to a point where a rudimentary level of understanding can be realised; this is often used in order to identify passages of text which can then be analysed in further depth to fulfil the particular objective 

Formalism is a means to represent the rules used in the establishment of a model of linguistic knowledge 

Speech Recognition : The sound of speech is received by a computer in analogue wave forms which are analysed to identify the units of sound (called phonemes) which make up words. Statistical models of phonemes and words are used to recognise discrete or continuous speech input. The production of quality statistical models requires extensive training samples (corpora) and vast quantities of speech have been collected, and continue to be collected, for this purpose. There are a number of significant problems to be overcome if speech is to become a commonly used medium for dealing with a computer. The first of these is the ability to recognise continuous speech rather than speech which is deliberately delivered by the speaker as a series of discrete words separated by a pause. The next is to recognise any speaker, avoiding the need to train the system to recognise the speech of a particular individual. There is also the serious problem of the noise which can interfere with recog nition, either from the environment in which the speaker uses the system or through noise introduced by the transmission medium, the telephone line, for example. Noise reduction, signal enhancement and key word spotting can be used to allow accurate and robust recognition in noisy environments or over telecommunication networks. Finally, there is the problem of dealing with accents, dialects, and language spoken, as it often is, ungrammatically. 

Text alignment is the process of aligning different language versions of a text in order to be able to identify equivalent terms, phrases, or expressions

Authoring tools are facilities provided in conjunction with word processing to aid the author of documents, typically including an on-line dictionary and thesaurus, spell-, grammar-, and style-checking, and facilities for structuring, integrating and linking documents 

Controlled language is the language which has been designed to restrict the number of words and the structure of (also artificial language) language used, in order to make language processing easier; typical users of controlled language work in an area where precision of language and speed of response is critical, such as the police and emergency services, aircraft pilots, air traffic control, etc.

Domain is usually applied to the area of application of the language enabled software e.g. banking, insurance, travel, etc.; the significance in Language Engineering is that the vocabulary of an application is restricted so the language resource requirements are effectively limited by limiting the domain of application 

http://www.hltcentral.org/usr_docs/project-source/en/broch/harness.html#t

What is the state of the art in Speech Technology?

Comments about the state-of-the-art need to be made in the context of specific applications which reflect the constraints on the task. Moreover, different technologies are sometimes appropriate for different tasks. For example, when the vocabulary is small, the entire word can be modeled as a single unit. Such an approach is not practical for large vocabularies, where word models must be built up from subword units.

The past decade has witnessed significant progress in speech recognition technology. Word error rates continue to drop by a factor of 2 every two years. Substantial progress has been made in the basic technology, leading to the lowering of barriers to speaker independence, continuous speech, and large vocabularies. There are several factors that have contributed to this rapid progress. First, there is the coming of age of the HMM. HMM is powerful in that, with the availability of training data, the parameters of the model can be trained automatically to give optimal performance.

Second, much effort has gone into the development of large speech corpora for system development, training, and testing. Some of these corpora are designed for acoustic phonetic research, while others are highly task specific. Nowadays, it is not uncommon to have tens of thousands of sentences available for system training and testing. These corpora permit researchers to quantify the acoustic cues important for phonetic contrasts and to determine parameters of the recognizers in a statistically meaningful way. While many of these corpora (e.g., TIMIT, RM, ATIS, and WSJ; see section 12.3) were originally collected under the sponsorship of the U.S. Defense Advanced Research Projects Agency (ARPA) to spur human language technology development among its contractors, they have nevertheless gained world-wide acceptance (e.g., in Canada, France, Germany, Japan, and the U.K.) as standards on which to evaluate speech recognition.

Third, progress has been brought about by the establishment of standards for performance evaluation. Only a decade ago, researchers trained and tested their systems using locally collected data, and had not been very careful in delineating training and testing sets. As a result, it was very difficult to compare performance across systems, and a system's performance typically degraded when it was presented with previously unseen data. The recent availability of a large body of data in the public domain, coupled with the specification of evaluation standards, has resulted in uniform documentation of test results, thus contributing to greater reliability in monitoring progress (corpus development activities and evaluation methodologies are summarized in chapters 12 and 13 respectively).

Finally, advances in computer technology have also indirectly influenced our progress. The availability of fast computers with inexpensive mass storage capabilities has enabled researchers to run many large scale experiments in a short amount of time. This means that the elapsed time between an idea and its implementation and evaluation is greatly reduced. In fact, speech recognition systems with reasonable performance can now run in real time using high-end workstations without additional hardware---a feat unimaginable only a few years ago.

http://cslu.cse.ogi.edu/HLTsurvey/ch1node2.html#Chapter1

 

Main differences between speech recognition and speech synthesis.

Speech Synthesis:

This involves turning a string into spoken language that is played through the computer speakers. The complexities of turning words into phonemes, adding appropriate emphasis and translating the result into digital audio are beyond the scope of this paper and are catered for by a TTS engine installed on your machine.

The end result is that the computer talks to the user to save the user having to read some text on the screen.

Speech Recognition:

This involves the computer taking the user's speech and interpreting what has been said. This allows the user to control the computer (or certain aspects of it) by voice, rather than having to use the mouse and keyboard, or alternatively just dictating the contents of a document.

The complex nature of translating the raw audio into phonemes involves a lot of signal processing and is not focused on here. These details are taken care of by an SR engine that will be installed on your machine. SR engines are often called recognisers and these days typically implement continuous speech recognition (older recognisers implemented isolated or discrete speech recognition, where pauses were required between words).

Speech recognition usually means one of two things. The application can understand and follow simple commands that it has been educated about in advance. This is known as command and control (sometimes seen abbreviated as CnC, or simply SR).

Alternatively an application can support dictation (sometimes abbreviated to DSR). Dictation is more complex as the engine has to try and identify arbitrary spoken words, and will need to decide which spelling of similarly sounding words is required. It develops context information based on the preceding and following words to try and help decide. Because this context analysis is not required with Command and Control recognition, CnC is sometimes referred to as context-free recognition.

http://bdn.borland.com/article/0,1410,29580,00.html

 

 Speech-to-speech machine traslation and projects related to it.

Speech-to-Speech Machine Translation (SSMT) is a multidisciplinary research area that addresses one of the most complex problems in speech and language processing. The challenges posed by SSMT have been the subject of several collaborative research projects across universities and laboratories around the world. Over the last decade SSMT has benefited from advances in speech and language processing as well as from the availability of large multilingual databases. These advances have spurred research on statistical machine translation and on exploiting machine translation for cross-lingual information retrieval. There have also been substantial efforts towards automating and evaluating a variety of metrics that are relevant to SSMT systems.

http://www.ewh.ieee.org/soc/sps/tap/sp_issue/s2smt.html

There are a small number of initiatives that have contributed significantly to the development of this technology. Verbmobil, a project sponsored by the German government, and the European EuTrans project are two worth mentioning.

http://www.hltcentral.org/page-1086.0.shtml

 

 

What is the most convenient way of representing information? Why?

The most convenient way of representing information is the Information Architecture, which is a set of models, definitions, rules, and standards that give structure and order to an organization’s information so that information needs can be matched with information resources. An Information Architecture defines: what types of information exist in the organization where the information can be found who are the creators and owners of the information how the information is to be used. An Information Architecture may contain several of the following: a model or representation of main information entities and processes; taxonomy or categorization scheme; standards; definitions and interpretations of terms; directories or inventories; resource maps and description frameworks; designs for developing information systems, products, services. 

http://www.google.com/search?hl=es&ie=UTF-8&oe=UTF-8&q=How+many+words+of+technical+information+are+recorded+every+day%3F+&lr=





In what ways does Language Engineering improve the use of language? 

Language Engineering is a technology which uses our knowledge of language to enhance our application of computer systems: improving the way we interface with them assimilating, analysing, selecting, using, and presenting information more effectively providing human language generation and translation facilities.

New opportunities are becoming available to change the way we do many things, to make them easier and more effective by exploiting our developing knowledge of language. When, in addition to accepting typed input, a machine can recognise written natural language and speech, in a variety of languages, we shall all have easier access to the benefits of a wide range of information and communications services, as well as the facility to carry out business transactions remotely, over the telephone or other telematics services. When a machine understands human language, translates between different languages, and generates speech as well as printed output, we shall have available an enormously powerful tool to help us in many areas of our lives. When a machine can help us quickly to understand each other better, this will enable us to co-operate and collaborate more effectively both in business and in government. The success of Language Engineering will be the achievement of all the se possibilities. Already some of these things can be done, although they need to be developed further. The pace of advance is accelerating and we shall see many achievements over the next few years. 

http://sirio.deusto.es/abaitua/konzeptu/nlp/langeng.htm

 

 

CONCLUSION:

From this report we can deduct that the main aim of what we call Human Language Technologies is to make easier the interaction between  the human  language and computers. Information is more and more necessary in the world we live in. Information has become a major demand in the society we live in and its treatment is now a basic necessity. With the enormous amounts of information being produced  day after day, it is easy to understand the great efforts invested in compiling and clasifying all this information.  

The techniques used for the processing of all this information have developed inmensly in a short lap of time. They have tried to break through the barrier language sometimes is. Nevertheless, they still haven't evolved enough as to be able to exclude the human factor. The human being is still necessary in interpretation and translation tasks. And unless this technologies evolve very rapidly, the human being will still be indispensable in a few years time. 

For all these reasons, plenty of money and time should be invested in Human Language Technologies. This way, the computer would become a eassier tool to use, specially fot those who aren't so familiariced with computers. And also, a way of making disappear the problem of the language, making this a smaller, well connected, better  world.

 

 

REFERENCES:

Most of the references have been found through the likns we were given durign the course and provided through the university's web site. They have been organised according to the orther of questions.

http://www.hhltcentral.org/page-615.shtml

http://hltcentral.org/page-165.shtml

http://nlplab.cn/nlp.html

http://sirio.deusto.es/abaitua/konzeptu/nlp/HU_whatLT.pdf

http://sirio.deusto.es/abaitua/konzeptu/nlp/echo/infoage.html

http://www.gu.edu.au/centre/cmp/Papers_97/Browne_M.html

http://www.hltcentral.org/page-615.shtml 

http://www.hltcentral.org/usr_docs/project-source/en/broch/harness.html#t

http://cslu.cse.ogi.edu/HLTsurvey/ch1node2.html#Chapter1

http://bdn.borland.com/article/0,1410,29580,00.html

http://www.ewh.ieee.org/soc/sps/tap/sp_issue/s2smt.html

http://www.hltcentral.org/page-1086.0.shtml

http://www.google.com/search?hl=es&ie=UTF-8&oe=UTF-8&q=How+many+words+of+technical+information+are+recorded+every+day%3F+&lr=

http://sirio.deusto.es/abaitua/konzeptu/nlp/langeng.htm