Language Engineering

Report A by Patricia Zaldua Azkuenaga

Abstract:

This report is a review of Language Engineering and its influence in human language. It will focus on the ways Language Engineering has changed our use of language. Techniques and language resources that have contributed to the development of this new technology will be discussed. However, as these techniques still need to be developed in more depth, the report will analyse some of the problems still exist and it will also present the results the development of Language Engineering could carry in the future.It will be shown in what ways Language Engineering improves the interaction between man and compuer and which comunication problems could exist..

Introduction:

Language Engineering is the application of knowledge of language to the development of computer systems which can recognize, understand, interpret and generate human language in all its forms. It provides in which ways we can extend and improve our use of language to make it a more effective tool. It is based on a vast amount of knowledge about language and on the way it works.Language Engineering takes into account a set of techniques and language resources. It is a technology which uses our knowledge of language to enhance our application of computer systems improving the way we interface with those systems; assimilating, analysing, selecting, using and presenting information more effectively and providing human generation and translation facilities. The success of language engineering will be the achievement of all these possibilities. Some of these can already be done, however they need to be developed further

.In this paper first of all I will talk about human language (which is the natural means of human communication and the more effective way we have to communicate to each other) and in which ways Language Engineering changes our use of language.

I will also talk about its main language resources. At the same time, I will present and develop the basic processes of Language Engineering (how it works) and the main techniques. I will mention some of the technique problems that still need to be overcome .

So, within new technologies I will focus my review on Language Engineering. I will explain what it is, how it works, which techiques and language resources are used for its application to human languge and which are some of the problems that still exist in its application..

Objectives:

- Showing what is language engineering and in which ways it changes our use of language.

- Selecting the main language resources and techniques of Language Engineering and some of the problems that still exist.

-Presenting which can be some of the results after the development of Language Engineering

Structure:

  1. A short introduction about human language
  2. Explanation of language engineering
    1. The way it changes our use of language
    2. The main language resources
    3. Some basic processes
    4. The main techniques and some of their problems
    5. Some of the results in the future of the development of language engineering

Methodology:

I have not followed the questions in the web to do this review. Instead of that, I have chosen a topic which we have seen in class (Languege Engineering ) and I have read most of the documents available in the web of the subject "English Language and New Technologies" to look for information about Language Engineering. I have followed the process of copy and paste information that we have practice in class. However, I have not copy everything in the same way it is in the web but I have copy just the information I thougt It would be useful.

Language Engineering

1.Human language

Language is the natural means of human communication, the more effective way we have to communicate to each other. For most of us language is fundamental for all aspects of our life. However, between humans, understanding is usually limited to those groups who share a common language. In this respect language can sometimes be seen as much a barrier to communication as an aid.

2-What is Language Engineering?

According to this communication barrier of human language a change is taking place which will revolutionise our use of language and greatly enhance the value of language in every aspect of communication. This change is the result of developments in Language Engineering. Language Engineering is a technology which uses our knowledge of language to enhance our application of computer systems: improving the way we interface with them assimilating, analysing, selecting, using, and presenting information more effectively providing human language generation and translation facilities. It provides ways in which we can extend and improve our use of language to make it a more effective tool.

2.1- How does it change the way we use our language?

Thanks to Language Engineering new opportunities are becoming available to change the way we do many things, to make them easier and more effective by exploiting our developing knowledge of language. When, in addition to accepting typed input, a machine can recognise written natural language and speech, in a variety of languages, we shall all have easier access to the benefits of a wide range of information and communications services, as well as the facility to carry out business transactions remotely, over the telephone or other telematics services. When a machine understands human language, translates between different languages, and generates speech as well as printed output, we shall have available an enormously powerful tool to help us in many areas of our lives. When a machine can help us quickly to understand each other better, this will enable us to co-operate and collaborate more effectively both in business and in government. The success of Language Engineering will be the achievement of all these possibilities. Already some of these things can be done, although they need to be developed further. The pace of advance is accelerating and we shall see many achievements over the next few years.

Language Engineering uses language resources, such as electronic dictionaries and grammars, terminology banks and corpora, which have been developed over time.The resources represent the knowledge base needed to recognise, validate, understand, and manipulate language using the power of computers. By applying this knowledge of language we can develop new ways to help solve problems across the political, social, and economic spectrum.

2.2 Main language resources:

Language resources are essential components of Language Engineering. They are one of the main ways of representing the knowledge of language, which is used for the analytical work leading to recognition and understanding. These resources are produced, according to standard formats and protocols and they are being made available through the European Language Resources Association (ELRA).

The main language resources are:

Lexicons: A lexicon is a repository of words and knowledge about those words. This knowledge may include details of the grammatical structure of each word (morphology), the sound structure (phonology), the meaning of the word in different textual contexts.

Specialist Lexicons: There are a number of special cases which are usually researched and produced separately from general purpose lexicons: Proper names: Dictionaries of proper names are essential to effective understanding of language, at least so that they can be recognised within their context as places, objects, or person, or maybe animals. They take on a special significance in many applications, however, where the name is key to the application such as in a voice operated navigation system, a holiday reservations system, or railway timetable information system, based on automated telephone call handling. Terminology: In today's complex technological environment there are a host of terminologies which need to be recorded, structured and made available for language enhanced applications. Many of the most cost-effective applications of Language Engineering, such as multi-lingual technical document management and machine translation, depend on the availability of the appropriate terminology banks. Wordnets: A wordnet describes the relationships between words; for example, synonyms, antonyms, collective nouns, and so on. These can be invaluable in such applications as information retrieval, translator workbenches and intelligent office automation facilities for authoring.

Grammars: A grammar describes the structure of a language at different levels: word (morphological grammar), phrase, sentence, etc. A grammar can deal with structure both in terms of surface (syntax) and meaning (semantics and discourse)

Corpora: A corpus is a body of language, either text or speech. There are national corpora of hundreds of millions of words but there are also corpora which are constructed for particular purposes

2.3 Basic Processes:

Taking these language resources into account, the basic processes of language engineering are concerned with:

-entering material into the computer, using speech, printed text or handwriting, or text either keyed in or introduced electronically.

-recognising the language of the material, distinguishing separate words, for example, recording it in symbolic form and validating it.

-building an understanding of the meaning of the material, to the appropriate level for the particular application.

-using this understanding in an application such as transformation (e.g. speech to text), information retrieval, or human language translation.

-generating the medium for presenting the results of the application.

-finally, presenting the results to human users via a display of some kind: a printer or a plotter; a loud speaker or the telephone.

However, within this model form of configuration there are many other configurations because depending on the technology used not all those components are always needed.

2.4.Main techniques:

The main techniques of language engineering are:

-Speaker Identification and Verification: A human voice is as unique to an individual as a fingerprint. This makes it possible to identify a speaker and to use this identification as the basis for verifying that the individual is entitled to access a service

-Speech Recognition: The sound of speech is received by a computer in analogue wave forms which are analysed to identify the units of sound (called phonemes) which make up words. The production of quality statistical phoneme models requires extensive training samples (corpora) and vast quantities of speech have been collected, and continue to be collected, for this purpose.

-Character and Document Image Recognition: Recognition of written or printed language requires that a symbolic representation of the language is derived from its spatial form of graphical marks. For most languages this means recognising and transforming characters. There are two cases of character recognition: recognition of printed images, referred to as Optical Character Recognition (OCR) and recognising handwriting, usually known as Intelligent Character Recognition (ICR)

-Natural Language Understanding: The understanding of language is obviously fundamental to many applications. However, perfect understanding is not always a requirement. In fact, gaining a partial understanding is often a very useful preliminary step in the process.

-Natural Language Generation: A semantic representation of a text can be used as the basis for generating language

-Speech Generation: Speech is generated from filled templates, by playing 'canned' recordings or concatenating units of speech (phonemes, words) together. Speech generated has to account for aspects such as intensity, duration and stress in order to produce a continuous and natural response. Providing a library of speech recognisers and generators, together with a graphical tool for structuring their application, allows someone who is neither a speech expert nor a computer programmer to design a structured dialogue which can be used, for example, in automated handling of telephone calls.

2.4.1 Some problems of these techniques:

However, there are still some problems that have to be overcome. In the speaker interpretation and verification the types of problems which have to be overcome are, for example recognising that the speech is not recorded, selecting the voice through noise (either in the environment or the transfer medium), and identifying reliably despite temporary changes (such as caused by illness). In speech recognition There are a number of significant problems to be overcome if speech is to become a commonly used medium for dealing with a computer. The first of these is the ability to recognise continuous speech rather than speech which is deliberately delivered by the speaker as a series of discrete words separated by a pause. The next is to recognise any speaker, avoiding the need to train the system to recognise the speech of a particular individual. There is also the serious problem of the noise which can interfere with recognition, either from the environment in which the speaker uses the system or through noise introduced by the transmission medium, the telephone line, for example.

2.5. The results of the development of Language Engineering:

So as we can see there are still new developments to be done in language technologies to avoid all these problems. Our ability to develop our use of language holds the key to the multi-lingual information society; the European society of the future. New developments in Language Engineering will enable us to:

-access information efficiently, focusing precisely on the information we need, saving time and avoiding information overload.

-talk to our computer systems, at home as well as at work, in our cars and in public places where we need information or assistance.

-teach ourselves other languages and improve our use of our own, at our convenience: in our own time; at our own pace; and in our own place.

-do business efficiently over the telephone by interacting reliably and directly with voice operated computer systems; even instruct our PCs to carry out transactions on our behalf.

-learn more about what is happening around us, locally, nationally and internationally and have a greater influence on decisions affecting our lives.

-operate more effectively internationally, in business, in administration, in political activities and as citizens and consumers.

-provide a wider range of better services to the maximum number of fellow citizens, colleagues and customers.

Conclusion:

After writing this review, now I have at least an idea of what is the application of new technologies into human language. I have realized that Language Engineering is the application of knowledge of human language to the development of computer systems. (something that I did not know before doing this review).

Thanks to Language Engineering and these kinds of new technologies, the interaction between men and computers improves and that makes the communication between people with different tongues easier. However, I have realized that there are still some problems to overcome, to reach a complete communication between men and computers.

In my opinion, If the development of these new techniques continues, they will revolutionise the communication system. I think It will be a great achievement for all of us.

References:

http://sirio.deusto.es/abaitua/konzeptu/nlp/langeng.htm

Language Engineering (Brouchure by HLTCentral: caché)