By Idoia Martínez

English Filology- 2º Year



This is the first report we have been asked to formulate for the subject of English and New Technologies. This first report deals with the topic of language engineeering and its applications. in this report I will try to point out the most important features of language engineering nowadays.


This report is a brief and clear approach to the field of language engineering. It covers several pionts: first I present a brief definition of what language engineering is. Then, I try to show how language engineering works and how it relates with language. I also present the different applications that language engineering has nowadays and how can it help in the different fields of life.

As a second part to the body of the report I included some of the questions that appeared in the questionares that Professor abaitua proposed in class. In the answer to those questions we can find which are some of the elements present in language engineering or some of the techniques used.


The aim of language engineering is to facilitate the use of telematics applications and to increase the possibilities for communication in and between European languages.

Language Engineering provides ways in which we can extend and improve our use of language to make it a more effective tool. It is based on a vast amount of knowledge about language and the way it works, which has been accumulated through research. It uses language resources, such as electronic dictionaries and grammars, terminology banks and corpora, which have been developed over time. The research tells us what we need to know about language and develops the techniques needed to understand and manipulate it. The resources represent the knowledge base needed to recognise, validate, understand, and manipulate language using the power of computers. By applying this knowledge of language we can develop new ways to help solve problems across the political, social, and economic spectrum.

Language Engineering is a technology which uses our knowledge of language to enhance our application of computer systems:

New opportunities are becoming available to change the way we do many things, to make them easier and more effective by exploiting our developing knowledge of language.

When, in addition to accepting typed input, a machine can recognise written natural language and speech, in a variety of languages, we shall all have easier access to the benefits of a wide range of information and communications services, as well as the facility to carry out business transactions remotely, over the telephone or other telematics services.

When a machine understands human language, translates between different languages, and generates speech as well as printed output, we shall have available an enormously powerful tool to help us in many areas of our lives.

Language technologies can be applied to a wide range of problems in business and administration to produce better, more effective solutions. They can also be used in education, to help the disabled, and to bring new services both to organisations and to consumers. There are a number of areas where the impact is significant: competing in a global market, providing information for business, administration and consumers, enabling effective communications, ensuring easier accessibility and participation, etc.


Which are the main techniques used in Language Engineering?

Speaker Identification and Verification

 The types of problems which have to be overcome are, for example, recognising that the speech is not recorded, selecting the voice through noise (either in the environment or the transfer medium), and identifying reliably despite temporary changes (such as caused by illness).

Speech Recognition

The sound of speech is received by a computer in analogue wave forms which are analysed to identify the units of sound (called phonemes) which make up words. Statistical models of phonemes and words are used to recognise discrete or continuous speech input.

Character and Document Image Recognition

Recognition of written or printed language requires that a symbolic representation of the language is derived from its spatial form of graphical marks. For most languages this means recognising and transforming characters. There are two cases of character recognition:

Document image analysis is closely associated with character recognition but involves the analysis of the document to determine firstly its make-up in terms of graphics, photographs, separating lines and text, and then the structure of the text to identify headings, sub-headings, captions etc. in order to be able to process the text effectively.

Natural Language Understanding

Shallow or partial analysis of texts is used to obtain a robust initial classification of unrestricted texts efficiently. This initial analysis can then be used, for example, to focus on 'interesting' parts of a text for a deeper semantic analysis which determines the content of the text within a limited domain. It can also be used, in conjunction with statistical and linguistic knowledge, to identify linguistic features of unknown words automatically, which can then be added to the system's knowledge.

Combinations of analysis and generation with a semantic model allow texts to be translated. At the current stage of development, applications where this can be achieved need be limited in vocabulary and concepts so that adequate Language Engineering resources can be applied. Templates for document structure, as well as common phrases with variable parts, can be used to aid generation of a high quality text.

Natural Language Generation

A semantic representation of a text can be used as the basis for generating language. An interpretation of basic data or the underlying meaning of a sentence or phrase can be mapped into a surface string in a selected fashion; either in a chosen language or according to stylistic specifications by a text planning system.

Speech Generation

Speech is generated from filled templates, by playing 'canned' recordings or concatenating units of speech (phonemes, words) together. Speech generated has to account for aspects such as intensity, duration and stress in order to produce a continuous and natural response.

Dialogue can be established by combining speech recognition with simple generation, either from concatenation of stored human speech components or synthesising speech using rules.

Informaton taken from HLTCentral:

Language resources are essential components of Language Engineering. They are one of the main ways of representing the knowledge of language, which is used for the analytical work leading to recognition and understanding.

The work of producing and maintaining language resources is a huge task. Resources are produced, according to standard formats and protocols to enable access, in many EU languages, by research laboratories and public institutions. Many of these resources are being made available through the European Language Resources Association (ELRA).


A lexicon is a repository of words and knowledge about those words. This knowledge may include details of the grammatical structure of each word (morphology), the sound structure (phonology), the meaning of the word in different textual contexts, e.g. depending on the word or punctuation mark before or after it. A useful lexicon may have hundreds of thousands of entries. Lexicons are needed for every language of application.

Specialist Lexicons

There are a number of special cases which are usually researched and produced separately from general purpose lexicons:

Proper names: Dictionaries of proper names are essential to effective understanding of language, at least so that they can be recognised within their context as places, objects, or person, or maybe animals. They take on a special significance in many applications, however, where the name is key to the application such as in a voice operated navigation system, a holiday reservations system, or railway timetable information system, based on automated telephone call handling.

Terminology: In today's complex technological environment there are a host of terminologies which need to be recorded, structured and made available for language enhanced applications. Many of the most cost-effective applications of Language Engineering, such as multi-lingual technical document management and machine translation, depend on the availability of the appropriate terminology banks.

Wordnets: A wordnet describes the relationships between words; for example, synonyms, antonyms, collective nouns, and so on. These can be invaluable in such applications as information retrieval, translator workbenches and intelligent office automation facilities for authoring.


A grammar describes the structure of a language at different levels: word (morphological grammar), phrase, sentence, etc. A grammar can deal with structure both in terms of surface (syntax) and meaning (semantics and discourse).


A corpus is a body of language, either text or speech, which provides the basis for:

There are national corpora of hundreds of millions of words but there are also corpora which are constructed for particular purposes. For example, a corpus could comprise recordings of car drivers speaking to a simulation of a control system, which recognises spoken commands, which is then used to help establish the user requirements for a voice operated control system for the market.

Source: HLTCentral:



As the comclusion to this paper we can say that although there have been a lot of research in this field, language engineering is still a field in which projects and groups are working. Language engineering as we can see has many posible aplications and will help very much in several different field of life. 

Apart from having several applications language engineering also has several complications to face. the huge number of words and the many different languages make experts in this field have to face a huge task. There are many projects working in this field and a proof of this is the number of web pages related to language engineering present on the internet.

Body information taken from: