Language Engineering

Harnessing the Power of Language

Language Technology

Contents


Language Today

Language in Action

Language is the natural means of human communication; the most effective way we have to express ourselves to each other. We use language in a host of different ways: to explain complex ideas and concepts; to manage human resources; to negotiate; to persuade; to make our needs known; to express our feelings; to narrate stories; to record our culture for future generations; and to create beauty in poetry and prose. For most of us language is fundamental to all aspects of our lives.

The use of language is currently restricted. In the main, it is only used in direct communications between human beings and not in our interactions with the systems, services and appliances which we use every day of our lives. Even between humans, understanding is usually limited to those groups who share a common language. In this respect language can sometimes be seen as much a barrier to communication as an aid.

A change is taking place which will revolutionise our use of language and greatly enhance the value of language in every aspect of communication. This change is the result of developments in Language Engineering.

Language Engineering provides ways in which we can extend and improve our use of language to make it a more effective tool. It is based on a vast amount of knowledge about language and the way it works, which has been accumulated through research. It uses language resources, such as electronic dictionaries and grammars, terminology banks and corpora, which have been developed over time. The research tells us what we need to know about language and develops the techniques needed to understand and manipulate it. The resources represent the knowledge base needed to recognise, validate, understand, and manipulate language using the power of computers. By applying this knowledge of language we can develop new ways to help solve problems across the political, social, and economic spectrum.

Language Engineering is a technology which uses our knowledge of language to enhance our application of computer systems:

New opportunities are becoming available to change the way we do many things, to make them easier and more effective by exploiting our developing knowledge of language.

When, in addition to accepting typed input, a machine can recognise written natural language and speech, in a variety of languages, we shall all have easier access to the benefits of a wide range of information and communications services, as well as the facility to carry out business transactions remotely, over the telephone or other telematics services.

When a machine understands human language, translates between different languages, and generates speech as well as printed output, we shall have available an enormously powerful tool to help us in many areas of our lives.

When a machine can help us quickly to understand each other better, this will enable us to co-operate and collaborate more effectively both in business and in government.

The success of Language Engineering will be the achievement of all these possibilities. Already some of these things can be done, although they need to be developed further. The pace of advance is accelerating and we shall see many achievements over the next few years.


Language is Fundamental

Language is a means of effective, efficient communication. It is also a medium for recording and assimilating information; in practice, the most convenient way of representing most of the information we need. Language is vital both to our business activities and to our administration. It is also very important in many of the social, cultural and political aspects of our lives. Language is integral to our culture. It helps each of us to define ourselves.

For each one of us, our own language is fundamental to our national and cultural identity, providing a link to our traditions as well as the foundation of our education and entertainment.

In Europe we have the benefit of a diversity of languages and cultures, which means that we have the opportunity to learn a great deal about each others' culture and way of life. This remains one of the bases for a cohesive European society. If the benefits of a multi-lingual society are to remain a feature of the European way of life then we must explore ways in which to overcome the barriers to communication and understanding.

It is sometimes said that it is possible to use only one or two languages for international activities in business, administration and politics. To a certain extent this is true. However, it could never be entirely satisfactory. The dominance of a few languages would be an unacceptable imbalance of power as well as a poor use of resources.

Above all, it reduces significantly the number of people who can participate effectively in any activity and this is bound to exclude valuable contributions and lead to discontent. In time, such an approach would also marginalise the languages which are not used so widely, reducing further the scope of their usage and inevitably diminishing the richness and variety of our culture. It would adversely affect not only our feeling for national, regional and cultural identities, but also our sense of belonging to a truly European society, not just tolerant of its minorities but supportive of them, recognising their value.

Such a restrictive approach to language use would also limit the availability of a wide range of important new services and facilities by denying many people access to computer systems in their native language.

Europe's position as a naturally multi-lingual community in a multi-lingual world can be used to our commercial advantage. As we endeavour to collaborate more closely, to develop the single market as our home market, we have a special incentive to develop solutions to the problems of a multi-lingual market place. In successfully supporting our own language needs, especially in business, administration and education, Language Engineering will help us to compete for business in the global marketplace. On the one hand, our businesses will have a competitive edge through their experience in using technology to service the needs of a multi-lingual marketplace. On the other hand, we shall also have language products to sell to the rest of the world.

A pattern of life-long learning is expected to be one of the significant features of the Information Society. It is also recognised that managers of the future will need to be capable in more than one language. Language Engineering will make an important contribution to the development of personal tuition systems, not only for language learning but also in developing systems which adapt more effectively to the needs of the student.

Language enabled products will improve the performance of business and administration as well as individuals. Products which are developed using language technology will revolutionise our systems and enhance the range of services available to business, government and the public at large.

Speech recognition, understanding, and generation by computer, will make human computer interaction more efficient as well as more human. Natural language understanding by machines, will deliver our information needs with more precision and sensitivity, helping us to overcome the problem of having too much information to cope with.

Computer aided translation services and the generation of documents in foreign languages will not only improve our dealings within Europe but will also help to give us greater access to external markets.


Making Language Work for Us

Our ability to develop our use of language holds the key to the multi-lingual information society; the European society of the future. New developments in Language Engineering will enable us to:


Techniques and Resources

What is Language Engineering ?

Language Engineering is the application of knowledge of language to the development of computer systems which can recognise, understand, interpret, and generate human language in all its forms. In practice, Language Engineering comprises a set of techniques and language resources. The former are implemented in computer software and the latter are a repository of knowledge which can be accessed by computer software.


Components of the Technology

The basic processes of Language Engineering are shown in the diagram below. These are broadly concerned with:

Model of a Language Enabled System

Model of a Language Enabled System

Within this general model there are, of course, many different configurations. Depending on the application of the technology, not all these components are needed.


Techniques

There are many techniques used in Language Engineering and some of these are described below.


Speaker Identification and Verification

A human voice is as unique to an individual as a fingerprint. This makes it possible to identify a speaker and to use this identification as the basis for verifying that the individual is entitled to access a service or a resource. The types of problems which have to be overcome are, for example, recognising that the speech is not recorded, selecting the voice through noise (either in the environment or the transfer medium), and identifying reliably despite temporary changes (such as caused by illness).


Speech Recognition

The sound of speech is received by a computer in analogue wave forms which are analysed to identify the units of sound (called phonemes) which make up words. Statistical models of phonemes and words are used to recognise discrete or continuous speech input. The production of quality statistical models requires extensive training samples (corpora) and vast quantities of speech have been collected, and continue to be collected, for this purpose.

There are a number of significant problems to be overcome if speech is to become a commonly used medium for dealing with a computer. The first of these is the ability to recognise continuous speech rather than speech which is deliberately delivered by the speaker as a series of discrete words separated by a pause. The next is to recognise any speaker, avoiding the need to train the system to recognise the speech of a particular individual. There is also the serious problem of the noise which can interfere with recognition, either from the environment in which the speaker uses the system or through noise introduced by the transmission medium, the telephone line, for example. Noise reduction, signal enhancement and key word spotting can be used to allow accurate and robust recognition in noisy environments or over telecommunication networks. Finally, there is the problem of dealing with accents, dialects, and language spoken, as it often is, ungrammatically.


Character and Document Image Recognition

Recognition of written or printed language requires that a symbolic representation of the language is derived from its spatial form of graphical marks. For most languages this means recognising and transforming characters. There are two cases of character recognition:

OCR from a single printed font family can achieve a very high degree of accuracy. Problems arise when the font is unknown or very decorative, or when the quality of the print is poor. In these difficult cases, and in the case of handwriting, good results can only be achieved by using ICR. This involves word recognition techniques which use language models, such as lexicons or statistical information about word sequences.

Document image analysis is closely associated with character recognition but involves the analysis of the document to determine firstly its make-up in terms of graphics, photographs, separating lines and text, and then the structure of the text to identify headings, sub-headings, captions etc. in order to be able to process the text effectively.


Natural Language Understanding

The understanding of language is obviously fundamental to many applications. However, perfect understanding is not always a requirement. In fact, gaining a partial understanding is often a very useful preliminary step in the process because it makes it possible to be intelligently selective about taking the depth of understanding to further levels.

Shallow or partial analysis of texts is used to obtain a robust initial classification of unrestricted texts efficiently. This initial analysis can then be used, for example, to focus on 'interesting' parts of a text for a deeper semantic analysis which determines the content of the text within a limited domain. It can also be used, in conjunction with statistical and linguistic knowledge, to identify linguistic features of unknown words automatically, which can then be added to the system's knowledge.

Semantic models are used to represent the meaning of language in terms of concepts and relationships between them. A semantic model can be used, for example, to map an information request to an underlying meaning which is independent of the actual terminology or language in which the query was expressed. This supports multi-lingual access to information without a need to be familiar with the actual terminology or structuring used to index the information.

Combinations of analysis and generation with a semantic model allow texts to be translated. At the current stage of development, applications where this can be achieved need be limited in vocabulary and concepts so that adequate Language Engineering resources can be applied. Templates for document structure, as well as common phrases with variable parts, can be used to aid generation of a high quality text.


Natural Language Generation

A semantic representation of a text can be used as the basis for generating language. An interpretation of basic data or the underlying meaning of a sentence or phrase can be mapped into a surface string in a selected fashion; either in a chosen language or according to stylistic specifications by a text planning system.


Speech Generation

Speech is generated from filled templates, by playing 'canned' recordings or concatenating units of speech (phonemes, words) together. Speech generated has to account for aspects such as intensity, duration and stress in order to produce a continuous and natural response.

Dialogue can be established by combining speech recognition with simple generation, either from concatenation of stored human speech components or synthesising speech using rules.

Providing a library of speech recognisers and generators, together with a graphical tool for structuring their application, allows someone who is neither a speech expert nor a computer programmer to design a structured dialogue which can be used, for example, in automated handling of telephone calls.


Language Resources

Language resources are essential components of Language Engineering. They are one of the main ways of representing the knowledge of language, which is used for the analytical work leading to recognition and understanding.

The work of producing and maintaining language resources is a huge task. Resources are produced, according to standard formats and protocols to enable access, in many EU languages, by research laboratories and public institutions. Many of these resources are being made available through the European Language Resources Association (ELRA).


Lexicons

A lexicon is a repository of words and knowledge about those words. This knowledge may include details of the grammatical structure of each word (morphology), the sound structure (phonology), the meaning of the word in different textual contexts, e.g. depending on the word or punctuation mark before or after it. A useful lexicon may have hundreds of thousands of entries. Lexicons are needed for every language of application.


Specialist Lexicons

There are a number of special cases which are usually researched and produced separately from general purpose lexicons:

Proper names: Dictionaries of proper names are essential to effective understanding of language, at least so that they can be recognised within their context as places, objects, or person, or maybe animals. They take on a special significance in many applications, however, where the name is key to the application such as in a voice operated navigation system, a holiday reservations system, or railway timetable information system, based on automated telephone call handling.

Terminology: In today's complex technological environment there are a host of terminologies which need to be recorded, structured and made available for language enhanced applications. Many of the most cost-effective applications of Language Engineering, such as multi-lingual technical document management and machine translation, depend on the availability of the appropriate terminology banks.

Wordnets: A wordnet describes the relationships between words; for example, synonyms, antonyms, collective nouns, and so on. These can be invaluable in such applications as information retrieval, translator workbenches and intelligent office automation facilities for authoring.


Grammars

A grammar describes the structure of a language at different levels: word (morphological grammar), phrase, sentence, etc. A grammar can deal with structure both in terms of surface (syntax) and meaning (semantics and discourse).


Corpora

A corpus is a body of language, either text or speech, which provides the basis for:

There are national corpora of hundreds of millions of words but there are also corpora which are constructed for particular purposes. For example, a corpus could comprise recordings of car drivers speaking to a simulation of a control system, which recognises spoken commands, which is then used to help establish the user requirements for a voice operated control system for the market.


The Chain of Development and Application

The diagram below depicts the chain of activities which are involved in Language Engineering, from research to the delivery of language-enabled and language enhanced products and services to end-users. The process of research and development leads to the development of techniques, the production of resources, and the development of standards. These are the basic building blocks.

Model of Language Engineering Activities

Model of Language Engineering Activities

In practice, Language Engineering is applied at two levels. At the first level there are a number of generic classes of application, such as:

At the second level, these enabling applications are applied to real world problems across the social and economic spectrum. So, for example:

In general, language capability is embedded in systems to enhance their performance. Language Engineering is an 'enabling technology'.


The Impact of Language Engineering

Language technologies can be applied to a wide range of problems in business and administration to produce better, more effective solutions. They can also be used in education, to help the disabled, and to bring new services both to organisations and to consumers. There are a number of areas where the impact is significant:


Competing in a Global Market

Business success increasingly depends on the ability to compete in a global marketplace. Success is based on the ability to identify markets, sell into them effectively and provide the quality of aftersales service expected by customers. There are many areas where the application of Language Engineering can lead to greater efficiency and reduced costs. Such applications are:


Better Information

One of the key features of an information service is its ability to deliver information which meets the immediate, real needs of its client in a focused way. It is not sufficient to provide information which is broadly in the category requested, in such a way that the client must sift through it to extract what is useful. Equally, if the way that the information is extracted leads to important omissions, then the results are at best inadequate and at worst they could be seriously misleading.

Information is available throughout the world, on the World Wide Web, for example, in different languages. In reality, however, it is only available to a client who can firstly request the information in the language in which it is recorded and then understand the language in which the information is presented. Using machine translation facilities the person seeking information will be able to complete an information request in his or her native language and receive the information in that same language, regardless of the language in which the information is recorded.

Language Engineering can improve the quality of information services by using techniques which not only give more accurate results to search requests, but also increase greatly the possibility of finding all the relevant information available. Use of techniques like concept searches, i.e. using a semantic analysis of the search criteria and matching them against a semantic analysis of the database, give far better results than simple keyword searches.

One of the major, direct benefits of the Information Society for the ordinary citizen will be the improvement in public service information. However, the wide accessibility of this information will depend upon Language Engineering. People who are not familiar with the conventional user interface of a computer system will be able to request information by voice and the system will guide them through the possibilities. Those who want information about other countries, which may be held in a foreign language, will be able to receive it in their own language. A good example of this is a service which is currently being developed which will provide information about job opportunities across the European Union in the native language of the potential applicant. Obviously these are jobs where language skills are not significant. The service will be available on the Internet and it is also planned to have public booths where job seekers can use the service. In a mono-lingual pilot service run in Flanders, a surprising 26% of applications for jobs were received from applicants who had seen the details on the Internet.

Language Engineering will make a contribution in a large number of public interest areas. Intelligence gathering for law enforcement is an interesting case. In detecting smuggling for example, there is a large amount of information available from public or commercial sources which, if collated and presented in the right way, can give clear indications of suspicious activity. Details about ship movements, manifests and company information can highlight abnormal profiles of activity. The ability of language based analysis to produce these profiles is an important aid.


Direct Access to Services

In recent years there has been an explosion in the use of the telephone to deliver services such as banking, arranging insurance cover, and providing help desk facilities. The advantage of this type of service to the customer is that it provides a rapid response, 'around the clock'. For the supplier it is cost-effective because the business does not have to be conducted from expensive retail premises. Using speaker identification and speech recognition techniques it is possible to automate many of these services. A customer's telephone call can be dealt with by a computer system which is capable of having a meaningful dialogue with the caller and delivering the service to the customer's satisfaction. Perhaps the most obvious example today is the automation of the telephone banking services which are already available from many banks. The customer, telephoning the service would be answered by a computer which would, firstly, analyse the characteristics of the customer's voice to identify it and verify the customer's rights of access to the service. Then a dialogue would be conducted between the customer and the computer to establish the services required and to complete any transactions needed, e.g. paying a bill, providing a statement and so forth. Other examples could be ordering tickets for the theatre, making reservations for a journey by rail, ship, or aeroplane, and home shopping via cable television.

Apart from the economic advantage of automating services to provide 'around the clock' availability, it also removes the need for people to work long and unsociable hours to provide the necessary coverage. Services are likely to be more consistent, fast, and reliable. In addition the automatic recording of an audit trail for each transaction will mean that each party to the transaction can feel confident about its outcome.


Commerce in the Marketspace

Many of the actions involved in a business transaction, such as ordering, invoicing, and sending payment instructions to the bank, can be completed without the need for human intervention using, for example, EDI (Electronic Data Interchange) technology. However, at the present time, most business transactions are initiated by a dialogue between humans either on the telephone, in writing, or face-to-face. With improvements in the availability of telematics services and with the increasing use of the Internet and the World Wide Web, opportunities to automate more activities in the commercial cycle (see illustration below) have increased. Language enabled software will play a prominent role in making this automation easier to use and more effective.

The Cycle of Commerce

The Cycle of Commerce

To the human user one of the advantages of the World Wide Web is that information is published in natural language. However, for a software agent to scan and select information from the Web, requires that it is given the intelligence to understand the published information and match it to the requirements of its user. Language Engineering can make a significant contribution to the development of intelligent agents which can undertake to provide consumers with an easy way of using the facilities of electronic commerce. A consumer could instruct such an agent, by voice, to browse the Web or any similar service, to read catalogues and select suitable products, to look for and negotiate prices, even assemble bids in an electronic auction. When the results have been reviewed the consumer would then tell the agent to place the order and, subsequent to delivery, instruct the bank to pay an electronic invoice. The human users would see none of the complexity of the underlying commercial transactions which would be dealt with by the agent.

After sales service can also be improved by using hypertext based electronic help desks with additional, language enabled facilities. The benefits of this automation are immense. Apart from the reduction of costs throughout the business transaction cycle, a wider choice of suppliers and products can be reviewed and assessed for suitability, and competitive pricing will be stimulated. The whole process will be faster and more efficient and, once the relevant information has been recorded, the accuracy of all the derivative processes can be assured.

In time, electronic commerce will change the business model itself. There will be less need for middlemen. New and small enterprises will be able to make the world aware of their products and services quickly, effectively and without too much expense. However, without language understanding and multi-lingual capability, these benefits cannot be fully realised.


Effective Communication

Communication is probably the most obvious use of language. On the other hand, language is also the most obvious barrier to communication. Across cultures and between nations, difficulties arise all the time not only because of the problem of translating accurately from one language to another, but also because of the cultural connotations of word and phrases. A typical example in the European context is the word 'federal' which can mean a devolved form of government to someone who already lives in a federation, but to someone living in a unitary sovereign state, it is likely to mean the imposition of another level of more remote, centralised government.

As the application of language knowledge enables better support for translators, with electronic dictionaries, thesauri, and other language resources, and eventually when high quality machine translation becomes a reality, so the barriers will be lowered. Agreements at all levels, whether political or commercial, will be better drafted more quickly in a variety of languages. International working will become more effective with a far wider range of individuals able to contribute. An example of a project which is successfully helping to improve communications in Europe is one which interconnects many of the police forces of northern Europe using a limited, controlled language which can be automatically translated, in real-time. Such a facility not only helps in preventing and detecting international crime, but also assists the emergency services to communicate effectively during a major incident.


Accessibility and Participation

One of the most important ways in which Language Engineering will have a significant impact is in the use of human language, especially speech, to interface with machines. This improves the usability of systems and services. It will also help to ensure that services can be used not just by the computer literate but by ordinary citizens without special training. This aspect of accessibility is fundamental to a democratic, open, and equitable society in the Information Age.

A good example of the type of service which will be available is an automated legal advice service. The accessibility of the justice system to all citizens is becoming a serious problem in many societies where the cost of legal expertise and the process of law prevents all but the very rich, and those qualifying for legal aid, from exercising their legal rights. It will be possible using language based techniques not only to provide advice which is based on an understanding of the problem and an analysis of the relevant body of law, but also to understand a natural language description of the problem and deliver the advice, as a human lawyer would have done, in spoken or printed form. Such a service could be made available through kiosks in court buildings or post offices, for example. This type of application can also be used to inform citizens of social security entitlements and job opportunities, as well as providing a useable, comprehensible interface to more open government.

Systems with the capacity to communicate with their users interactively, through human language, available either through access points in public places or in the home, via the telephone network or TV cables, will make it possible to change the nature of our democracy. There will be a potential for participation in the decision-making process through a far greater availability of information in understandable and 'objective' form and through opinion gathering on a very large scale. Many people whose lives are affected by disability can be helped through the application of language technology. Computers with an understanding of language, able to listen, see and speak, will offer new opportunities to access services at home and participate in the workplace.


Improved Education Opportunities

Distance learning has become an important part of the provision of education services. It is especially important to the concept of 'life-long learning' which is expected to become an important feature of life in the Information Age. The effectiveness of distance learning and self-study is improved by using telematics services and computer aided learning.The quality and success of computer aided learning can be greatly enhanced by the use of Language Engineering techniques. If the computer aided learning package can understand the answers which its users give to questions, rather than simply recognise that the answer is right or wrong, it can direct them down a path which is more appropriate to their needs. In this way, students are likely to learn more effectively and have a longer concentration span, because a more sensitive package is inherently more comfortable to work with.

In future, in Europe, it will be essential in many walks of life to be competent in more than one language. Of course, computer aided language learning (CALL) is an area of prime importance for the application of Language Engineering. The same knowledge that is essential to the machine's ability to understand, is also the basis for the interactive teaching process, providing quality diagnostics of student errors as well as illustrating correct usage. New, more effective learning facilities at home and at work will greatly increase the opportunities to expand our knowledge and develop new skills.


Entertainment, Leisure and Creativity

The attraction of computer games to our children is a clear indication of the potential of the computer to affect our culture. Home entertainment can become more educational, while education can become more attractive, 'edutainment' as it has become known. The possibility of tele-presence in virtual environments such as museums, art galleries and libraries will provide a rich cultural experience, available to a wide section of society in the comfort and convenience of their own homes. Virtual visits to such cultural archives will be aided by language technology enabling the research and selection of all forms of digitised language based records, indexing and retrieval of images, dubbing of films and automatic production of sub-titles and providing translation of library and archive material.

For a wider range of people, writing can become a more exciting activity. Authoring tools will make it possible for them to achieve much higher quality results. The use of on-line dictionaries and thesauri, for example, makes selection of the 'mot juste' more likely, and grammar can be checked. The result can be a far more satisfying experience for writers who are not naturally gifted or well educated but who want to express themselves effectively in their business or social correspondence.


The Benefits

The benefits to be gained from successful Language Engineering are immense. They include:


Glossary - Commonly used Terminology

The following glossary describes some of the commonly used terminology of Language Engineering. Each term is classified as being: [a] - acronym; [adj] - adjective; [n] - noun; [p] - phrase; [v] - verb.

abstract [n] a short, concise description of a document, which covers the full scope of its contents
ambiguity [n] a state whereby a word or sentence can be understood in different ways; the former because the word has more than one meaning or the latter because the structure of the sentence can be analysed in such a way as to convey more than one meaning
authoring tools [p] facilities provided in conjunction with word processing to aid the author of documents, typically including an on-line dictionary and thesaurus, spell-, grammar-, and style-checking, and facilities for structuring, integrating and linking documents
CALL [a] Computer Aided Language Learning
character recognition [p] see Character and Document Recognition
computational linguistics [p] an area of applied linguistics concerned with the processing of natural language by computers
concept search [p] used in the context of information retrieval to mean that the search is made using a semantic analysis of the search filter matched against a semantic analysis of the database
continuous speech [p] speech where the speaker makes no allowances for the listener (e.g. a speech recognition device) by pausing between words
controlled language [p] language which has been designed to restrict the number of words and the structure of (also artificial language) language used, in order to make language processing easier; typical users of controlled language work in an area where precision of language and speed of response is critical, such as the police and emergency services, aircraft pilots, air traffic control, etc.
corpus (plural corpora) [n] see Corpora
dialogue [n] an interactive, two way alternate flow of language between two individuals, an individual and a machine, or between two machines
dictionary [n] a list of words and a description of each, usually confined to describing their meaning and possibly their etymology
discourse [n] a contiguous stretch of language comprising more than one sentence
discourse analysis [p] analysis to identify the linguistic dependencies which exist between sentences
document image recognition [p] see Character and Document Image Recognition
domain [n] usually applied to the area of application of the language enabled software e.g. banking, insurance, travel, etc.; the significance in Language Engineering is that the vocabulary of an application is restricted so the language resource requirements are effectively limited by limiting the domain of application
formalism [n] a means to represent the rules used in the establishment of a model of linguistic knowledge
generate [v] to produce language in one form from another form of language or information see also Speech Generation and Natural Language Generation
globalisation [n] the process of preparing software for use in any language and cultural environment either by designing it to be usable in this way or by adding facilities to existing software to facilitate subsequent localisation (see below)
grammar [n] see Grammars
grammar checker [p] a software facility which checks text for the correctness of its grammar
hidden Markov model [p] a finite state machine in which not only transitions are probabilistic but also output; currently used in speech recognition systems to help to determine the words represented by the sound wave forms captured
hypertext [n] a system commonly used for help files and in the World Wide Web whereby highlighted text is used to provide a link (rather like an index) to related text (often a more detailed explanation of the item highlighted)
index [v] to build a concise means of reference to information within a database which, for textual information, can be based on keywords or concepts
information extraction [p] the process of selecting information from a database using indices based on keywords, semantics, and/or concept searching
information retrieval [p] usually used as a generic term to cover the access to and delivery of information from natural language databases by whatever method
interlingua [n] an invented language which can be used as a common, formal representation into which source natural language may be translated and from which target natural language can be generated
interpret [v] generally, to attribute meaning to language; but also, to translate from one language to another, usually orally, in real-time
language enabled [p] describes a computer application which has been improved in functionality, performance, enhanced and/or presentation by the use of language engineering
language engineering [p] the application of knowledge of language to the development of computer systems which can recognise, understand, interpret and generate human language in all its forms
language resources [p] see Language Resources
lemmatise [v] to break an inflected word into its root (base form) and ending components
lexicon [n] see Lexicons
localise [v] to adapt software to the local requirements in terms of language and culture (including legal practice and business conventions, for example)
machine translation [p] the process of automatically translating from one language to another by a computer
machine aided translation [p] the process of assisting a human translator in translating from one language to another using computer software tools
machine readable [p] a dictionary (see above) which can be read by computer dictionary software
mark up [v] to annotate text so that its structure and presentation are defined in such a way that the structure can be reproduced by a software system other than that used for its creation
morpheme [n] the smallest meaningful element of language
morphology [n] the science of the structure of words
multi-lingual [adj] properly used to mean that something exists in a form that can handle several languages but often used to describe the characteristic that versions exist in several languages
natural language generation [p] see Natural Language Generation
natural language processing [p] a term in use since the 1980s to define a class of software systems which handle text intelligently
OCR [a] see Optical Character Recognition below
Optical Character Recognition [p] see Character and Document Image Recognition
onomastics [n] scientific investigation of proper names (see Specialist Lexicons)
parse [v] analyse language in order to establish its structure and relationships at a the level of syntax and/or semantics
phoneme [n] the smallest unit of sound (analogous to a morpheme) which can be identified from an acoustic flow of speech and which is semantically distinct
proper names [p] see Specialist Lexicons
semantics [n] the analysis of language to determine meaning
shallow parser [p] software which parses language to a point where a rudimentary level of understanding can be realised; this is often used in order to identify passages of text which can then be analysed in further depth to fulfil the particular objective
speaker identification [p] see Speaker Identification and Verification
speaker independent [p] describes a speech recognition system which is capable of recognising speech regardless of the speaker, i.e. it does not need to be trained to recognise individual speakers
speaker verification [p] see Speaker Identification and Verification
speech recognition [p] see Speech Recognition
speech generation [p] see Speech Generation
speech to text [p] the process of analysing speech and producing its textual equivalent; a typical example of a speech to text application is in dictation systems
spell checker [p] software which checks the spelling of words
style check [p] software which checks a document to ensure that it conforms to a template defining the structure of the text and the document containing it; also the checking of the use of phrases or sentences in a predefined way
summarise [v] to produce a concise description of a document, which covers the full scope of its contents
syllable [n] a unit of pronunciation which is more than a single sound (see phoneme above) and smaller than a word
syntax [n] the system of rules which describe how sentences can be formed from basic elements of language, i.e. morphemes, words and parts of speech
tag [v] to annotate a corpus by attaching information to the words, which describes the grammatical context of the words and/or associations with other words
terminology [n] see Specialist Lexicons
text [n] used frequently to distinguish written, printed, or symbolically recorded (using character encoding) language from speech
text alignment [p] the process of aligning different language versions of a text in order to be able to identify equivalent terms, phrases, or expressions
text to speech [p] the process of producing the speech equivalent of text; a typical example of a text to speech application is an automatic announcement system at an airport or railway station
thesaurus [n] a dictionary of synonyms
translate [v] to transfer a text from one language to another
translation memory [p] a system which builds knowledge about translating from one language to another by remembering and re-using previous translations
translator's workbench [p] a software system providing a working environment for a human translator, which offers a range of aids such as on-line dictionaries, thesauri, translation memories, etc
user modelling [p] usually, in dialogue based speech recognition, a component which attempts to be sensitive to the various sorts of users that the system may encounter
utterance [n] the string of sounds produced by a speaker between two pauses
version [n] an edition of a document which is recorded as different from the previous edition
version control [p] the management of the production, recording, and issue of documents
voice authentication [p] speaker verification
voice recognition [p] speech recognition
wizard of Oz testing [p] testing in which the automated machine component is substituted by some form of human intervention but in such a way that the user participating in the test is unaware of the substitution
wordnet [n] see Specialist Lexicons


| Home | I*M Europe | Telematics | What's new | News | Events | Reports | Indexes | Site map | FAQ | About this site | Search | Feedback |