ABSTRACT

This report wants to be a review to the Information Society and all the subjects that can be related to it. The idea is to see how language and new technologies are now deeply connected and how they interact with the actual society. Indeed, the development of new technologies has allowed a fast and efficient way to communicate, and are now used broadly. But as the facilities to communicate grows, so grows the problem of multilinguality and machine translation. A hint of a solution may be presented in this report. Priority was given to the data explaining the contents of the report in a clear way in order to easy the reading to those who are not familiar to the subject.

 

 

INTRODUCTION

 

Since the construction of the first computer, technology has evolved in an exponential way; every day new technology appears and is improved, allowing amazing feats otherwise impossible to accomplish. Computers had brought to society new possible ways of interaction, transforming our lives. People of different nations can now communicate with a simple “click”, travellers can visit virtual representations of their destinations and chose whether to go or not. This has changed the concept of social interaction, destroying ancient social barriers and raising new ones; Internet has played a crucial role here with the possibility for every user to share their thoughts with the rest of the world. In this report we will see how technology influences language and the situation of both concepts in society.

 

The report is divided in multiple sections, beginning with the thematic of the information society which includes a brief description of this concept: how new technologies have influenced our society to a point that it has now became an important part of our lives, influencing many aspects, specially the use of language. How the information society is formed is the next point covered, explaining then the concepts of Language Engineering and Human Language Technologies which are the tools that allow the information society to grow and develop. The next part treats briefly the problem of multilinguality along with a possible solution, the machine translation. Nevertheless as we will see machine translation is not flawless, and in fact is very far from perfection: these problems will be explained to as they enlarge our vision of language.

 

 

 

THE INFORMATION SOCIETY

 

In the European Union, the concept of the Information Society has been evolving strongly over the past few years building on the philosophy originally spelled out by Commissioner Martin Bangemann in 1994. Bangemann argued that the Information Society represents a "revolution based on information ... [which] adds huge new capacities to human intelligence and constitutes a resource which changes the way we work together and the way we live together..." (European Commission, 1994:4). One of the main implications of this "revolution" for Bangemann is that the Information Society can secure badly needed jobs (Europe and the Global Information Society, 1994:3). In other words, a driving motivation for the Information Society is the creation of employment for depressed economies.

 

The Goldsworthy report sees the Information Society as a "societal revolution based around information and communication technologies and about the role of these in developing global competitiveness and managing the transition to a globalised free trade world" (Department of Industry, Science and Tourism, 1997). In short, Goldsworthy's idea of the Information Society is entirely an economic one. At a broader level Barry Jones, the author of the House of Representatives Standing Committee's 1991 report 'Australia as a Information Society' sets out a definition of the Information Society which sees it as simply "a period when use of time, family life, employment, education and social interaction are increasingly influenced by access to Information Technology" (Australia as an Information Society: Grasping New Paradigms, 1991).

 

These are just a few examples of ideas underpinning information policy drives in the developed world where the concept is accepted almost without challenge, and there is an inherent belief that like the Olympics, the Information Society is real - or will be very soon if only we can get ourselves organised properly. Some claim, of course, that the Information Society is here already and not just on its way. But one way or the other "it" exists and is a "good thing". By and large, national and regional Information Society documents do not question the belief that the Information Society will bring prosperity and happiness if a few basic safeguards are put in place. Some of the very few notes of serious caution in the practice of information policy have come through the influence of the Scandinavian countries which joined the European Union when the EU was already in full flight with implementing the actions flowing from the Bangemann report. Interestingly, in recent travels in India I noticed an extraordinary level of hope and trust in that developing country in the potential of information technology to transform India into a modern fully developed economy. The push to develop information and technological infrastructure initiated by Rajiv Gandhi is seen as positive and a necessary step for the goal of a universally prosperous society in India. Effectively there is the same acceptance of the goodness of an Information Society and the absolute necessity to be one, that is found in the West.

 

Given this blind faith in the existence and the desirability of an Information Society among diverse nations, it is instructive to look at the theoretical literature which has spawned the idea to see what it claims for the Information Society. The term Information Society has many synonyms: Information Age, Information Revolution, Information Explosion and so on and it is found across a wide spectrum of disciplines. Fortunately the task of unravelling many of these ideas has been accomplished in a masterly way by Frank Webster. He has categorised the variety of concepts of the Information Society, Information Revolution, or whatever, and provided an analysis of five common conceptions of the Information Society (Webster, 1995).

 1-Technological

             The initial reports focuses on the convergence of computers and telecommunications and the capacity for storage, manipulation and transmission of vast amounts of data. The problem is, however, that drawing a direct line between the presence of information technology with some sort of new society is hard to justify. Will the presence of say, a computer in every home, make us an Information Society? Or should that be two computers? At what point will we know we've arrived? What changes in our fundamental institutions, ways of living and working characterises an Information Society, as opposed to a non- Information Society? A further weakness of this concept is highlighted by the many commentators who point out the dangers of technological determinism in thinking about the Information Society and reject the view that technology impacts on society and is the prime agent of change, defining the social world

2-Economic

 

This concept of the Information Society has been built on Fritz Machlup's seminal study of the size and effect of the US information industries in the 1960s; he demonstrated that education, the media, computing, information services (including insurance, law and other information based professions), R+D and so on accounted for some 30% of GNP.

Entrancing as it is to have numbers to quote in support of the importance of information in the economy, it is difficult to argue that the existence of lots of information activities in society actually impacts on social life, without moving to an analysis of the substance or quality of that information. In any event, what matters, surely, is not the amount but the meaning and value of information. Some econometric studies suggest that the early experimental exponential growth of information activities as a proportion of economic activities has actually slowed down with little change from 1958 to 1980.

 

3- Occupational

 

This idea of the Information Society rests on the idea that in an Information Society the dominant category of worker is engaged as an "information worker". Many commentators have produced data to demonstrate growth patterns in the need for more workers who will use their brain rather than their brawn. Daniel Bell's influential 'Coming of the Post-Industrial Society' argued that the professional and technical classes would dominate in the new era with work organised around theoretically based knowledge for the purpose of social control and directing of innovation and change (Bell 1974: 15-20).

The challenge was to find a way of saying definitively whether a job was predominantly an information professional's job or not since it appears that in all works there’s a certain amount of information processing,

There was a time when policy was clearly the business of the public sector and was essentially about "what governments choose to do and what not to do" (Dye 1995). The trouble now is that the edges of the public and private spheres are becoming more difficult to distinguish as has been amply demonstrated by papers in this strand of the Conference. It is interesting that the field of information studies has in some ways anticipated this development as it has accepted the place of private sector organisational policy on information matters to be recognised as "information policy" even though, at least traditionally, these policies were turned inwards to the support of organisational roles.

With the general global drive to interweave public and private sector activities within market-led, neo-liberal frameworks the burgeoning information and IT infrastructure within governments cannot be considered adequately without looking the interaction of public and private sectors. The private sector can have monumental effects on what governments can do with information for their own use or in the context of making information available to the community at large. Take, for example, the decision to concentrate Microsoft and Apple interests. This cannot but impact on government through the extension of control of the IT and software industries. This effect is even more pointed when governments operate along strictly market philosophies and for-profit activities are incorporated in the government sector.

Some understanding of how the fusion of public and private impacts on information policy can be gained from Nick Moore's analysis of Western and East Asian information policy implementation strategies (Moore, 1997). Moore argues that there are two broad approaches to information policy formation. One, the neo-liberal, puts its trust in the market to move society along towards the Information Society. The European Union policies illustrate this particularly well as there the basic tenet of information policy is the belief that the achievement of the Information Society "is a task for the private sector" with the role of government confined to ensuring a supportive regulatory climate and a refocusing of current public expenditure patterns.

 

 

INFORMATION OVERLOAD

 

 

What is information ?

 

The task of define the concept of information has not been an easy one and nowadays the term still remains a bit ambiguous. Some terms like “data” or “knowledge” are often used as synonyms while sometimes they only follow fashions in their usages but the nature of information can be focused in two concepts: firstly there are those who see information as a tangible entity which can be processed, moved, changed and so on; and then there’s those who see information as existing only in the human brain, the result of absorption of symbols and signs. In this approach information is seen as subjective and ambiguous with no "reality" so that it can be understood only in terms of process and how it changes people, or through its use or impact on individual action.

Fortunately, there has been some useful work which has linked some of the many conceptions of information in a framework which can guide the policy maker. Sandra Braman has outlined four main categories of information to be considered in policy making:

  1. Information as a resource: This is the idea which historically has dominated thinking about the Information Society. Information consists of "pieces of information" unrelated to bodies of knowledge. Information and its creators, processors and users are regarded as discrete and isolated entities.
  2. Information as a commodity: Neo-liberal approaches to establishing an Information Society build strongly on the idea that information gains value as it passes through various information production processes such as indexing and abstracting. Notwithstanding the considerable difficulties of applying economic concepts generally to information it is accepted that information can be bought and sold for profit.
  3. Information as Perception: In moving up the hierarchy of definitions established by Braman, context is added when information is treated as perception of pattern. At this level the effects of information (such as its capacity to reduce uncertainty) are treated. This idea of information sits comfortably with the idea of information as an intangible and subjective phenomenon. It acknowledges that patterns and context differ between individuals and that information is relativistic.

4.      Information as a constitutive force in society: In this framework information is seen as having power in its own right and a capacity to shape context. Its capacity to change individuals and societies comes into play with the idea that "information is power" falling squarely into this set of beliefs.

 

Braman argues that effective information policy must consider information at all level of her hierarchy. Few information policies do this although the European drive to underpin its Information Society policy with a philosophy of "putting people in charge of information" and viewing the "Information Society as a "Learning Society" based on know-how and wisdom of people, not on information in machines" suggests a broader perspective than many information policy initiatives including our own

 

 

Information fatigue

 

David Lewis coined the term "information fatigue syndrome" for what he expects will soon be a recognized medical condition that touches specially administrators that must deal with enormous amounts of data issued from the information society. Lewis claims that those problems will get worse with the increasing use of internet, and can cause mental anguish and even physical illness.

This state of mind and body is caused not by a wrong administration but by a continual flow of information whose debit is born to be increased by the enormous facility that the information facility has to create, share and move information around the world.

            The only solution to this is  a specific training of individuals in the information management area, where they should be able to discern relevant from non-relevant data.

 

 

HUMAN LANGUAGE TECHNOLOGY AND LANGUAGE ENGINEERING

 

Language is the natural means of human communication; the most effective way we have to express ourselves to each other. We use language in a host of different ways: to explain complex ideas and concepts; to manage human resources; to negotiate; to persuade; to make our needs known; to express our feelings; to narrate stories; to record our culture for future generations; and to create beauty in poetry and prose. For most of us language is fundamental to all aspects of our lives. The use of language is currently restricted. In the main, it is only used in direct communications between human beings and not in our interactions with the systems, services and appliances which we use every day of our lives. Even between humans, understanding is usually limited to those groups who share a common language. In this respect language can sometimes be seen as much a barrier to communication as an aid.

A change is taking place which will revolutionise our use of language and greatly enhance the value of language in every aspect of communication. This change is the result of developments in Language Engineering.

Language Engineering provides ways in which we can extend and improve our use of language to make it a more effective tool. It is based on a vast amount of knowledge about language and the way it works, which has been accumulated through research. It uses language resources, such as electronic dictionaries and grammars, terminology banks and corpora, which have been developed over time. The research tells us what we need to know about language and develops the techniques needed to understand and manipulate it. The resources represent the knowledge base needed to recognise, validate, understand, and manipulate language using the power of computers. By applying this knowledge of language we can develop new ways to help solve problems across the political, social, and economic spectrum.

Our ability to develop our use of language holds the key to the multi-lingual information society; the European society of the future. New developments in Language Engineering will enable us to:

 

What is language engineering?

Language Engineering is the application of knowledge of language to the development of computer systems which can recognise, understand, interpret, and generate human language in all its forms. In practice, Language Engineering comprises a set of techniques and language resources. The former are implemented in computer software and the latter are a repository of knowledge which can be accessed by computer software.

1- Basic processes of a Language Engineering System

·  entering material into the computer, using speech, printed text or handwriting, or text either keyed in or introduced electronically

·  recognising the language of the material, distinguishing separate words, for example, recording it in symbolic form and validating it

·  building an understanding of the meaning of the material, to the appropriate level for the particular application

·  using this understanding in an application such as transformation (e.g. speech to text), information retrieval, or human language translation

·  generating the medium for presenting the results of the application

·  finally, presenting the results to human users via a display of some kind: a printer or a plotter; a loud speaker or the telephone.

 

2- The techniques that are used:

 

 

The languages resources

 

1-     Lexicons: A lexicon is a repository of words and knowledge about those words. This knowledge may include details of the grammatical structure of each word (morphology), the sound structure (phonology), the meaning of the word in different textual context

2-     Specialist lexicons: these lexicons are usually researched and produced separately from general purpose lexicons, usually related to proper names, terminology and wordnets.

3-     Grammar: A grammar describes the structure of a language at different levels: word (morphological grammar), phrase, sentence, etc. A grammar can deal with structure both in terms of surface (syntax) and meaning (semantics and discourse).

4-     A corpus is a body of language, either text or speech, which provides the basis for the analysis of language to establish its characteristics, to train a machine, usually to adapt its behaviour to particular circumstances, to verify empirically a theory concerning language and to set a test for a Language Engineering technique or application to establish how well it works in practice.

 

 

The chain of development and application

 

In practice, Language Engineering is applied at two levels. At the first level there are a number of generic classes of application, such as:

At the second level, these enabling applications are applied to real world problems across the social and economic spectrum. So, for example:

In general, language capability is embedded in systems to enhance their performance. Language Engineering is an 'enabling technology'.

THE IMPACT OF LANGUAGE ENGINEERING

 

Language technologies can be applied to a wide range of problems in business and administration to produce better, more effective solutions. They can also be used in education, to help the disabled, and to bring new services both to organisations and to consumers. There are a number of areas where the impact is significant:

 

1- Competing in a Global Market

Business success increasingly depends on the ability to compete in a global marketplace. Success is based on the ability to identify markets, sell into them effectively and provide the quality of aftersales service expected by customers. There are many areas where the application of Language Engineering can lead to greater efficiency and reduced costs such as the generation of business letters, the production and management of multi-lingual customer documentation, in-line translation of electronic communications or the provision of computer aided translation services.

 

 2- Better information

 

One of the key features of an information service is its ability to deliver information which meets the immediate, real needs of its client in a focused way. It is not sufficient to provide information which is broadly in the category requested, in such a way that the client must sift through it to extract what is useful. Equally, if the way that the information is extracted leads to important omissions, then the results are at best inadequate and at worst they could be seriously misleading.

Language Engineering can improve the quality of information services by using techniques which not only give more accurate results to search requests, but also increase greatly the possibility of finding all the relevant information available. Use of techniques like concept searches, i.e. using a semantic analysis of the search criteria and matching them against a semantic analysis of the database, give far better results than simple keyword searches.

 

3- Direct access to services

 

Apart from the economic advantage of automating services to provide 'around the clock' availability, it also removes the need for people to work long and unsociable hours to provide the necessary coverage. Services are likely to be more consistent, fast, and reliable. In addition the automatic recording of an audit trail for each transaction will mean that each party to the transaction can feel confident about its outcome.

 

4- Commerce in marketplaces.

 

Many of the actions involved in a business transaction, such as ordering, invoicing, and sending payment instructions to the bank, can be completed without the need for human intervention using, for example, EDI (Electronic Data Interchange) technology. However, at the present time, most business transactions are initiated by a dialogue between humans either on the telephone, in writing, or face-to-face. With improvements in the availability of telematics services and with the increasing use of the Internet and the World Wide Web, opportunities to automate more activities in the commercial cycle (see illustration below) have increased. Language enabled software will play a prominent role in making this automation easier to use and more effective.

In time, electronic commerce will change the business model itself. There will be less need for middlemen. New and small enterprises will be able to make the world aware of their products and services quickly, effectively and without too much expense. However, without language understanding and multi-lingual capability, these benefits cannot be fully realised.

 

 5-  Effective communication

 

As the application of language knowledge enables better support for translators, with electronic dictionaries, thesauri, and other language resources, and eventually when high quality machine translation becomes a reality, so the barriers will be lowered. Agreements at all levels, whether political or commercial, will be better drafted more quickly in a variety of languages. International working will become more effective with a far wider range of individuals able to contribute. An example of a project which is successfully helping to improve communications in Europe is one which interconnects many of the police forces of northern Europe using a limited, controlled language which can be automatically translated, in real-time. Such a facility not only helps in preventing and detecting international crime, but also assists the emergency services to communicate effectively during a major incident.

 

6- Accessibility and participation

 

One of the most important ways in which Language Engineering will have a significant impact is in the use of human language, especially speech, to interface with machines. This improves the usability of systems and services. It will also help to ensure that services can be used not just by the computer literate but by ordinary citizens without special training. This aspect of accessibility is fundamental to a democratic, open, and equitable society in the Information Age.

Systems with the capacity to communicate with their users interactively, through human language, available either through access points in public places or in the home, via the telephone network or TV cables, will make it possible to change the nature of our democracy. There will be a potential for participation in the decision-making process through a far greater availability of information in understandable and 'objective' form and through opinion gathering on a very large scale. Many people whose lives are affected by disability can be helped through the application of language technology. Computers with an understanding of language, able to listen, see and speak, will offer new opportunities to access services at home and participate in the workplace.

 

7- Improved education opportunities

 

Distance learning has become an important part of the provision of education services. It is especially important to the concept of 'life-long learning' which is expected to become an important feature of life in the Information Age. The effectiveness of distance learning and self-study is improved by using telematics services and computer aided learning.The quality and success of computer aided learning can be greatly enhanced by the use of Language Engineering techniques.

 

8- Entertaining, leisure and creativity

 

            Computer games as well as films benefits from the language engineering and may became  an 'edutainment' thanks to the subtitles: our children will learn to develop their language capabilities thanks to these improvements, and for a wider range of people, writing can become a more exciting activity. Authoring tools will make it possible for them to achieve much higher quality results.

 

 

LIMITATIONS OF MACHINE TRANSLATION

 

The development of natural language applications which handle multi-lingual and multi-modal information is the next major challenge facing the field of computational linguistics. Over the past 50 years, a variety of language-related capabilities has been developed in areas such as machine translation, information retrieval, and speech recognition, together with core capabilities such as information extraction, summarization, parsing, generation, multimedia planning and integration, statistics-based methods, ontologies, lexicon construction and lexical representations, and grammar. The next few years will require the extension of these technologies to encompass multi-lingual and multi-modal information.

Extending current technologies will require integration of the various capabilities into multi-functional natural language systems. However, there is today no clear vision of how these technologies could or should be assembled into a coherent framework. What would be involved in connecting a speech recognition system to an information retrieval engine, and then using machine translation and summarization software to process the retrieved text? How can traditional parsing and generation be enhanced with statistical techniques? What would be the effect of carefully crafted lexicons on traditional information retrieval?

 

Why machine translation seems to be so difficult

 

The question to be asked is therefore why some problems are more difficult for computers to deal with than others? With this knowledge, users should be able to understand why when 'post-editing’ certain types of ‘mistakes’ need to be constantly corrected, why when ‘pre-editing’ texts or composing in controlled languages’ certain types of ambiguity and constructions must always be avoided, and why in ‘interactive’ systems certain types of questions recur again and again.

The methods for dealing with translation difficulties vary from system to system. In many cases, the ambiguities specific to the source language are tackled in operations separate from the treatment of differences between languages. Commonly three basic operations are recognised: the analysis of the source text, the bilingual transfer of lexical items and structures and the generation of the target text. Questions of ambiguity and choice occur at every stage. For example, resolving the ambiguity of English cry between ‘weep’ and ‘shout’ would be part of a program for the analysis of English. On the other hand, the selection of connaître or savoir in French for the English verb know would be a matter for a separate transfer program. Analysis involves also the identification and disambiguation of structures, e.g. whether He saw her shaking hands means that he saw someone who was welcoming a visitor or he saw someone who was suffering from the cold weather. Transfer likewise can involve changes of structure, e.g. from an English infinitival construction He likes to swim to a German adverbial construction Er schwimmt gern. Generation is often incorporated in transfer operations, but when a separate component it might include operations to distinguish between English big, large and great (about which more later) and the production of correct morphology and word order in the target language (ses mains tremblantes, er darf nicht schwimmen).

 

Methods of analysis and transfer

All translation is a problem-solving activity, choices have to be made continually. The assumption in MT systems, whether fully or partially automatic, is that there are sufficiently large areas of natural language and of translation processes that can be formalised for treatment by computer programs. The basic premise is therefore that the differences between languages can to some extent be regularised. What this means at the practical level is that problems of selection can be resolved by clearly definable procedures. The major task for MT researchers and developers is to determine what information is most effective in particular situations, what kind of information is appropriate in particular circumstances, and whether some data should be given greater weight than others.

Specific words

Decisions based on specific words are the easiest to apply and are capable of the highest degree of precision. At the same time, however, there is inflexibility since there is no allowance for inflected variation of forms or for the least variation of word order. Three examples will be analysed:

1-     Compound nouns: the relation of one word with another may imply a disambiguation of the meaning of the compound. Thus many MT systems include entries for compounds such as light ship and light bulb; and indicate directly the target language equivalent (French ampoule, German Glühbirne).

 

2-     Idioms: The perceived difficulty of idioms is that the individual words take on meanings and connotations which they do not have in their literal usages. However, it is precisely because most idioms are relatively fixed expressions, consisting of the same words in the same sequence, that they can be easily translated into comparable idioms – or if none exist into a literal equivalent. Idioms can in fact be treated very much like any compound.

 

3-     Metaphors: The same approach can be taken with many metaphorical usagese.g. mouth of river, branch of a bank, flow of ideas, channel of communication, tide of opinion, foot of the mountain, leg of the table. Like idioms, metaphors of this kind can be treated as fixed compound expressions. We may note that among the European languages there is a common thread of similar formations, so that even if a metaphorical usage is not recorded in the dictionary, it may be possible to produce a ‘literal’ translation which has the same metaphorical impact.

 

The advantage of treating certain word combinations as fixed expressions and translating them as units is the considerable saving in processing, particularly the analysis of syntactic structure, and the assurance that the target output will be guaranteed to be correct. There are disadvantages also, however, since idioms can vary in structure, and variation is very common for ‘idiomatic’ phrasal verbs (2). In other words the identification of idiomatic expressions must often involve morphological and syntactic analysis.

 

Morphological analysis

It is a truism to say that one of the most straightforward operations of any MT system should be the identification and generation of morphological variants of nouns and verbs. There are basically two types of morphology in question: inflectional morphology, as illustrated by the familiar verb and noun paradigms (French marcher, marche, marchons, marchait, est marché, etc.), and derivational morphology, which is concerned with the formation of nouns from verb bases, verbs from noun forms, adjectives from nouns, and so forth, e.g. nation, nationalism, nationalise, nationalisation, and equivalents in other languages.

It should be stressed that any MT system should as a minimum be capable of recognising morphological forms and of generating them correctly. However, the alignment of equivalences between the verb forms between languages is another matter, particularly when modal forms are involved (must, might, devoir, falloir, mögen, dürfen, etc.). In general, a MT system which cannot go beyond morphological analysis will produce little more than word for word translations. It may cope well with compounds and other fixed expressions, it may deal adequately with noun and verb forms in certain cases, but the omission of any treatment of word order will give poor results.

 

Syntactic structures

Syntactic analysis is based largely on the identification of grammatical categories: nouns, verbs, adjectives. For English, the major problem is the categorial ambiguity of so many words, as already illustrated with the word light. In essence, the solution is to look for words which are unambiguous as to category and to test all possible syntactic structures. In the case of a sentence such as:

“Prices rose quickly in the market”

Each of the words prices, rose, and market can be either nouns or verbs; however, quickly is unambiguously an adverb and the unambiguously a definite article, and these facts ensure the unambiguous analysis as a phrase structure (5), where prices is identified as a subject noun phrase, in the market as a prepositional phrase, and rose quickly as part of a verb phrase. (Note that this particular analysis is not one necessarily found in any MT system and would not be adopted by many syntax theories.)

Semantic roles and features

 

The recognition of implicit relations may well require access to semantic information. It is common to identify two types: semantic roles and semantic features. By the semantic roles in a structure is meant the specific relationships of nominal elements (entities) to verbal elements (actions or states): a particular noun may be the ‘agent’ of an action, another may be the ‘instrument’ (or means), another may be the ‘recipient’, and another may refer to the ‘location’, and so forth.

Unfortunately, there is no universally agreed set of semantic roles which can be applied without difficulty to any language. Developers of MT systems are usually obliged to draw up their own list. However, the principal difficulty is the identification of roles. In English the main indicators are the prepositions, but these can be ambiguous as to the role expressed; with can indicate instrument, manner or context:

The bottle was opened with a corkscrew

The bottle was opened with difficulty

The bottle was opened with the meal

 

Real world knowledge

 

While semantic features and roles combined with syntactic information can go a long way in resolving ambiguities in the source language and in deciding among translation variants, there are numerous instances where what is apparently needed is knowledge about the things and events being referred to. Take some simple problems of coordination:

pregnant women and children ® des femmes enceintes et des enfants

not: des femmes et des enfants enceintes

Probably all MT systems have difficulties with this kind of construction. An examination of the semantic features of the verbs may suffice on occasions, but in many cases it will not. What seems to be involved is knowledge about human behaviour, the system needs to have some kind of human-like ‘understanding”.

We are led therefore to the argument that good quality translation is not possible without understanding the reality behind what is being expressed, i.e. translation goes beyond the familiar linguistic information: morphology, syntax and semantics.

 

MULTILINGUALITY

One of the most distinctive features of texts produced by MT systems is their unnatural literalness. In general, they adhere too closely to the structures of source texts. Of course, human translators can be guilty of this fault as well – although Newmark (1991) considers literalness to be desirable in literary and authoritative texts, as long as the result is in the appropriate style. However, the aim in technical translation is generally to produce texts which read as if they were originally written in the target language. It is quite evident that MT systems do not achieve this goal. Indeed, it can be argued that they should not aim for idiomaticity of this order, if only because recipients of MT output may be led to assume complete accuracy and fidelity in the translation. It does not need stressing that readability and fidelity do not go hand in hand: a readable translation may be inaccurate, and a faithful translation may be difficult to read.

As we can see multilinguality is the major problem of machine translation, and it will became a major one sine the information society grows more and more, including more and more documents in different languages. Major translation will solve in a way this problem, but as we have seem it would be far from perfection since a machine does not have the “kind of human-like ‘understanding” that allows him to translate proper expressions of a language.

A “lingua franca” can be used, such as English to link all languages together and ease the machine translation, but the enormous quantity of languages makes it an Herculean task, since we should cover all possibilities: English-Spanish, English-French, English-German; and the task is far more great if a base language is not chosen, due to the nearly infinite possible combinations

Machine translation technology has greatly improved in the last decades, but the task is hard and the technology and means at our disposal do not make it easy.

 

Translation technology: it is worth learning it?

Localization is the paradigm of the need for technology, while interpreting and literary translation are examples of the latter. The localization business is intimately connected with the software industry and companies in the field complain about the lack of qualified personnel that combine both an adequate linguistic background and computational skills. This is the reason why the industry (around the LISA association) has taken the lead over educational institutions by proposing courseware standards (the LEIT initiative) for training localization professionals.

Information of many types is rapidly changing format and going digital. Electronic documentation is the adequate realm for the incorporation of translation technology. This is something that young students of translation must learn. As the conception and design of technical documentation becomes progressively influenced by the electronic medium, it is integrating more and more with the whole concept of a software product. The strategies and means for translating both software packages and electronic documents are becoming very similar and both are now, as we will see, the goal of the localization industry.

An important consequence of the popularization of Internet is that the access to information is now truly global and the demand for localizing institutional and commercial Web sites is growing fast. In the localization industry, the utilization of technology is congenital, and developing adequate tools has immediate economic benefits.

The main role of localization companies is to help software publishers, hardware manufacturers and telecommunications companies with versions of their software, documentation, marketing, and Web-based information in different languages for simultaneous worldwide release.

The recent expansion of these industries has considerably increased the demand for translation products and has created a new burgeoning market for the language business. According to a recent industry survey by LISA (the Localization Industry Standards Association), almost one third of software publishers, such as Microsoft, Oracle, Adobe, Quark, etc., generate above 20 percent of their sales from localized products, that is, from products which have been adapted to the language and culture of their targeted markets, and the great majority of publishers expect to be localizing into more than ten different languages.

 

LISA Educational Initiative Taskforce (LEIT)

 

LISA Education Initiative Taskforce (LEIT) is a consortium of schools training translators and computational linguists that was announced in 1998 as an initiative to develop a promotional program for the academic communities in Europe, North America, and Asia.

The main goal of the LEIT initiative is to introduce localization courseware into translation studies, with versions ready for the start of the 1999 academic year.

 

Professor Margaret King of Geneva University described the first step of the project as consisting of the "clarification of the state of affairs and to plan courses that are comprehensive enough to cover all aspects of interest of the localization industry, to review all aspects of the localization industry, from translation and technical writing through globalization, internationalization, and localization". The definition of the critical terms involved was a contentious topic, although there seems to be a consensus with the following:

Globalization: The adaptation of marketing strategies to regional requirements of all kinds (e.g., cultural, legal, and linguistic).

Internationalization: The engineering of a product (usually software) to enable efficient adaptation of the product to local requirements.

Localization: The adaptation of a product to a target language and culture (locale).

 

 

Tools for the industry

 

Machine translation has never been plug-and-play. It requires a huge effort in preparation, evaluation, and maintenance. Suitability of technology depends on many factors, but fundamentally text type. Without these considerations, the technology may be seen as a fiasco. Few informed people still see the original ideal of fully automatic high-quality translation of arbitrary texts as a realistic goal. Translation technology suppliers are now working under the assumption that, rather than batch processes, man-machine interaction together with the integration of tools into the translator's working environment is the solution.

 

    ·        The translation workstation

 

Leaving behind the old conception of a monolithic compact translation engine, the industry is now moving in the direction of integrating systems, This approach for integrating different tools is largely the view advocated by many language-technology specialists. Below is a description of an ideal engine which captures the answers given by Muriel Vasconcellos (from the Pan American Health Organization) and Minako O'Hagan (author of The Coming Age of Teletranslations). The ideal workstation for the translator would combine the following features:

1-     Full integration in the translator's general working environment, which comprises the operating system, the document editor (hypertext authoring, desktop publisher or the standard word-processor), as well as the emailer or the Web browser. These would be complemented with a wide collection of linguistic tools: from spell, grammar and style checkers to on-line dictionaries, and glossaries, including terminology management, annotated corpora, concordances, collated texts, etc.

2-     The system should comprise all advances in machine translation (MT) and translation memory (TM) technologies, be able to perform batch extraction and reuse of validated translations, enable searches into TM databases by various keywords (such as phrases, authors, or issuing institutions). These TM databases could be distributed and accessible through Internet. There is a new standard for TM exchange (TMX) that would permit translators and companies to work remotely and share memories in real-time.

 

 ·        Software localization tool

 

Localization packages are now being designed to assist users throughout the whole life cycle of a multilingual document, unlike traditional translators, software localizers may be engaged in early stages of software development, as there are issues, such as platform portability, code exchange, format conversion, etc. which if not properly dealt with may hinder product internationalisation. Localizers are often involved in the selection and application of utilities that perform code scanning and checking, that automatically isolate and suggest solutions to National Language Support (NLS) issues, which save time during the internationalisation enabling process. There are run-time libraries that enable software developers and localizers to create single-source, multilingual, and portable cross-platform applications. Unicode support is also fundamental for software developers who work with multilingual texts, as it provides a consistent coding format for international character sets.

 

·        Human excellence

 

Having said all this, it is important to reassess the human factor. Like cooks, tailors or architects, professional translators need to become acquainted with technology, because good use of technology will make their jobs more competitive and satisfactory. But they should not dismiss craftsmanship. Technology enhances productivity, but translation excellence goes beyond technology. It is important to delimit the roles of humans and machines in translation. Martin Kay's (1987) words in this respect are most illustrative:

A computer is a device that can be used to magnify human productivity. Properly used, it does not dehumanize by imposing its own Orwellian stamp on the products of human spirit and the dignity of human labor but, by taking over what is mechanical and routine, it frees human beings over what is mechanical and routine. Translation is a fine and exacting art, but there is much about it that is mechanical and routine, if this were given over to a machine, the productivity of the translator would not only be magnified but this work would become more rewarding, more exciting, more human.

 

The complexity of mastering translation

 

Although it may not be perceived at first sight, the complexity of natural language is of an order of magnitude far superior to any purely mechanical process. To how many words should the vocabulary be limited to make the complexity of producing "free sonnets" (that is, any combination of 6 words in 14 verses) comparable to the number of possible chess games? It may be difficult to believe, but the vocabulary should be restricted to 100 words. That is, making free sonnets with 100 words offers as many different alternatives as there are ways of playing a chess game (roughly, 10120).

The number of possibilities would quickly come down if combinations were restricted so that they not only made sense but acquired some sort of poetic value. However, defining formally or mechanically the properties of "make sense" and "have poetic value" is not an easy task. Or at least, it is far more difficult than establishing winning heuristics for a color to succeed in a chess game.

Outside the limits of the mechanical and routine, MT is impracticable and human creativity becomes indispensable. Translators of the highest quality are only obtainable from first-class raw materials and constant and disciplined training. The potentially good translator must be a sensitive, wise, vigilant, talented, gifted, experienced, and knowledgeable person. An adequate use of mechanical means and resources can make a good human translator a much more productive one. Nevertheless, very much like dictionaries and other reference material, technology may be considered an excellent prothesis, but little more than that.

However, even for skilled human translators, translation is often difficult. One clear example is when linguistic form, as opposed to content, becomes an important part of a literary piece. Conveying the content, but missing the poetic aspects of the signifier may considerably hinder the quality of the translation.

 

However, even for skilled human translators, translation is often difficult. One clear example is when linguistic form, as opposed to content, becomes an important part of a literary piece. Conveying the content, but missing the poetic aspects of the signifier may considerably hinder the quality of the translation. This is a challenge to any translator. Jaime de Ojeda's (1989) Spanish translation of Lewis Carroll's Alice in Wonderland illustrates this problem:
 

Twinkle, twinkle, little bat 
how I wonder what you're at! 
Up above the world you fly 
like a tea-tray in the sky.

Brilla, luce, ratita alada 
¿en qué estás tan atareada? 
Por encima del universo vuelas 
como una bandeja de teteras.

Manuel Breva (1996) analyzes the example and shows how Ojeda solves the "formal hurdles" of the original:

            The above lines are a parody of the famous poem "Twinkle, twinkle, little star" by Jane Taylor, which, in Carroll's version, turns into a sarcastic attack against Bartholomew Price, a professor of mathematics, nicknamed "The Bat". Jaime de Ojeda translates "bat" as "ratita alada" for rhythmical reasons. "Murciélago", the Spanish equivalent of "bat", would be hard to fit in this context for the same poetic reasons. With Ojeda's choice of words the Spanish version preserves the meaning and maintains the same rhyming pattern (AABB) as in the original English verse-lines.

 

 

What would the output of any MT system be like if confronted with this fragment? Obviously, the result would be disastrous. Compared with the complexity of natural language, the figures that serve to quantify the "knowledge" of any MT program are absurd: 100,000 word bilingual vocabularies, 5,000 transfer rules.... Well developed systems such as Systran, or Logos hardly surpass these figures. How many more bilingual entries and transfer rules would be necessary to match Ojeda's competence? How long would it take to adequately train such a system? And even then, would it be capable of challenging Ojeda in the way the chess master Kasparov has been challenged? I have serious doubts about that being attainable at all.

But there are other opinions, as is the case of the famous Artificial Intelligence master, Marvin Minsky. Minsky would argue that it is all a matter of time. He sees the human brain as an organic machine, and as such, its behavior, reactions and performance can be studied and reproduced. Other people believe there is an important aspect separating organic, living "machines" from synthetic machines. They would claim that creativity is in life, and that it is an exclusive faculty of living creatures to be creative.

 

 

CONCLUSION

 

As we’ve seen language engineering has become a powerful tool for industry, allowing it to spread more efficiently the worldwide information. The management of this information under the many variable forms and languages is a task that the experts would probably not be able to accomplish in the next years, even decades: multilnguality stills a great obstacle to pass, and the solution, the machine translation, is up to date is a tool of limited success, though the experts are making great efforts to improve it.

Besides the use of the characteristic of the information management will be something inherent to any job in the future, and thus it is not a fool idea to say that a prosperous future awaits those who decide to take a career related to language engineering, machine translation or language technologies.

Concerning the society, it seems that this “sudden technological development” that happened in the last two decades is sensed as a kind of natural way of evolution towards a society perhaps more dependant on machines, but certainly a more performing one.

 

REFERENCES

 

# http://en.wikipedia.org/wiki/Natural_language_processing 

# Language Engineering and the Information Society (Document from I*M Europe)
# Living and Working Together in the Information Society (Discussion Document from HLTCentral).

Information Policy for an Information Society (Paper by Mairéad Browne - caché).

# What is Language Technology, by Hans Uszkoreit 

Joseph Mariani (ed.). 1999. Multilingual Speech Processing (Recognition and Synthesis), in Multilingual Information Management: Current Levels and Future Abilities. http://www.cs.cmu.edu/people/ref/mlim/index.html

# Jay Branegan and Peggy Salz-Trautman. 1996. Information Fatigue Syndrome
# Introduction to Human Language Technologies
# Judith Klavans and Eduard Hovy. 1999. Cross-lingual and Cross-modal Information Retrieval Multilingual Information Management: Current Levels and Future Abilities. http://www.cs.cmu.edu/people/ref/mlim/index.html

# Reflections on the history and present state of machine translation, John Hutchins 

# Translation problems, by D J Arnold

# Why computers do not translate better, by John Hutchins