This report wants to be a review to the Information Society and all the subjects that can be related to it. The idea is to see how language and new technologies are now deeply connected and how they interact with the actual society. Indeed, the development of new technologies has allowed a fast and efficient way to communicate, and are now used broadly. But as the facilities to communicate grows, so grows the problem of multilinguality and machine translation. A hint of a solution may be presented in this report. Priority was given to the data explaining the contents of the report in a clear way in order to easy the reading to those who are not familiar to the subject.
the construction of the first computer, technology has evolved in an exponential
way; every day new technology appears and is improved, allowing amazing feats
otherwise impossible to accomplish. Computers had brought to society new
possible ways of interaction, transforming our lives. People of different
nations can now communicate with a simple “click”, travellers can visit
virtual representations of their destinations and chose whether to go or not.
This has changed the concept of social interaction, destroying ancient social
barriers and raising new ones; Internet has played a crucial role here with the
possibility for every user to share their thoughts with the rest of the world.
In this report we will see how technology influences language and the situation
of both concepts in society.
The report is divided in multiple sections, beginning with the thematic of the information society which includes a brief description of this concept: how new technologies have influenced our society to a point that it has now became an important part of our lives, influencing many aspects, specially the use of language. How the information society is formed is the next point covered, explaining then the concepts of Language Engineering and Human Language Technologies which are the tools that allow the information society to grow and develop. The next part treats briefly the problem of multilinguality along with a possible solution, the machine translation. Nevertheless as we will see machine translation is not flawless, and in fact is very far from perfection: these problems will be explained to as they enlarge our vision of language.
In the European Union, the concept of the Information Society has been evolving strongly over the past few years building on the philosophy originally spelled out by Commissioner Martin Bangemann in 1994. Bangemann argued that the Information Society represents a "revolution based on information ... [which] adds huge new capacities to human intelligence and constitutes a resource which changes the way we work together and the way we live together..." (European Commission, 1994:4). One of the main implications of this "revolution" for Bangemann is that the Information Society can secure badly needed jobs (Europe and the Global Information Society, 1994:3). In other words, a driving motivation for the Information Society is the creation of employment for depressed economies.
The Goldsworthy report sees the Information Society as a "societal revolution based around information and communication technologies and about the role of these in developing global competitiveness and managing the transition to a globalised free trade world" (Department of Industry, Science and Tourism, 1997). In short, Goldsworthy's idea of the Information Society is entirely an economic one. At a broader level Barry Jones, the author of the House of Representatives Standing Committee's 1991 report 'Australia as a Information Society' sets out a definition of the Information Society which sees it as simply "a period when use of time, family life, employment, education and social interaction are increasingly influenced by access to Information Technology" (Australia as an Information Society: Grasping New Paradigms, 1991).
These are just a few examples of ideas underpinning information policy drives in the developed world where the concept is accepted almost without challenge, and there is an inherent belief that like the Olympics, the Information Society is real - or will be very soon if only we can get ourselves organised properly. Some claim, of course, that the Information Society is here already and not just on its way. But one way or the other "it" exists and is a "good thing". By and large, national and regional Information Society documents do not question the belief that the Information Society will bring prosperity and happiness if a few basic safeguards are put in place. Some of the very few notes of serious caution in the practice of information policy have come through the influence of the Scandinavian countries which joined the European Union when the EU was already in full flight with implementing the actions flowing from the Bangemann report. Interestingly, in recent travels in India I noticed an extraordinary level of hope and trust in that developing country in the potential of information technology to transform India into a modern fully developed economy. The push to develop information and technological infrastructure initiated by Rajiv Gandhi is seen as positive and a necessary step for the goal of a universally prosperous society in India. Effectively there is the same acceptance of the goodness of an Information Society and the absolute necessity to be one, that is found in the West.
Given this blind faith in the existence and the desirability of an Information Society among diverse nations, it is instructive to look at the theoretical literature which has spawned the idea to see what it claims for the Information Society. The term Information Society has many synonyms: Information Age, Information Revolution, Information Explosion and so on and it is found across a wide spectrum of disciplines. Fortunately the task of unravelling many of these ideas has been accomplished in a masterly way by Frank Webster. He has categorised the variety of concepts of the Information Society, Information Revolution, or whatever, and provided an analysis of five common conceptions of the Information Society (Webster, 1995).
The initial reports focuses on the convergence of computers and telecommunications and the capacity for storage, manipulation and transmission of vast amounts of data. The problem is, however, that drawing a direct line between the presence of information technology with some sort of new society is hard to justify. Will the presence of say, a computer in every home, make us an Information Society? Or should that be two computers? At what point will we know we've arrived? What changes in our fundamental institutions, ways of living and working characterises an Information Society, as opposed to a non- Information Society? A further weakness of this concept is highlighted by the many commentators who point out the dangers of technological determinism in thinking about the Information Society and reject the view that technology impacts on society and is the prime agent of change, defining the social world
This concept of the Information Society has been built on Fritz Machlup's seminal study of the size and effect of the US information industries in the 1960s; he demonstrated that education, the media, computing, information services (including insurance, law and other information based professions), R+D and so on accounted for some 30% of GNP.
Entrancing as it is to have numbers to quote in support of the importance of information in the economy, it is difficult to argue that the existence of lots of information activities in society actually impacts on social life, without moving to an analysis of the substance or quality of that information. In any event, what matters, surely, is not the amount but the meaning and value of information. Some econometric studies suggest that the early experimental exponential growth of information activities as a proportion of economic activities has actually slowed down with little change from 1958 to 1980.
This idea of the Information Society rests on the idea that in an Information Society the dominant category of worker is engaged as an "information worker". Many commentators have produced data to demonstrate growth patterns in the need for more workers who will use their brain rather than their brawn. Daniel Bell's influential 'Coming of the Post-Industrial Society' argued that the professional and technical classes would dominate in the new era with work organised around theoretically based knowledge for the purpose of social control and directing of innovation and change (Bell 1974: 15-20).
The challenge was to find a way of saying definitively whether a job was predominantly an information professional's job or not since it appears that in all works there’s a certain amount of information processing,
There was a time when policy was clearly the business of the public sector and was essentially about "what governments choose to do and what not to do" (Dye 1995). The trouble now is that the edges of the public and private spheres are becoming more difficult to distinguish as has been amply demonstrated by papers in this strand of the Conference. It is interesting that the field of information studies has in some ways anticipated this development as it has accepted the place of private sector organisational policy on information matters to be recognised as "information policy" even though, at least traditionally, these policies were turned inwards to the support of organisational roles.
With the general global drive to interweave public and private sector activities within market-led, neo-liberal frameworks the burgeoning information and IT infrastructure within governments cannot be considered adequately without looking the interaction of public and private sectors. The private sector can have monumental effects on what governments can do with information for their own use or in the context of making information available to the community at large. Take, for example, the decision to concentrate Microsoft and Apple interests. This cannot but impact on government through the extension of control of the IT and software industries. This effect is even more pointed when governments operate along strictly market philosophies and for-profit activities are incorporated in the government sector.
Some understanding of how the fusion of public and private impacts on information policy can be gained from Nick Moore's analysis of Western and East Asian information policy implementation strategies (Moore, 1997). Moore argues that there are two broad approaches to information policy formation. One, the neo-liberal, puts its trust in the market to move society along towards the Information Society. The European Union policies illustrate this particularly well as there the basic tenet of information policy is the belief that the achievement of the Information Society "is a task for the private sector" with the role of government confined to ensuring a supportive regulatory climate and a refocusing of current public expenditure patterns.
What is information ?
The task of define the concept of information has not been an easy one and nowadays the term still remains a bit ambiguous. Some terms like “data” or “knowledge” are often used as synonyms while sometimes they only follow fashions in their usages but the nature of information can be focused in two concepts: firstly there are those who see information as a tangible entity which can be processed, moved, changed and so on; and then there’s those who see information as existing only in the human brain, the result of absorption of symbols and signs. In this approach information is seen as subjective and ambiguous with no "reality" so that it can be understood only in terms of process and how it changes people, or through its use or impact on individual action.
Fortunately, there has been some useful work which has linked some of the many conceptions of information in a framework which can guide the policy maker. Sandra Braman has outlined four main categories of information to be considered in policy making:
4. Information as a constitutive force in society: In this framework information is seen as having power in its own right and a capacity to shape context. Its capacity to change individuals and societies comes into play with the idea that "information is power" falling squarely into this set of beliefs.
Braman argues that effective information policy must consider information at all level of her hierarchy. Few information policies do this although the European drive to underpin its Information Society policy with a philosophy of "putting people in charge of information" and viewing the "Information Society as a "Learning Society" based on know-how and wisdom of people, not on information in machines" suggests a broader perspective than many information policy initiatives including our own
David Lewis coined the term "information fatigue syndrome" for what he expects will soon be a recognized medical condition that touches specially administrators that must deal with enormous amounts of data issued from the information society. Lewis claims that those problems will get worse with the increasing use of internet, and can cause mental anguish and even physical illness.
This state of mind and body is caused not by a wrong administration but by a continual flow of information whose debit is born to be increased by the enormous facility that the information facility has to create, share and move information around the world.
The only solution to this is a specific training of individuals in the information management area, where they should be able to discern relevant from non-relevant data.
HUMAN LANGUAGE TECHNOLOGY AND
is the natural means of human communication; the most effective way we have to
express ourselves to each other. We use language in a host of different
ways: to explain complex ideas and concepts; to manage human resources; to
negotiate; to persuade; to make our needs known; to express our feelings; to
narrate stories; to record our culture for future generations; and to create
beauty in poetry and prose. For
most of us language is fundamental to all aspects of our lives. The use of
language is currently restricted. In the main, it is only used in direct
communications between human beings and not in our interactions with the systems,
services and appliances which we use every day of our lives. Even between humans,
understanding is usually limited to those groups who share a common language. In
this respect language can sometimes be seen as much a barrier to communication
as an aid.
change is taking place which will revolutionise our use of language and greatly
enhance the value of language in every aspect of communication. This change is
the result of developments in Language Engineering.
Language Engineering provides ways in which we can extend and improve our use of language to make it a more effective tool. It is based on a vast amount of knowledge about language and the way it works, which has been accumulated through research. It uses language resources, such as electronic dictionaries and grammars, terminology banks and corpora, which have been developed over time. The research tells us what we need to know about language and develops the techniques needed to understand and manipulate it. The resources represent the knowledge base needed to recognise, validate, understand, and manipulate language using the power of computers. By applying this knowledge of language we can develop new ways to help solve problems across the political, social, and economic spectrum.
Our ability to develop our use of language holds the key to the multi-lingual information society; the European society of the future. New developments in Language Engineering will enable us to:
is language engineering?
Language Engineering is the application of knowledge of language to the development of computer systems which can recognise, understand, interpret, and generate human language in all its forms. In practice, Language Engineering comprises a set of techniques and language resources. The former are implemented in computer software and the latter are a repository of knowledge which can be accessed by computer software.
Basic processes of a Language Engineering System
entering material into the computer, using speech, printed text or
handwriting, or text either keyed in or introduced electronically
recognising the language of the material, distinguishing separate words,
for example, recording it in symbolic form and validating it
building an understanding of the meaning of the material, to the
appropriate level for the particular application
using this understanding in an application such as transformation (e.g.
speech to text), information retrieval, or human language translation
generating the medium for presenting the results of the application
finally, presenting the results to human users via a display of some kind:
a printer or a plotter; a loud speaker or the telephone.
The techniques that are used:
Lexicons: A lexicon is a repository of words and knowledge about those
words. This knowledge may include details of the grammatical structure of each
word (morphology), the sound structure (phonology), the meaning of the word in
different textual context
Specialist lexicons: these lexicons are usually researched and produced
separately from general purpose lexicons, usually related to proper names,
terminology and wordnets.
Grammar: A grammar describes the structure of a language at different
levels: word (morphological grammar), phrase, sentence, etc. A grammar can deal
with structure both in terms of surface (syntax) and meaning (semantics and
A corpus is a body of language, either text or speech, which provides the
basis for the analysis of language to establish its characteristics, to train a
machine, usually to adapt its behaviour to particular circumstances, to verify
empirically a theory concerning language and to set a test for a Language
Engineering technique or application to establish how well it works in practice.
practice, Language Engineering is applied at two levels. At the first level
there are a number of generic classes of application, such as:
the second level, these enabling applications are applied to real world problems
across the social and economic spectrum. So, for example:
general, language capability is embedded in systems to enhance their performance.
Language Engineering is an 'enabling technology'.
technologies can be applied to a wide range of problems in business and
administration to produce better, more effective solutions. They can also be
used in education, to help the disabled, and to bring new services both to
organisations and to consumers. There are a number of areas where the impact is
success increasingly depends on the ability to compete in a global marketplace.
Success is based on the ability to identify markets, sell into them effectively
and provide the quality of aftersales service expected by customers. There are
many areas where the application of Language Engineering can lead to greater
efficiency and reduced costs such as the generation of business letters, the
production and management of multi-lingual customer documentation, in-line
translation of electronic communications or the provision of computer aided
of the key features of an information service is its ability to deliver
information which meets the immediate, real needs of its client in a focused way.
It is not sufficient to provide information which is broadly in the category
requested, in such a way that the client must sift through it to extract what is
useful. Equally, if the way that the information is extracted leads to important
omissions, then the results are at best inadequate and at worst they could be
Engineering can improve the quality of information services by using techniques
which not only give more accurate results to search requests, but also increase
greatly the possibility of finding all the relevant information available. Use
of techniques like concept searches, i.e. using a semantic analysis of the
search criteria and matching them against a semantic analysis of the database,
give far better results than simple keyword searches.
Direct access to services
from the economic advantage of automating services to provide 'around the clock'
availability, it also removes the need for people to work long and unsociable
hours to provide the necessary coverage. Services are likely to be more
consistent, fast, and reliable. In addition the automatic recording of an audit
trail for each transaction will mean that each party to the transaction can feel
confident about its outcome.
Commerce in marketplaces.
of the actions involved in a business transaction, such as ordering, invoicing,
and sending payment instructions to the bank, can be completed without the need
for human intervention using, for example, EDI (Electronic Data Interchange)
technology. However, at the present time, most business transactions are
initiated by a dialogue between humans either on the telephone, in writing, or
face-to-face. With improvements in the availability of telematics services and
with the increasing use of the Internet and the World Wide Web, opportunities to
automate more activities in the commercial cycle (see illustration below) have
increased. Language enabled software will play a prominent role in making this
automation easier to use and more effective.
time, electronic commerce will change the business model itself. There will be
less need for middlemen. New and small enterprises will be able to make the
world aware of their products and services quickly, effectively and without too
much expense. However, without language understanding and multi-lingual
capability, these benefits cannot be fully realised.
the application of language knowledge enables better support for translators,
with electronic dictionaries, thesauri, and other language resources, and
eventually when high quality machine translation becomes a reality, so the
barriers will be lowered. Agreements at all levels, whether political or
commercial, will be better drafted more quickly in a variety of languages.
International working will become more effective with a far wider range of
individuals able to contribute. An example of a project which is successfully
helping to improve communications in Europe is one which interconnects many of
the police forces of northern Europe using a limited, controlled language which
can be automatically translated, in real-time. Such a facility not only helps in
preventing and detecting international crime, but also assists the emergency
services to communicate effectively during a major incident.
Accessibility and participation
of the most important ways in which Language Engineering will have a significant
impact is in the use of human language, especially speech, to interface with
machines. This improves the usability of systems and services. It will also help
to ensure that services can be used not just by the computer literate but by
ordinary citizens without special training. This aspect of accessibility is
fundamental to a democratic, open, and equitable society in the Information Age.
with the capacity to communicate with their users interactively, through human
language, available either through access points in public places or in the home,
via the telephone network or TV cables, will make it possible to change the
nature of our democracy. There will be a potential for participation in the
decision-making process through a far greater availability of information in
understandable and 'objective' form and through opinion gathering on a very
large scale. Many people whose lives are affected by disability can be helped
through the application of language technology. Computers with an understanding
of language, able to listen, see and speak, will offer new opportunities to
access services at home and participate in the workplace.
Improved education opportunities
learning has become an important part of the provision of education services. It
is especially important to the concept of 'life-long learning' which is expected
to become an important feature of life in the Information Age. The effectiveness
of distance learning and self-study is improved by using telematics services and
computer aided learning.The quality and success of computer aided learning can
be greatly enhanced by the use of Language Engineering techniques.
Entertaining, leisure and creativity
Computer games as
well as films benefits from the language engineering and may became an 'edutainment' thanks to the subtitles: our children will
learn to develop their language capabilities thanks to these improvements, and
for a wider range of people, writing can become a more exciting activity.
Authoring tools will make it possible for them to achieve much higher quality
OF MACHINE TRANSLATION
development of natural language applications which handle multi-lingual and
multi-modal information is the next major challenge facing the field of
computational linguistics. Over the past 50 years, a variety of language-related
capabilities has been developed in areas such as machine translation,
information retrieval, and speech recognition, together with core capabilities
such as information extraction, summarization, parsing, generation, multimedia
planning and integration, statistics-based methods, ontologies, lexicon
construction and lexical representations, and grammar. The next few years will
require the extension of these technologies to encompass multi-lingual and multi-modal
technologies will require integration of the various capabilities into multi-functional
natural language systems. However, there is today no clear vision of how these
technologies could or should be assembled into a coherent framework. What would
be involved in connecting a speech recognition system to an information
retrieval engine, and then using machine translation and summarization software
to process the retrieved text? How can traditional parsing and generation be
enhanced with statistical techniques? What would be the effect of carefully
crafted lexicons on traditional information retrieval?
Why machine translation seems to be so
question to be asked is therefore why some problems are more difficult for
computers to deal with than others? With this knowledge, users should be able to
understand why when 'post-editing’ certain types of ‘mistakes’ need to be
constantly corrected, why when ‘pre-editing’ texts or composing in
controlled languages’ certain types of ambiguity and constructions must always
be avoided, and why in ‘interactive’ systems certain types of questions
recur again and again.
methods for dealing with translation difficulties vary from system to system. In
many cases, the ambiguities specific to the source language are tackled in
operations separate from the treatment of differences between languages.
Commonly three basic operations are recognised: the analysis of the source text,
the bilingual transfer of lexical items and structures and the generation of the
target text. Questions of ambiguity and choice occur at every stage. For example,
resolving the ambiguity of English cry between ‘weep’ and ‘shout’
would be part of a program for the analysis of English. On the other hand,
the selection of connaître or savoir in French for the English
verb know would be a matter for a separate transfer program.
Analysis involves also the identification and disambiguation of structures, e.g.
whether He saw her shaking hands means that he saw someone who was
welcoming a visitor or he saw someone who was suffering from the cold weather.
Transfer likewise can involve changes of structure, e.g. from an English
infinitival construction He likes to swim to a German adverbial
construction Er schwimmt gern. Generation is often incorporated in
transfer operations, but when a separate component it might include operations
to distinguish between English big, large and great (about which more
later) and the production of correct morphology and word order in the target
language (ses mains tremblantes, er darf nicht schwimmen).
Methods of analysis and transfer
translation is a problem-solving activity, choices have to be made continually.
The assumption in MT systems, whether fully or partially automatic, is that
there are sufficiently large areas of natural language and of translation
processes that can be formalised for treatment by computer programs. The basic
premise is therefore that the differences between languages can to some extent
be regularised. What this means at the practical level is that problems of
selection can be resolved by clearly definable procedures. The major task for MT
researchers and developers is to determine what information is most effective in
particular situations, what kind of information is appropriate in particular
circumstances, and whether some data should be given greater weight than others.
based on specific words are the easiest to apply and are capable of the highest
degree of precision. At the same time, however, there is inflexibility since
there is no allowance for inflected variation of forms or for the least
variation of word order. Three examples will be analysed:
relation of one word with another may imply a disambiguation of the meaning of
the compound. Thus many MT systems include entries for compounds such as light
ship and light bulb; and indicate directly the target language
equivalent (French ampoule, German Glühbirne).
The perceived difficulty of idioms is that the individual words take on meanings
and connotations which they do not have in their literal usages. However, it is
precisely because most idioms are relatively fixed expressions, consisting of
the same words in the same sequence, that they can be easily translated into
comparable idioms – or if none exist into a literal equivalent. Idioms can in
fact be treated very much like any compound.
The same approach can be taken with many metaphorical usagese.g. mouth of
river, branch of a bank, flow of ideas, channel of communication, tide of
opinion, foot of the mountain, leg of the table. Like idioms, metaphors of
this kind can be treated as fixed compound expressions. We may note that among
the European languages there is a common thread of similar formations, so that
even if a metaphorical usage is not recorded in the dictionary, it may be
possible to produce a ‘literal’ translation which has the same metaphorical
advantage of treating certain word combinations as fixed expressions and
translating them as units is the considerable saving in processing, particularly
the analysis of syntactic structure, and the assurance that the target output
will be guaranteed to be correct. There are disadvantages also, however, since
idioms can vary in structure, and variation is very common for ‘idiomatic’
phrasal verbs (2). In other words the identification of idiomatic expressions
must often involve morphological and syntactic analysis.
is a truism to say that one of the most straightforward operations of any MT
system should be the identification and generation of morphological variants of
nouns and verbs. There are basically two types of morphology in question:
inflectional morphology, as illustrated by the familiar verb and noun paradigms
(French marcher, marche, marchons, marchait, est marché, etc.), and
derivational morphology, which is concerned with the formation of nouns from
verb bases, verbs from noun forms, adjectives from nouns, and so forth, e.g. nation,
nationalism, nationalise, nationalisation, and equivalents in other
should be stressed that any MT system should as a minimum be capable of
recognising morphological forms and of generating them correctly. However, the
alignment of equivalences between the verb forms between languages is another
matter, particularly when modal forms are involved (must, might, devoir,
falloir, mögen, dürfen, etc.). In general, a MT system which cannot go
beyond morphological analysis will produce little more than word for word
translations. It may cope well with compounds and other fixed expressions, it
may deal adequately with noun and verb forms in certain cases, but the omission
of any treatment of word order will give poor results.
analysis is based largely on the identification of grammatical categories: nouns,
verbs, adjectives. For English, the major problem is the categorial ambiguity of
so many words, as already illustrated with the word light. In essence,
the solution is to look for words which are unambiguous as to category and to
test all possible syntactic structures. In the case of a sentence such as:
“Prices rose quickly in the market”
of the words prices, rose, and market can be either nouns
or verbs; however, quickly is unambiguously an adverb and the unambiguously a
definite article, and these facts ensure the unambiguous analysis as a phrase
structure (5), where prices is identified as a subject noun phrase, in
the market as a prepositional phrase, and rose quickly as part of a
verb phrase. (Note that this particular analysis is not one necessarily found in
any MT system and would not be adopted by many syntax theories.)
Semantic roles and features
recognition of implicit relations may well require access to semantic
information. It is common to identify two types: semantic roles and semantic
features. By the semantic roles in a structure is meant the specific
relationships of nominal elements (entities) to verbal elements (actions or
states): a particular noun may be the ‘agent’ of an action, another may be
the ‘instrument’ (or means), another may be the ‘recipient’, and another
may refer to the ‘location’, and so forth.
Unfortunately, there is no
universally agreed set of semantic roles which can be applied without difficulty
to any language. Developers of MT systems are usually obliged to draw up their
own list. However, the principal difficulty is the identification of roles. In
English the main indicators are the prepositions, but these can be ambiguous as
to the role expressed; with can indicate instrument, manner or context:
bottle was opened with a corkscrew
bottle was opened with difficulty
bottle was opened with the meal
Real world knowledge
While semantic features
and roles combined with syntactic information can go a long way in resolving
ambiguities in the source language and in deciding among translation variants,
there are numerous instances where what is apparently needed is knowledge about
the things and events being referred to. Take some simple problems of
pregnant women and
des femmes enceintes et des enfants
des femmes et des enfants enceintes
all MT systems have difficulties with this kind of construction. An examination
of the semantic features of the verbs may suffice on occasions, but in many
cases it will not. What seems to be involved is knowledge about human behaviour,
the system needs to have some kind of human-like ‘understanding”.
are led therefore to the argument that good quality translation is not possible
without understanding the reality behind what is being expressed, i.e.
translation goes beyond the familiar linguistic information: morphology, syntax
of the most distinctive features of texts produced by MT systems is their
unnatural literalness. In general, they adhere too closely to the structures of
source texts. Of course, human translators can be guilty of this fault as well
– although Newmark (1991) considers literalness to be desirable in literary
and authoritative texts, as long as the result is in the appropriate style.
However, the aim in technical translation is generally to produce texts which
read as if they were originally written in the target language. It is quite
evident that MT systems do not achieve this goal. Indeed, it can be argued that
they should not aim for idiomaticity of this order, if only because recipients
of MT output may be led to assume complete accuracy and fidelity in the
translation. It does not need stressing that readability and fidelity do not go
hand in hand: a readable translation may be inaccurate, and a faithful
translation may be difficult to read.
we can see multilinguality is the major problem of machine translation, and it
will became a major one sine the information society grows more and more,
including more and more documents in different languages. Major translation will
solve in a way this problem, but as we have seem it would be far from perfection
since a machine does not have the “kind of human-like ‘understanding” that
allows him to translate proper expressions of a language.
“lingua franca” can be used, such as English to link all languages together
and ease the machine translation, but the enormous quantity of languages makes
it an Herculean task, since we should cover all possibilities: English-Spanish,
English-French, English-German; and the task is far more great if a base
language is not chosen, due to the nearly infinite possible combinations
translation technology has greatly improved in the last decades, but the task is
hard and the technology and means at our disposal do not make it easy.
Translation technology: it is worth
is the paradigm of the need for technology, while interpreting and literary
translation are examples of the latter. The localization business is intimately
connected with the software industry and companies in the field complain about
the lack of qualified personnel that combine both an adequate linguistic
background and computational skills. This is the reason why the industry (around
the LISA association) has taken the lead over educational institutions by
proposing courseware standards (the LEIT initiative) for training localization
of many types is rapidly changing format and going digital. Electronic
documentation is the adequate realm for the incorporation of translation
technology. This is something that young students of translation must learn. As
the conception and design of technical documentation becomes progressively
influenced by the electronic medium, it is integrating more and more with the
whole concept of a software product. The strategies and means for translating
both software packages and electronic documents are becoming very similar and
both are now, as we will see, the goal of the localization industry.
important consequence of the popularization of Internet is that the access to
information is now truly global and the demand for localizing institutional and
commercial Web sites is growing fast. In the localization industry, the
utilization of technology is congenital, and developing adequate tools has
immediate economic benefits.
The main role of localization companies is to help software publishers, hardware manufacturers and telecommunications companies with versions of their software, documentation, marketing, and Web-based information in different languages for simultaneous worldwide release.
recent expansion of these industries has considerably increased the demand for
translation products and has created a new burgeoning market for the language
business. According to a recent industry survey by LISA (the Localization
Industry Standards Association), almost one third of software publishers, such
as Microsoft, Oracle, Adobe, Quark, etc., generate
above 20 percent of their sales from localized products, that is, from products
which have been adapted to the language and culture of their targeted markets,
and the great majority of publishers expect to be localizing into more than ten
Educational Initiative Taskforce (LEIT)
Education Initiative Taskforce (LEIT) is a consortium of schools training
translators and computational linguists that was announced in 1998 as an
initiative to develop a promotional program for the academic communities in
Europe, North America, and Asia.
main goal of the LEIT initiative is to introduce localization courseware into
translation studies, with versions ready for the start of the 1999 academic year.
Margaret King of Geneva University described the first step of the project as
consisting of the "clarification of the state of affairs and to plan
courses that are comprehensive enough to cover all aspects of interest of the
localization industry, to review all aspects of the localization industry, from
translation and technical writing through globalization, internationalization,
and localization". The definition of the critical terms involved was a
contentious topic, although there seems to be a consensus with the following:
The adaptation of marketing strategies to regional requirements of all kinds (e.g.,
cultural, legal, and linguistic).
The engineering of a product (usually software) to enable efficient adaptation
of the product to local requirements.
The adaptation of a product to a target language and culture (locale).
translation has never been plug-and-play. It requires a huge effort in
preparation, evaluation, and maintenance. Suitability of technology depends on
many factors, but fundamentally text type. Without these considerations, the
technology may be seen as a fiasco. Few informed people still see the original
ideal of fully automatic high-quality translation of arbitrary texts as a
realistic goal. Translation technology suppliers are now working under the
assumption that, rather than batch processes, man-machine interaction together
with the integration of tools into the translator's working environment is the
behind the old conception of a monolithic compact translation engine, the
industry is now moving in the direction of integrating systems, This approach
for integrating different tools is largely the view advocated by many language-technology
specialists. Below is a description of an ideal engine which captures the
answers given by Muriel Vasconcellos (from the Pan American Health Organization)
and Minako O'Hagan (author of The Coming Age of Teletranslations). The
ideal workstation for the translator would combine the following features:
integration in the translator's general working environment, which comprises the
operating system, the document editor (hypertext authoring, desktop publisher or
the standard word-processor), as well as the emailer or the Web browser. These
would be complemented with a wide collection of linguistic tools: from spell,
grammar and style checkers to on-line dictionaries, and glossaries, including
terminology management, annotated corpora, concordances, collated texts, etc.
should comprise all advances in machine translation (MT) and translation memory
(TM) technologies, be able to perform batch extraction and reuse of validated
translations, enable searches into TM databases by various keywords (such as
phrases, authors, or issuing institutions). These TM databases could be
distributed and accessible through Internet. There is a new standard for TM
exchange (TMX) that would permit translators and companies to work remotely and
share memories in real-time.
Localization packages are now being designed to assist users throughout the whole life cycle of a multilingual document, unlike traditional translators, software localizers may be engaged in early stages of software development, as there are issues, such as platform portability, code exchange, format conversion, etc. which if not properly dealt with may hinder product internationalisation. Localizers are often involved in the selection and application of utilities that perform code scanning and checking, that automatically isolate and suggest solutions to National Language Support (NLS) issues, which save time during the internationalisation enabling process. There are run-time libraries that enable software developers and localizers to create single-source, multilingual, and portable cross-platform applications. Unicode support is also fundamental for software developers who work with multilingual texts, as it provides a consistent coding format for international character sets.
· Human excellence
said all this, it is important to reassess the human factor. Like cooks, tailors
or architects, professional translators need to become acquainted with
technology, because good use of technology will make their jobs more competitive
and satisfactory. But they should not dismiss craftsmanship. Technology enhances
productivity, but translation excellence goes beyond technology. It is important
to delimit the roles of humans and machines in translation. Martin Kay's (1987)
words in this respect are most illustrative:
A computer is a device that can be used to magnify human productivity. Properly used, it does not dehumanize by imposing its own Orwellian stamp on the products of human spirit and the dignity of human labor but, by taking over what is mechanical and routine, it frees human beings over what is mechanical and routine. Translation is a fine and exacting art, but there is much about it that is mechanical and routine, if this were given over to a machine, the productivity of the translator would not only be magnified but this work would become more rewarding, more exciting, more human.
it may not be perceived at first sight, the complexity of natural language is of
an order of magnitude far superior to any purely mechanical process. To how many
words should the vocabulary be limited to make the complexity of producing
"free sonnets" (that is, any combination of 6 words in 14 verses)
comparable to the number of possible chess games? It may be difficult to believe,
but the vocabulary should be restricted to 100 words. That is, making free
sonnets with 100 words offers as many different alternatives as there are ways
of playing a chess game (roughly, 10120).
number of possibilities would quickly come down if combinations were restricted
so that they not only made sense but acquired some sort of poetic value. However,
defining formally or mechanically the properties of "make sense" and
"have poetic value" is not an easy task. Or at least, it is far more
difficult than establishing winning heuristics for a color to succeed in a chess
Outside the limits of the mechanical and routine, MT is impracticable and human creativity becomes indispensable. Translators of the highest quality are only obtainable from first-class raw materials and constant and disciplined training. The potentially good translator must be a sensitive, wise, vigilant, talented, gifted, experienced, and knowledgeable person. An adequate use of mechanical means and resources can make a good human translator a much more productive one. Nevertheless, very much like dictionaries and other reference material, technology may be considered an excellent prothesis, but little more than that.
even for skilled human translators, translation is often difficult. One clear
example is when linguistic form, as opposed to content, becomes an important
part of a literary piece. Conveying the content, but missing the poetic aspects
of the signifier may considerably hinder the quality of the translation.
even for skilled human translators, translation is often difficult. One clear
example is when linguistic form, as opposed to content, becomes an important
part of a literary piece. Conveying the content, but missing the poetic aspects
of the signifier may considerably hinder the quality of the translation. This is
a challenge to any translator. Jaime de Ojeda's (1989) Spanish translation of
Lewis Carroll's Alice in Wonderland illustrates this problem:
twinkle, little bat
Brilla, luce, ratita alada
Manuel Breva (1996)
analyzes the example and shows how Ojeda solves the "formal hurdles"
of the original:
The above lines are a parody of the famous poem "Twinkle, twinkle, little star" by Jane Taylor, which, in Carroll's version, turns into a sarcastic attack against Bartholomew Price, a professor of mathematics, nicknamed "The Bat". Jaime de Ojeda translates "bat" as "ratita alada" for rhythmical reasons. "Murciélago", the Spanish equivalent of "bat", would be hard to fit in this context for the same poetic reasons. With Ojeda's choice of words the Spanish version preserves the meaning and maintains the same rhyming pattern (AABB) as in the original English verse-lines.
would the output of any MT system be like if confronted with this fragment?
Obviously, the result would be disastrous. Compared with the complexity of
natural language, the figures that serve to quantify the "knowledge"
of any MT program are absurd: 100,000 word bilingual vocabularies, 5,000
transfer rules.... Well developed systems such as Systran, or Logos
hardly surpass these figures. How many more bilingual entries and transfer rules
would be necessary to match Ojeda's competence? How long would it take to
adequately train such a system? And even then, would it be capable of
challenging Ojeda in the way the chess master Kasparov has been challenged? I
have serious doubts about that being attainable at all.
there are other opinions, as is the case of the famous Artificial Intelligence
master, Marvin Minsky. Minsky would argue that it is all a matter of time. He
sees the human brain as an organic machine, and as such, its behavior, reactions
and performance can be studied and reproduced. Other people believe there is an
important aspect separating organic, living "machines" from synthetic
machines. They would claim that creativity is in life, and that it is an
exclusive faculty of living creatures to be creative.
we’ve seen language engineering has become a powerful tool for industry,
allowing it to spread more efficiently the worldwide information. The management
of this information under the many variable forms and languages is a task that
the experts would probably not be able to accomplish in the next years, even
decades: multilnguality stills a great obstacle to pass, and the solution, the
machine translation, is up to date is a tool of limited success, though the
experts are making great efforts to improve it.
the use of the characteristic of the information management will be something
inherent to any job in the future, and thus it is not a fool idea to say that a
prosperous future awaits those who decide to take a career related to language
engineering, machine translation or language technologies.
# Language Engineering and the Information Society (Document from I*M Europe)
# Living and Working Together in the Information Society (Discussion Document from HLTCentral).
Information Policy for an Information Society (Paper by Mairéad Browne - caché).
# What is Language Technology, by Hans Uszkoreit
Joseph Mariani (ed.). 1999. Multilingual Speech Processing (Recognition and Synthesis), in Multilingual Information Management: Current Levels and Future Abilities. http://www.cs.cmu.edu/people/ref/mlim/index.html
# Jay Branegan and Peggy Salz-Trautman. 1996. Information Fatigue Syndrome
# Introduction to Human Language Technologies
# Judith Klavans and Eduard Hovy. 1999. Cross-lingual and Cross-modal Information Retrieval Multilingual Information Management: Current Levels and Future Abilities. http://www.cs.cmu.edu/people/ref/mlim/index.html
# Reflections on the history and present state of machine translation, John Hutchins
# Translation problems, by D J Arnold
# Why computers do not translate better, by John Hutchins