“ HUMAN LANGUAGE TECHNOLOGIES AND THEIR ROLE IN THE INFORMATION SOCIETY”

 

Abstract.

This report contains a list of questions copied and pasted from the on-line documentation provided by professor Abaitua. The main issue discussed is Human Language Technologies and their role in the Information Society: problems that occur when translating automatically, methods to solve those problems and so on.

 

Introduction.

This report has been done following a very concrete methodology that consists on a series of questions given by professor Abaitua (published on-line) every week, with the links in which we were going to find the answer to those questions.

 

The objective here is to learn how to use the tools that the new technologies offer us and how to apply them to our profession as linguists. Traditional translations may still be improved with these of these tools. In fact, in a rather proximate future every single translator would necessarily account with the help of these tools for better translations _ better quality of the translations, efficacy, updated sources for translation (dictionaries on-line etc), easy access to such materials. However, these professional users of new technologies, should not lack of a good training on the subject. With such perspectives then, students should be aware of the importance of these subject during their early studies on Translation, literature, etc, in order to have a natural fluency on the use of these new technologies before they graduate. 

 In this way, the materials included in the report are very varied.

 

 

The first week: Students contact with the materials (on-line)

 

Second week: Language technologies and the Information Society.

Third week: Information overload and methods to improve data-management.

Fourth week: Language technology and engineering

Fifth week: Multilinguality:  review of translation technology and resolution of problems.

Sixth week:  Machine translation (MT): history, systems, methods,, etc.

Seventh week: Machine Translation II. Multilingual resources

 

Conclusion.

With the realization of this report on Human Language Technologies and their role in the Information Society students would find that they had acquired enough information to understand the importance of using the tools that New technologies offer today. During this course, students have learned about the content of the subject but also to improve their skills on computing. The fact that the subject in computer-based brings the student closer to what they are meant to learn in the course: New technologies are taught through the new technology tool, internet.

All in all, it is a very positive course _necessary in this field of “Filología Inglesa” where students deal with dictionaries every day. With the knowledge acquired in this course students will be able to begin to become use to the new systems in order to have an extra help with their translation, essay, reports etc. It is very useful in many ways: from knowing how to make a better use of the computer and internet (and its wide range of possibilities, like using on-line dictionaries, sending e-mails, finding information by means of new technologies, for every day’s things use, etc).

 

Questions.

 

         Language Technologies and

the Information Society

 

 

1.     WHAT IS THE “INFORMATION SOCIETY”?

 

 

This is the information age and we are a society in which information is vital to economic, social, and political success as well as to our quality of life.

http:/www.serv-inf.deusto.es/abaitua/konzeptu/nlp/echo/infoage.html

 

The information society will permeate virtually every area of life involving interactions between people and organisations, in both the public and the private spheres.

http://europa.eu.int/en/record/white/c93700/ch01_1.html

La Unión Europea ha abierto sus servidores www en Internet y actualmente sus bases de datos se pueden consultar a través de la Red.

 

A new “information society” is emerging, in which management, quality and speed of information are the key factors for competitiveness: as an input to industry as a whole and as a service provided to ultimate consumers, information and communication technologies influence the economy at all stages.

 

This decade is witnessing the forging of a link of unprecedented magnitude and significance between the technological innovation process and economic ad social organization.

 

A new “information society is emerging in which the services provided by information and communications technologies (ICTs) underpin human activities. The development of an “information society” will be a global phenomenon, led first of all by the Triad, but gradually extended to cover the entire planet.

 

 

 

2.     WHAT IS THE ROLE OF HLTCentral.org?

 

HLTCentral  - Gateway to Speech & Language Technology Opportunities on the Web

HLTCentral web site ( http://www.hltcentral.org/page-615.shtml) was established as an online information resource of human language technologies and related topics of interest to the HLT community at large. It covers news, R&D, technological and business developments in the field of speech, language, multilinguality, automatic translation, localisation and related areas. Its coverage of HLT news and developments is worldwide - with a unique European perspective.

HLTCentral is Powered by Two EU funded projects, ELSNET and EUROMAP.

EUROMAP ("Facilitating the path to market for language and speech technologies in Europe") - aims to provide awareness, bridge-building and market-enabling services for accelerating the rate of technology transfer and market take-up of the results of European HLT RTD projects. http://www.hltcentral.org/htmlengine.shtml?id=56

 

 

ELSNET ("The European Network of Excellence in Human Language Technologies") - aims to bring together the key players in language and speech technology, both in industry and in academia, and to encourage interdisciplinary co-operation through a variety of events and services. http://www.elsnet.org/ 

 

 

3.     WHY LANGUAGE TECHNOLOGIES ARE SO IMPORTANT FOR THE INFORMATION SOCIETY?

 

 

Many new ways in which the application of telematics ad the use of language technology will benefit our way of life, from interactive entertainment to lifelong learning.

 

The language technologies will make an indispensable contribution to the success of this information revolution. The availability ad usability of new telematics services will depend on developments in language engineering. Speech recognition will become a standard computer function providing us with the facility to talk to a range of devices, from our cars to our home computers, and to do so in our native language. In turn, these devices will present us the information, at least in part, by generating speech. Multilingual services will also be developed in many areas. http://www.serv-inf.deusto.es/abaitua/konzeptu/nlp/echo/infoage.html

 

Advances in computerised analysis, understanding and generation of written and spoken language are going to revolutionise human-computer interaction and technology mediated person-to-person communication globalisation of economy and society, high-language technologies playa central role. All this according to HTL Central http://www.hltcentral.org/htmlengine.shml?id=55    

DISCUSSION DOCUMENT, LUXEMBURG, JULY 1997

 

HTL will enable the information society through intuitive, human-centred modes of interaction with products and services. These will include spoken interaction, removing the need for the keyboards and keypads, the use of many different languages to process information and interact with devices  - as well as the ability to communicate across languages barriers. According to HTL Central http://www.hltcentral.org/page-219.shtml

 

Human Language Technologies activities are relevant to many of the action lines within the thematic programme on the Information society, due to the pervasiveness of human language in information and communication related activities

http://www.hltcentral.org/htmlengine.shml?id=55

DISCUSSION DOCUMENT, LUXEMBOURG, JULY 1997

 

 

 

 

Information overload and methods to improve data-management

 

 

1.          WHY “KNOWLEDGE” IS OF MORE VALUE THAN “INFORMATION”?

Information is data given context, and endowed with meaning and significance. Knowledge is information that is transformed through reasoning and reflection into beliefs, concepts, and mental models.

“Knowledge is power, but information is not. It’s like the detritus that a gold-panner needs to sift through in order to find the nuggets.”

 

 

http://sirio.deusto.es/abaitua/Konzeptu/fatiga.htm - knowledge

 

 

 

2.          DOES THE POSSESION OF BIG QUANTITIES OF DATA IMPLY THAT WE ARE WELL INFORMED?

 

Information is data given context, and endowed with meaning and significance. “Having too much information can be as dangerous as having too little. Among other problems, it can lead to a paralysis of analysis, making it far harder to find the right solutions or make the best decisions.” “Information is supposed to speed the flow of commerce, but it often just clogs the pipes.”

 

 

 

“Information stress sets in when people in possession of a huge volume of data have to work against the clock, when major consequences –lives saved or lost, money made or lost- will flow from their decision, or when they feel at a disadvantage because even with their wealth of material they still think they do not have all the facts they need. So challenged, the human body reacts with a primitive survival response. This evolved millions of years ago to safeguard us when confronted by physical danger. In situations where the only options are to kill a adversary or flee from it, the ´fight-flight’ response can make the difference between life and death.”

http://sirio.deusto.es/abaitua/Konzeptu/fatiga.htm - stress

 

 

3.          HOW MANY WORDS OF TECHNICAL INFORMATION ARE RECORDED EVERY DAY?

 

Notes

Telephone calls

The world spent nearly 60 billion minutes on the telephone -talking, faxing and sending data- in 1995. In 1985, the time spent was 15 billion minutes; in 2000 it is expected to be 95 billion minutes.

The BT/MCI Global Communications Report 1996/97. Trends, Analysis, Implications

Words per minute

Every day, approximately 20 million words of technical information are recorded. A reader capable of reading 1000 words per minute would require 1.5 months, reading eight hours every day, to get through one day's output, and at the end of that period he would have fallen 5.5 years behind in his reading

Methos for Satisfying the Needs of the Scientist and the Engineer for Scientific and Technical Communication, Hubert Murray Jr.

 

http://www.serv-inf.deusto.es/abaitua/konzeptu/fatiga.htm - Notes

 

 

 

4.          WHAT IS THE MOST CONVENIENT WAY OF REPRESENTING INFORMATION? WHY?

 

 

One of the key features of an information service is its ability to deliver information which meets the immediate, real needs of its client in a focused way. It is not sufficient to provide information which is broadly in the category requested, in such a way that the client must sift through it to extract what is useful. Equally, if the way that the information is extracted leads to important omissions, then the results are at best inadequate and at worst they could be seriously misleading.

Information is available throughout the world, on the World Wide Web, for example, in different languages. In reality, however, it is only available to a client who can firstly request the information in the language in which it is recorded and then understand the language in which the information is presented. Using machine translation facilities the person seeking information will be able to complete an information request in his or her native language and receive the information in that same language, regardless of the language in which the information is recorded.

Language Engineering can improve the quality of information services by using techniques which not only give more accurate results to search requests, but also increase greatly the possibility of finding all the relevant information available.

 

http://www.serv-inf.deusto.es/abaitua/konzeptu/nlp/langeng.htm

 

 

5. HOW CAN COMPUTER SCIENCE AND LANGUAGE TECHNOLOGIES HELP MANAGE INFORMATION?

 

 

"Better training in separating essential data from material that, no matter how interesting, is irrelevant to the task at hand is needed."

D. Lewis

 

The European Commission is also encouraging governments, corporations and small businesses to train people in how to manage data.

 

The irony of the fact that Daying for Information? was sponsored by Reuters Business Information is not lost on its executives, who direct the production and marketing of information services to corporate clients around the world.

 

"We would argue the Reuters' whole raison d'être for the past 150 years is getting through the overload to the salient facts."

 

Paul Waddington, marketing manager at Reuters.

"Dealing with the information burden is one of the most urgent challenges facing businesses. Unless we can discover ways of staying afloat amidst the surging torrents of information, we may end u drowing in them."

D. Lewis

 

http://www.serv-inf.deusto.es/abaitua/konzeptu/fatiga.htm

 

 

6.     WHY LANGUAGE CAN SOMETIMES BE SEEN AS A BARRIER TO COMMUNICATION? HOW CAN THIS CHANGE?

 

Language is the natural means of human communication; the most effective way we have to express ourselves to each other. We use language in a host of different ways: to explain complex ideas and concepts; to manage human resources; to negotiate; to persuade; to make our needs known; to express our feelings; to narrate stories; to record our culture for future generations; and to create beauty in poetry and prose. For most of us language is fundamental to all aspects of our lives.

The use of language is currently restricted. In the main, it is only used in direct communications between human beings and not in our interactions with the systems, services and appliances which we use every day of our lives. Even between humans, understanding is usually limited to those groups who share a common language. In this respect language can sometimes be seen as much a barrier to communication as an aid.

A change is taking place which will revolutionise our use of language and greatly enhance the value of language in every aspect of communication. This change is the result of developments in Language Engineering.

http://www.serv-inf.deusto.es/abaitua/konzeptu/nlp/langeng.htm

 

 

 

 

 

 

Language technology and engineering

 

1.     IN WHAT WAYS DOES LANGUAGE ENGINEERIG IMPROVE THE USE OF THE LANGUAGE?

 

 

Language is the natural means of human communication; the most effective way we have to express ourselves to each other.

 

Language Engineering provides ways in which we can extend and improve our use of language to make it a more effective tool. It is based on a vast amount of knowledge about language and the way it works, which has been accumulated through research. It uses language resources, such as electronic dictionaries and grammars, terminology banks and corpora, which have been developed over time. The research tells us what we need to know about language and develops the techniques needed to understand and manipulate it. The resources represent the knowledge base needed to recognise, validate, understand, and manipulate language using the power of computers. By applying this knowledge of language we can develop new ways to help solve problems across the political, social, and economic spectrum.

 

Language Engineering is a technology which uses our knowledge of language to enhance our application of computer systems:

 

·        improving the way we interface with them

·        assimilating, analysing, selecting, using, and presenting information more effectively

·        providing human language generation and translation facilities.

 

When, in addition to accepting typed input, a machine can recognise written natural language and speech, in a variety of languages, we shall all have easier access to the benefits of a wide range of information and communications services, as well as the facility to carry out business transactions remotely, over the telephone or other telematics services.

http://www.serv-inf.deusto.es/abaitua/konzeptu/nlp/langeng.htm

 

2.     LANGUAGE TECHNOLOGY, LANGUAGE ENGINEERING AND COMPUTATIONAL LINGUISTICS. SIMILARITIES AND DIFFERENCES.

 

 

Language Engineering is the application of knowledge of language to the development of computer systems which can recognise, understand, interpret, and generate human language in all its forms. In practice, Language Engineering comprises a set of techniques and language resources. The former are implemented in computer software and the latter are a repository of knowledge which can be accessed by computer software.

 

http:/www.hltcentral.org/usr_docs/project-source/en/broch/arnés.html#wile

 

Computational linguistics (CL) is a discipline between linguistics and computer science which is concerned with the computational aspects of the human language faculty. It belongs to the cognitive sciences and overlaps with the field of artificial intelligence (AI), a branch of computer science aiming at computational models of human cognition. Computational linguistics has applied and theoretical components.

 

 

http://sirio.deusto.es/abaitua/konzeptu/nlp/HU_what_cl.htmç

 

 

 

3.     WHICH ARE THE MAIN TECHNIQUES USED IN LANGUAGE ENGINEERING?

 

There are many techniques used in Language Engineering and some of these are described below:

 

. Speaker Identification and Verification

A human voice is as unique to an individual as a fingerprint. This makes it possible to identify a speaker and to use this identification as the basis for verifying that the individual is entitled to access a service or a resource.

 

 The sound of speech is received by a computer in analogue wave forms which are analysed to identify the units of sound (called phonemes) which make up words. Statistical models of phonemes and words are used to recognise discrete or continuous speech input.

 

. Character and Document Image Recognition

Recognition of written or printed language requires that a symbolic representation of the language is derived from its spatial form of graphical marks. For most languages this means recognising and transforming characters. There are two cases of character recognition:

-         recognition of printed images, referred to as Optical Character Recognition (OCR)

-         recognising handwriting, usually known as Intelligent Character reccognitio (ICR)

 

. Natural Language Understanding

The understanding of language is obviously fundamental to many applications. However, perfect understanding is not always requirement. In fact, gaining a partial understanding is often a very useful preliminary step in the process because it makes it possible to be intelligently selective about taking the depth of understanding to further levels.

 

. Natural Language Generation

A semantic representation of a text can be used as the basis for generating language. An interpretation of basic data or the underlying meaning of a sentence or phrase can be mapped into a surface string in a selected fashion; either in a chosen language or according to stylistic specifications by a text planning system.

 

Speech Generation

Speech is generated from filled templates, by playing ‘canned’ recordings or concatenating units of speech (phonemes, words) together. Speech generated as to account for aspects such as intensity, duration and stress in order to produce a continuous and natural response.

 

http://www.hltcentral.org/usr_docs/project-source/en/index.html

 

 

4.     WHICH LANGUAGE RESOURCES ARE ESSENTIAL COMPONENTS OF LANGUAGE ENGINEERING?

 

 

They are one of the main ways of representing the knowledge of language, which is used for the analytical work leading to recognition and understanding.

 

 

Lexicons

 

A lexicon is a repository of words and knowledge about those words. This knowledge may include details of the grammatical structure of each word (morphology), the sound structure (phonology), the meaning of the word in different textual contexts, e.g. depending on the word or punctuation mark before or after it. A useful lexicon may have hundreds of thousands of entries. Lexicons are needed for every language of application.

 

 
Specialist Lexicons

 

There are a number of special cases which are usually researched and produced separately from general purpose lexicons:

 

Proper names: Dictionaries of proper names are essential to effective understanding of language, at least so that they can be recognised within their context as places, objects, or person, or maybe animals. They take on a special significance in many applications, however, where the name is key to the application such as in a voice operated navigation system, a holiday reservations system, or railway timetable information system, based on automated telephone call handling.

 

Terminology: In today's complex technological environment there are a host of terminologies which need to be recorded, structured and made available for language enhanced applications. Many of the most cost-effective applications of Language Engineering, such as multi-lingual technical document management and machine translation, depend on the availability of the appropriate terminology banks.

 

Wordnets: A wordnet describes the relationships between words; for example, synonyms, antonyms, collective nouns, and so on. These can be invaluable in such applications as information retrieval, translator workbenches and intelligent office automation facilities for authoring.

 

 

Grammars

 

A grammar describes the structure of a language at different levels: word (morphological grammar), phrase, sentence, etc. A grammar can deal with structure both in terms of surface (syntax) and meaning (semantics and discourse).

 

 

Corpora

 

A corpus is a body of language, either text or speech, which provides the basis for:

http://www.serv-inf.deusto.es/abaitua/konzeptu/nlp/langeng.htm

 

 

5.     CHECK FOR THE FOLLOWING TERMS:

 

 

 

NATURAL LANGUAGE PROCESSING:

natural language processing

[p]

a term in use since the 1980s to define a class of software systems which handle text intelligently

 

 

TRANSLATOR’S WORKBENCH:

translator's workbench

[p]

a software system providing a working environment for a human translator, which offers a range of aids such as on-line dictionaries, thesauri, translation memories, etc

 

 

SHALLOW PARSER:

 

shallow parser

[p]

software which parses language to a point where a rudimentary level of understanding can be realised; this is often used in order to identify passages of text which can then be analysed in further depth to fulfil the particular objective

 

 

FORMALISM:

 

formalism

[n]

a means to represent the rules used in the establishment of a model of linguistic knowledge

 

 

SPEECH RECOGNITION:

 

speech recognition

[p]

 

The sound of speech is received by a computer in analogue wave forms which are analysed to identify the units of sound (called phonemes) which make up words

 

 

 

 

TEXT ALIGNMENT:

 

text alignment

[p]

the process of aligning different language versions of a text in order to be able to identify equivalent terms, phrases, or expressions

 

 

AUTHORING TOOLS:

 

authoring tools

[p]

facilities provided in conjunction with word processing to aid the author of documents, typically including an on-line dictionary and thesaurus, spell-, grammar-, and style-checking, and facilities for structuring, integrating and linking documents

 

 

CONTROLLED LANGUAGE:

 

controlled language

[p]

language which has been designed to restrict the number of words and the structure of (also artificial language) language used, in order to make language processing easier; typical users of controlled language work in an area where precision of language and speed of response is critical, such as the police and emergency services, aircraft pilots, air traffic control, etc.

 

 

DOMAIN:

 

Domain

[n]

usually applied to the area of application of the language enabled software e.g. banking, insurance, travel, etc.; the significance in Language Engineering is that the vocabulary of an application is restricted so the language resource requirements are effectively limited by limiting the domain of application

 

 

 

http://www.serv-inf.deusto.es/abaitua/konzeptu/nlp/langeng.htm

 

 

Multilinguality.

 

-Review of translation technology and its potential to help overcome that problem.

 

 

1.     IN THE TRANSLATION CURRICULA, WHICH FACTORS MAKE TECHNOLOGY MORE DISPENSABLE?

When discussing the relevance of technological training in the translation curricula, it is important to clarify the factors that make technology more indispensable and show how the training should be tuned accordingly. The relevance of technology will depend on the medium that contains the text to be translated. This particular aspect is becoming increasingly evident with the rise of the localization industry, which deals solely with information in digital form. There may be no other imaginable means for approaching the translation of such things as on-line manuals in software packages or CD-ROMs with technical documentation than computational ones.

On the other hand, the traditional crafts of interpreting natural speech or translating printed material, which are peripheral to technology, may still benefit from technological training slightly more than anecdotally. It is clear that word processors, on-line dictionaries and all sorts of background documentation, such as concordances or collated texts, besides e-mail or other ways of network interaction with colleagues anywhere in the world may substantially help the literary translator's work. With the exception of a few eccentrics or maniacs, it will be rare in the future to see good professional interpreters and literary translators not using more or less sophisticated and specialized tools for their jobs, comparable to the familiarization with tape recorders or typewriters in the past. In any case, this might be something best left to the professional to decide, and may not be indispensable.

http://sirio.deusto.es/abaitua/konzeptu/ta/vic.htm

 

 

2.     DO PROFESSIONAL INTERPRETERS AND LITERARY TRANSLATORS NEED TRANSLATION TECHNOLOGY? WHICH ARE THE TOOLS THEY NEED FOR THEIR JOB?

 

Like cooks, tailors or architects, professional translators need to become acquainted with technology, because good use of technology will make their jobs more competitive and satisfactory. But they should not dismiss craftsmanship. Technology enhances productivity, but translation excellence goes beyond technology. It is important to delimit the roles of humans and machines in translation.

 

It has taken some 40 years for the specialists involved in the development of MT to realize that the limits to technology arise when going beyond the mechanical and routine aspects of language. From the outside, translation is often seen as a mere mechanical process, not any more complex than playing chess, for example. If computers have been programmed with the capacity of beating a chess master champion such as Kasparov, why should they not be capable of performing translation of the highest quality?

 

The potentially good translator must be a sensitive, wise, vigilant, talented, gifted, experienced, and knowledgeable person. An adequate use of mechanical means and resources can make a good human translator a much more productive one. Nevertheless, very much like dictionaries and other reference material, technology may be considered an excellent prothesis, but little more than that.

 

However, even for skilled human translators, translation is often difficult. One clear example is when linguistic form, as opposed to content, becomes an important part of a literary piece. Conveying the content, but missing the poetic aspects of the signifier may considerably hinder the quality of the translation. This is a challenge to any translator. Jaime de Ojeda's (1989) Spanish translation of Lewis Carroll's Alice in Wonderland illustrates this problem:
 

Twinkle, twinkle, little bat 
how I wonder what you're at! 
Up above the world you fly 
like a tea-tray in the sky.

Brilla, luce, ratita alada 
¿en qué estás tan atareada? 
Por encima del universo vuelas 
como una bandeja de teteras.

Manuel Breva (1996) analyzes the example and shows how Ojeda solves the "formal hurdles" of the original:

The above lines are a parody of the famous poem "Twinkle, twinkle, little star" by Jane Taylor, which, in Carroll's version, turns into a sarcastic attack against Bartholomew Price, a professor of mathematics, nicknamed "The Bat". Jaime de Ojeda translates "bat" as "ratita alada" for rhythmical reasons. "Murciélago", the Spanish equivalent of "bat", would be hard to fit in this context for the same poetic reasons. With Ojeda's choice of words the Spanish version preserves the meaning and maintains the same rhyming pattern (AABB) as in the original English verse-lines.

What would the output of any MT system be like if confronted with this fragment? Obviously, the result would be disastrous. Compared with the complexity of natural language, the figures that serve to quantify the "knowledge" of any MT program are absurd: 100,000 word bilingual vocabularies, 5,000 transfer rules.... Well developed systems such as Systran, or Logos hardly surpass these figures. How many more bilingual entries and transfer rules would be necessary to match Ojeda's competence? How long would it take to adequately train such a system? And even then, would it be capable of challenging Ojeda in the way the chess master Kasparov has been challenged? I have serious doubts about that being attainable at all. But there are other opinions, as is the case of the famous Artificial Intelligence master, Marvin Minsky. Minsky would argue that it is all a matter of time. He sees the human brain as an organic machine, and as such, its behavior, reactions and performance can be studied and reproduced. Other people believe there is an important aspect separating organic, living "machines" from synthetic machines. They would claim that creativity is in life, and that it is an exclusive faculty of living creatures to be creative.

 

http://sirio.deusto.es/abaitua/konzeptu/ta/vic.htm

 

 

 

3.     IN WHAT WAYS IS DOCUMENTATION BECOMING ELECTRONIC? HOW DOES THIS AFFECT THE INDUSTRY?

 

Electronic documentation is the adequate realm for the incorporation of translation technology

The increase of information in electronic format is linked to advances in computational techniques for dealing with it. Together with the proliferation of informational webs in Internet, we can also see a growing number of search and retrieval devices, some of which integrate translation technology. Technical documentation is becoming electronic, in the form of CD-ROM, on-line manuals, intranets, etc. An important consequence of the popularization of Internet is that the access to information is now truly global and the demand for localizing institutional and commercial Web sites is growing fast. In the localization industry, the utilization of technology is congenital, and developing adequate tools has immediate economic benefits.

http://sirio.deusto.es/abaitua/konzeptu/ta/vic.htm

 

4.     WHAT IS THE FOCUS OF THE LOCALIZATION INDUSTRY? DO YOU BELIEVE THERE MIGHT BE A JOB FOR YOU IN THAT INDUSTRY SECTOR?

The main role of localization companies is to help software publishers, hardware manufacturers and telecommunications companies with versions of their software, documentation, marketing, and Web-based information in different languages for simultaneous worldwide release. The recent expansion of these industries has considerably increased the demand for translation products and has created a new burgeoning market for the language business. According to a recent industry survey by LISA (the Localization Industry Standards Association), almost one third of software publishers, such as Microsoft, Oracle, Adobe, Quark, etc., generate above 20 percent of their sales from localized products, that is, from products which have been adapted to the language and culture of their targeted markets, and the great majority of publishers expect to be localizing into more than ten different languages.

 

Besides Internet, another emerging sector for the localization industry is the introduction of the e-book (electronic book) in the literary market. Microsoft, Bertelsmann, HarperCollins, Penguin Putnam, Simon & Schuster, and TimeWarner Books have launched a new association for standardizing the format of electronic books. Although there may be doubts about whether we will ever be able to bring the electronic page into line with the printed page in terms of readability and ease of use, it is clear that for a new generation of console and video-game users, who are more than adapted to reading on screens, literature on the console may be more than appealing.

http://sirio.deusto.es/abaitua/konzeptu/ta/vic.htm

 

To answer the question if I believe there should be a job for me in that industry sector I would say that I don´t think so, since I am not prepared for that. Probably in the future, if I decide to go on with my studies and take this field which I consider to be very interesting but also very competitive.

 

 

5.     DEFINE INTERNATIONALIZATION, GLOBALIZATION AND LOCALIZATION. HOW DO THEY AFFECT THE DESIGN OF SOFTWARE PRODUCTS?

Professor Margaret King of Geneva University described the first step of the project as consisting of the "clarification of the state of affairs and to plan courses that are comprehensive enough to cover all aspects of interest of the localization industry, to review all aspects of the localization industry, from translation and technical writing through globalization, internationalization, and localization". The definition of the critical terms involved was a contentious topic, although there seems to be a consensus with the following:

Globalization: The adaptation of marketing strategies to regional requirements of all kinds (e.g., cultural, legal, and linguistic).

Internationalization: The engineering of a product (usually software) to enable efficient adaptation of the product to local requirements.

Localization: The adaptation of a product to a target language and culture (locale).

                           

Many aspects of software localization have not been considered, particularly the concepts of multilingual management and document-life monitoring. Corporations are now realizing that documentation is an integral part of the production line where the distinction between product, marketing and technical material is becoming more and more blurred. Product documentation is gaining importance in the whole process of product development with direct impact on time-to-market. Software engineering techniques that apply in other phases of software development are beginning to apply to document production as well. The appraisal of national and international standards of various types is also significant: text and character coding standards (e.g. SGML/XML and Unicode), as well as translation quality control standards (e.g. DIN 2345 in Germany, or UNI 10574 in Italy).

 

Unlike traditional translators, software localizers may be engaged in early stages of software development, as there are issues, such as platform portability, code exchange, format conversion, etc. which if not properly dealt with may hinder product internationalization. Localizers are often involved in the selection and application of utilities that perform code scanning and checking, that automatically isolate and suggest solutions to National Language Support (NLS) issues, which save time during the internationalization enabling process. There are run-time libraries that enable software developers and localizers to create single-source, multilingual, and portable cross-platform applications. Unicode support is also fundamental for software developers who work with multilingual texts, as it provides a consistent coding format for international character sets.

http://sirio.deusto.es/abaitua/konzeptu/ta/vic.htm

 

 

 

6.     ARE TRANSLATION AND LOCALIZATION THE SAME THING’ EXPLAIN THE DIFFERENCES.

There are conditions under which translation technology is not only worth learning but essential. Then again, it may be completely disregarded in many other circumstances without substantial loss. Localization is the paradigm of the need for technology, while interpreting and literary translation are examples of the latter. The localization business is intimately connected with the software industry and companies in the field complain about the lack of qualified personnel that combine both an adequate linguistic background and computational skills. This is the reason why the industry (around the LISA association) has taken the lead over educational institutions by proposing courseware standards (the LEIT initiative) for training localization professionals. We will discuss this and other issues connected with the formation of professional translators for today.

Localization is not limited to the software-publishing business and it has infiltrated many other facets of the market, from software for manufacturing and enterprise resource planning, games, home banking, and edutainment (education and entertainment), to retail automation systems, medical instruments, mobile phones, personal digital assistants (PDA), and the Internet.

Vand der Meer, president of AlpNet, puts it this way:

Localization was originally intended to set software (or information technology) translators apart from 'old fashioned' non-technical translators of all types of documents. Software translation required a different skill set: software translators had to understand programming code, they had to work under tremendous time pressure and be flexible about product changes and updates. Originally there was only a select group--the localizers--who knew how to respond to the needs of the software industry. >From these beginnings, pure localization companies emerged focusing on testing, engineering, and project management.

Localization: the adaptation of a product to a target language and culture (locale).

http://sirio.deusto.es/abaitua/konzeptu/ta/vic.htm

 

7.     WHAT IS A TRANSLATION WORKSTATION? COMPARE IT WITH A STANDARD LOCALIZATION TOOL.

 

Leaving behind the old conception of a monolithic compact translation engine, the industry is now moving in the direction of integrating systems: "In the future Trados will offer solutions that provide enterprise-wide applications for multilingual information creation and dissemination, integrating logistical and language-engineering applications into smooth workflow that spans the globe," says Trados manager Henri Broekmate. Logos, the veteran translation technology provider, has announced "an integrated technology-based translation package, which will combine term management, TM, MT and related tools to create a seamless full service localization environment."

 

The ideal workstation for the translator would combine the following features:

Software localization tools

Unlike traditional translators, software localizers may be engaged in early stages of software development, as there are issues, such as platform portability, code exchange, format conversion, etc. which if not properly dealt with may hinder product internationalization. Localizers are often involved in the selection and application of utilities that perform code scanning and checking, that automatically isolate and suggest solutions to National Language Support (NLS) issues, which save time during the internationalization enabling process. There are run-time libraries that enable software developers and localizers to create single-source, multilingual, and portable cross-platform applications. Unicode support is also fundamental for software developers who work with multilingual texts, as it provides a consistent coding format for international character sets.

http://sirio.deusto.es/abaitua/konzeptu/ta/vic.htm

 

8.     MACHINE TRANSLATION VS. HUMAN TRANSLATION. DO YOU AGREE THAT TRANSLATION EXCELLENCE GOES BEYOND ECHNOLOGY? WHY?

 

Twinkle, twinkle, little bat 
how I wonder what you're at! 
Up above the world you fly 
like a tea-tray in the sky.

LEWIS CARROL

Brilla, luce, ratita alada 
¿en qué estás tan atareada? 
Por encima del universo vuelas 
como una bandeja de teteras.

Tr. de Jaime Ojeda

Manuel Breva (1996) analyzes the example and shows how Ojeda solves the "formal hurdles" of the original:

The above lines are a parody of the famous poem "Twinkle, twinkle, little star" by Jane Taylor, which, in Carroll's version, turns into a sarcastic attack against Bartholomew Price, a professor of mathematics, nicknamed "The Bat". Jaime de Ojeda translates "bat" as "ratita alada" for rhythmical reasons. "Murciélago", the Spanish equivalent of "bat", would be hard to fit in this context for the same poetic reasons. With Ojeda's choice of words the Spanish version preserves the meaning and maintains the same rhyming pattern (AABB) as in the original English verse-lines.

http://sirio.deusto.es/abaitua/konzeptu/ta/vic.htm

 

Centelleo, centelleo, pequeño palo, ¿cómo me pregunto en cuál usted está! Encima sobre del mundo usted vuela como una té-bandeja en el cielo.

Tr. De SYSTRAN

 

Brilla, luce, ratita alada

¿en qué estás tan atareada?

Por encima del universo vuelas

Como una bandeja de teteras.

 

Tr. De Jaime Ojeda

 

 

I agree that translation excellence goes beyond technology because there are many words that can’t be translated literally and machines are not able to give the appropriate nuance to these words.

 

 

9.     WHICH PROFILES SHOULD ANY PERSON WITH A UNIVERSITY DEGREE IN TRANSLATION BE QUALIFIED FOR?

 

Any person with a university degree in translation should be qualified at least for the following three profiles:

 

The following three are also options that this person with a univesity degree in translation could carry out:

 

http://sirio.deusto.es/abaitua/konzeptu/ta/vic.htm

 

 

 

Machine Translation

 

 

 

1.     WHICH ARE THE MAIN PROBLEMS OF MT?

 

There are certain difficulties encountered by present computer system which attempt to produce partial or complete translations of texts from one natural language into another.

The major problems of MT systems concern ambiguity , homonymy and alternative structures.

 

http://sirio.deusto.es/abaitua/konzeptu/ta/hutchins91.htm

 

 

 

2.     WHICH PARTS OF LINGUISTICS ARE MORE RELEVANT FOR MT?

 

The full potential of machine readable texts can be exploited in three ways: first, by adopting the notion of an `electronic document'  and embedding an MT system in a complete document processing system; second, by restricting the form of input by using simplified or controlled language ; and third, by restricting both the form, and the subject matter of the input texts to those that fall within a  sublanguage --- it is here that the immediate prospects for MT are greatest.

 

 

3.     ILLUSTRATE YOUR DISCUSSION WITH:

 

 

·        TWO EXAMPLES OF LEXICAL AMBIGUITY.

 

1.                  Imagine that we are trying to translate these two sentences into French :

In the first sentence use is a verb, and in the second a noun, that is, we have a case of lexical ambiguity. An English-French  dictionary will say that the verb can be translated by (inter alia) se servir de and employer, whereas the noun is translated as emploi or utilisation. One way a reader or an automatic parser can find out whether the noun or verb form of use is being employed in a sentence is by working out whether it is grammatically possible to have a noun or a verb in the place where it occurs

 

2.      Take for example the word button. Like the word use, it can be either a verb or a noun. As a noun, it can mean both the familiar small round object used to fasten clothes, as well as a knob on a piece of apparatus.

 

http://www.essex.ac.uk/linguistics/clmt/MTbook/HTML/node53.html#SECTION00820000000000000000

 

 

 

·        ONE EXAMPLE OF STRUCTURAL AMBIGUITY.

 Another source of syntactic ambiguity is where whole phrases, typically prepositional phrases, can attach to more than one position in a sentence. For example, in the following example, the prepositional phrase with a Postscript interface can attach either to the NP the word processor package, meaning ``the word-processor which is fitted or supplied with a Postscript interface'', or to the verb connect, in which case the sense is that the Postscript interface is to be used to make the connection.

·        the printer to a word processor package with a Postscript interface.

 This kind of real world knowledge  is also an essential component in disambiguating the pronoun it in examples such as the following

·        the paper in the printer. Then switch it on.

In order to work out that it is the printer that is to be switched on, rather than the paper, one needs to use the knowledge of the world that printers (and not paper) are the sort of thing one is likely to switch on.

 

http://www.essex.ac.uk/linguistics/clmt/MTbook/HTML/node53.html#SECTION00820000000000000000

All the examples given in these two questions are by Arnold DJ. Thu Dec 21 10:52:49 GMT 1995

 

·        THREE LEXICAL AND STRUCTURAL MISMATCHES.

 

 

In the best of all possible worlds for NLP, every word would have exactly one sense. While this is true for most NLP, it is an exaggeration as regards MT. It would be a better world, but not the best of all possible worlds, because we would still be faced with difficult translation problems. Some of these problems are to do with lexical differences between languages --- differences in the ways in which languages seem to classify the world, what concepts they choose to express by single words, and which they choose not to lexicalize.

 

However, when one turns to cases of structural mismatch, classification is not so easy. This is because one may often think that the reason one language uses one construction, where another uses another is because of the stock of lexical items the two languages have. Thus, the distinction is to some extent a matter of taste and convenience.

 

A particularly obvious example of this involves problems arising from what are sometimes called lexical holes  --- that is, cases where one language has to use a phrase to express what another language expresses in a single word. Examples of this include the `hole' that exists in English with respect to French  ignorer (`to not know', `to be ignorant of'), and se suicider (`to suicide', i.e. `to commit suicide', `to kill oneself'). The problems raised by such lexical holes  have a certain similarity to those raised by idiom s: in both cases, one has phrases translating as single words.

One kind of structural mismatch occurs where two languages use the same construction for different purposes, or use different constructions for what appears to be the same purpose.

   1. The adverb just must be translated as the verb venir-de (perhaps this is not the best way to think about it --- the point is that the French structure must contain venir-de, and just must not be translated in any other way).

2. Sam, the SUBJECT of see, must become the SUBJECT of venir-de.

3. Some information about tense, etc. must be taken from the S node of which see is the HEAD, and put on the S node of which venir-de is the HEAD. This is a complication, because normally one would expect such information to go on the node of which the translation of see, voir, is the HEAD.

4. Other parts of the English sentence should go into the corresponding parts of the sentence HEADed by voir. This is simple enough here, because in both cases Kim is an OBJECT, but it is not always the case that OBJECTs translate as OBJECTs, of course.

5. The link between the SUBJECT of venir-de and the SUBJECT of voir must be established --- but this can perhaps be left to French synthesis.  

A slightly different sort of structural mismatch occurs where two languages have `the same' construction (more precisely, similar constructions, with equivalent interpretations), but where different restrictions on the constructions mean that it is not always possible to translate in the most obvious way. The following is a relatively simple example of this.

ArnoldDJ
Thu Dec 21 10:52:49 GMT 1995

http://www.essex.ac.uk/linguistics/clmt/MTbook/HTML/node54.html#SECTION00830000000000000000

·        THREE COLLOCATIONS.

 

The meaning can be guessed from the meanings of the parts. What is not predictable is the particular words that are used.

For example, the fact that we say rancid butter, but not * sour butter, and sour cream, but not * rancid cream does not seem to be completely predictable from the meaning of butter or cream, and the various adjectives. Similarly the choice of take as the verb for walk is not simply a matter of the meaning of walk (for example, one can either make or take a journey).

 

http://www.essex.ac.uk/linguistics/clmt/MTbook/HTML/node55.html#SECTION00840000000000000000

 

 

 

·        TWO IDIOMATIC EXPRESSIONS.

Idioms are expressions whose meaning cannot be completely understood from the meanings of the component parts. The problem with idioms, in an MT context, is that it is not usually possible to translate them using the normal rules. There are exceptions, for example take the bull by the horns (meaning `face and tackle a difficulty without shirking') can be translated literally into French  as prendre le taureau par les cornes, which has the same meaning. But, for the most part, the use of normal rules in order to translate idioms will result in nonsense. Instead, one has to treat idioms as single units in translation.

In many cases, a natural translation for an idiom will be a single word --- for example, the French  word mourir (`die') is a possible translation for kick the bucket. This brings out the similarity, which we noted above, with lexical holes  of the kind shown in.

In general, there are two approaches one can take to the treatment of idioms. The first is to try to represent them as single units in the monolingual dictionaries. What this means is that one will have lexical entries such as kick_the_bucket. One might try to construct special morphological  rules to produce these representations before performing any syntactic  analysis --- this would amount to treating idioms as a special kind of word, which just happens to have spaces in it. As will become clear, this is not a workable solution in general. A more reasonable idea is not to regard lexical lookup as a single process that occurs just once, before any syntactic  or semantic processing, but to allow analysis rules to replace pieces of structure by information which is held in the lexicon at different stages of processing, just as they are allowed to change structures in other ways. This would mean that kick the bucket and the non-idiomatic kick the table would be represented alike (apart from the difference between bucket and table) at one level of analysis, but that at a later, more abstract representation kick the bucket would be replaced with a single node, with the information at this node coming from the lexical entry kick_the_bucket. This information would probably be similar to the information one would find in the entry for die.

One problem with sentences which contain idioms is that they are typically ambiguous , in the sense that either a literal or idiomatic interpretation is generally possible (i.e. the phrase kick the bucket can really be about buckets and kicking). However, the possibility of having a variety of interpretations does not really distinguish them from other sorts of expression. Another problem is that they need special rules (such as those above, perhaps), in addition to the normal rules for ordinary words and constructions. However, in this they are no different from ordinary words, for which one also needs special rules. The real problem with idioms is that they are not generally fixed in their form, and that the variation of forms is not limited to variations in inflection (as it is with ordinary words). Thus, there is a serious problem in recognising idioms.

http://www.essex.ac.uk/linguistics/clmt/MTbook/HTML/node55.html#SECTION00840000000000000000

 

Arnold D J
Thu Dec 21 10:52:49 GMT 1995

 

 

Machine translation II

 

- history, methods, approaches, best known systems, etc.

 

1.     WHICH ARE THE MORE USUAL INTERPRETATIONS OF THE TERM “MACHINE TRANSLATION”?

 

The term machine translation (MT) is normally taken in its restricted and precise meaning of fully automatic translation. However, in this chapter we consider the whole range of tools that may support translation and document production in general, which is especially important when considering the integration of other language processing techniques and resources with MT. We therefore define Machine Translation to include any computer-based process that transforms (or helps a user to transform) written text from one human language into another. We define Fully Automated Machine Translation (FAMT) to be MT performed without the intervention of a human being during the process. Human-Assisted Machine Translation (HAMT) is the style of translation in which a computer system does most of the translation, appealing in case of difficulty to a (mono- or bilingual) human for help. Machine-Aided Translation (MAT) is the style of translation in which a human does most of the work but uses one of more computer systems, mainly as resources such as dictionaries and spelling checkers, as assistants.

http://sirio.deusto.es/abaitua/konzeptu/nlp/Mlim/mlim4.html

 

2.     WHAT DO FAHQT AND ALPAC MEAN IN THE EVOLUTION OF MT?

 

Fully Automated Machine Translation (FAMT) to be MT performed without the intervention of a human being during the process.

 

 

3.     LIST SOME OF THE MAJOR METHODS, TECHNIQUES AND APPROACHES.

 

Statistical vs. linguistic MT

 

 

4.     WHERE WAS MT TEN YEARS AGO?

Ten years ago, the typical users of machine translation were large organizations such as the European Commission, the US Government, the Pan American Health Organization, Xerox, Fujitsu, etc. Fewer small companies or freelance translators used MT, although translation tools such as online dictionaries were becoming more popular. However, ongoing commercial successes in Europe, Asia, and North America continued to illustrate that, despite imperfect levels of achievement, the levels of quality being produced by FAMT and HAMT systems did address some users’ real needs. Systems were being produced and sold by companies such as Fujitsu, NEC, Hitachi, and others in Japan, Siemens and others in Europe, and Systran, Globalink, and Logos in North America (not to mentioned the unprecedented growth of cheap, rather simple MT assistant tools such as PowerTranslator).

In response, the European Commission funded the Europe-wide MT research project Eurotra, which involved representatives from most of the European languages, to develop a large multilingual MT system (Johnson, et al., 1985). Eurotra, which ended in the early 1990s, had the important effect of establishing Computational Linguistics groups in a several countries where none had existed before. Following this effort, and responding to the promise of statistics-based techniques (as introduced into Computational Linguistics by the IBM group with their MT system CANDIDE), the US Government funded a four-year effort, pitting three theoretical approaches against each other in a frequently evaluated research program. The CANDIDE system (Brown et al., 1990), taking a purely-statistical approach, stood in contrast to the Pangloss system (Frederking et al., 1994), which initially was formulated as a HAMT system using a symbolic-linguistic approach involving an interlingua; complementing these two was the LingStat system (Yamron et al., 1994), which sought to combine statistical and symbolic/linguistic approaches. As we reach the end of the decade, the only large-scale multi-year research project on MT worldwide is Verbmobil in Germany (Niemann et al., 1997), which focuses on speech-to-speech translation of dialogues in the rather narrow domain of scheduling meetings.

http://sirio.deusto.es/abaitua/konzeptu/nlp/Mlim/mlim4.html

 

5.     NEW DIRECTIONS AND FORESEEABLE BREAKTHROUGHS OF MT IN THE SORT TERM.

 

Future developments will include highly integrated approaches to translation (integration of translation memory and MT, hybrid statistical-linguistic translation, multi-engine translation systems, and the like). We are likely to witness the development of statistical techniques to address problems that defy easy formalization and obvious rule-based behavior, such as sound transliteration (Knight and Graehl, 1997), word equivalence across languages (Wu, 1995), wordsense disambiguation (Yarowsky, 1995), etc. The interplay between statistical and symbolic techniques is discussed in Chapter 6.

Two other ongoing developments do not draw much on empirical linguistics. The first is the continuing integration of low-level MT techniques with conventional word processing to provide a range of aids, tools, lexicons, etc., for both professional and occasional translators. This is now a real market, assisting translators to perform, and confirms Martin Kay’s predictions (Kay,1997; reprint) about the role of machine-aided human translation some twenty years ago. Kay’s remarks predated the more recent empirical upsurge and seemed to reflect a deep pessimism about the ability of any form of theoretical linguistics, or theoretically motivated computational linguistics, to deliver high-quality MT. The same attitudes underlie (Arnold et al., 1994), which was produced by a group long committed to a highly abstract approach to MT that failed in the Eurotra project; the book itself is effectively an introduction to MT as an advanced form of document processing.

The second continuing development, set apart from the statistical movement, is a continuing emphasis on large-scale handcrafted resources for MT. This emphasis implicitly rejects the assumptions of the empirical movement that such resources could be partly or largely acquired automatically by, e.g., extraction of semantic structures from machine readable dictionaries, of grammars from treebanks or by machine learning methods. As described in Chapter 1, efforts continue in a number of EC projects, including PAROLE/SIMPLE and EuroWordNet (Vossen et al., 1999), as well as on the ontologies WordNet (Miller et al., 1995), SENSUS (Knight and Luk, 1994; Hovy, 1998), and Mikrokosmos (Nirenburg, 1998). This work exemplifies something of the same spirit expressed by Kay and Arnold et al., as it has been conspicuous in parts of the Information Extraction community (see Chapter 3): the use of very simple heuristic methods, while retaining the option to use full scale theoretical methods (in this case knowledge-based MT).

http://sirio.deusto.es/abaitua/konzeptu/nlp/Mlim/mlim4.html