ABSTRACT

This is a report that consist in the development of some questions that I have answered and that are related with the "Human Languages an New Technologies", made at the University of Deusto, Bilbao.

As I have realized , New Technologies and the information that are related to computers are improving and raising in importance. New Technologies are essential for our present daily work and even more and more for the future.This report deals with important information taken from some internet pages. The method followed to do this report is of the kind of Question - Answer, that is, a questionaire with some of the most important topics. This report includes very interesting themes about Human Languages and New Technologies. In this report, I shall explain the issues and information that are imprescindible for the understanding and enjoying of human Languages and New Technologies.

The objectve of this information is for a best understanding of Human Languages and New Technologies, but not only to could understand also to learn and to enjoy learnig and reading this kind of information.

INTRODUCTION

New Technologies are essential for our present daily work and even more and more for the future. new Technologies are going to be the feature and our future and we will not be able to live without New Technologies such as information technology, internet, web pages, mobile phones and so on.

Even more I am realizing that little by little we have to deal with this new kinds of Technologies and so we have to begin to understand this new technologies, for us in the future could work better and more profesionaly with computers and all its world.because i am realized of the importance that New Technologies are getting, this report deals important information, taken from some internet pages, that deals with this this theme: Human Languages and new Technologies. The method that i have followed to do this report is of the kind of question- answer, that is, a questionare with information about the theme that I have alredy mention and that are important to work with a computer eficciently. This report is directed specially to students or arts and languages, but not only for them, also to everybody that is interested in these kind of themes.Some of the themes that I have developed are:

Language Technologies and the Information Society, Language Technologies and Resources, Multilinguality, Translation Technology, Machine Translation and its history, methods, approaches, problems, etc..., multilingual resources, and Corpus Linguistics.

The objectve of this information is for a best understanding of Human Languages and New Technologies, but not only to could understand also to learn and to enjoy learnig and reading this kind of information.

1.What is the role of HLTCentral.org?

HLTCentral - Gateway to Speech & Language Technology Opportunities on the Web HLTCentral web site was established as an online information resource of human language technologies and related topics of interest to the HLT community at large. It covers news, R&D, technological and business developments in the field of speech, language, multilinguality, automatic translation, localisation and related areas. Its coverage of HLT news and developments is worldwide - with a unique European perspective.

http://www.hltcentral.org/page-615.shtml

2.What is the "Information Society"?

The term Information Society has been around for a long time now and, indeed, has become something of a cliché. The notion of the coming Information Society reminds me of the way the idea of the Sydney 2000 Olympics and the way it shimmers in the distance.

We look towards the Olympics and resolve to prepare hard for it. We must rapidly transform ourselves, our city, our demeanour to be ready and worthy. Time is of the essence in making ourselves ready for the challenge. There is certain breathlessness in all of this rhetoric. The same can be said of much of the documents and writings on the Information Society. The recent Department of Industry, Science and Tourism's Goldsworthy report on the Global Information Economy urges "...time is short, and the need for action is urgent. Government must grasp the challenge now." (Department of Industry, Science and Tourism, 1997:7). But when you push past the rhetoric and the sense of urgency being conveyed , what is the reality of the Information Society? What, in particular, do policy makers think it is?

In the European Union, the concept of the Information Society has been evolving strongly over the past few years building on the philosophy originally spelled out by Commissioner Martin Bangemann in 1994. Bangemann argued that the Information Society represents a "revolution based on information ... [which] adds huge new capacities to human intelligence and constitutes a resource which changes the way we work together and the way we live together..." (European Commission, 1994:4). One of the main implications of this "revolution" for Bangemann is that the Information Society can secure badly needed jobs (Europe and the Global Information Society, 1994:3). In other words, a driving motivation for the Information Society is the creation of employment for depressed economies. Closer to home it is instructive to look at just a few policy (or would-be policy) documents to see the views of the Information Society dominant here. The Goldsworthy report sees the Information Society as a "societal revolution based around information and communication technologies and about the role of these in developing global competitiveness and managing the transition to a globalised free trade world" (Department of Industry, Science and Tourism, 1997).

In short, Goldsworthy's idea of the Information Society is entirely an economic one. At a broader level Barry Jones, the author of the House of Representatives Standing Committee's 1991 report 'Australia as a Information Society' sets out a definition of the Information Society which sees it as simply "a period when use of time, family life, employment, education and social interaction are increasingly influenced by access to Information Technology" (Australia as an Information Society: Grasping New Paradigms, 1991). These are just a few examples of ideas underpinning information policy drives in the developed world where the concept is accepted almost without challenge, and there is an inherent belief that like the Olympics, the Information Society is real - or will be very soon if only we can get ourselves organised properly.

Some claim, of course, that the Information Society is here already and not just on its way. But one way or the other "it" exists and is a "good thing". By and large, national and regional Information Society documents do not question the belief that the Information Society will bring prosperity and happiness if a few basic safeguards are put in place. Some of the very few notes of serious caution in the practice of information policy have come through the influence of the Scandinavian countries which joined the European Union when the EU was already in full flight with implementing the actions flowing from the Bangemann report.

Interestingly, in recent travels, I noticed an extraordinary level of hope and trust in that developing country in the potential of information technology to transform India into a modern fully developed economy. The push to develop information and technological infrastructure initiated by Rajiv Gandhi is seen as positive and a necessary step for the goal of a universally prosperous society in India. Effectively there is the same acceptance of the goodness of an Information Society and the absolute necessity to be one, that is found in the West. Given this blind faith in the existence and the desirability of an Information Society among diverse nations, it is instructive to look at the theoretical literature which has spawned the idea to see what it claims for the Information Society. The term Information Society has many synonyms:

Information Age, Information Revolution, Information Explosion and so on. It is found across a wide spectrum of disciplines. Fortunately the task of unravelling many of these ideas has been accomplished in a masterly way by Frank Webster. He has categorised the variety of concepts of the Information Society, Information Revolution, or whatever, and provided an analysis of five common conceptions of the Information Society (Webster, 1995).

http://www.gu.edu.au/centre/cmp/Papers_97/Browne_M.html

3.Why language technologies are so important for the Information Society?

The strategic objective of the Information Society Technologies (IST) Programme is to realise the benefits of the information society for Europe both by accelerating its emergence and by ensuring that the needs of individuals and enterprises are met. The IST Programme has four inter-related specific objectives. For the private individual, the objective is to meet the need and expectation of high-quality affordable general interest services. For Europe’s enterprises, workers and consumers, the objective is to enable individuals and organisations to innovate and be more effective and efficient in their work, thereby providing the basis for sustainable growth and high added-value employment while also improving the quality of working life. In the sector of multimedia content, the key objective is to confirm Europe as a leading force, realising its full potential. For the enabling technologies which are the foundations of the information society, the programme objective is to drive their development, enhance their applicability and accelerate their take-up in Europe. The IST Programme is managed by the European Commission, with the assistance of the IST Committee consisting of representatives of each Member and Associated State. The Commission and the IST Committee are supported in their work by an IST Advisory Group of some 25 members who are highly experienced in this field. They provide independent expert advice concerning the content of the IST Workprogramme. The Programme follows on from the ESPRIT, ACTS and Telematics Applications Programmes, which were carried out by the Community within the 4th Framework Programme. It is based on a new, integrated approach which reflects the increasing convergence of the information and communications technologies which were addressed individually by those programmes.

http://europa.eu.int/information_society/programmes/research/index_en.htm

4.How many words of technical information are recorded every day?

Every day, approximately 20 million words of technical information are recorded. A reader capable of reading 1000 words per minute would require 1.5 months, reading eight hours every day, to get through one day's output, and at the end of that period he would have fallen 5.5 years behind in his reading.

http://sirio.deusto.es/abaitua/konzeptu/fatiga.htm

5.Does the possesion of big quantities of data imply that we are well informed?

Like most bureaucrats, business executives, teachers, doctors, lawyers and other professionals, Guilford increasingly feels he is suffering from information overload. The symtoms of this epidemic ailment can include tension, occasional irritability and frequent feelings of helpessnes -all signs that the victim is under considerable stress.

David Lewis coined the term "information fatigue syndrome" for what he expects will soon be a recognized medical condition. "Having too much information can be as dangerous as having too little. Among other problems, it can lead to a paralysis of analysis, making it far harder to find the right solutions or make the best decisions." "Information is supposed to speed the flow of commerce, but it often just clogs the pipes."

"Information stress sets in when people in possession of a huge volume of data have to work against the clock, when major consequences -lives saved or lost, money made or lost- will flow from their decision, or when they feel at a disadvantage because even with their wealth of material they still think they do not have all the facts they need. So challenged, the human body reacts with a primitive survival response. This evolved millions of years ago to safeguard us when confronted by physical danger. In situations where the only options are to kill a adversary or flee from it, the 'fight-flight' response can make the difference between life and death." D. Lewis

Strategies for dealing with information:

"Just because you can't cope with a lot of information doesn't make you a bad manager. Organizations are gettig by with fewer people doing more, and aren't necessarily giving people time to devise strategies for dealing with information." R. Sachs.

http://sirio.deusto.es/abaitua/konzeptu/fatiga.htm#knowledge

6.Why "knowledge" is of more value than "information"?

Information and knowledge : "Knowledge is power, but information is not. It's like the detritus that a gold-panner needs to sift through in order to find the nuggets." D. Lewis.

Information management is the harnessing of the information resources and information capabilities of the organization in order to add and create value both for itself and for its clients or customers.

http://sirio.deusto.es/abaitua/konzeptu/fatiga.htm#knowledge

7.What is the most convenient way of representing information? Why?

Language is the natural means of human communication; the most effective way we have to express ourselves to each other. We use language in a host of different ways: to explain complex ideas and concepts; to manage human resources; to negotiate; to persuade; to make our needs known; to express our feelings; to narrate stories; to record our culture for future generations; and to create beauty in poetry and prose. For most of us language is fundamental to all aspects of our lives.

http://www.hltcentral.org/usr_docs/project-source/en/broch/harness.html#lt

One of the key features of an information service is its ability to deliver information which meets the immediate, real needs of its client in a focused way. It is not sufficient to provide information which is broadly in the category requested, in such a way that the client must sift through it to extract what is useful. Equally, if the way that the information is extracted leads to important omissions, then the results are at best inadequate and at worst they could be seriously misleading.

http://sirio.deusto.es/abaitua/konzeptu/nlp/langeng.htm

8.How can computer science and language technologies help manage information?

Language Engineering can improve the quality of information services by using techniques which not only give more accurate results to search requests, but also increase greatly the possibility of finding all the relevant information available. Use of techniques like concept searches, i.e. using a semantic analysis of the search criteria and matching them against a semantic analysis of the database, give far better results than simple keyword searches.

http://www.hltcentral.org/usr_docs/project-source/en/broch/harness.html#bi

9.Why language can sometimes be seen as a barrier to comunication? How can this change?

Communication is probably the most obvious use of language. On the other hand, language is also the most obvious barrier to communication. Across cultures and between nations, difficulties arise all the time not only because of the problem of translating accurately from one language to another, but also because of the cultural connotations of word and phrases. A typical example in the European context is the word 'federal' which can mean a devolved form of government to someone who already lives in a federation, but to someone living in a unitary sovereign state, it is likely to mean the imposition of another level of more remote, centralised government.

As the application of language knowledge enables better support for translators, with electronic dictionaries, thesauri, and other language resources, and eventually when high quality machine translation becomes a reality, so the barriers will be lowered. Agreements at all levels, whether political or commercial, will be better drafted more quickly in a variety of languages. International working will become more effective with a far wider range of individuals able to contribute. An example of a project which is successfully helping to improve communications in Europe is one which interconnects many of the police forces of northern Europe using a limited, controlled language which can be automatically translated, in real-time. Such a facility not only helps in preventing and detecting international crime, but also assists the emergency services to communicate effectively during a major incident.

http://www.hltcentral.org/usr_docs/project-source/en/broch/harness.html#ec

10.Language Tecnology, Language Engineering and Computational Linguistics. Similarities and differencies.

What is Language Engineering ? :

Language Engineering is the application of knowledge of language to the development of computer systems which can recognise, understand, interpret, and generate human language in all its forms. In practice, Language Engineering comprises a set of techniques and language resources. The former are implemented in computer software and the latter are a repository of knowledge which can be accessed by computer software.

In practice, Language Engineering is applied at two levels. At the first level there are a number of generic classes of application, such as: language translation information management (multi-lingual) authoring (multi-lingual) human/machine interface (multi-lingual voice and text) At the second level, these enabling applications are applied to real world problems across the social and economic spectrum. So, for example: information management can be used in an information service, as the basis for analysing requests for information and matching the request, against a database of text or images, to select the information accurately authoring tools are typically used in word processing systems but can also be used to generate text, such as business letters in foreign languages, as well as in conjunction with information management, to provide document management facilities human language translation is currently used to provide translator workbenches and automatic translation in limited domains most applications can usefully be provided with natural language user interfaces, including speech, to improve their usability. In general, language capability is embedded in systems to enhance their performance. Language Engineering is an 'enabling technology'.

http://www.hltcentral.org/usr_docs/project-source/en/broch/harness.html#cott

Computational linguistics:

Computational linguistics (CL) is a discipline between linguistics and computer science which is concerned with the computational aspects of the human language faculty. It belongs to the cognitive sciences and overlaps with the field of artificial intelligence (AI), a branch of computer science aiming at computational models of human cognition. Computational linguistics has applied and theoretical components.

http://www.coli.uni-sb.de/~hansu/what_is_cl.html

Language Technology:

Language technolodies are information technologies that are specialized for dealing with the most complex information medium in our world: Human language. Therefore these technologies are also often subsumed under the term Human Language technology. Human Language occurs in spoken and written forms. Whereas speech is the oldest andmost natural mode of comunication, complex information and most of human knowledge is mantained ena transmitted in written texts. Speech and text technologies process or produce language in these two modes of realitation. But language also has aspectsthat are shared between speech and text such as dictionaries, most of grammar and the meaning of sentences. Thus large parts of language technology cannot be subsumed under speech and text technologies. Among those are technologies that link language to knowledge. We do not know how language, knowledge and thought are represented in the human brain. Nevertheless, language technology had to create formal representation systems that link language to concepts and tasks in the real world. This provides the interface to the fast growing area of knowledge technologies.

http://sirio.deusto.es/abaitua/konzeptu/nlp/HU_whatLT.pdf

11.In What ways does language engineering improves the use of language?

Our ability to develop our use of language holds the key to the multi-lingual information society; the European society of the future. New developments in Language Engineering will enable us to: Access information efficiently, focusing precisely on the information we need, saving time and avoiding information overload. Talk to our computer systems, at home as well as at work, in our cars and in public places where we need information or assistance. Teach ourselves other languages and improve our use of our own, at our convenience: in our own time; at our own pace; and in our own place. do business efficiently over the telephone by interacting reliably and directly with voice operated computer systems; even instruct our PCs to carry out transactions on our behalf. Learn more about what is happening around us, locally, nationally and internationally and have a greater influence on decisions affecting our lives. Operate more effectively internationally, in business, in administration, in political activities and as citizens and consumers. Provide a wider range of better services to the maximum number of fellow citizens, colleagues and customers.

http://www.hltcentral.org/usr_docs/project-source/en/broch/harness.html#mlwfu

12.Which are the main techniques used in Language Engineering?

1.- Speaker Identification and Verification

A human voice is as unique to an individual as a fingerprint. This makes it possible to identify a speaker and to use this identification as the basis for verifying that the individual is entitled to access a service or a resource. The types of problems which have to be overcome are, for example, recognising that the speech is not recorded, selecting the voice through noise (either in the environment or the transfer medium), and identifying reliably despite temporary changes (such as caused by illness

2.- Speech Recognition

The sound of speech is received by a computer in analogue wave forms which are analysed to identify the units of sound (called phonemes) which make up words. Statistical models of phonemes and words are used to recognise discrete or continuous speech input. The production of quality statistical models requires extensive training samples (corpora) and vast quantities of speech have been collected, and continue to be collected, for this purpose. There are a number of significant problems to be overcome if speech is to become a commonly used medium for dealing with a computer. The first of these is the ability to recognise continuous speech rather than speech which is deliberately delivered by the speaker as a series of discrete words separated by a pause. The next is to recognise any speaker, avoiding the need to train the system to recognise the speech of a particular individual. There is also the serious problem of the noise which can interfere with recognition, either from the environment in which the speaker uses the system or through noise introduced by the transmission medium, the telephone line, for example. Noise reduction, signal enhancement and key word spotting can be used to allow accurate and robust recognition in noisy environments or over telecommunication networks. Finally, there is the problem of dealing with accents, dialects, and language spoken, as it often is, ungrammatically.

3.- Character and Document Image Recognition

Recognition of written or printed language requires that a symbolic representation of the language is derived from its spatial form of graphical marks. For most languages this means recognising and transforming characters. There are two cases of character recognition: recognition of printed images, referred to as Optical Character Recognition (OCR) and recognising handwriting, usually known as Intelligent Character Recognition (ICR)

OCR from a single printed font family can achieve a very high degree of accuracy. Problems arise when the font is unknown or very decorative, or when the quality of the print is poor. In these difficult cases, and in the case of handwriting, good results can only be achieved by using ICR. This involves word recognition techniques which use language models, such as lexicons or statistical information about word sequences.

Document image analysis is closely associated with character recognition but involves the analysis of the document to determine firstly its make-up in terms of graphics, photographs, separating lines and text, and then the structure of the text to identify headings, sub-headings, captions etc. in order to be able to process the text effectively.

4.- Natural Language Understanding

The understanding of language is obviously fundamental to many applications. However, perfect understanding is not always a requirement. In fact, gaining a partial understanding is often a very useful preliminary step in the process because it makes it possible to be intelligently selective about taking the depth of understanding to further levels.

Shallow or partial analysis of texts is used to obtain a robust initial classification of unrestricted texts efficiently. This initial analysis can then be used, for example, to focus on 'interesting' parts of a text for a deeper semantic analysis which determines the content of the text within a limited domain. It can also be used, in conjunction with statistical and linguistic knowledge, to identify linguistic features of unknown words automatically, which can then be added to the system's knowledge.

Semantic models are used to represent the meaning of language in terms of concepts and relationships between them. A semantic model can be used, for example, to map an information request to an underlying meaning which is independent of the actual terminology or language in which the query was expressed. This supports multi-lingual access to information without a need to be familiar with the actual terminology or structuring used to index the information.

Combinations of analysis and generation with a semantic model allow texts to be translated. At the current stage of development, applications where this can be achieved need be limited in vocabulary and concepts so that adequate Language Engineering resources can be applied. Templates for document structure, as well as common phrases with variable parts, can be used to aid generation of a high quality text.

5.- Natural Language Generation

A semantic representation of a text can be used as the basis for generating language. An interpretation of basic data or the underlying meaning of a sentence or phrase can be mapped into a surface string in a selected fashion; either in a chosen language or according to stylistic specifications by a text planning system.

6.- Speech Generation

Speech is generated from filled templates, by playing 'canned' recordings or concatenating units of speech (phonemes, words) together. Speech generated has to account for aspects such as intensity, duration and stress in order to produce a continuous and natural response.

Dialogue can be established by combining speech recognition with simple generation, either from concatenation of stored human speech components or synthesising speech using rules.Providing a library of speech recognisers and generators, together with a graphical tool for structuring their application, allows someone who is neither a speech expert nor a computer programmer to design a structured dialogue which can be used, for example, in automated handling of telephone calls.

http://www.hltcentral.org/usr_docs/project-source/en/broch/harness.html#nlg

13.Which language resources are essential components of Language Enginnering?

Language resources are essential components of Language Engineering. They are one of the main ways of representing the knowledge of language, which is used for the analytical work leading to recognition and understanding.

The work of producing and maintaining language resources is a huge task. Resources are produced, according to standard formats and protocols to enable access, in many EU languages, by research laboratories and public institutions. Many of these resources are being made available through the European Language Resources Association (ELRA). Those are the essential components of language engineering:

1.- Lexicons

A lexicon is a repository of words and knowledge about those words. This knowledge may include details of the grammatical structure of each word (morphology), the sound structure (phonology), the meaning of the word in different textual contexts, e.g. depending on the word or punctuation mark before or after it. A useful lexicon may have hundreds of thousands of entries. Lexicons are needed for every language of application.

2.- Specialist Lexicons

Proper names:Dictionaries of proper names are essential to effective understanding of language, at least so that they can be recognised within their context as places, objects, or person, or maybe animals. They take on a special significance in many applications, however, where the name is key to the application such as in a voice operated navigation system, a holiday reservations system, or railway timetable information system, based on automated telephone call handling.

Terminology: In today's complex technological environment there are a host of terminologies which need to be recorded, structured and made available for language enhanced applications. Many of the most cost-effective applications of Language Engineering, such as multi-lingual technical document management and machine translation, depend on the availability of the appropriate terminology banks.

Wordnets:A wordnet describes the relationships between words; for example, synonyms, antonyms, collective nouns, and so on. These can be invaluable in such applications as information retrieval, translator workbenches and intelligent office automation facilities for authoring

3.- Grammars

A grammar describes the structure of a language at different levels: word (morphological grammar), phrase, sentence, etc. A grammar can deal with structure both in terms of surface (syntax) and meaning (semantics and discourse).

4.- Corpora

A corpus is a body of language, either text or speech, which provides the basis for: analysis of language to establish its characteristics, training a machine, usually to adapt its behaviour to particular circumstances, verifying empirically a theory concerning language, a test set for a Language Engineering technique or application to establish how well it works in practice.

There are national corpora of hundreds of millions of words but there are also corpora which are constructed for particular purposes. For example, a corpus could comprise recordings of car drivers speaking to a simulation of a control system, which recognises spoken commands, which is then used to help establish the user requirements for a voice operated control system for the market.

http://www.hltcentral.org/usr_docs/project-source/en/broch/harness.html#lr

14.Check the following Terms:

Natural language processing:Natural language processing is a term in use since the 1980s to define a class of software systems which handle text intelligently.

Traslator´s Workbench:It is a software system providing a working environment for a human translator, which offers a range of aids such as on-line dictionaries, thesauri, translation memories, etc.

Shallow parser:Shallow parser is a software which parses language to a point where a rudimentary level of understanding can be realised; this is often used in order to identify passages of text which can then be analysed in further depth to fulfil the particular objective.

http://www.hltcentral.org/usr_docs/project-source/en/broch/harness.html#nlu

Formalism:It is a means to represent the rules used in the establishment of a model of linguistic knowledge

http://www.hltcentral.org/usr_docs/project-source/en/broch/harness.html#nl

Speech recognition:The sound of speech is received by a computer in analogue wave forms which are analysed to identify the units of sound (called phonemes) which make up words. Statistical models of phonemes and words are used to recognise discrete or continuous speech input. The production of quality statistical models requires extensive training samples (corpora) and vast quantities of speech have been collected, and continue to be collected, for this purpose. There are a number of significant problems to be overcome if speech is to become a commonly used medium for dealing with a computer. The first of these is the ability to recognise continuous speech rather than speech which is deliberately delivered by the speaker as a series of discrete words separated by a pause. The next is to recognise any speaker, avoiding the need to train the system to recognise the speech of a particular individual. There is also the serious problem of the noise which can interfere with recognition, either from the environment in which the speaker uses the system or through noise introduced by the transmission medium, the telephone line, for example. Noise reduction, signal enhancement and key word spotting can be used to allow accurate and robust recognition in noisy environments or over telecommunication networks. Finally, there is the problem of dealing with accents, dialects, and language spoken, as it often is, ungrammatically.

http://www.hltcentral.org/usr_docs/project-source/en/broch/harness.html#nlu

Text alignment:The process of aligning different language versions of a text in order to be able to identify equivalent terms, phrases, or expressions.

http://www.hltcentral.org/usr_docs/project-source/en/broch/harness.html#nlu

Authoring tools:Authoring tools facilities provided in conjunction with word processing to aid the author of documents, typically including an on-line dictionary and thesaurus, spell-, grammar-, and style-checking, and facilities for structuring, integrating and linking documents .

http://www.hltcentral.org/usr_docs/project-source/en/broch/harness.html#nlu

Controlled language:It is a language which has been designed to restrict the number of words and the structure of (also artificial language) language used, in order to make language processing easier; typical users of controlled language work in an area where precision of language and speed of response is critical, such as the police and emergency services, aircraft pilots, air traffic control, etc.

http://www.hltcentral.org/usr_docs/project-source/en/broch/harness.html#nlu

Domain:It is usually applied to the area of application of the language enabled software e.g. banking, insurance, travel, etc.; the significance in Language Engineering is that the vocabulary of an application is restricted so the language resource requirements are effectively limited by limiting the domain of application.

http://www.hltcentral.org/usr_docs/project-source/en/broch/harness.html#nlu

15.What is the focus of the localization industry? Do you believe there might be a job for you in that industry sector?

The increase of information in electronic format is linked to advances in computational techniques for dealing with it. Together with the proliferation of informational webs in Internet, we can also see a growing number of search and retrieval devices, some of which integrate translation technology. Technical documentation is becoming electronic, in the form of CD-ROM, on-line manuals, intranets, etc. An important consequence of the popularization of Internet is that the access to information is now truly global and the demand for localizing institutional and commercial Web sites is growing fast. In the localization industry, the utilization of technology is congenital, and developing adequate tools has immediate economic benefits.

The main role of localization companies is to help software publishers, hardware manufacturers and telecommunications companies with versions of their software, documentation, marketing, and Web-based information in different languages for simultaneous worldwide release. The recent expansion of these industries has considerably increased the demand for translation products and has created a new burgeoning market for the language business. According to a recent industry survey by LISA (the Localization Industry Standards Association), almost one third of software publishers, such as Microsoft, Oracle, Adobe, Quark, etc., generate above 20 percent of their sales from localized products, that is, from products which have been adapted to the language and culture of their targeted markets, and the great majority of publishers expect to be localizing into more than ten different languages.

Localization is not limited to the software-publishing business and it has infiltrated many other facets of the market, from software for manufacturing and enterprise resource planning, games, home banking, and edutainment (education and entertainment), to retail automation systems, medical instruments, mobile phones, personal digital assistants (PDA), and the Internet. Doing business in an integrated global economy, with growing electronic transactions, and world wide access to products and services means an urgent need to break through language barriers. A prediction of $220 billion online spending by 2001 shows the potential of this new market. It means that product information, from purchasing procedures to user manuals, must be made available in the languages of potential customers.

According to the latest surveys, there are more than 35 million non-English-speaking Internet users. Internet is thus evolving into a huge consumer of Web-based information in different languages. The company Nua Ltd. provides a good example of how the demand for multilingual Web-sites is changing the notion of translation into localization. Nua has recently won a substantial contract to develop and maintain a searchable multilingual intranet for the American Export Group (AEG), a division of Thomas Publishing International. Nua's task is to transform the existing American Export Register (AER), a directory of some 6,000 pages, into a localized database of 45,000 company listings, with information about each company, including a categorization into one of AEG's 5,000 categories. AEG's intranet will link 47,000 US firms to overseas clients.

The first version of the AER register will provide access in five languages: English, French, German, Spanish, and Portuguese. Russian is due to follow, and the company hopes eventually to have an Arabic version. Any such multilingual service involves frequent revisions and updates, which in turn means a high demand for constant localization effort. Besides Internet, another emerging sector for the localization industry is the introduction of the e-book (electronic book) in the literary market. Microsoft, Bertelsmann, HarperCollins, Penguin Putnam, Simon & Schuster, and TimeWarner Books have launched a new association for standardizing the format of electronic books. Although there may be doubts about whether we will ever be able to bring the electronic page into line with the printed page in terms of readability and ease of use, it is clear that for a new generation of console and video-game users, who are more than adapted to reading on screens, literature on the console may be more than appealing.

To understand the relevance of the localization market we can look at some figures provided by companies in the field. AlpNet, for example, who claims to be the largest publicly owned dedicated supplier of worldwide translation and product localization services, with over 375 employees in 13 countries, has recently reported sales of US$10.4 million in one quarter of 1997, with net income of US$619,000. In addition to AlpNet, here are some more names of buoyant companies in the localization business: International Software Products, EnCompas Globalization, Lernout & Huaspie, Flanders Language Valley, Vertaalbureau Bothof , Intertrans, Bowne Global Solutions, LionBridge Technologies, Language Management International, International Language Engineering, Techno-Graphics & Translations Accent Software International Ltd.. The specialized magazine Language International, with six issues a year, is a good source of information to find out more about these companies. Many claim to have problems recruiting people. The General Manager of LionBridge, Santi van der Kruk, for example, declares:

The profile we look for in translators is an excellent knowledge of computer technology and superb linguistic ability in both the source and target languages. They must know how to use the leading CAT [computer assisted translation] tools and applications and be flexible.

The information technology and localization industries are evolving very rapidly and translators need to move with them. Vand der Meer, president of AlpNet, puts it this way: Localization was originally intended to set software (or information technology) translators apart from 'old fashioned' non-technical translators of all types of documents. Software translation required a different skill set: software translators had to understand programming code, they had to work under tremendous time pressure and be flexible about product changes and updates. Originally there was only a select group--the localizers--who knew how to respond to the needs of the software industry. >From these beginnings, pure localization companies emerged focusing on testing, engineering, and project management. This shows that the localization market is requiring an expertise that the vast majority of academic centers is not properly providing. This state of affairs explains why the localization industry itself, around the LISA association, has seen the need to promote an educational initiative.

http://sirio.deusto.es/abaitua/konzeptu/ta/vic.htm

17.Define internationalization, globalization and localization. How do they affect the design of software products?

LISA Education Initiative Taskforce (LEIT) is a consortium of schools training translators and computational linguists that was announced in 1998 as an initiative to develop a promotional program for the academic communities in Europe, North America, and Asia. The initial mandate of LEIT was to conduct a survey among academic and non-academic programs that offer courseware and training for internationalizers and localizers and to query the market players to determine their needs with respect to major job profiles. LEIT's main objective is to stimulate more formal education in skills beneficial to the localization industry that complains of a labor shortage. The academic institutions involved in the first release of LEIT are: University of Geneva (Switzerland), Brigham Young University (Utah), Kent State University (Ohio), University of Cologne (Germany), City College of Dublin (Ireland), Monterey Institute of International Studies (California), and National Software Center in Bombay (India).

Professor Margaret King of Geneva University described the first step of the project as consisting of the "clarification of the state of affairs and to plan courses that are comprehensive enough to cover all aspects of interest of the localization industry, to review all aspects of the localization industry, from translation and technical writing through globalization, internationalization, and localization". The definition of the critical terms involved was a contentious topic, although there seems to be a consensus with the following:

Globalization: The adaptation of marketing strategies to regional requirements of all kinds (e.g., cultural, legal, and linguistic). Internationalization: The engineering of a product (usually software) to enable efficient adaptation of the product to local requirements.

Localization: The adaptation of a product to a target language and culture (locale). The main goal of the LEIT initiative is to introduce localization courseware into translation studies, with versions ready for the start of the 1999 academic year.

However, this must be done with care. Bert Esselink (1998), from AlpNet, for example, argues against separating localization from other disciplines and claims its basic principles should be covered in all areas of translation training. Furthermore, it would be useful to add the trainers not only need constant feedback and guidance from the commercial sector, they also need to maintain close contact with the software industry. So, perhaps, one of the best features of the LEIT initiative is its combination of partnership from the academic as well as from the industry world. LISA offers the first version of this courseware on its Web-site and users have the possibility to contact the LEIT group and collaborate through an on-line questionnaire.

http://sirio.deusto.es/abaitua/konzeptu/ta/vic.htm

18.What is a translation workstation? Compare it with a standard localization tool.

The translation workstation leaving behind the old conception of a monolithic compact translation engine, the industry is now moving in the direction of integrating systems: "In the future Trados will offer solutions that provide enterprise-wide applications for multilingual information creation and dissemination, integrating logistical and language-engineering applications into smooth workflow that spans the globe," says Trados manager Henri Broekmate. Logos, the veteran translation technology provider, has announced "an integrated technology-based translation package, which will combine term management, TM, MT and related tools to create a seamless full service localization environment."

Other software manufacturers also in the race are Corel, Star, IBM, and the small but belligerent Spanish company Atril. This approach for integrating different tools is largely the view advocated by many language-technology specialists. Below is a description of an ideal engine which captures the answers given by Muriel Vasconcellos (from the Pan American Health Organization), Minako O'Hagan (author of The Coming Age of Teletranslations) and Eduard Hovy (President of the Association of Machine Translation in the Americas) to a recent survey (by Language International 10.6).

The ideal workstation for the translator would combine the following features:

Full integration in the translator's general working environment, which comprises the operating system, the document editor (hypertext authoring, desktop publisher or the standard word-processor), as well as the emailer or the Web browser. These would be complemented with a wide collection of linguistic tools: from spell, grammar and style checkers to on-line dictionaries, and glossaries, including terminology management, annotated corpora, concordances, collated texts, etc. The system should comprise all advances in machine translation (MT) and translation memory (TM) technologies, be able to perform batch extraction and reuse of validated translations, enable searches into TM databases by various keywords (such as phrases, authors, or issuing institutions). These TM databases could be distributed and accessible through Internet.

There is a new standard for TM exchange (TMX) that would permit translators and companies to work remotely and share memories in real-time. Eduard Hovy underlines the need for a genre detector. "We need a genre topology, a tree of more or less related types of text and ways of recognizing and treating the different types computationally." He also sees the difficulty of constantly up-dating the dictionaries and suggests a "restless lexicon builder that crawls all over the Web every night, ceaselessly collecting words, names, and phrases, and putting them into the appropriate lexicons." Muriel Vasconcellos pictures her ideal design of the workstation in the following way:

Good view of the source text extensive enough to offer the overall context, including the previous sentence and two or three sentences after the current one.

Relevant on-line topical word lists, glossaries and thesaurus. These should be immediately accessible and, in the case of topical lists, there should be an optimal switch that shows, possibly in color, when there are subject-specific entries available.

Three target-text windows. The first would be the main working area, and it would start by providing a sentence from the original document (or a machine pre-translation), which could be over-struck or quickly deleted to allow the translator to work from scratch. The original text or pre-translation could be switched off. Characters of any language and other symbols should be easy to produce.

Drag-and-drop is essential and editing macros are extremely helpful when overstriking or translating from scratch.

The second window would offer translation memory when it is available. The TM should be capable of fuzzy matching with a very large database, with the ability to include the organization's past texts if they are in some sort of electronic form.

The third window would provide a raw machine translation which should be easy to paste into the target document.

The grammar checker can be tailored so that it is not so sensitive. It would be ideal if one could write one's own grammar rules.

http://sirio.deusto.es/abaitua/konzeptu/ta/vic.htm

19.Machine translation vs. human translation. Do you agree that translation excellence goes beyond technology? Why?

Although it may not be perceived at first sight, the complexity of natural language is of an order of magnitude far superior to any purely mechanical process. To how many words should the vocabulary be limited to make the complexity of producing "free sonnets" (that is, any combination of 6 words in 14 verses) comparable to the number of possible chess games? It may be difficult to believe, but the vocabulary should be restricted to 100 words. That is, making free sonnets with 100 words offers as many different alternatives as there are ways of playing a chess game (roughly, 10120; see DELI's Web page for discussion).

The number of possibilities would quickly come down if combinations were restricted so that they not only made sense but acquired some sort of poetic value. However, defining formally or mechanically the properties of "make sense" and "have poetic value" is not an easy task. Or at least, it is far more difficult than establishing winning heuristics for a color to succeed in a chess game. No wonder then that Douglas Hofstadter's MT experiment translating 16th century French Clément Marot's poemMa Mignonne into English using IBM's Candide system should have performed so badly (see Sgrung's interview in Language International 10.1) : Well, when you look at [IBM's Candide's] translation of Ma Mignonne, thinking of Ma Mignonne as prose, not as poetry, it's by far the worst. It's so terrible that it's not even laughable, it just stinks! It's pathetic! Obviously, Hofstadter's experiment has gone beyond the recommended mechanical and routine scope of language and is therefore an abuse of MT. Outside the limits of the mechanical and routine, MT is impracticable and human creativity becomes indispensable. Translators of the highest quality are only obtainable from first-class raw materials and constant and disciplined training. The potentially good translator must be a sensitive, wise, vigilant, talented, gifted, experienced, and knowledgeable person.

An adequate use of mechanical means and resources can make a good human translator a much more productive one. Nevertheless, very much like dictionaries and other reference material, technology may be considered an excellent prothesis, but little more than that. As Martin Kay (1992) argues, there is an intrinsic and irreplaceable human aspect of translation: There is nothing that a person could know, or feel, or dream, that could not be crucial for getting a good translation of some text or other. To be a translator, therefore, one cannot just have some parts of humanity; one must be a complete human being. However, even for skilled human translators, translation is often difficult. One clear example is when linguistic form, as opposed to content, becomes an important part of a literary piece. Conveying the content, but missing the poetic aspects of the signifier may considerably hinder the quality of the translation. This is a challenge to any translator. Jaime de Ojeda's (1989) Spanish translation of Lewis Carroll's Alice in Wonderland illustrates this problem: Twinkle, twinkle, little bat how I wonder what you're at! Up above the world you fly like a tea-tray in the sky. Brilla, luce, ratita alada ¿en qué estás tan atareada? Por encima del universo vuelas como una bandeja de teteras. Manuel Breva (1996) analyzes the example and shows how Ojeda solves the "formal hurdles" of the original: The above lines are a parody of the famous poem "Twinkle, twinkle, little star" by Jane Taylor, which, in Carroll's version, turns into a sarcastic attack against Bartholomew Price, a professor of mathematics, nicknamed "The Bat".

Jaime de Ojeda translates "bat" as "ratita alada" for rhythmical reasons. "Murciélago", the Spanish equivalent of "bat", would be hard to fit in this context for the same poetic reasons. With Ojeda's choice of words the Spanish version preserves the meaning and maintains the same rhyming pattern (AABB) as in the original English verse-lines. What would the output of any MT system be like if confronted with this fragment? Obviously, the result would be disastrous. Compared with the complexity of natural language, the figures that serve to quantify the "knowledge" of any MT program are absurd: 100,000 word bilingual vocabularies, 5,000 transfer rules.... Well developed systems such as Systran, or Logos hardly surpass these figures. How many more bilingual entries and transfer rules would be necessary to match Ojeda's competence? How long would it take to adequately train such a system? And even then, would it be capable of challenging Ojeda in the way the chess master Kasparov has been challenged? I have serious doubts about that being attainable at all. But there are other opinions, as is the case of the famous Artificial Intelligence master, Marvin Minsky. Minsky would argue that it is all a matter of time. He sees the human brain as an organic machine, and as such, its behavior, reactions and performance can be studied and reproduced. Other people believe there is an important aspect separating organic, living "machines" from synthetic machines. They would claim that creativity is in life, and that it is an exclusive faculty of living creatures to be creative.

http://sirio.deusto.es/abaitua/konzeptu/ta/vic.htm

20.Do professional interpreters and literary translators need translation technology? Which are the tools they need for their job?

Away from such metaphysical dilemmas, what I personally expect are systems that learn while they are exposed to translations like Ojeda's; systems that are capable of memorizing any bilingual chunk which may be considered a translation unit. Sometimes the translation unit will correspond to just a word or a phrase, like "bat" and "ratita alada", but more often whole paragraphs or even entire literary works could be taken as translation units. One can think of systems that, when confronted with a text which contains an occurrence of Lewis Carroll's parody of Jane Taylor, would be clever enough to resort to Ojeda's translation, and not only use "ratita alada" instead of "murciélago", but provide the whole verse if needed.

More remarkably than Carroll, Shakespeare may be taken as the literary author who is most frequently quoted or paraphrased in English. There are two established translators of Shakespeare into Spanish, Astrana Marín and Ángel Luis Pujante (Rupérez 1998). Astrana had been the main reference until Pujante's versions were published in 1998. Astrana translated Shakespeare in prose, with frequent paraphrases and explanations of the source text. Pujante tries to maintain as much poetic effect as possible not only in rhyme and rhythm, but also with the archaic flavor of the original. One would like to see a system with expertise in Shakespeare's translations into Spanish, where both Astrana and Pujante's versions were registered, together with other known alternatives, including Carmen Criado's work in the Spanish dubbed version of the film Shakespeare in love. It would be an attractive content for an e-book, as is in fact Francisco Rico's CD-ROM with collations and the final revised version of El Quijote in Spanish. Such an ideal MT system, rather than competing with Astrana's, or Pujante's translating skills, would just be able to reproduce their versions in one's own working environment, word by word, through the simple stroke of a key. And this is completely within the state of the art in translation technology.

http://sirio.deusto.es/abaitua/konzeptu/ta/vic.htm

What should students learn about translation technology? As we now know, there is no one single answer to this question. Technological skills will depend on how students see their own future as translators. Those with good aptitudes for interpreting or literary translation could leave technology on a secondary level. However, it is clear that the vast majority of students should be prepared to satisfy the growing demand for specialists in technical documentation, and in particular the demand from the localization industry. Thus, training centers should seriously consider introducing the LEIT initiative into their training curricula.

Apart from a basic common computational background these would include official and industrial standards in offimatics (with word-processing, data-base maintenance, spreadsheet management, Internet browsing, emailing, etc.); students should have realistic knowledge of some specific translation technology, ideally in the form of a translation workstation. However, it is important to realize that software is constantly evolving, software and hardware up-dating is expensive, and that key concepts and skills may be equally well acquired with tools which are two or three years old. What is most important is becoming competent with the basic functional operations such as file and window management, editing, and net interaction. More specialized operations will be easily acquired on top of the basic ones, and will largely depend on the student's natural sympathy for the computer.

I would recommend at least one year of basic computer training before attempting any specialization. It is thus important to tune training courses to the expectations of the students. Out of the following six options, any person with a University degree in Translation should be qualified at least to be able to carry out the first three of the following:

Consultant: A person that is sufficiently informed to advise potential users of translation technology. This person should be able to find out when and how technology may be useful or cost-effective; how to find out the most adequate tools or where to get the necessary information to come up with an answer. That is, a person that has read at least one paper like this, or knows where to find the basic relevant literature and references.

User: A person that has sufficient technological training to be efficient not only using the computer but also any specialized translation software with a minimally standard way of working.

Instructor: A person that can both assess and use the technology is, with a little more experience, also capable of training other people. Teaching requires some confidence with hardware and software, so it would be desirable for the instructor to also be a regular computer user.

Evaluator: Evaluating the technology requires a little more expertise than being a consultant. An evaluator would be able to analyze how good or bad particular software is. Therefore, some experience in software evaluation in general, and in translation technology in particular, is recommendable.

Manager: A person that has the responsibility to make a translation or localization company profitable should have quite some experience in using and testing translation technology. That person should also be able to design an optimal distribution between human and machine resources; and should know what kind of professionals the company needs (translators, computational linguists, or software engineers), as well as how to acquire the most appropriate technological infrastructure.

Developer: Localization software very often needs customizing, integration or up-dating. Good professionals may be involved in software development, where both linguistic and technical skills may be required. Thus, it can be seen that the traditional role of the translator will be changing very quickly and a direct consequence of this is that there will be more career opportunities for the graduate in Translation Studies than ever before.

A recent survey, done by the Department of Languages at the University of Applied Sciences in Cologne provoked the following comments by Michael Grade, Professor of Technical English (Language International 10.4): Career prospects are favorable for technical translation graduates with further job qualifications. The chances of finding a language-related job in scientific, medical, or technical fields appear high. The result of the survey indicated that the most important area of activity was the international export and sales sector for technical products and services. The job description now includes a diversified range of activities such as commercial and specialized technical tasks, customer relations, clerical, and organizational responsibilities. So this, together with the fact that the market is beginning to recognize the value of the translator in the world today, seems to augur well for the future.

http://sirio.deusto.es/abaitua/konzeptu/ta/vic.htm

21.In the translation curricula, which factors make technology more indispensable?

Technological training in the translation curricula, it is important to clarify the factors that make technology more indispensable and show how the training should be tuned accordingly. The relevance of technology will depend on the medium that contains the text to be translated. This particular aspect is becoming increasingly evident with the rise of the localization industry, which deals solely with information in digital form. There may be no other imaginable means for approaching the translation of such things as on-line manuals in software packages or CD-ROMs with technical documentation than computational ones.

On the other hand, the traditional crafts of interpreting natural speech or translating printed material, which are peripheral to technology, may still benefit from technological training slightly more than anecdotally. It is clear that word processors, on-line dictionaries and all sorts of background documentation, such as concordances or collated texts, besides e-mail or other ways of network interaction with colleagues anywhere in the world may substantially help the literary translator's work. With the exception of a few eccentrics or maniacs, it will be rare in the future to see good professional interpreters and literary translators not using more or less sophisticated and specialized tools for their jobs, comparable to the familiarization with tape recorders or typewriters in the past.

In any case, this might be something best left to the professional to decide, and may not be indispensable. However, the greater number of jobs for our students is in the localization market. Information of many types is rapidly changing format and going digital. Electronic documentation is the adequate realm for the incorporation of translation technology. This is something that young students of translation must learn. As the conception and design of technical documentation becomes progressively influenced by the electronic medium, it is integrating more and more with the whole concept of a software product. The strategies and means for translating both software packages and electronic documents are becoming very similar and both are now, as we will see, the goal of the localization industry http://sirio.deusto.es/abaitua/konzeptu/ta/vic.htm

22.In what ways is documentation becoming electronic? How does this affect the industry?

This particular aspect is becoming increasingly evident with the rise of the localization industry, which deals solely with information in digital form. There may be no other imaginable means for approaching the translation of such things as on-line manuals in software packages or CD-ROMs with technical documentation than computational ones.

An important consequence of the popularization of Internet is that the access to information is now truly global and the demand for localizing institutional and commercial Web sites is growing fast.

http://sirio.deusto.es/abaitua/konzeptu/ta/vic.htm

23.Are translation and localitation the same thing?

Localization - is the process during which a computer program is translated to a different language for a specific market. The user interface is translated into the target language, dialog boxes are resized due to the use of different character sets, and if necessary, double-byte enabling is done.

Vand der Meer, president of AlpNet, puts it this way:

"Localization was originally intended to set software (or information technology) translators apart from 'old fashioned' non-technical translators of all types of documents. Software translation required a different skill set: software translators had to understand programming code, they had to work under tremendous time pressure and be flexible about product changes and updates. Originally there was only a select group--the localizers--who knew how to respond to the needs of the software industry. >From these beginnings, pure localization companies emerged focusing on testing, engineering, and project management."

http://sirio.deusto.es/abaitua/konzeptu/ta/vic.htm

24.Which profiles should any person with a University degree in Translation be qualified for?

Obviously, Hofstadter's experiment has gone beyond the recommended mechanical and routine scope of language and is therefore an abuse of MT. Outside the limits of the mechanical and routine, MT is impracticable and human creativity becomes indispensable. Translators of the highest quality are only obtainable from first-class raw materials and constant and disciplined training. The potentially good translator must be a sensitive, wise, vigilant, talented, gifted, experienced, and knowledgeable person. An adequate use of mechanical means and resources can make a good human translator a much more productive one. Nevertheless, very much like dictionaries and other reference material, technology may be considered an excellent prothesis, but little more than that. As Martin Kay (1992) argues, there is an intrinsic and irreplaceable human aspect of translation:

There is nothing that a person could know, or feel, or dream, that could not be crucial for getting a good translation of some text or other. To be a translator, therefore, one cannot just have some parts of humanity; one must be a complete human being.

http://sirio.deusto.es/abaitua/konzeptu/ta/vic.htm

25.Why is translation such a difficult task?

The translation is difficult because of the Problems of ambiguity , (ii) problems that arise from structural and lexical differences between languages and (iii) multiword units like idiom s and collocations .

Of course, these sorts of problem are not the only reasons why MT is hard. Other problems include the sheer size of the undertaking, as indicated by the number of rules and dictionary entries that a realistic system will need, and the fact that there are many constructions whose grammar is poorly understood, in the sense that it is not clear how they should be represented, or what rules should be used to describe them. This is the case even for English, which has been extensively studied, and for which there are detailed descriptions -- both traditional `descriptive' and theoretically sophisticated -- some of which are written with computational usability in mind. It is an even worse problem for other languages. Moreover, even where there is a reasonable description of a phenomenon or construction, producing a description which is sufficiently precise to be used by an automatic system raises non-trivial problems.

http://sirio.deusto.es/abaitua/konzeptu/ta/MT_book_1995/node52.html#SECTION00810000000000000000

26.Which are the main problems of MT?

The methods for dealing with translation difficulties vary from system to system. In many cases, the ambiguities specific to the source language are tackled in operations separate from the treatment of differences between languages. Commonly three basic operations are recognised: the analysis of the source text, the bilingual transfer of lexical items and structures and the generation of the target text. Questions of ambiguity and choice occur at every stage. For example, resolving the ambiguity of English cry between ‘weep’ and ‘shout’ would be part of a program for the analysis of English.

On the other hand, the selection of connaître or savoir in French for the English verb know would be a matter for a separate transfer program. Analysis involves also the identification and disambiguation of structures, e.g. whether He saw her shaking hands means that he saw someone who was welcoming a visitor or he saw someone who was suffering from the cold weather. Transfer likewise can involve changes of structure, e.g. from an English infinitival construction He likes to swim to a German adverbial construction Er schwimmt gern. Generation is often incorporated in transfer operations, but when a separate component it might include operations to distinguish between English big, large and great (about which more later) and the production of correct morphology and word order in the target language (ses mains tremblantes, er darf nicht schwimmen).

All translation is a problem-solving activity, choices have to be made continually. The assumption in MT systems, whether fully or partially automatic, is that there are sufficiently large areas of natural language and of translation processes that can be formalised for treatment by computer programs. The basic premise is therefore that the differences between languages can to some extent be regularised. What this means at the practical level is that problems of selection can be resolved by clearly definable procedures.

The major task for MT researchers and developers is to determine what information is most effective in particular situations, what kind of information is appropriate in particular circumstances, and whether some data should be given greater weight than others.

In general, a MT system which cannot go beyond morphological analysis will produce little more than word for word translations. It may cope well with compounds and other fixed expressions, it may deal adequately with noun and verb forms in certain cases, but the omission of any treatment of word order will give poor results.

http://sirio.deusto.es/abaitua/konzeptu/ta/hutchins91.htm

27.Which parts of Linguistics are more relevant for MT?

It is a truism to say that one of the most straightforward operations of any MT system should be the identification and generation of morphological variants of nouns and verbs. There are basically two types of morphology in question: inflectional morphology, as illustrated by the familiar verb and noun paradigms (French marcher, marche, marchons, marchait, est marché, etc.), and derivational morphology, which is concerned with the formation of nouns from verb bases, verbs from noun forms, adjectives from nouns, and so forth, e.g. nation, nationalism, nationalise, nationalisation, and equivalents in other languages.

http://sirio.deusto.es/abaitua/konzeptu/ta/hutchins91.htm

28.How many different types of ambiguity are there?

Syntactic analysis is based largely on the identification of grammatical categories: nouns, verbs, adjectives. For English, the major problem is the categorial ambiguity of so many words, as already illustrated with the word light. In essence, the solution is to look for words which are unambiguous as to category and to test all possible syntactic structures.

There are two types of ambiguity, such as: lexical ambiguity:When a word has more than one meaning, it is said to be.

structural ambiguity:When a phrase or sentence can have more than one structure it is said to be.

http://sirio.deusto.es/abaitua/konzeptu/ta/hutchins91.htm

29.Illustrate your discussion with:

Two examples of lexical ambiguity:

Prices rose quickly in the market Each of the words prices, rose, and market can be either nouns or verbs; however, quickly is unambiguously an adverb and the unambiguously a definite article, and these facts ensure the unambiguous analysis as a phrase structure , where prices is identified as a subject noun phrase, in the market as a prepositional phrase, and rose quickly as part of a verb phrase.

One example of structural ambiguity:

We can illustrate this with some examples. First, let us show how grammar rules, differently applied, can produce more than one syntactic analysis for a sentence. One way this can occur is where a word is assigned to more than one category in the grammar. For example, assume that the word cleaning is both an adjective and a verb in our grammar. This will allow us to assign two different analyses to the following sentence:

fluids can be dangerous.

One of these analyses will have cleaning as a verb, and one will have it as an adjective. In the former (less plausible) case the sense is `to clean a fluid may be dangerous', i.e. it is about an activity being dangerous. In the latter case the sense is that fluids used for cleaning can be dangerous. Choosing between these alternative syntactic analyses requires knowledge about meaning

It may be worth noting, in passing, that this ambiguity disappears when can is replaced by a verb which shows number agreement by having different forms for third person singular and plural. For example, the following are not ambiguous in this way:

has only the sense that the action is dangerous

has only the sense that the fluids are dangerous

We have seen that syntactic analysis is useful in ruling out some wrong analyses, and this is another such case, since, by checking for agreement of subject and object, it is possible to find the correct interpretations. A system which ignored such syntactic facts would have to consider all these examples ambiguous, and would have to find some other way of working out which sense was intended, running the risk of making the wrong choice. For a system with proper syntactic analysis, this problem would arise only in the case of verbs like can which do not show number agreement.

http://sirio.deusto.es/abaitua/konzeptu/ta/MT_book_1995/node53.html#SECTION00820000000000000000

Three lexical and structural mismatches:

English chooses different verbs for the action/event of putting on, and the action/state of wearing. Japanese does not make this distinction, but differentiates according to the object that is worn. In the case of English to Japanese, a fairly simple test on the semantics of the NPs that accompany a verb may be sufficient to decide on the right translation. Some of the colour examples are similar, but more generally, investigation of colour vocabulary indicates that languages actually carve up the spectrum in rather different ways, and that deciding on the best translation may require knowledge that goes well beyond what is in the text, and may even be undecidable. In this sense, the translation of colour terminology begins to resemble the translation of terms for cultural artifacts (e.g. words like English cottage, Russian dacha, French château, etc. for which no adequate translation exists, and for which the human translator must decide between straight borrowing, neologism, and providing an explanation). In this area, translation is a genuinely creative act, which is well beyond the capacity of current computers.

A particularly obvious example of this involves problems arising from what are sometimes called lexical holes --- that is, cases where one language has to use a phrase to express what another language expresses in a single word. Examples of this include the `hole' that exists in English with respect to French ignorer (`to not know', `to be ignorant of'), and se suicider (`to suicide', i.e. `to commit suicide', `to kill oneself'). The problems raised by such lexical holes have a certain similarity to those raised by idiom s: in both cases, one has phrases translating as single words. We will therefore postpone discussion of these until Section .

One kind of structural mismatch occurs where two languages use the same construction for different purposes, or use different constructions for what appears to be the same purpose.

http://sirio.deusto.es/abaitua/konzeptu/ta/MT_book_1995/node54.html#SECTION00830000000000000000

Three collocations:

Rather different from idioms are expressions like those in ( ), which are usually referred to as collocations . Here the meaning can be guessed from the meanings of the parts. What is not predictable is the particular words that are used.

This butter is rancid (*sour, *rotten, *stale).

This cream is sour (*rancid, *rotten, *stale).

They took (*made) a walk.

They made (*took) an attempt.

They had (*made, *took) a talk.

For example, the fact that we say rancid butter, but not * sour butter, and sour cream, but not * rancid cream does not seem to be completely predictable from the meaning of butter or cream, and the various adjectives. Similarly the choice of take as the verb for walk is not simply a matter of the meaning of walk (for example, one can either make or take a journey).

In what we have called linguistic knowledge (LK) systems, at least, collocations can potentially be treated differently from idioms. This is because for collocations one can often think of one part of the expression as being dependent on, and predictable from the other. For example, one may think that make, in make an attempt has little meaning of its own, and serves merely to `support' the noun (such verbs are often called light verbs, or support verbs). This suggests one can simply ignore the verb in translation, and have the generation or synthesis component supply the appropriate verb. For example, in Dutch , this would be doen, since the Dutch for make an attempt is een poging doen (`do an attempt').

http://sirio.deusto.es/abaitua/konzeptu/ta/MT_book_1995/node55.html#SECTION00840000000000000000

Two idiomatic expressions :

If Sam mends the bucket, her children will be rich.

If Sam kicks the bucket, her children will be rich.

The problem with idioms, in an MT context, is that it is not usually possible to translate them using the normal rules. There are exceptions, for example take the bull by the horns (meaning `face and tackle a difficulty without shirking') can be translated literally into French as prendre le taureau par les cornes, which has the same meaning. But, for the most part, the use of normal rules in order to translate idioms will result in nonsense. Instead, one has to treat idioms as single units in translation. In many cases, a natural translation for an idiom will be a single word --- for example, the French word mourir (`die') is a possible translation for kick the bucket.

http://sirio.deusto.es/abaitua/konzeptu/ta/MT_book_1995/node55.html#SECTION00840000000000000000

30.Which are the most usual interpretations of the term "machine translation" (MT)?

The term machine translation (MT) is normally taken in its restricted and precise meaning of fully automatic translation. However, in this chapter we consider the whole range of tools that may support translation and document production in general, which is especially important when considering the integration of other language processing techniques and resources with MT. We therefore define Machine Translation to include any computer-based process that transforms (or helps a user to transform) written text from one human language into another. We define Fully Automated Machine Translation (FAMT) to be MT performed without the intervention of a human being during the process. Human-Assisted Machine Translation (HAMT) is the style of translation in which a computer system does most of the translation, appealing in case of difficulty to a (mono- or bilingual) human for help. Machine-Aided Translation (MAT) is the style of translation in which a human does most of the work but uses one of more computer systems, mainly as resources such as dictionaries and spelling checkers, as assistants.

Traditionally, two very different classes of MT have been identified. Assimilation refers to the class of translation in which an individual or organization wants to gather material written by others in a variety of languages and convert them all into his or her own language. Dissemination refers to the class in which an individual or organization wants to broadcast his or her own material, written in one language, in a variety of language to the world. A third class of translation has also recently become evident. Communication refers to the class in which two or more individuals are in more or less immediate interaction, typically via email or otherwise online, with an MT system mediating between them. Each class of translation has very different features, is best supported by different underlying technology, and is to be evaluated according to somewhat different criteria.

http://sirio.deusto.es/abaitua/konzeptu/nlp/Mlim/mlim4.html

31.What do FAHQT and ALPAC mean in the evolution of MT?

There were of course dissenters from the dominant 'perfectionism'. Researchers at Georgetown University and IBM were working towards the first operational systems, and they accepted the long-term limitations of MT in the production of usable translations. More influential was the well-known dissent of Bar-Hillel. In 1960, he published a survey of MT research at the time which was highly critical of the theory-based projects, particularly those investigating interlingua approaches, and which included his demonstration of the non-feasibility of fully automatic high quality translation (FAHQT) in principle. Instead, Bar-Hillel advocated the development of systems specifically designed on the basis of what he called 'man-machine symbiosis', a view which he had first proposed nearly ten years before when MT was still in its infancy (Bar-Hillel 1951).

Nevertheless, the main thrust of research was based on the explicit or implicit assumption that the aim of MT must be fully automatic systems producing translations at least as good as those made by human translators. The current operational systems were regarded as temporary solutions to be superseded in the near future. There was virtually no serious consideration of how 'less than perfect' MT could be used effectively and economically in practice. Even more damaging was the almost total neglect of the expertise of professional translators, who naturally became anxious and antagonistic. They foresaw the loss of their jobs, since this is what many MT researchers themselves believed was inevitable.

In these circumstances it is not surprising that the Automatic Language Processing Advisory Committee (ALPAC) set up by the US sponsors of research found that MT had failed by its own criteria, since by the mid 1960s there were clearly no fully automatic systems capable of good quality translation and there was little prospect of such systems in the near future. MT research had not looked at the economic use of existing 'less than perfect' systems, and it had disregarded the needs of translators for computer-based aids.

While the ALPAC report brought to an end many MT projects, it did not banish the public perception of MT research as essentially the search for fully automatic solutions. The subsequent history of MT is in part the story of how these is this mistaken emphasis of the early years has had to be repaired and corrected. The neglect of the translation profession has been made good eventually by the provision of translation tools and translator workstations. MT research has turned increasingly to the development of realistic practical MT systems where the necessity for human involvement at different stages of the process is fully accepted as an integral component of their design architecture. And 'pure' MT research has by and large recognised its role within the broader contexts of commercial and industrial realities.

http://ourworld.compuserve.com/homepages/WJHutchins/MTS-95.htm

Initial attempts to achieve Fully Automatic High Quality Translation (FAHQT) were doomed to failure; it was thus an error of ALPAC and other commentaries of the period to judge progress on MT by such an impossible standard. In later generations of MT it came to be accepted that machine translations not meeting the goal of FAHQT can also be of value, and that the demands made of MT systems should take into account both what is possible with current technology and what is necessary for particular applications.

http://www.europarl.eu.int/stoa/publi/99-12-01/part2_en.htm#a

32.List some of the major methods, techniques and approaches

Tools for translators, practical machine translation and research methods for machine translation.

Before the nineties, three main approaches to Machine Translation were developed: the so-called direct, transfer and interlingua approaches. Direct and transfer-based systems must be implemented separately for each language pair in each direction, while the interlingua-based approach is oriented to translation between any two of a group of languages for which it has been implemented. The implications of this fundamental difference, as well as other features of each type of system, are discussed in this and the following sections. The more recent corpus-based approach is considered later in this section.

The direct approach, chronologically the first to appear, is technically also the least sophisticated, although this does not mean, within the limits of present translation technology, that it necessarily produces inferior results. In its purest form, a system of the direct type "translates as it goes along", on a word-by-word basis. Nonetheless, the systems of this type now in general use incorporate many sophisticated features that improve their efficiency, often matching systems of more recent design in the quality and robustness of the translations they produce. Well known and widely used systems are based on such an approach, but greatly enhanced, mainly by including information in the lexicon incorporating rules for disambiguation and bilingual syntax rules.

More recently developed approaches to MT divide the translation process into discrete stages, including an initial stage of analysis of the structure of a sentence in the source language, and a corresponding final stage of generation of a sentence from a structure in the target language. Neither analysis nor generation are translation as such. The analysis stage involves interpreting sentences in the source language, arriving at a structural representation which may incorporate morphological, syntactic and lexical coding, by applying information stored in the MT system as grammatical rules and dictionaries. The generation stage performs approximately the same functions in reverse, converting structural representations into sentences, again applying information embodied in rules and dictionaries.

None of the approaches described is trouble-free, and they all have trouble with much the same set of recalcitrant problems, in particular lexical and grammatical ambiguities in the source language.

http://www.europarl.eu.int/stoa/publi/99-12-01/part2_en.htm#a

33.Where was MT ten years ago?

Ten years ago, the typical users of machine translation were large organizations such as the European Commission, the US Government, the Pan American Health Organization, Xerox, Fujitsu, etc. Fewer small companies or freelance translators used MT, although translation tools such as online dictionaries were becoming more popular. However, ongoing commercial successes in Europe, Asia, and North America continued to illustrate that, despite imperfect levels of achievement, the levels of quality being produced by FAMT and HAMT systems did address some users’ real needs. Systems were being produced and sold by companies such as Fujitsu, NEC, Hitachi, and others in Japan, Siemens and others in Europe, and Systran, Globalink, and Logos in North America (not to mentioned the unprecedented growth of cheap, rather simple MT assistant tools such as PowerTranslator).

http://sirio.deusto.es/abaitua/konzeptu/nlp/Mlim/mlim4.html

34.New directions and foreseeable breakthroughs of MT in the sort term.

Several applications have proven to be able to work effectively using only subsets of the knowledge required for MT. It is possible now to evaluate different tasks, to measure the information involved in solving them, and to identify the most efficient techniques for a given task. Thus, we must face the decomposition of monolithic systems, and to start talking about hybridization, engineering, architectural changes, shared modules, etc. It is important when identifying tasks to evaluate linguistic information in terms of what is generalizable, and thus a good candidate for traditional parsing techniques (argument structure of a transitive verb in active voice?), and what is idiosyncratic (what about collocations?). Besides, one cannot discard the power of efficient techniques that yield better results than older approaches, as illustrated clearly by part of speech disambiguation, which has proved to be better solved using Hidden Markov Models than traditional parsers. On the other hand, it has been proven that good theoretically motivated and linguistically driven tagging label sets improve the accuracy of statistical systems. Hence we must be ready to separate the knowledge we want to represent from the techniques/formalisms that have to process it.

http://sirio.deusto.es/abaitua/konzeptu/nlp/Mlim/mlim4.html

35.Which are Internet's essential features?

Before the nineties, three main approaches to Machine Translation were developed: the so-called direct, transfer and interlingua approaches. Direct and transfer-based systems must be implemented separately for each language pair in each direction, while the interlingua-based approach is oriented to translation between any two of a group of languages for which it has been implemented. The implications of this fundamental difference, as well as other features of each type of system, are discussed in this and the following sections. The more recent corpus-based approach is considered later in this section

More recently developed approaches to MT divide the translation process into discrete stages, including an initial stage of analysis of the structure of a sentence in the source language, and a corresponding final stage of generation of a sentence from a structure in the target language. Neither analysis nor generation are translation as such. The analysis stage involves interpreting sentences in the source language, arriving at a structural representation which may incorporate morphological, syntactic and lexical coding, by applying information stored in the MT system as grammatical rules and dictionaries. The generation stage performs approximately the same functions in reverse, converting structural representations into sentences, again applying information embodied in rules and dictionaries.

The transfer approach, which characterizes the more sophisticated MT systems now in use, may be seen as a compromise between the direct and interlingua approaches, attempting to avoid the most extreme pitfalls of each. Although no attempt is made to arrive at a completely language-neutral interlingua representation, the system nevertheless performs an analysis of input sentences, and the sentences it outputs are obtained by generation. Analysis and generation are however shallower than in the interlingua approach, and in between analysis and generation, there is a transfer component, which converts structures in one language into structures in the other and carries out lexical substitution. The object of analysis here is to represent sentences in a way that will facilitate and anticipate the subsequent transfer to structures corresponding to the target language sentences.

http://www.europarl.eu.int/stoa/publi/99-12-01/part2_en.htm

36.What is the role of minority languages on the Internet (Catalan, Basque...)?

Every language stands on the Internet within a planetary space and face to face with all the other languages there present. Minority languages which have survived as enclaves within nation-states now have to perceive themselves, like all other languages, as standing at a cultural cross-roads, open to multilateral relationships and exchanges.

Some of the minority languages of the EU exist only in minority situations, whether minoritarian in one member-state only, as with Sorbian or Welsh, or minoritarian in two or more member-states, as in the cases of Catalan and Basque. But there are also transfrontier minority languages, where although the language is minoritarian on one side of the border, it also belongs to a large and sometimes powerful language-group possessing its own nation-state on the other side of the border (or further afield), as in the case of the German minorities in Belgium, Denmark or Italy.

we shall have in mind those minority languages which exist only in minority situations, or very small state languages, or languages which fall into each of those categories on two sides of a border. We do not have to worry about the availability of word-processors and Internet browsers, or the creation of linguistic corpora for German-speaking minorities outside Germany. These exist within the language. But when we come to consider uses of the Internet for communication within and between minority language-groups, German-speaking minorities will certainly find themselves in the same set of regional and minority languages as Frisian or Scottish Gaelic.

Once the hardware and communications infrastructure is in place, the Internet in its present form has many advantages for minority language communities as indeed for all small communities.

The uses to which minority language groups put the Internet may seem at first to be the same as many we find in majority languages, but the significance is often different. Any presentation of the minority language and culture to a world-wide audience by definition breaks new ground since minority language groups, whatever access they may have had to broadcasting within the nation-state, have scarcely ever had the political or economic strength to project themselves outside the state in which they live, or indeed, in many cases, to their fellow citizens in other parts of the same state.

Another ambitious and comprehensive electronic newspaper is the Catalan Vilaweb , founded in 1995 by Vicent Partal and Assumpció Maresme, both experienced journalists. It is an electronic newspaper with a network of local editions which appear in towns and villages throughout the Catalan lands but also in diaspora areas such as Boston and New York, creating a kind of "virtual nation". The site also incorporates a directory of electronic resources in the Catalan language and reaches 90,000 different readers each month. This critical mass of users attracts some international web advertising, and local editions collect their own local advertisement. Indeed the organizational and financial arrangements of Vilaweb are every bit as interesting as the technical ones and could be of interest in other minority languages. A similar network exists in Galicia .

There are many courses teaching minority languages on the Internet. The most ambitious is likely to be HABENET, a three-year project for teaching Basque on the Internet and costing some 1.8m euros. Internet courses in minority languages have new possibilities but also face new challenges. Most face-to-face courses and course materials for learning minority languages assume a knowledge of the local majority language and this is undoubtedly where the main demand will be, on and off the Internet. But it seems to us that there would also be room to develop a multi-media language-learning package that was language-independent or language-adaptable so far as the language of instruction went. Such a course would make each language approachable from any other language at least at an elementary level.

Because of the incorporation of complex components into new versions of the programmes, updates to the localization become more expensive rather than cheaper as time goes on. Moreover, by the time a programme has been localized in small languages such as Basque - which are allocated low priority within Microsoft - new versions of the original are already becoming available in English and some other languages which offer a large market. Finally, what might be thought a major advantage of cooperation with an international company, namely access to its marketing skills and distribution network, does not apply. The Basque version was not important enough to Microsoft for them to be interested in promoting it themselves.

The Basque Government has now looked at information technology needs for the next ten years. Localization is only one kind of action contemplated, and on the whole the assessment of costs and benefits seems to favour other priorities: the development of spelling and grammar checkers, of OCR tools specific to Basque voice recognition software, also support for making Basque dictionaries and reference works available for on-line public use. A five year plan starting this year (2000) is likely to support local companies working in some of these fields. There is also an interest in developing tools for the automatic translation of web-pages.

The Catalan Autonomous Government too entered into an agreement with Microsoft and has appointed a committee of experts to ensure that a strategy is in place so that electronic resources are created in the Catalan language. But the experience from Catalunya we want to look at here is entirely within the non-commercial and voluntary sector.

Marketing is a problem, but, as we have seen, the same was the case with Microsoft software localized into Basque and Catalan. However, given that governmental or voluntary organizations have to do the marketing in each case, there must be some advantage in marketing a free product. The Basque Microsoft programmes, despite the heavy element of subsidy, have had to be purchased by individuals and institutions, including the Basque Government itself.

http://sirio.deusto.es/abaitua/konzeptu/ta/part1_en.htm

37.In what ways can Machine Translation be applied on the Internet?

The Internet today is less a homogeneous environment than a macro-environment: a range of only loosely related ways of using a global electronic network for a variety of purposes. There is therefore no reason to expect diverse applications of the Internet to share a single set of conditions relevant to the applicability of Machine Translation (MT), and so to be susceptible to identical solutions. Furthermore, as the Internet evolves, new specialized uses will emerge which may determine the potential roles of MT in the future. Thus, any realistic assessment of the contribution of MT to the Internet must be a complex one.

http://www.europarl.eu.int/stoa/publi/99-12-01/part2_en.htm#a

CONCLUSION

I think that New Technologies are essential for our present daily work and even more and more we will need them for the future. This essay deals with some quite inportant information taken from some internet pages. The method that iI have followed to do this report is of the kind of Question- Answer, that is, a questionaire with with some of the most important topics about Human Languages and Technologies. Some of the themes that I have developed are:

Language Technologies and the Information Society, Language Technologies and Resources, Multilinguality, Translation Technology, Machine Translation and its history, methods, approaches, problems, etc..., multilingual resources, and Corpus Linguistics. These are themes of a great importance for New Technologies.

While I have been developing this information, I have realized that these topics are are quite important, both for a student and for any kind of professional or normal people that are interesting in Human Languages and New Technologies. If a person wants to know more about these themes or just work with it, the information that I have developed I think that is very interesting and in a way necesary to understand and to inform yourself about New Technologies.

In my opinion one of the most developed theme is the one aabout Machine Translation. I think that this is a very interesting theme to know about and to work with. But altough I like this theme, I also could find a bad point in it. Although New Technologies help very much to the translation of texts, novels, poems... the human capacity is better gifted to do this kind of work, because we can see things and study thing with more than one perspective, and this is necesary to do a good job. As far as I am concerned, this is one of the inapropriateness that the Machine Translation has.

About the other themes, I also think that are quite interesting but the bad point with this kind of information I think that is tha in a way they have in some ocassions very technical words and this can be insome moments boring for a person that is starting or even working with this field.

REFERENCES

1. http://www.europarl.eu.int/stoa/publi/99-12-01/part2_en.htm#a

2.http://sirio.deusto.es/abaitua/konzeptu/ta/part1_en.htm

3.http://sirio.deusto.es/abaitua/konzeptu/nlp/Mlim/mlim4.html

4.http://ourworld.compuserve.com/homepages/WJHutchins/MTS-95.htm

5.http://sirio.deusto.es/abaitua/konzeptu/ta/MT_book_1995/node55.html#SECTION00840000000000000000

6.http://sirio.deusto.es/abaitua/konzeptu/ta/MT_book_1995/node54.html#SECTION00830000000000000000

7.http://sirio.deusto.es/abaitua/konzeptu/ta/hutchins91.htm

8.http://sirio.deusto.es/abaitua/konzeptu/ta/MT_book_1995/node53.html#SECTION00820000000000000000

9.http://sirio.deusto.es/abaitua/konzeptu/ta/vic.htm

10.http://www.hltcentral.org/usr_docs/project-source/en/broch/harness.html#nlu