Feb. 17-21

What is the "Information Society"?

Information Society Information Society is a term for a society in which the creation, distribution, and manipulation of information has become the most significant economic and cultural activity. An Information Society may be contrasted with societies in which the economic underpinning is primarily Industrial or Agrarian. The machine tools of the Information Society are computers and telecommunications, rather than lathes or ploughs.

http://whatis.techtarget.com/definition/0,,sid9_gci213588,00.html

 

What is the role of HLTCentral.org?

HLTCentral - Gateway to Speech & Language Technology Opportunities on the Web HLTCentral web site was established as an online information resource of human language technologies and related topics of interest to the HLT community at large. It covers news, R&D, technological and business developments in the field of speech, language, multilinguality, automatic translation, localisation and related areas. Its coverage of HLT news and developments is world-wide - with a unique European perspective.

http://www.hltcentral.org/page-615.0.shtml

 

Why language technologies are so important for the Information Society?

Language Engineering and the Information Society -------------------------------------------------------------------------------- The Information Age The development and convergence of computer and telecommunication technologies has led to a revolution in the way that we work, communicate with each other, buy goods and use services, and even the way we entertain and educate ourselves. One of the results of this revolution is that large volumes of information will increasingly be held in a form which is more natural for human users than the strictly formatted, structured data typical of computer systems of the past. Information presented in visual images, as sound, and in natural language, either as text or speech, will become the norm. We all deal with computer systems and services, either directly or indirectly, every day of our lives. This is the information age and we are a society in which information is vital to economic, social, and political success as well as to our quality of life. The changes of the last two decades may have seemed revolutionary but, in reality, we are only on the threshold of this new age. There are still many new ways in which the application of telematics and the use of language technology will benefit our way of life, from interactive entertainment to lifelong learning. Although these changes will bring great benefits, it is important that we anticipate difficulties which may arise, and develop ways to overcome them. Examples of such problems are: access to much of the information may be available only to the computer literate and those who understand English; a surfeit of information from which it is impossible to identify and select what is really wanted. Language Engineering can solve these problems. Information universally available The language technologies will make an indispensable contribution to the success of this information revolution. The availability and usability of new telematics services will depend on developments in language engineering. Speech recognition will become a standard computer function providing us with the facility to talk to a range of devices, from our cars to our home computers, and to do so in our native language. In turn, these devices will present us with information, at least in part, by generating speech. Multi-lingual services will also be developed in many areas. In time, material provided by information services will be generated automatically in different languages. This will increase the availability of information to the general public throughout Europe. Initially, multi-lingual services will become available, based on basic data, such as weather forecasts and details of job vacancies, from which text can be generated in any language. Eventually, however, we can expect to see automated translation as an everyday part of information services so that we can both request and receive all sorts of information in our own language. Home and Abroad Language Engineering will also help in the way that we deal with associates abroad. Although the development of electronic commerce depends very much on the adoption of interchange standards for communications and business transactions, the use of natural language will continue, precisely because it is natural. However, systems to generate business letters and other forms of communication in foreign languages will ease and greatly enhance communication. Automated translation combined with the management of documentation, including technical manuals and user handbooks, will help to improve the quality of service in a global marketplace. Export business will be handled cost effectively with the same high level of customer care that is provided in the home market. How can we cope with so much information ? One of the fundamental components of Language Engineering is the understanding of language, by the computer. This is the basis of speech operated control systems and of translation, for example. It is also the way in which we can prevent ourselves from being overwhelmed with information, unable to collate, analyse, and select what we need. However, if information services are capable of understanding our requests, and can scan and select from the information base with real understanding, not only will the problem of information overload be solved but also no significant information will be missed. Language Engineering will deliver the right information at the right time.

http://sirio.deusto.es/abaitua/konzeptu/nlp/echo/infoage.html

 

 

Feb. 24-28

Why "knowledge" is of more value than "information"?

What is the Difference Between Information Management and Knowledge Management? Information management is the harnessing of the information resources and information capabilities of the organization in order to add and create value both for itself and for its clients or customers. Knowledge management is a framework for designing an organizationís goals, structures, and processes so that the organization can use what it knows to learn and to create value for its customers and community. A KM framework involves designing and working with the following elements: Categories of organizational knowledge (tacit knowledge, explicit knowledge, cultural knowledge) Knowledge processes (knowledge creation, knowledge sharing, knowledge utilization) Organizational enablers (vision and strategy; roles and skills; policies and processes; tools and platforms) IM provides the foundation for KM, but the two are focused differently. IM is concerned with processing and adding value to information, and the basic issues here include access, control, coordination, timeliness, accuracy, and usability. KM is concerned with using the knowledge to take action, and the basic issues here include codification, diffusion, practice, learning, innovation, and community building.

http://choo.fis.utoronto.ca/IMfaq/

 

Does the possesion of big quantities of data imply that we are well informed?

Like most bureaucrats, business executives, teachers, doctors, lawyers and other professionals, Guilford increasingly feels he is suffering from information overload. The symtoms of this epidemic ailment can include tension, occasional irritability and frequent feelings of helpessnes -all signs that the victim is under considerable stress. David Lewis coined the term "information fatigue syndrome" for what he expects will soon be a recognized medical condition. "Having too much information can be as dangerous as having too little. Among other problems, it can lead to a paralysis of analysis, making it far harder to find the right solutions or make the best decisions." "Information is supposed to speed the flow of commerce, but it often just clogs the pipes." "Information stress sets in when people in possession of a huge volume of data have to work against the clock, when major consequences -lives saved or lost, money made or lost- will flow from their decision, or when they feel at a disadvantage because even with their wealth of material they still think they do not have all the facts they need. So challenged, the human body reacts with a primitive survival response. This evolved millions of years ago to safeguard us when confronted by physical danger. In situations where the only options are to kill a adversary or flee from it, the 'fight-flight' response can make the difference between life and death." D. Lewis Strategies for dealing with information: "Just because you can't cope with a lot of information doesn't make you a bad manager. Organisations are getting by with fewer people doing more, and aren't necessarily giving people time to devise strategies for dealing with information." R. Sachs

http://sirio.deusto.es/abaitua/konzeptu/fatiga.htm#knowledge

 

How many words of technical information are recorded every day?

Every day, approximately 20 million words of technical information are recorded. A reader capable of reading 1000 words per minute would require 1.5 months, reading eight hours every day, to get through one day's output, and at the end of that period he would have fallen 5.5 years behind in his reading

http://sirio.deusto.es/abaitua/konzeptu/fatiga.htm

 

What is the most convenient way of representing information? Why?

Parsing systems that use unification generally fall into two broad (and rather crude) categories. Computational grammars tend to run efficiently but it is difficult to express linguistic information easily. Linguistic grammars tend to run slowly but are efficient at expressing linguistic information. Previous implementations of LFG have usually been interpreters that are typically inefficient, even when implemented in the form of a chart parser which is recognised as having good efficiency. The present work starts with LFG grammars and lexicons, written in a style that is very recognisably LFG. Grammars are treated as rules and lexicons as facts which are compiled into a Prolog form. In particular, this involves using Prolog's term unification rather than the more usual linguistic unification. Previous systems have implemented linguistic unification on top of Prolog's term unification with possible speed disadvantages and difficulties in ensuring the correctness of the new unification algorithm. Thus the work has the dual aims of allowing linguistic information to be encoded in a linguistically sophisticated way, while preserving the speed and accuracy of computational grammars

http://www.cs.bham.ac.uk/research/booklet_97/arr/projects/node8.html

 

How can computer science and language technologies help manage information?

New opportunities are becoming available to change the way we do many things, to make them easier and more effective by exploiting our developing knowledge of language.

When, in addition to accepting typed input, a machine can recognise written natural language and speech, in a variety of languages, we shall all have easier access to the benefits of a wide range of information and communications services, as well as the facility to carry out business transactions remotely, over the telephone or other telematics services.

When a machine understands human language, translates between different languages, and generates speech as well as printed output, we shall have available an enormously powerful tool to help us in many areas of our lives.

When a machine can help us quickly to understand each other better, this will enable us to co-operate and collaborate more effectively both in business and in government.

The success of Language Engineering will be the achievement of all these possibilities. Already some of these things can be done, although they need to be developed further. The pace of advance is accelerating and we shall see many achievements over the next few years

http://www.hltcentral.org/usr_docs/project-source/en/broch/harness.html#lt

 

why language can sometimes be seen as a barrier to communication? how can this change?

The use of language is currently restricted. In the main, it is only used in direct communications between human beings and not in our interactions with the systems, services and appliances which we use every day of our lives. Even between humans, understanding is usually limited to those groups who share a common language. In this respect language can sometimes be seen as much a barrier to communication as an aid.

http://www.hltcentral.org/usr_docs/project-source/en/broch/harness.html#lt

 

May. 3-7

In what ways does Language Engineering improves the use of language?

Language Engineering is a technology which uses our knowledge of language to enhance our application of computer systems: improving the way we interface with them assimilating, analysing, selecting, using, and presenting information more effectively providing human language generation and translation facilities. New opportunities are becoming available to change the way we do many things, to make them easier and more effective by exploiting our developing knowledge of language. When, in addition to accepting typed input, a machine can recognise written natural language and speech, in a variety of languages, we shall all have easier access to the benefits of a wide range of information and communications services, as well as the facility to carry out business transactions remotely, over the telephone or other telematics services. When a machine understands human language, translates between different languages, and generates speech as well as printed output, we shall have available an enormously powerful tool to help us in many areas of our lives. When a machine can help us quickly to understand each other better, this will enable us to co-operate and collaborate more effectively both in business and in government. The success of Language Engineering will be the achievement of all these possibilities. Already some of these things can be done, although they need to be developed further. The pace of advance is accelerating and we shall see many achievements over the next few years

http://sirio.deusto.es/abaitua/konzeptu/nlp/langeng.htm

 

Language Technology, Language Engineering and Computational Linguistics. Similarities and differences.

Language Technology, Language Engineering and Computational Linguistics. Similarities and differences. Computational linguistics (CL) is a discipline between linguistics and computer science which is concerned with the computational aspects of the human language faculty. It belongs to the cognitive sciences and overlaps with the field of artificial intelligence (AI), a branch of computer science aiming at computational models of human cognition. Computational linguistics has applied and theoretical components. Theoretical CL takes up issues in theoretical linguistics and cognitive science. It deals with formal theories about the linguistic knowledge that a human needs for generating and understanding language. Today these theories have reached a degree of complexity that can only be managed by employing computers. Computational linguists develop formal models simulating aspects of the human language faculty and implement them as computer programmes. These programmes constitute the basis for the evaluation and further development of the theories. In addition to linguistic theories, findings from cognitive psychology play a major role in simulating linguistic competence. Within psychology, it is mainly the area of psycholinguistics that examines the cognitive processes constituting human language use. The relevance of computational modelling for psycholinguistic research is reflected in the emergence of a new subdiscipline: computational psycholinguistics. Applied CL focuses on the practical outcome of modelling human language use. The methods, techniques, tools and applications in this area are often subsumed under the term language engineering or (human) language technology. Although existing CL systems are far from achieving human ability, they have numerous possible applications. The goal is to create software products that have some knowledge of human language. Such products are going to change our lives. They are urgently needed for improving human-machine interaction since the main obstacle in the interaction between human and computer is a communication problem. Today's computers do not understand our language but computer languages are difficult to learn and do not correspond to the structure of human thought. Even if the language the machine understands and its domain of discourse are very restricted, the use of human language can increase the acceptance of software and the productivity of its users. http://sirio.deusto.es/abaitua/konzeptu/nlp/HU_what_cl.htm Language Engineering is the application of knowledge of language to the development of computer systems which can recognise, understand, interpret, and generate human language in all its forms. In practice, Language Engineering comprises a set of techniques and language resources. The former are implemented in computer software and the latter are a repository of knowledge which can be accessed by computer software. http://www.hltcentral.org/usr_docs/project-source/en/broch/harness.html#wile Language Technologies are information technologies that are specialised for dealing with the most complex information medium in our world: human language. Therefore this technologies are also often subsumed under the term Human Language Technology. Human language occurs in spoken and written form. Whereas the speech is the oldest and most natural mode of language communication, complex information and most human knowledge is maintained and transmitted in written texts. Speech and texts technologies process or produce language in this two models of realization. But language has also aspects that shared between speech and text such as dictionaries, most of grammar and the meaning of the sentence. Thus large parts of language technology cannot be subsued under speech and texts technologies. Among those are technologies that link language to knowledge. We do not know how language, knowledge and thought are represented in the human brain. Nevertheless, language technology have to create formal representation systems that link language to concepts and tasks in the real world. This provides the interface to the fast growing area of knowledge technologies.

http://sirio.deusto.es/abaitua/konzeptu/nlp/HU_whatLT.pdf

 

Which are the main techniques used in Language Engineering?

Techniques There are many techniques used in Language Engineering and some of these are described below: Speaker Identification and Verification A human voice is as unique to an individual as a fingerprint. This makes it possible to identify a speaker and to use this identification as the basis for verifying that the individual is entitled to access a service or a resource. The types of problems which have to be overcome are, for example, recognising that the speech is not recorded, selecting the voice through noise (either in the environment or the transfer medium), and identifying reliably despite temporary changes (such as caused by illness). Speech Recognition The sound of speech is received by a computer in analogue wave forms which are analysed to identify the units of sound (called phonemes) which make up words. Statistical models of phonemes and words are used to recognise discrete or continuous speech input. The production of quality statistical models requires extensive training samples (corpora) and vast quantities of speech have been collected, and continue to be collected, for this purpose. There are a number of significant problems to be overcome if speech is to become a commonly used medium for dealing with a computer. The first of these is the ability to recognise continuous speech rather than speech which is deliberately delivered by the speaker as a series of discrete words separated by a pause. The next is to recognise any speaker, avoiding the need to train the system to recognise the speech of a particular individual. There is also the serious problem of the noise which can interfere with recognition, either from the environment in which the speaker uses the system or through noise introduced by the transmission medium, the telephone line, for example. Noise reduction, signal enhancement and key word spotting can be used to allow accurate and robust recognition in noisy environments or over telecommunication networks. Finally, there is the problem of dealing with accents, dialects, and language spoken, as it often is, ungrammatically. Character and Document Image Recognition Recognition of written or printed language requires that a symbolic representation of the language is derived from its spatial form of graphical marks. For most languages this means recognising and transforming characters. There are two cases of character recognition: recognition of printed images, referred to as Optical Character Recognition (OCR) recognising handwriting, usually known as Intelligent Character Recognition (ICR) OCR from a single printed font family can achieve a very high degree of accuracy. Problems arise when the font is unknown or very decorative, or when the quality of the print is poor. In these difficult cases, and in the case of handwriting, good results can only be achieved by using ICR. This involves word recognition techniques which use language models, such as lexicons or statistical information about word sequences. Document image analysis is closely associated with character recognition but involves the analysis of the document to determine firstly its make-up in terms of graphics, photographs, separating lines and text, and then the structure of the text to identify headings, sub-headings, captions etc. in order to be able to process the text effectively. Natural Language Understanding The understanding of language is obviously fundamental to many applications. However, perfect understanding is not always a requirement. In fact, gaining a partial understanding is often a very useful preliminary step in the process because it makes it possible to be intelligently selective about taking the depth of understanding to further levels. Shallow or partial analysis of texts is used to obtain a robust initial classification of unrestricted texts efficiently. This initial analysis can then be used, for example, to focus on 'interesting' parts of a text for a deeper semantic analysis which determines the content of the text within a limited domain. It can also be used, in conjunction with statistical and linguistic knowledge, to identify linguistic features of unknown words automatically, which can then be added to the system's knowledge. Semantic models are used to represent the meaning of language in terms of concepts and relationships between them. A semantic model can be used, for example, to map an information request to an underlying meaning which is independent of the actual terminology or language in which the query was expressed. This supports multi-lingual access to information without a need to be familiar with the actual terminology or structuring used to index the information. Combinations of analysis and generation with a semantic model allow texts to be translated. At the current stage of development, applications where this can be achieved need be limited in vocabulary and concepts so that adequate Language Engineering resources can be applied. Templates for document structure, as well as common phrases with variable parts, can be used to aid generation of a high quality text. Natural Language Generation A semantic representation of a text can be used as the basis for generating language. An interpretation of basic data or the underlying meaning of a sentence or phrase can be mapped into a surface string in a selected fashion; either in a chosen language or according to stylistic specifications by a text planning system. Speech Generation Speech is generated from filled templates, by playing 'canned' recordings or concatenating units of speech (phonemes, words) together. Speech generated has to account for aspects such as intensity, duration and stress in order to produce a continuous and natural response. Dialogue can be established by combining speech recognition with simple generation, either from concatenation of stored human speech components or synthesising speech using rules. Providing a library of speech recognisers and generators, together with a graphical tool for structuring their application, allows someone who is neither a speech expert nor a computer programmer to design a structured dialogue which can be used, for example, in automated handling of telephone calls.

http://www.hltcentral.org/usr_docs/project-source/en/broch/harness.html#t

Which language resources are essential components of Language Engineering?

Language resources are essential components of Language Engineering. They are one of the main ways of representing the knowledge of language, which is used for the analytical work leading to recognition and understanding. The work of producing and maintaining language resources is a huge task. Resources are produced, according to standard formats and protocols to enable access, in many EU languages, by research laboratories and public institutions. Many of these resources are being made available through the European Language Resources Association (ELRA).

http://www.hltcentral.org/usr_docs/project-source/en/broch/harness.html#t

Check for the following terms: natural language processing

natural language processing is a term in use since the 1980s to define a class of software systems which handle text intelligently translator's workbench is a software system providing a working environment for a human translator, which offers a range of aids such as on-line dictionaries, thesauri, translation memories, etc shallow parser is software which parses language to a point where a rudimentary level of understanding can be realised; this is often used in order to identify passages of text which can then be analysed in further depth to fulfil the particular objective formalism is a means to represent the rules used in the establishment of a model of linguistic knowledge Speech Recognition : The sound of speech is received by a computer in analogue wave forms which are analysed to identify the units of sound (called phonemes) which make up words. Statistical models of phonemes and words are used to recognise discrete or continuous speech input. The production of quality statistical models requires extensive training samples (corpora) and vast quantities of speech have been collected, and continue to be collected, for this purpose. There are a number of significant problems to be overcome if speech is to become a commonly used medium for dealing with a computer. The first of these is the ability to recognise continuous speech rather than speech which is deliberately delivered by the speaker as a series of discrete words separated by a pause. The next is to recognise any speaker, avoiding the need to train the system to recognise the speech of a particular individual. There is also the serious problem of the noise which can interfere with recognition, either from the environment in which the speaker uses the system or through noise introduced by the transmission medium, the telephone line, for example. Noise reduction, signal enhancement and key word spotting can be used to allow accurate and robust recognition in noisy environments or over telecommunication networks. Finally, there is the problem of dealing with accents, dialects, and language spoken, as it often is, ungrammatically. text alignment is the process of aligning different language versions of a text in order to be able to identify equivalent terms, phrases, or expressions authoring tools are facilities provided in conjunction with word processing to aid the author of documents, typically including an on-line dictionary and thesaurus, spell-, grammar-, and style-checking, and facilities for structuring, integrating and linking documents controlled language is the language which has been designed to restrict the number of words and the structure of (also artificial language) language used, in order to make language processing easier; typical users of controlled language work in an area where precision of language and speed of response is critical, such as the police and emergency services, aircraft pilots, air traffic control, etc. domain is usually applied to the area of application of the language enabled software e.g. banking, insurance, travel, etc.; the significance in Language Engineering is that the vocabulary of an application is restricted so the language resource requirements are effectively limited by limiting the domain of application

http://www.hltcentral.org/usr_docs/project-source/en/broch/harness.html#t

May. 10-14

In the translation curricula, which factors make technology more indispensable?

When discussing the relevance of technological training in the translation curricula, it is important to clarify the factors that make technology more indispensable and show how the training should be tuned accordingly. The relevance of technology will depend on the medium that contains the text to be translated. This particular aspect is becoming increasingly evident with the rise of the localization industry, which deals solely with information in digital form. There may be no other imaginable means for approaching the translation of such things as on-line manuals in software packages or CD-ROMs with technical documentation than computational ones.

http://sirio.deusto.es/abaitua/konzeptu/ta/vic.htm

Do professional interpreters and literary translators need translation technology? Which are the tools they need for their job?

With the exception of a few eccentrics or maniacs, it will be rare in the future to see good professional interpreters and literary translators not using more or less sophisticated and specialised tools for their jobs., comparable to the familiarisation with type recorders or typewriters in the past. In any case, this maybe something best left to the professional to decide, and may not be indispensable. It is clear that word processors, on-line dictionaries and all sorts of background documentation, such as concordances or collated texts, besides e-mail or other ways of network interaction with colleagues in the world may substantially help the literary translator´s work

http://sirio.deusto.es/abaitua/konzeptu/ta/vic.htm

In what ways is documentation becoming electronic? How does this affect the industry?

Information of many types is rapidly changing format and going digital. Electronic documentation is the adequate realm for the incorporation of translation technology. This is something that young students of translation must learn. As the conception and design of technical documentation becomes progressively influenced by the electronic medium, it is integrating more and more with the whole concept of a software product. The strategies and means for translating both software packages and electronic documents are becoming very similar and both are now, as we will see, the goal of the localisation industry

http://sirio.deusto.es/abaitua/konzeptu/ta/vic.htm

What is the focus of the localisation industry? Do you believe there might be a job for you in that industry sector?

The main focus of localisation industry is to help software publishers, hardware manufacturers and telecommunications companies with versions of their software, documentation, marketing, and Web-based information in different languages for simultaneous world-wide release. Yes, I believe it, because it is very important for this sector the capacity of translation.

http://sirio.deusto.es/abaitua/konzeptu/ta/vic.htm

Define internationalisation, globalisation and localisation. How do they affect the design of software products?

Globalisation: The adaptation of marketing strategies to regional requirements of all kinds (e.g., cultural, legal, and linguistic). Internationalisation: The engineering of a product (usually software) to enable efficient adaptation of the product to local requirements. Localisation: The adaptation of a product to a target language and culture (locale). The main goal of the LEIT initiative is to introduce localisation courseware into translation studies, with versions ready for the start of the 1999 academic year. However, this must be done with care. Bert Esselink (1998), from AlpNet, for example, argues against separating localisation from other disciplines and claims its basic principles should be covered in all areas of translation training. Furthermore, it would be useful to add the trainers not only need constant feedback and guidance from the commercial sector, they also need to maintain close contact with the software industry. So, perhaps, one of the best features of the LEIT initiative is its combination of partnership from the academic as well as from the industry world. LISA offers the first version of this courseware on its Web-site and users have the possibility to contact the LEIT group and collaborate through an on-line questionnaire

http://sirio.deusto.es/abaitua/konzeptu/ta/vic.htm

Are translation and localisation the same thing? Explain the differences.

In the localisation industry, the utilisation of technology is congenital, and developing adequate tools has immediate economic benefits. The above lines depict a view of a translation environment which is closer to more traditional needs of the translator than to current requirements of the industry. Many aspects of software localisation have not been considered, particularly the concepts of multilingual management and document-life monitoring. Corporations are now realising that documentation is an integral part of the production line where the distinction between product, marketing and technical material is becoming more and more blurred. Product documentation is gaining importance in the whole process of product development with direct impact on time-to-market. Software engineering techniques that apply in other phases of software development are beginning to apply to document production as well. The appraisal of national and international standards of various types is also significant: text and character coding standards (e.g. SGML/XML and Unicode), as well as translation quality control standards (e.g. DIN 2345 in Germany, or UNI 10574 in Italy). In response to these new challenges, localisation packages are now being designed to assist users throughout the whole life cycle of a multilingual document. These take them through job set-up, authoring, translation preparation, translation, validation, and publishing, besides ensuring consistency and quality in source and target language variants of the documentation. New systems help developers monitor different versions, variants and languages of product documentation, and author customer specific solutions. An average localisation package today will normally consist of an industry standard SGML/XML editor (e.g. ArborText), a translation and terminology toolkit (Trados Translator's Workbench), and a publishing engine (e.g. Adobe's Frame+SGML). Unlike traditional translators, software localizers may be engaged in early stages of software development, as there are issues, such as platform portability, code exchange, format conversion, etc. which if not properly dealt with may hinder product internationalisation. Localizers are often involved in the selection and application of utilities that perform code scanning and checking, that automatically isolate and suggest solutions to National Language Support (NLS) issues, which save time during the internationalisation enabling process. There are run-time libraries that enable software developers and localizers to create single-source, multilingual, and portable cross-platform applications. Unicode support is also fundamental for software developers who work with multilingual texts, as it provides a consistent coding format for international character sets. In the words of Rose Lockwood (Language International 10.5), a consultant from Equip Consortium Ltd, "as traditional translation methods give way to language engineering and disciplined authoring, translation and document-management methods, the role of technically proficient linguists and authors will be increasingly important to global WWW. The challenge will be to employ the skills used in conventional technical publishing in the new environment of a digital economy."

http://sirio.deusto.es/abaitua/konzeptu/ta/vic.htm

What is a translation workstation? Compare it with a standard localisation tool.

Leaving behind the old conception of a monolithic compact translation engine, the industry is now moving in the direction of integrating systems: "In the future Trados will offer solutions that provide enterprise-wide applications for multilingual information creation and dissemination, integrating logistical and language-engineering applications into smooth workflow that spans the globe," says Trados manager Henri Broekmate. Logos, the veteran translation technology provider, has announced "an integrated technology-based translation package, which will combine term management, TM, MT and related tools to create a seamless full service localisation environment." Other software manufacturers also in the race are Corel, Star, IBM, and the small but belligerent Spanish company Atril. This approach for integrating different tools is largely the view advocated by many language-technology specialists. Below is a description of an ideal engine which captures the answers given by Muriel Vasconcellos (from the Pan American Health Organisation), Minako O'Hagan (author of The Coming Age of Teletranslations) and Eduard Hovy (President of the Association of Machine Translation in the Americas) to a recent survey (by Language International 10.6). The ideal workstation for the translator would combine the following features: Full integration in the translator's general working environment, which comprises the operating system, the document editor (hypertext authoring, desktop publisher or the standard word-processor), as well as the emailer or the Web browser. These would be complemented with a wide collection of linguistic tools: from spell, grammar and style checkers to on-line dictionaries, and glossaries, including terminology management, annotated corpora, concordances, collated texts, etc. The system should comprise all advances in machine translation (MT) and translation memory (TM) technologies, be able to perform batch extraction and reuse of validated translations, enable searches into TM databases by various keywords (such as phrases, authors, or issuing institutions). These TM databases could be distributed and accessible through Internet. There is a new standard for TM exchange (TMX) that would permit translators and companies to work remotely and share memories in real-time. Eduard Hovy underlines the need for a genre detector. "We need a genre topology, a tree of more or less related types of text and ways of recognizing and treating the different types computationally." He also sees the difficulty of constantly up-dating the dictionaries and suggests a "restless lexicon builder that crawls all over the Web every night, ceaselessly collecting words, names, and phrases, and putting them into the appropriate lexicons." Muriel Vasconcellos pictures her ideal design of the workstation in the following way: Good view of the source text extensive enough to offer the overall context, including the previous sentence and two or three sentences after the current one. Relevant on-line topical word lists, glossaries and thesaurus. These should be immediately accessible and, in the case of topical lists, there should be an optimal switch that shows, possibly in color, when there are subject-specific entries available. Three target-text windows. The first would be the main working area, and it would start by providing a sentence from the original document (or a machine pre-translation), which could be over-struck or quickly deleted to allow the translator to work from scratch. The original text or pre-translation could be switched off. Characters of any language and other symbols should be easy to produce. Drag-and-drop is essential and editing macros are extremely helpful when overstriking or translating from scratch. The second window would offer translation memory when it is available. The TM should be capable of fuzzy matching with a very large database, with the ability to include the organization's past texts if they are in some sort of electronic form. The third window would provide a raw machine translation which should be easy to paste into the target document. The grammar checker can be tailored so that it is not so sensitive. It would be ideal if one could write one's own grammar rules. The above lines depict a view of a translation environment which is closer to more traditional needs of the translator than to current requirements of the industry. Many aspects of software localization have not been considered, particularly the concepts of multilingual management and document-life monitoring. Corporations are now realizing that documentation is an integral part of the production line where the distinction between product, marketing and technical material is becoming more and more blurred. Product documentation is gaining importance in the whole process of product development with direct impact on time-to-market. Software engineering techniques that apply in other phases of software development are beginning to apply to document production as well. The appraisal of national and international standards of various types is also significant: text and character coding standards (e.g. SGML/XML and Unicode), as well as translation quality control standards (e.g. DIN 2345 in Germany, or UNI 10574 in Italy). In response to these new challenges, localization packages are now being designed to assist users throughout the whole life cycle of a multilingual document. These take them through job setup, authoring, translation preparation, translation, validation, and publishing, besides ensuring consistency and quality in source and target language variants of the documentation. New systems help developers monitor different versions, variants and languages of product documentation, and author customer specific solutions. An average localization package today will normally consist of an industry standard SGML/XML editor (e.g. ArborText), a translation and terminology toolkit (Trados Translator's Workbench), and a publishing engine (e.g. Adobe's Frame+SGML). Unlike traditional translators, software localizers may be engaged in early stages of software development, as there are issues, such as platform portability, code exchange, format conversion, etc. which if not properly dealt with may hinder product internationalization. Localizers are often involved in the selection and application of utilities that perform code scanning and checking, that automatically isolate and suggest solutions to National Language Support (NLS) issues, which save time during the internationalization enabling process. There are run-time libraries that enable software developers and localizers to create single-source, multilingual, and portable cross-platform applications. Unicode support is also fundamental for software developers who work with multilingual texts, as it provides a consistent coding format for international character sets. In the words of Rose Lockwood (Language International 10.5), a consultant from Equipe Consortium Ltd, "as traditional translation methods give way to language engineering and disciplined authoring, translation and document-management methods, the role of technically proficient linguists and authors will be increasingly important to global WWW. The challenge will be to employ the skills used in conventional technical publishing in the new environment of a digital economy."

http://sirio.deusto.es/abaitua/konzeptu/ta/vic.htm

Machine translation vs. human translation. Do you agree that translation excellence goes beyond technology? Why?

Having said all this, it is important to reassess the human factor. Like cooks, tailors or architects, professional translators need to become acquainted with technology, because good use of technology will make their jobs more competitive and satisfactory. But they should not dismiss craftsmanship. Technology enhances productivity, but translation excellence goes beyond technology. It is important to delimit the roles of humans and machines in translation. Martin Kay's (1987) words in this respect are most illustrative: A computer is a device that can be used to magnify human productivity. Properly used, it does not dehumanize by imposing its own Orwellian stamp on the products of human spirit and the dignity of human labor but, by taking over what is mechanical and routine, it frees human beings over what is mechanical and routine. Translation is a fine and exacting art, but there is much about it that is mechanical and routine, if this were given over to a machine, the productivity of the translator would not only be magnified but this work would become more rewarding, more exciting, more human. It has taken some 40 years for the specialists involved in the development of MT to realize that the limits to technology arise when going beyond the mechanical and routine aspects of language. From the outside, translation is often seen as a mere mechanical process, not any more complex than playing chess, for example. If computers have been programed with the capacity of beating a chess master champion such as Kasparov, why should they not be capable of performing translation of the highest quality? Few people are aware of the complexity of literary translation. Douglas Hofstadter (1998) depicts this well: A skilled literary translator makes a far larger number of changes, and far more significant changes, than any virtuoso performer of classical music would ever dare to make in playing notes in the score of, say, a Beethoven piano sonata. In literary translation, it's totally humdrum stuff for new ideas to be interpreted, old ideas to be deleted, structures to be inverted, twisted around, and on and on. Although it may not be perceived at first sight, the complexity of natural language is of an order of magnitude far superior to any purely mechanical process. To how many words should the vocabulary be limited to make the complexity of producing "free sonnets" (that is, any combination of 6 words in 14 verses) comparable to the number of possible chess games? It may be difficult to believe, but the vocabulary should be restricted to 100 words. That is, making free sonnets with 100 words offers as many different alternatives as there are ways of playing a chess game (roughly, 10120; see DELI's Web page for discussion). The number of possibilities would quickly come down if combinations were restricted so that they not only made sense but acquired some sort of poetic value. However, defining formally or mechanically the properties of "make sense" and "have poetic value" is not an easy task. Or at least, it is far more difficult than establishing winning heuristics for a color to succeed in a chess game. No wonder then that Douglas Hofstadter's MT experiment translating 16th century French Clément Marot's poemMa Mignonne into English using IBM's Candide system should have performed so badly (see Sgrung's interview in Language International 10.1) : Well, when you look at [IBM's Candide's] translation of Ma Mignonne, thinking of Ma Mignonne as prose, not as poetry, it's by far the worst. It's so terrible that it's not even laughable, it just stinks! It's pathetic! Obviously, Hofstadter's experiment has gone beyond the recommended mechanical and routine scope of language and is therefore an abuse of MT. Outside the limits of the mechanical and routine, MT is impracticable and human creativity becomes indispensable. Translators of the highest quality are only obtainable from first-class raw materials and constant and disciplined training. The potentially good translator must be a sensitive, wise, vigilant, talented, gifted, experienced, and knowledgeable person. An adequate use of mechanical means and resources can make a good human translator a much more productive one. Nevertheless, very much like dictionaries and other reference material, technology may be considered an excellent prothesis, but little more than that. As Martin Kay (1992) argues, there is an intrinsic and irreplaceable human aspect of translation: There is nothing that a person could know, or feel, or dream, that could not be crucial for getting a good translation of some text or other. To be a translator, therefore, one cannot just have some parts of humanity; one must be a complete human being. However, even for skilled human translators, translation is often difficult. One clear example is when linguistic form, as opposed to content, becomes an important part of a literary piece. Conveying the content, but missing the poetic aspects of the signifier may considerably hinder the quality of the translation. This is a challenge to any translator. Jaime de Ojeda's (1989) Spanish translation of Lewis Carroll's Alice in Wonderland illustrates this problem: Twinkle, twinkle, little bat how I wonder what you're at! Up above the world you fly like a tea-tray in the sky. Brilla, luce, ratita alada ¿en qué estás tan atareada? Por encima del universo vuelas como una bandeja de teteras. Manuel Breva (1996) analyzes the example and shows how Ojeda solves the "formal hurdles" of the original: The above lines are a parody of the famous poem "Twinkle, twinkle, little star" by Jane Taylor, which, in Carroll's version, turns into a sarcastic attack against Bartholomew Price, a professor of mathematics, nicknamed "The Bat". Jaime de Ojeda translates "bat" as "ratita alada" for rhythmical reasons. "Murciélago", the Spanish equivalent of "bat", would be hard to fit in this context for the same poetic reasons. With Ojeda's choice of words the Spanish version preserves the meaning and maintains the same rhyming pattern (AABB) as in the original English verse-lines. What would the output of any MT system be like if confronted with this fragment? Obviously, the result would be disastrous. Compared with the complexity of natural language, the figures that serve to quantify the "knowledge" of any MT program are absurd: 100,000 word bilingual vocabularies, 5,000 transfer rules.... Well developed systems such as Systran, or Logos hardly surpass these figures. How many more bilingual entries and transfer rules would be necessary to match Ojeda's competence? How long would it take to adequately train such a system? And even then, would it be capable of challenging Ojeda in the way the chess master Kasparov has been challenged? I have serious doubts about that being attainable at all. But there are other opinions, as is the case of the famous Artificial Intelligence master, Marvin Minsky. Minsky would argue that it is all a matter of time. He sees the human brain as an organic machine, and as such, its behavior, reactions and performance can be studied and reproduced. Other people believe there is an important aspect separating organic, living "machines" from synthetic machines. They would claim that creativity is in life, and that it is an exclusive faculty of living creatures to be creative. But from my point of view, machine translation never will be able to get the level of the human translation. The human being is not perfect but he has more imagination to adapt a text into another language, using different features such us feelings or his own knowledge, while a computer cannot use this kind of things to translate a text.

http://sirio.deusto.es/abaitua/konzeptu/ta/vic.htm

Which profiles should any person with a University degree in Translation be qualified for?

LISA Education Initiative Taskforce (LEIT) is a consortium of schools training translators and computational linguists that was announced in 1998 as an initiative to develop a promotional program for the academic communities in Europe, North America, and Asia. The initial mandate of LEIT was to conduct a survey among academic and non-academic programs that offer courseware and training for internationalizers and localizers and to query the market players to determine their needs with respect to major job profiles. LEIT's main objective is to stimulate more formal education in skills beneficial to the localization industry that complains of a labor shortage. The academic institutions involved in the first release of LEIT are: University of Geneva (Switzerland), Brigham Young University (Utah), Kent State University (Ohio), University of Cologne (Germany), City College of Dublin (Ireland), Monterey Institute of International Studies (California), and National Software Center in Bombay (India).

http://sirio.deusto.es/abaitua/konzeptu/ta/vic.htm

 

May. 17-21

Why is translation such a difficult task?

  1. Problems of ambiguity , (ii) problems that arise from structural and lexical differences between languages and (iii) multiword units like idiom s and collocations . We will discuss typical problems of ambiguity in Section , lexical and structural mismatches in Section , and multiword units in Section . Of course, these sorts of problem are not the only reasons why MT is hard. Other problems include the sheer size of the undertaking, as indicated by the number of rules and dictionary entries that a realistic system will need, and the fact that there are many constructions whose grammar is poorly understood, in the sense that it is not clear how they should be represented, or what rules should be used to describe them. This is the case even for English, which has been extensively studied, and for which there are detailed descriptions -- both traditional `descriptive' and theoretically sophisticated -- some of which are written with computational usability in mind. It is an even worse problem for other languages. Moreover, even where there is a reasonable description of a phenomenon or construction, producing a description which is sufficiently precise to be used by an automatic system raises non-trivial problems.

http://sirio.deusto.es/abaitua/konzeptu/ta/MT_book_1995/node52.html#SECTION00810000000000000000

 

Which are the main problems of MT?

lexical holes --- that is, cases where one language has to use a phrase to express what another language expresses in a single word. Examples of this include the `hole' that exists in English with respect to French ignorer (`to not know', `to be ignorant of'), and se suicider (`to suicide', i.e. `to commit suicide', `to kill oneself'). The problems raised by such lexical holes have a certain similarity to those raised by idiom s: in both cases, one has phrases translating as single words

http://sirio.deusto.es/abaitua/konzeptu/ta/hutchins91.htm

 

Which parts of Linguistics are more relevant for MT?

Morphological,Syntactical and Semantic fields are more relevant for MT

http://sirio.deusto.es/abaitua/konzeptu/ta/hutchins91.htm

 

How many different types of ambiguity are there?

In the best of all possible worlds (as far as most Natural Language Processing is concerned, anyway) every word would have one and only one meaning. But, as we all know, this is not the case. When a word has more than one meaning, it is said to be lexically ambiguous. When a phrase or sentence can have more than one structure it is said to be structurally ambiguous. Ambiguity is a pervasive phenomenon in human languages. It is very hard to find words that are not at least two ways ambiguous, and sentences which are (out of context) several ways ambiguous are the rule, not the exception. This is not only problematic because some of the alternatives are unintended (i.e. represent wrong interpretations), but because ambiguities `multiply'. In the worst case, a sentence containing two words, each of which is two ways ambiguous may be four ways ambiguous (), one with three such words may be , ways ambiguous etc. One can, in this way, get very large numbers indeed.

http://sirio.deusto.es/abaitua/konzeptu/ta/hutchins91.htm

 

Illustrate your discussion with:

Imagine that we are trying to translate these two sentences into French : You must not abrasive cleaners on the printer casing. The of abrasive cleaners on the printer casing is not recommended. In the first sentence use is a verb, and in the second a noun, that is, we have a case of lexical ambiguity. An English-French dictionary will say that the verb can be translated by (inter alia) se servir de and employer, whereas the noun is translated as emploi or utilisation. One way a reader or an automatic parser can find out whether the noun or verb form of use is being employed in a sentence is by working out whether it is grammatically possible to have a noun or a verb in the place where it occurs. For example, in English, there is no grammatical sequence of words which consists of the + V + PP --- so of the two possible parts of speech to which use can belong, only the noun is possible in the second sentence ( b). As we have noted in Chapter , we can give translation engines such information about grammar, in the form of grammar rules. This is useful in that it allows them to filter out some wrong analyses. However, giving our system knowledge about syntax will not allow us to determine the meaning of all ambiguous words. This is because words can have several meanings even within the same part of speech. Take for example the word button. Like the word use, it can be either a verb or a noun. As a noun, it can mean both the familiar small round object used to fasten clothes, as well as a knob on a piece of apparatus. To get the machine to pick out the right interpretation we have to give it information about meaning. In fact, arming a computer with knowledge about syntax, without at the same time telling it something about meaning can be a dangerous thing. This is because applying a grammar to a sentence can produce a number of different analyses, depending on how the rules have applied, and we may end up with a large number of alternative analyses for a single sentence. Now syntactic ambiguity may coincide with genuine meaning ambiguity, but very often it does not, and it is the cases where it does not that we want to eliminate by applying knowledge about meaning. We can illustrate this with some examples. First, let us show how grammar rules, differently applied, can produce more than one syntactic analysis for a sentence. One way this can occur is where a word is assigned to more than one category in the grammar. For example, assume that the word cleaning is both an adjective and a verb in our grammar. This will allow us to assign two different analyses to the following sentence. fluids can be dangerous. One of these analyses will have cleaning as a verb, and one will have it as an adjective. In the former (less plausible) case the sense is `to clean a fluid may be dangerous', i.e. it is about an activity being dangerous. In the latter case the sense is that fluids used for cleaning can be dangerous. Choosing between these alternative syntactic analyses requires knowledge about meaning. It may be worth noting, in passing, that this ambiguity disappears when can is replaced by a verb which shows number agreement by having different forms for third person singular and plural. For example, the following are not ambiguous in this way: ( a) has only the sense that the action is dangerous, ( b) has only the sense that the fluids are dangerous. Cleaning fluids is dangerous. Cleaning fluids are dangerous. We have seen that syntactic analysis is useful in ruling out some wrong analyses, and this is another such case, since, by checking for agreement of subject and object, it is possible to find the correct interpretations. A system which ignored such syntactic facts would have to consider all these examples ambiguous, and would have to find some other way of working out which sense was intended, running the risk of making the wrong choice. For a system with proper syntactic analysis, this problem would arise only in the case of verbs like can which do not show number agreement. Another source of syntactic ambiguity is where whole phrases, typically prepositional phrases, can attach to more than one position in a sentence. For example, in the following example, the prepositional phrase with a Postscript interface can attach either to the NP the word processor package, meaning ``the word-processor which is fitted or supplied with a Postscript interface'', or to the verb connect, in which case the sense is that the Postscript interface is to be used to make the connection. the printer to a word processor package with a Postscript interface. Notice, however, that this example is not genuinely ambiguous at all, knowledge of what a Postscript interface is (in particular, the fact that it is a piece of software, not a piece of hardware that could be used for making a physical connection between a printer to an office computer) serves to disambiguate. Similar problems arise with ( ), which could mean that the printer and the word processor both need Postscript interfaces, or that only the word processor needs them. will require a printer and a word processor with Postscript interfaces. This kind of real world knowledge is also an essential component in disambiguating the pronoun it in examples such as the following the paper in the printer. Then switch it on. In order to work out that it is the printer that is to be switched on, rather than the paper, one needs to use the knowledge of the world that printers (and not paper) are the sort of thing one is likely to switch on. There are other cases where real world knowledge , though necessary, does not seem to be sufficient. The following, where two people are re-assembling a printer, seems to be such an example: : Now insert the cartridge at the back. B: Okay. A: By the way, did you order more toner today? B: Yes, I got some when I picked up the new paper. A: OK, how far have you got? A: Did you get fixed? It is not clear that any kind of real world knowledge will be enough to work out that it in the last sentence refers to the cartridge, rather than the new paper, or toner. All are probably equally reasonable candidates for fixing. What strongly suggests that it should be interpreted as the cartridge is the structure of the conversation --- the discussion of the toner and new paper occurs in a digression, which has ended by the time it occurs. Here what one needs is knowledge of the way language is used. This is knowledge which is usually thought of as pragmatic in nature. Analysing the meaning of texts like the above example is important in dialogue translation, which is a long term goal for MT research, but similar problems occur in other sorts of text. Another sort of pragmatic knowledge is involved in cases where the translation of a sentence depends on the communicative intention of the speaker --- on the sort of action (the speech act) that the speaker intends to perform with the sentence. For example, ( ) could be a request for action, or a request for information, and this might make a difference to the translation. you reprogram the printer interface on this printer? In some cases, working out which is intended will depend on the non-linguistic situation, but it could also depend on the kind of discourse that is going on --- for example, is it a discourse where requests for action are expected, and is the speaker in a position to make such a request of the hearer? In dialogues, such pragmatic information about the discourse can be important for translating the simplest expressions. For example, the right translation of Thank you into French depends on what sort of speech act it follows. Normally, one would expect the translation to be merci. However, if it is uttered in response to an offer, the right translation would be s'il vous plaît (`please').

http://sirio.deusto.es/abaitua/konzeptu/ta/MT_book_1995/node53.html#SECTION00820000000000000000

 

May. 24-28

Which are the most usual interpretations of the term "machine translation" (MT)?

The term machine translation (MT) is normally taken in its restricted and precise meaning of fully automatic translation. However, in this chapter we consider the whole range of tools that may support translation and document production in general, which is especially important when considering the integration of other language processing techniques and resources with MT. We therefore define Machine Translation to include any computer-based process that transforms (or helps a user to transform) written text from one human language into another. We define Fully Automated Machine Translation (FAMT) to be MT performed without the intervention of a human being during the process. Human-Assisted Machine Translation (HAMT) is the style of translation in which a computer system does most of the translation, appealing in case of difficulty to a (mono- or bilingual) human for help. Machine-Aided Translation (MAT) is the style of translation in which a human does most of the work but uses one of more computer systems, mainly as resources such as dictionaries and spelling checkers, as assistants. Traditionally, two very different classes of MT have been identified. Assimilation refers to the class of translation in which an individual or organization wants to gather material written by others in a variety of languages and convert them all into his or her own language. Dissemination refers to the class in which an individual or organization wants to broadcast his or her own material, written in one language, in a variety of language to the world. A third class of translation has also recently become evident. Communication refers to the class in which two or more individuals are in more or less immediate interaction, typically via email or otherwise online, with an MT system mediating between them. Each class of translation has very different features, is best supported by different underlying technology, and is to be evaluated according to somewhat different criteria

http://sirio.deusto.es/abaitua/konzeptu/nlp/Mlim/mlim4.html

What do FAHQT and ALPAC mean in the evolution of MT?

Researchers at Georgetown University and IBM were working towards the first operational systems, and they accepted the long-term limitations of MT in the production of usable translations. More influential was the well-known dissent of Bar-Hillel. In 1960, he published a survey of MT research at the time which was highly critical of the theory-based projects, particularly those investigating interlingua approaches, and which included his demonstration of the non-feasibility of fully automatic high quality translation (FAHQT) in principle. Instead, Bar-Hillel advocated the development of systems specifically designed on the basis of what he called 'man-machine symbiosis', a view which he had first proposed nearly ten years before when MT was still in its infancy (Bar-Hillel 1951). In these circumstances it is not surprising that the Automatic Language Processing Advisory Committee (ALPAC) set up by the US sponsors of research found that MT had failed by its own criteria, since by the mid 1960s there were clearly no fully automatic systems capable of good quality translation and there was little prospect of such systems in the near future. MT research had not looked at the economic use of existing 'less than perfect' systems, and it had disregarded the needs of translators for computer-based aids.

http://ourworld.compuserve.com/homepages/WJHutchins/MTS-95.htm

List some of the major methods, techniques and approaches

The list of such applications of 'external' theories is long. It began in the 1950s and 1960s with information theory, categorial grammar, transformational-generative grammar, dependency grammar, and stratificational grammar. In the 1970s and 1980s came MT research based on artificial intelligence, non-linguistic knowledge bases, formalisms such as Lexical-Functional Grammar, Generalized Phrase Structure Grammar, Head-driven Phrase Structure Grammar, Definite Clause Grammar, Principles and Parameters, Montague semantics. In the 1990s have been added neural networks, connectionism, parallel processing, and statistical methods, and many more. In nearly every case, it has been found that the 'pure' adoption of the new theory was not as successful as initial trials on small samples appeared to demonstrate. Inevitably the theory had to be adapted to the demands of MT and translation, and in the process it became modified. But innovativeness and idealism must not to be discouraged in a field such as MT where the major problems are so great and all promising approaches must be examined closely. Unfortunately, there has been a tendency throughout the history of MT for the advocates of new approaches to exaggerate their contribution. Many new approaches have been proclaimed as definitive solutions on the basis of small-scale demonstrations with limited vocabulary and limited sentence structures. It is these initial untested claims that must always be treated with great caution. This lesson has been learnt by most MT researchers; no longer do they proclaim imminent breakthroughs.

http://ourworld.compuserve.com/homepages/WJHutchins/MTS-95.htm

Where was MT ten years ago?

Within the last ten years, research on spoken translation has developed into a major focus of MT activity. Of course, the idea or dream of translating the spoken word automatically was present from the beginning (Locke 1955), but it has remained a dream until now. Research projects such as those at ATR, CMU and on the Verbmobil project in Germany are ambitious. But they do not make the mistake of attempting to build all-purpose systems. The constraints and limitations are clearly defined by definition of domains, sublanguages and categories of users. That lesson has been learnt. The potential benefits even if success is only partial are clear for all to see, and it is a reflection of the standing of MT in general and a sign that it is no longer suffering from old perceptions that such ambitious projects can receive funding.

http://ourworld.compuserve.com/homepages/WJHutchins/MTS-95.htm

New directions and foreseeable breakthroughs of MT in the sort term.

In the future, much MT research will be oriented towards the development of `translation modules' to be integrated in general `office' systems, rather than the design of systems to be self-contained and independent. It is already evident that the range of computer-based translation activities is expanding to embrace any process which results in the production or generation of texts and documents in bilingual and multilingual contexts, and it is quite possible that MT will be seen as the most significant component in the facilitation of international communication and understanding in the future `information age'. In this respect, the development of MT systems appropriate for electronic mail is an area which ought to be explored. Those systems which are in use (e.g. DP/Translator on CompuServe) were developed for quite different purposes and circumstances. It would be wrong to assume that existing systems are completely adequate for this purpose. They were not designed for the colloquial and often ungrammatical and incomplete dialogue style of the discussion lists on networks.

http://ourworld.compuserve.com/homepages/WJHutchins/MTS-95.htm

 

REPORT A.

 

ABSTRACT

This report analysis the roles information and knowledge play in our society, if big quantities of data imply that we are well informed or if it destroys us physically and psychologically, and finally if language can be a barrier to communication.

Most of the material used in this report has been taken from the on-line pages shown in the REFERENCES.

INTRODUCTION

Information has always being necessary for us, for our personal enrichment and very important to belong to this society in which everything is based on information and knowledge.

The aim of this report is to explain what does having information implies in the present society and how does it work.

First of all, we will define what information and knowledge are, and what about they are concerned. We will see that knowledge is more important that information.

Secondly, I will try to explain that having big amounts of information doesnít imply not only that we are well informed, but also that it can end in an illness. We can suffer from stress, among others, because we are not able to separate the essential information from the one that is not important.

Finally, we will explain how computer science and language technologies help manage information, and why language can sometimes act as a barrier to communication.

 

INFORMATION AND KNOWLEDGE NOWADAYS

Firstly, we could define Information Management and Knowledge Management. Information Management is the harnessing of the information resources and information capabilities of the organisation in order to add and create value both for itself and for its clients or customers. On the other hand, Knowledge Management is a framework for designing an organisationís goals, structures and processes so that the organisation can use what it knows to learn and to create value for its customers and community.

Information Management is concerned with processing and adding value to information, and the basic issues include access, control, co-ordination, timeliness, accuracy and usability. Knowledge Management is concerned with using the knowledge to take action, in which its basic issues include codification, diffusion, practice, learning, innovation and community building.

We use the information to add and create values, ideas concepts, etc. and each person has the right to choose an amount of information which will be useful for him/her. On the other hand, knowledge has a more defining structure followed by an aim: exclude data, simplify the information in order to use which is necessary to obtain ours and our communityís achievement. That way we wonít get stressed by the big quantity of information which is stored in our brain.

Representing information.

We have two ways of representing information: Computational Grammar and Linguistic Grammar. The firs one is efficient, but it is difficult to express, and as it uses complex structure is more difficult to understand. The second one, tend to run slowly but are efficient at expressing linguistic information. If we want to use the first one we need interpreters that are typically inefficient.

Grammars are treated as rules and lexicons as facts which are compiled into a prolog form. Computational Grammar is the unification of these prologs, the most organised, , but this can have disadvantages in the speed, when we want to understand the information. That is, we can have problems when we want to understand the information rapidly because of:

- having big quantity of information not much simplified

- not having information, which can obstruct what is being communicating.

The best would be having an information easy to understand, a sophisticated one, which could maintain the speed and the precision of the Computational Grammar.

 

What having big quantities of information imply.

Some investigations found that having too much information could be as dangerous as having too little for our health, as Dr. David Lewis said.

Most bureaucrats, business executives, teachers, doctors, lawyers and other professionals suffer from information overload, or calling it technically "Information Fatigue Syndrome", which will be soon recognised medical condition. The symptoms of this syndrome include tension, occasional irritability and frequent feelings of helplessness, all of them sings that the victim is under considerable stress. That overload of information can lead, among other problems, to a paralysis of analysis, making it far harder to find the right solutions or make the best decisions.

Information stress sets in when people in possession of a huge volume of data have to work against the clock, and those people are incapable of organising all the information they have and acting immediately. As a result, they put at risk the life of patients in the case of doctors, or the money in the case of business executives, for example.

To challenge the "Fatigue Syndrome" the human body reacts with a primitive survival response. This evolved millions of years ago to safeguard us when confronted by physical danger. Nowadays, as millions of years ago, in these situations where the only two options are to kill the adversary or flee from it, the "fight-flight" response can make the difference between life and death.

It is hard to admit it, but the human being reacts this way. Normally peolpe flees from the problem, or he or she goes to the doctor asking help. But others, as we have seen all over the world, face the problem in the wrong way, killing the adversary, who can be his/her boss, neighbour, partner, etc.

 

How can computer science and language technologies help manage information?

New opportunities are becoming available to change the way we do many things, to make them easier and more effective by exploiting our developing knowledge of language.

Nowadays, machines can recognise different languages, so people all over the world can do well out of this technology. The benefit that this involve makes that any information of the different cultures of the world can be known.

Computer science also lets us make, for example, business transactions over the telephone or other telematic services. That is, having all facilities at the back of oneís hand.

When a machine understands human language, translates between different languages, and generates speech as well as printed output, we will have available an enormously powerful tool to help us in many areas of our lives.

The success of Language Engineering will be the achievement of all these possibilities. The pace of advance is accelerating and we will see many achievements over the next few years.

Language as a barrier to information.

Language is the most effective way to communicate with other people. But sometimes there can arise problems when we want to begin that communication. That is, between humans, understanding is limited to those groups who share a common language, thatís why language can be seen as a barrier to communication more than as an aid.

 

CONCLUSIONS

 

Thereís no doubt that having information and knowledge is necessary to belong to this society and to be literate. We could make to ourselves the 1million dollar question: which one is more important, knowledge or information? the answer could be that although knowledge can have its bases in information, is more important because knowledge makes us take action.

We have always thought that having big quantity of information made us more intelligent, and that it was useful. But after last investigations we have known that it can be dangerous for us if we donít select what the important information is, rejecting what is unuseful. If we donít do it, it can end in an illness called "Information Fatigue Syndrome", causing irritability and stress among others.

After doing this report I have learned that what we thought was an advantage for the human being can also be an obstacle: the language. Although computer science helps us reaching to those places and to do things that we could never imagine, the simplest thing and the most familiar for us, the language, can act as a barrier to communication in the systems, services and appliances, and also when we want to communicate with someone who belongs to a different language.

 

 

REFERENCES

- FAQ on Information and knowledge Management. Faculty of Information Studies, University of Toronto.

http://choo.fis.utoronto.ca/IMfaq/

- System overload by Maryann Bird. published in Time, 9.12.96 pp.44-45. (transcribed by Joseba Abaitua)http://sirio.deusto.es/abaitua/konzeptu/fatiga.htm

- Jeremy L Wyatt

Mon Feb 23 19:14:55 GMT 1998 http://www.cs.bham.ac.uk/research/booklet_97/arr/projects/node8.html

- hltcentral.

http://www.hltcentral.org/usr_docs/project-source/en/broch/harness.html#lt