This report is a quick review on some subjects related to human language technologies. Information provided was collected from the Internet. It is not an explanation in depth of a particular topic. The objective of the report is to have a general idea about the relationship between new technologies and language. New technologies allow instant communications among people worldwide and fast broadcasting of news. This means a revolution for the society. This text points the possibilities new technologies offer to overcome the barrier to communication that multilinguality implies and to defy the problem of machine translation.
For the last years, we have witnessed a great development in communications, the Internet has been created and many more technologies have been brought out. The influence of computers in every aspect of our lives has rocketed. This new way of life has completely transformed the society. Nowadays, news from all around the world are provided, travellers can visit any place of the world, instant communication between two different spots is possible, no matter how far they are. Despite the disadvantages this evolution may imply, as the change in the relationships between humans, it can help us approach problems we wouldn't be able to solve without computers. Furthermore, the evolution may improve the method used for some processes. In this report, we will see how computers affect people and the use of language.
The report has several sections. The first part concerns aspects related to information society. It deals with the advantages related to new communications and technologies and their effect on the society. The second part is referred to information overload. The matter of information excess is overviewed and the instructions to follow not to obtain misleading information are treated. The third topic is language technology and engineering. This section includes the description of language technology and engineering and how can they help us in the use of language. The next part treats some aspects of machine translation. It covers the most important problems related to machine translation and the methodology used for working. Multilinguality is one of the biggest problems for MT, not just due to phrases and expressions used in each language, but also to different alphabets. Whereas this translation technology is not trustworthy yet, experts are working on it because it offers a wide range of possibilities. The last section covers the diversity on the Internet. The Internet is a channel that could hold a complete structure able to provide instant multilingual service to all users. If this great step was done, all language-related barriers to communication would be avoided.
The development of sophisticated translation systems and language engineering has appeared from the enormous quantity of information available in modern society and the birth of such terms as 'information society'. The evolution of the Information Society has raised the issue of adequate and responsible policy formation in the area of information technology.
In the European Union, the evolution of the Information Society has been based on the philosophy of comissioner Martin Bangemann (1994) who argued that it represents a "revolution based on information (which) adds huge new capacities to human intelligence and constitutes a resource which changes the way we work together and the way we live together". He added that a driving motivation for the Information Society is the creation of employment in developing countries.
In Australia, the Goldsworthy Report (1997) sees the Information Society as a "societal revolution based around information and communication technologies and about the role of these in developing global competitiveness and managing the transition to a globalised free trade world". The author of the House of Representatives Standing Committee's report 'Australia as an Information Society', Barry Jones (1991) adds that the Information Society can be seen as "a period when use of time, family life, employment, education and social interaction are increasingly influenced by access to Information Technology".
The existence and presence of the Information Society appears to have been accepted by the Western World without challenge. In addition, the desire to develop information and technological infrastructure initiated by Ragid Ghandi is viewed as a positive and necessary step for the goal of a universally prosperous society in India. The term Information Society has a variety of connotations and concepts which have been conveniently categorized by Frank Webster (1995):
The inicial concept focuses on the development, modernization and increasingly complexity of computers and telecommunications and their capacity to manipulate, store and transmit data. As to when our society becomes an Information Society, whether it will be based on the number or technological promotion of computer-based appliances in the house or office is still unclear.
Fritz Machlup (1960) studied the size and effect of US information industries demonstrating that education, the media, computing and information services accounted for 30% of the GDP. However, the influence of economy on the growth of the Information Society has been demonstrated by other studies which suggest the early exponential growth of information activities as a proportion of economic activities has actually slowed with little change from 1958 to 1980.
The apparent growth of the 'information' workers over the 'production' workers as stipulated by Daniel Bell in his report 'Coming of the Post-Industrial Society' (1974) arguing that professional and technical classes would dominate the new era with theoretically-based work is still confusing. Defining the job task becomes difficult when almost all employment positions involve some level of information processing.
In terms of policy making in the Information Society, the fusion of public and private spheres is becoming relevant. Whereas there was a time when policy making was clearly a public sector responsibility, the influence of the private sphere as governments become operational along market philosophies and incorporate for-profit activities, is becoming increasingly relevant. The European Union claim that responsibility for policy making in information technology "is a task for the private sector" with the government confined to a regulatory role, while in such places as Singapore strong government influence in information policy is evident.
The very definition of information and its implication for the formation of policy remains ambiguous. Concepts such as 'data' and 'knowledge' are used as synonyms, but the nature of information can be focused on two distinct ideas. Firstly as a tangible entity which can be processed, moved, changed etc. or secondly as a concept of the human brain, the result of absorbtion of symbols and signs thus implying both tangible and intangible definitions. Definitions of information, described as a guide for policy-making has been provided by Sandra Bramen (1989).
1. Information as a resource - informacion as 'pieces of information' and its creators, processors and users as separate entities.
2. Information as perception - information as perception of pattern - treating its effects such as its capacity to reduce uncertainty.
3. Information as commodity - information gains value as it passes through production processes such as indexing and abstracting - it can be bought and sold for profit.
4. Information as a constitutive force in society - information having a capacity to shape context thus exhibiting a power in its own right.
Bramen argues that effective information policy must consider information at all levels of her hierarchy. The concern lies in the complexity of the concepts of 'information' and 'information society' which leave the formation of policy vulnerable to dominance by private sector interests.
An interesting consequence of the existence of the information society in which administrators are forced to deal with gigantic amounts of information has been recognised by David Lewis as 'information fatigue syndrome'. He points out stress-related imperfections among information processors in the report 'Dying for Information' comissioned for Reuters Business International. The stated origin of the disorder, arising from the proliferation of data in the corporate world, comes from the inability of individuals to cope with the processing of a multitude of information in the decision making process and manifests itself in such symptoms as tension, irritability and feelings of hopelessness. The specific training of individuals in the management of data, including the rationalization of the relevant from the non-relevant is viewed as the key to the management of the syndrome.
Language is the basis of communication - to explain ideas, express feelings, record culture or beauty in prose. As fundamental as it is to communication, language can also become a barrier when its limitations in expression across cultures are considered. Language engineering has evolved in response to these limitations. Based on its knowledge of the structure and use of the human language using computer technology, our ability to recognize and manipulate it is revolutionalizing the utilization of the medium. The ability of a machine to recognize written and spoken language of different forms, process and generate responses is greatly improving our ability to communicate and conduct business across cultures and distances.
The development of language engineering into a computerized form has been described by Hans Uszkoreit (1996) as computational linguistics, the relationship between linguistics and computer science. By examining the human cognition required in the generation and understanding of language, the creation of computer programs examining and simulating the structure of the human language becomes possible. The goal of computational linguistics is the development of software products enabling the communication between the computer and its human user. As this interaction applies to a multitude of languages and cultures, in such pursuits as data-base inquires, information retrieval from texts or robot control, its influence on the work environment and information technology will be innovative.
The development and diversity of language engineering facilitates the communication between uses of diverse languages through direct translation, with technological efficiency far superior to the laborious work of translators. The fast increase in utilization of the Internet and society´s need for information demands further development in language technology. The processing and examination of information on the Web provided from a variety of languages can only be achieved effectively by the use of multilingual systems for absorbtion and understanding. Perfection of such systems will facilitate education and international cooperation across cultures. The modelling of the human language by computer also facilitates a deeper understanding of its complex structure, hidden properties and its linguistic application.
Language is fundamental to our identity and social and political aspects of our lives. As a multilingual society, Europe´s development of language engineering is particularly in the enhancement of communication and maintenance of cultural identity and facilitating access to overseas markets. The benefits of language engineering may be summarised as:
- efficient access to information
- communication with computer systems at home, work, car and public places
- acquisition of new languages
- business interaction by telephone
- improved access to world and local events
- improved capability to operate internationally - business, social and political affairs
- provision of a wider range of services
Examination of the structure of a language enabled computer system reveals the complexities involved.
a. Entering material into the computer
Speech (telephone, microphone), text (keyboard), image (scanner, camera, video)
b. Recognition of the language
Understanding of the content, application to the task (information retrieval, translation), generation of the required output
c. Display of the output
Text, speech, image
a. Speaker identification - the individuality of the human voice enables identification of the speaker to facilitate access to information and services.
b. Speaker recognition - speech is received by the computer in analogue or wave forms and is analysed to identify the phonemes which construct words.
c. Character and document image recognition - recognising written or printed language requires that the symbolic representation of a language is derived from its graphic markings. Two cases of recognition exist (of images and handwriting).
d. Natural language understanding - interpretation of using semantic models to represent the meaning of language in terms of concepts and relationships between them.
e. Natural language generation - a semantic representation of a text can be used as the basis for generating language. From an interpretation of the data of the basic meaning, a sentence can be elaborated by a text planning system.
f. Speech generation - speech is generated by playing ´canned´ recordings or concentrated phonemes together. Dialogue can be established by combining speech recognition with simple generation, particularly useful in automatic management of phone calls.
3. Language resources - the essential concepts of language engineering. Resources are produced to enable access to different European Union languages by research laboratories and public institutions.
a. Lexicons - a lexicon is a repository of words and knowledge about those words which may include the grammatical structure (morphology), the sound structure (phonology) or the word´s meaning in different contexts (relating to the preceding of following word).
b. Specialist lexicons - researched and produced seperately, these include proper names (people,places) with significance in automated hotel reservation systems.
c. Grammers - describes the structure of a language at different levels - word phrase and sentence.
d. Corpora - a body of language which provides the basis for analysis of language to establish its characteristics, training a machine to adapt its behaviour and providing a test set for a language engineering technique to re-evaluate itself.
Language engineering is applied at two levels. The first is a generic class such as language translation, information management, authoring or human / machine interaction and the second is the application of these systems to real world problems such as the formation of information services, the generation of texts such as business letters in foreign languages and the formation of translator workbenches.
Improved communication and comprehension of information with computer - enabled language systems have significant applications in society.
1. Competing in a global market
Business success relies on the ability to identify markets, sell into them effectively and provide quality aftersales service. The ability of language systems to develop business letters in appropriate languages, manage multilingual customer documentation and provide translation services facilitates this process immensely.
2. Better information
One of the key features of an information system is the ability to deliver information which meets the needs of its client as seen in the public service information field such as job listings across the EU.
3. Direct access to services
Access to services by phone such as banking or arranging insurance cover enables around the clock service for the customer and a cost - effective system for the service provider.
4. Commerce in the marketplace
Language enabled software will create more opportunities to automate activities involved in the commercial areas. Customers can instruct agents by voice to browse the Web to select products, negotiate prices or collect bids.
5. Effective communication
Application of language knowledge for translators with electronic dictionaries and thesauri lowers the barriers to communication in business and political circles.
6. Accessibility and participation
One of the most important impacts of language engineering is the use of human language to interface with machines, such as the accessibility to the automated legal advice service, providing valuable advice without the expense of direct consultation with legal personnel.
7. Improved education facilities
Computer aided learning has enabled education to be realised across distances and has facilitated competence in more than one language.
8. Entertainment, leisure and creativity
Such benefits are seen in computer games, dubbing and subtitling of films and translation of library and archive material.
An effective language engineering system can have wide - ranging applications across political, business and social circles enhancing communication and leading to a greater cohesion in society. The greatest challenge in the formacion of an intelligent system capable of coping with the vast amount of information available in today´s tecnological society is the handling of multilingual and multimodal information robustly and efficiently with quality performance. A work shop held in 1998 in Granada (Spain was formulated to address these issues) the results published in a report commisioned by the US National Science Foundation (1999). Key components included:
1. The current level of capability of the field dealing with language and human communication.
2. The integration of these functions in the near future and what kind of systems will result.
3. The considerations for extending these functions to handle multilingual and multimodal information.
The magnitude of this challenge may be appreciated by examination of the complexities and ambiguities inherent in the structure of language. When a word has more than one meaning it is said to be lexically ambiguous whilst a phrase or sentence with more than one meaning is recognised as structurally ambiguous. In the English language alone, where unambiguous words are a minority, when combined with sentence ambiguity the possible connotations for the meaning conveyed by text becomes extremely complicated. Looking across languages, this process is further complicated by variations in subject - object - verb order and the ability of words to act as verbs or nouns. Enabling a computer to interpret sentences, it becomes necessary to provide it with some meaning for each item of syntax. Such difficulties and limitations encountered by computer translation systems have been examined in detail by John Hutchins (1991).
Basically, two types of translation systems exist. Those that attempt to translate whole texts without human intervention and those which require intervention to resolve problems of ambiguity in the source text and in the selection of appropriate words or phrases in the target language. Text translation consists of three operations - analysis of the source text, the bilingual transfer of lexical items and structures and the generation of the new text. For example:
Analysis - resolving the ambiguity of the English word `cry`- it could mean `weep` or `shout`
Transfer - deciding the appropriate French verb for the English `know` - `connaitre` or `savoir`
Generation - often incorporated in transfer but can include distinguishing between the English `big`, ´great`or ´large`.
Many present systems have a separate program for each component allowing multilingual translation.
Translation is a problem-solving activity and its regulation in computer translation systems implies that the significant differences between languages can to an extent be regularised.
a. compound nouns - the use of one noun with another can often remove ambiguities such as that of 'light' - not heavy, not dark, illumination. Inclusion of key words in systems to signify meaning aid translation: light bulb, light weight
b. idioms - the use of idioms such as `the spirit is willing but the flesh is weak` present difficulties as the literal meaning is distinct from that of the individual words. Treatment of idioms as units can overcome the problem.
c. metaphors - such as `mouth of river`are also treated as compound expresions and given appropriate rather than literal translations.
The need for syntactic analysis is highlighted by one of the limitations of treating word groups as units eg.
The water pressure is low
To fill the well water pressure is obtained from the pump
The second sentence does not allow compound group translation.
Two types of morphology exist:
1. inflectional morphology - illustrated by familiar verb and noun paradigms eg. French marcher, marche, marchons, marchait etc.
2. derivational morphology - concerned with the formation of nouns from verbs, adjectives from nouns etc. - nation, nationalistic, nationalise.
An effective translation system must be capable of recognising and generating morphological forms to avoid word for word translation.
The problems of ambiguity can also be avoided by syntactic analysis. For example the English `return` which can signify `go back`or `give back` can be defined by the presence of the direct object:
She returned to the office
She returned the book
Semantic roles in a structure signify the specific relationships of nominal elements (entities) to verbal elements (actions or states). Developers of translation systems face difficulties with the identification of semantic roles such as the ambiguous role of `with`:
Instument - The bottle was opened with a corkscrew
Manner - The bottle was opened with difficulty
Context - The bottle was opened with the meal
General knowledge about things and events being referred to can also help to solve ambiguity:
Pregnant women and children - only knowledge that 'pregnant' doesn't apply to 'children' aids in correct interpretation. Translation systems therefore require some kind of human-like understanding.
Despite the inherent difficulties, computer translation systems are capable of dealing with a range of linguistic problems with reasonable success. Although some problems are apparently unsolvable, continued research will produce gradual if not dramatic improvement.
There are many different languages in the world. Every language has its own characteristics reflecting the culture, the society and the history of the country or the region. The most relevant inherent features of languages are the phonemes (they will define how the language is spoken) and the alphabet (it declares how to write). In every language the features and the way of people to express themselves are different. There may be some similarities depending on the influence over the regions through history, but every group of people develops in a different manner. Consequently, some languages have nothing to do not just in spoken or written language, but also in corporal expression while talking or in gestures.
The first problem is no language can be used as a basis for translating all the others. In fact, for translating Italian or Portuguese we could use Spanish as a start point, by contrast, it would be no good for translating Chinese.
Another problem we come up to is the high number of languages in the world. All the possible connections between languages should be covered, for instance, English-German, English-French, English-Spanish, German-French... This implies an enormous amount of pairs, very difficult to manage with today's technology and methodology.
As a result, multilinguality is a problem that cannot be coped with in present conditions of technology. The research and discoveries of new techniques to overcome the problem would change completely the capability to interact with people all around the world.
The emergence of the information society and demands for language services in dealing with information in digital form is opening a new market of employment for specialists in translation of data of this type. The localization industry has been formed in response to the existence of information in electronic format and its chief role is assistance of software publishers, hardware manufacturers and telecommunications companies with versions of their software, documentation, marketing and Web-based information in different languages. At present, software publishers such as Microsoft and Adobe generate 20% of their sales from products localised or adapted to the language and culture of their local markets. In addition to software, its applications extend to home banking, enterprise resource planning, mobile phones and the Internet with a growing need for translation products in the integrated global economy.
LISA is a consortium of schools training translators and computational linguistics announced in 1998 as an iniciative to develop a promotional program for academic countries in Europe, North America and Asia. The course was designed to deal with all aspects of the localization industry:
Globalization - the adaption of market strategies to regional requirements (cultural, legal, linguistic)
Internationalization - the engineering of a product to enable efficient adaptation to local markets
Localization - the adaptation of a product to a target language and culture
Machine translation requires a huge effort in preparation, evaluation and maintenance. Efficient translation services work under the assumption that man-machine interaction and the integration of tools into the translator´s environment is the solution.
The Translation Workstation
The versatility of multi-lingual information services relies on the integration of linguistic tools. Such a working environment comprises the operating system, the document editor and the emailer or Web-browser, complemented with such tools as spell, grammer or style-checkers to on-line dictionaries.
Software Localization Tools
Localization packages are now being designed to assist users throughout the whole life-cycle of a multi-lingual document, through authoring, translation, preparation, validation and publishing. Such new systems help developers monitor different versions, variants and languages of product documentation.
The human factor must be considered in the machine translation process. The computer is merely a device to magnify human productivity and translation services require expert utilization of linguistic machine tools.
As Martin Kay (1992) has stated: "There is nothing that a person could know or feel of dream that could not be crucial for getting a good translation of one text or another. To be a translator therefore, one cannot have some parts of humanity, one must be a complex human being.
The limitations of machine translation become obvious when the liguistic form as opposed to content becomes relevant as exemplified in translation of prose or poetry. Douglas Hofstadter´s experiment in translating 16th Century French Clement Marot´s poem Ma Mignonne into English using IBM´s Candide system provides ample evidence, being describes as an utter disaster.
Students of translation technology need an excellent aptitude for interpretation, literary translation and naturally, specialised computer training. Today´s courses should be designed to enable graduates to fulfill the following functions:
Consultant - person suficiently informed to advise potential users of translation technology
User - a trained person capable of using the computer and specialised translation software
Instructor - a person capable of training others
Evaluator - a person able to assess the value of software
Manager - a person responsible for acquisition of appropriate professionals and technological infrastructure to make a translation company profitable
Developer - a person involved in software development, integration and updating.
The formation of professional translators is becoming not only a necessary but essential procedure as companies in the field of software production complain about the lack of qualified personnel that combine both a linguistic background and computational skills. The rationale for training personnel and other associated developments in the language engineering industry lies in socio-economic issues in achieving a competitive market position and sustaining growth in the global economy as business becomes associated with language and culture. Europe lies at the forefront of language technology with its ability to capitalise on the wealth represented by its linguistic and cultural diversity. Its strengths include:
Technological excellence in human language services
Telecommunications, Content Industry, High-Technology manufacturing
The European Union commitment to encourage and preserve cultural and linguistic diversity and sustain cohesive development among its many cultures will maintain its position at the forefront of this industry.
The Internet is a macro-environment, so there are many different ways of using it for a variety of purposes. Consequently, more than just one method for developing machine translation should be expected. It is believed that new specialized uses will appear that could make possible MT in the future. Therefore, it is complicated to make an estimation of how will MT contribute Internet in the future.
The Internet is a channel allowing information to be transmitted or stored. The essential features of this channel can be summed up in four points: its operation is efficient; its extension is global; its use is flexible; and its form is electronic:
Efficient operation: Communication via the Internet is rapid (in some cases instantaneous), powerful (large volumes of traffic can be supported), reliable (messages are delivered with precision), and, once the necessary technological infrastructure and tools are in place, cheap in comparison to alternative channels of communication.
Global extension: The Internet renders geographical distances insignificant, turning the world into a "global village". Consequently, other obstacles to communication acquire greater relevance, including possession of the required technology (hence ultimately economic factors) and cultural differences (particularly language).
Flexible use: A wide and increasing variety of types of communication can be realized via the Internet, transmitting different sorts of content through different media; the only limits are the potential for such content and media to be digitalized, the capacity of current technology to perform such digitalization, and the availability of hardware and communications infrastructure with the required capacities and power.
Electronic form: The electronic nature of the channel is the key element behind the aforementioned features; it also implies other benefits. Anything that can be done electronically can be done via the Internet; hence, more and more of modern technology can employ the same common channel, including the numerous aspects of Information Techonology which are beginning to emerge at the present time.
Internet will become a way to deliver MT services and will also get many advantages from them. For that reason, improving MT services will be good for making the Internet a global network that will not only overcome geographical difficulties, but also linguistic ones.
Europe is the place in the world where the most import research could be held. In fact, Europe is a place in the world where the capacity for tehnological innovation and the cultural and linguistic variety leads to the necessity of a proper MT system. There are other places in the world which have the same technology level but have very little diversity to be aware of how important is MT. Moreover, there are some places where different languages exist but there is no technological capability to start a project like this one.
The Internet at the present time is a channel where information is provided by transmission of data from an author to a recipient. It is a functional system for the first steps of a network like the Internet, but it has already begun to change. If the Internet didn't change, it would become bigger and bigger with loads of information. Consequently, information would become less manageable and efficiency would fall drastically. The structure of the Internet should be reformed into a more complex one able to interact with the user to provide the information needed. The working process would consist of a requirement of the user, then the network would start a process of collecting information and finally would elaborate a report with the precise information that the user demanded.
Anyhow, the internet is also a tool for carrying out transactions. Actually, this actions are multiplying in the Net, products can be purchased, services are ordered and delivered. According to experts in this matter, electronic transactions will have a boom during the next years. This boom means a significant increase in the number of transactions that will be done in the Internet. As a result, for more specialized functions, more specialized technology will be needed. This fact may change the present structure of the Internet. As the complexity and specialization grow, so will do the solutions for machine translation. Whereas it's difficult to predict what will happen to the Internet in the future, presumably, the new advances in technology will help approach the problems of MT.
Language engineering has become a specialized industry with the proliferation of information in a globalized economy. The very nature of language in all its complexities and forms presents a huge challenge to professionals in the field in the development of systems capable of deal with multilingual data. The future for students pursuing a career in information technology appears bright with the further demand for experts in this field. In terms of its impact on the development of modern society, the information age seems to have been accepted with optimism with some doubts concerning the responsibility for policy making by public or government sectors.
Prof. G. Yadigaroglu. 2001. "How to write a good report"
Mairéad Browne. 1997. "Information Policy for an Information Society"
Hans Uszkoreit. 1996. "WHAT IS COMPUTATIONAL LINGUISTICS?"
FAQ's in the University of Toronto. http.//www.fis.utoronto.ca/kmi/resources.html
Language engineering and the information society. http://sirio.deusto.es/abaitua/kontzeptu/nlp/echo/infoage.html
Joseba Abaitua. 1999. "Is it worth learning translation technology?". Universidad de Deusto
W.John Hutchins (University of East Anglia, Norwich, England). [Paper for Translating & the Computer 13, London, November 1991]. "WHY COMPUTERS DO NOT TRANSLATE BETTER"
Arnold D J. 1995. "Translation problems"
"Multilingual Information Management: Current Levels and Future Abilities". A report Commissioned by the US National Science Foundation and also delivered to the European Commissions Language Engineering Office and the US Defense Advanced Research Projects Agency. April 1999
John Hutchins (University of East Anglia). [Paper presented at the MT Summit, Luxembourg, 1995]. "REFLECTIONS ON THE HISTORY AND PRESENT STATE OF MACHINE TRANSLATION"
STOA PUBLICATIONS. "Linguistic Diversity on the Internet: Assessment of the Contribution of Machine Translation" PE 289 662/Fin. St.
Australia as an Information Society: Grasping New Paradigms (1991). (Jones Report). Canberra, AGPS.
Bell, d. (1974). The Coming of the Post-Industrial Society. London, Heinemann.
Braman, S. (1996). Defining information: an approach for policy makers. Telecommunications Policy, 13, 233-242.
Department of Industry, Science and Tourism (1997). The Global Information Economy: The Way Ahead. (Goldsworthy Report) Canberra, DIST.
Frededking, R et al, (1999). Multilingual Information Management: Current Levels and Future Abilities.