ABSTRACT

In This report I will try to answer the following question: how can Language Technology Benefit the Information society? The themes I am going to deal with are all in connection with language and new technologies, such as Machine Translation, Internet, and so on, which are very important for the development of our society, as well as for the development of other sciences

INTRODUCTION

Nowadays, it is doubtless that the new technologies are essential to cope with our society. We live in a society where language has a lot to do with new technologies. In this report, my main objective is to show the connection between Language Technology and the Information Society, but it is not the only one. In order to answer the main questions about each theme, I have looked up in several documents and reports (that will be mentioned later on) which were on-line in the Internet.

There are many issues related to my main theme, and these sub themes are structured in the following way:

In the first part of my report I have established a basic connection with the information society; it is the first contact with the browser and on-line materials.

The second part of the report, that coincides with the second week of the semester, deals with the problem of having too much information or even too little. In connection to this theme, I have explained some terms, such as Language Engineering, or its terminology.

The main objective of the third part is to review translation technology and its potential to help overcome the problem of multilinguality as a barrier to communication. Translation curricula is important for professional interprets and literary translators because it is useful for their jobs, as well as for the localization industry.

The fourth part is in connection with the previous one because it is about Machine Translation as a way of overcoming language barriers. In this section there are several examples of lexical ambiguity and structural ambiguity, as well as idioms and so on.

Then, in the fifth part, I have concentrated more deeply on the development of Machine translation, how it was ten years ago and its major methods, techniques and approaches; and then the main breakthroughs of machine translation in the future.

The sixth and final part of this report continues talking about the machine translation, but this time from a different point of view. In this occasion the main theme is Linguistic Diversity on the Internet. I have discussed here the fundamental features of the Internet and the role of minority languages on it.

In the last decades there has been a great development in the field of information technology, and computing. This advance has supposed an important innovation for many people and in different ways. It has improved the communications all over the world, and it is a doubtless help for a lot of enterprises, as it gives them the chance for the increase of their profits and a fastest development of the jobs they offer.

To sum up I can say that what we call English Language and New Technologies is something that concerns many people all over the world.

What is the "Information Society"?

The term Information Society has been around for a long time now and, indeed, has become something of a cliché. The notion of the coming Information Society reminds me of the way the idea of the Sydney 2000 Olympics and the way it shimmers in the distance. We look towards the Olympics and resolve to prepare hard for it. We must rapidly transform ourselves, our city, our demeanour to be ready and worthy. Time is of the essence in making ourselves ready for the challenge. There is certain breathlessness in all of this rhetoric. The same can be said of much of the documents and writings on the Information Society. The recent Department of Industry, Science and Tourism's Goldsworthy report on the Global Information Economy urges "...time is short, and the need for action is urgent. Government must grasp the challenge now." (Department of Industry, Science and Tourism, 1997:7). But when you push past the rhetoric and the sense of urgency being conveyed , what is the reality of the Information Society? What, in particular, do policy makers think it is? In the European Union, the concept of the Information Society has been evolving strongly over the past few years building on the philosophy originally spelled out by Commissioner Martin Bangemann in 1994. Bangemann argued that the Information Society represents a "revolution based on information ... [which] adds huge new capacities to human intelligence and constitutes a resource which changes the way we work together and the way we live together..." (European Commission, 1994:4). One of the main implications of this "revolution" for Bangemann is that the Information Society can secure badly needed jobs (Europe and the Global Information Society, 1994:3). In other words, a driving motivation for the Information Society is the creation of employment for depressed economies. Closer to home it is instructive to look at just a few policy (or would-be policy) documents to see the views of the Information Society dominant here. The Goldsworthy report sees the Information Society as a "societal revolution based around information and communication technologies and about the role of these in developing global competitiveness and managing the transition to a globalised free trade world" (Department of Industry, Science and Tourism, 1997). In short, Goldsworthy's idea of the Information Society is entirely an economic one. At a broader level Barry Jones, the author of the House of Representatives Standing Committee's 1991 report 'Australia as a Information Society' sets out a definition of the Information Society which sees it as simply "a period when use of time, family life, employment, education and social interaction are increasingly influenced by access to Information Technology" (Australia as an Information Society: Grasping New Paradigms, 1991).

1. Technologica

This notion of the Information Society focuses on the gee-whiz technology as epitomised by the 'Towards 2000' TV series. In recent times, the emphasis is on the convergence of computers and telecommunications and the capacity for storage, manipulation and transmission of vast amounts of data. The Goldsworthy Report sits squarely in this category, following earlier Australian reports such as the Broadband Services Expert Group's document (Broadband Services Expert Group, 1994). The problem is, however, that drawing a direct line between the presence of information technology with some sort of new society is hard to justify. Will the presence of say, a computer in every home, make us an Information Society? Or should that be two computers? At what point will we know we've arrived? What changes in our fundamental institutions, ways of living and working characterises an Information Society, as opposed to a non- Information Society? A further weakness of this concept is highlighted by the many commentators who point out the dangers of technological determinism in thinking about the Information Society and reject the view that technology impacts on society and is the prime agent of change, defining the social world (Webster, 1995:10)

2. Economic

This concept of the Information Society has been built on Fritz Machlup's seminal study of the size and effect of the US information industries in the 1960s. Machlup demonstrated that education, the media, computing, information services (including insurance, law and other information based professions), R+D and so on accounted for some 30% of GNP. (Machlup, 1962) . Marc Porat continued this line of enquiry and demonstrated the rising proportion of information-related activities in the US economy (Porat, 1977) . Barry Jones replicated this work for Australia in his highly-cited Sleepers Wake! (Jones, 1983). More recently an ABC "Background Briefing" programme on the Information Economy highlighted the significance of the value of logical structures, the expression of cognitive processes, within computer software. This was referred to as the "weightless economy". Entrancing as it is to have numbers to quote in support of the importance of information in the economy, it is difficult to argue that the existence of lots of information activities in society actually impacts on social life, without moving to an analysis of the substance or quality of that information. In any event, what matters, surely, is not the amount but the meaning and value of information. Some econometric studies suggest that the early experimental exponential growth of information activities as a proportion of economic activities has actually slowed down with little change from 1958 to 1980. This hardly supports the idea that information is growing steadily in its dominance (Rubin and Huber, 1986). And there is the added difficulty of applying economic concepts to the creation, processing. flow and use of information. Sandra Braman's analysis shows the pitfalls of thinking of information as a commodity as this fails to accommodate the fact that many forms of activity around information are not driven by market forces, for example, culturally transmitted information. Nor does an economic approach acknowledge the inappropriateness of many basic economic assumptions given that form and substance of information are not the same thing. Finally, there is the difficulty that economic approaches require information to be measured in terms of discrete pieces for economic valuation. (Braman, 1996).

3. Occupational

This idea of the Information Society rests on the idea that in an Information Society the dominant category of worker is engaged as an "information worker". Many commentators have produced data to demonstrate growth patterns in the need for more workers who will use their brain rather than their brawn. Daniel Bell's influential 'Coming of the Post-Industrial Society' argued that the professional and technical classes would dominate in the new era with work organised around theoretically based knowledge for the purpose of social control and directing of innovation and change (Bell 1974: 15-20). Analyses of census data support the view that there are vast armies working in information (Porat, 1977; Jones, 1983). Jonscher's analysis of the role of information resources in productivity increases in the US economy demonstrated the scale and categorised workers as belonging to either the 'Information Sector' where creating, processing and handling information dominates or a 'Production Sector' which is concerned with production and handling of physical goods (Jonscher, 1983). As the former head of a school of Information Studies, I have real doubts about the usefulness of the figures in these analyses. Conscientious attempts by myself and colleagues to analyse market demand for graduates in Information Studies led to immense frustration as we grappled with the poor descriptive powers of job titles, and advertisements in general, in relation to the information activities in a given position. Some were fairly obvious - Data Base Designer, Librarian, Information Manager, Research Officer, but we quickly found that lurking beneath just about every position described in the Saturday advertisements was some component of information handling and processing. The challenge was to find a way of saying definitively whether a job was predominantly an information professional's job or not. This difficulty may not be enough on its own to say that occupational trends cannot be reliably tapped and used as an indicator of broad developments over time but it suggests the basis of the Bell, Porat and Jones studies are probably more than a bit wobbly. A quick consideration of Jonscher's two categories of worker applied to publishers and booksellers points up the same difficulty at a more general level. If workers deal with tangible products such as books - are they production workers or are they information workers? The dilemma comes from the reality that just about everyone's job has some information activities embedded in it so that deciding when information handling dominates to the point where the worker is an "information worker" rather than a "production worker" is simply too hard. It has to be concluded then that attempts to define an Information Society according to the number of people in the business of information is problematic. Consequently, measurement of trends in employment in information work or comparison between societies to decide which, if any, is an Information Society seems destined to be highly unreliable. Webster has identified two more concepts of the Information Society which I will mention only briefly here, namely, spatial and cultural. Firstly, there is the spatial idea of the Information Society as a networked society, a global village where people of like minds and purposes are linked together through electronic networks. This idea is now coming through in some EU Information Society policy documents in the idea of the Information Society as a mechanism for developing cultural cohesion, empowerment and integration of communities across the Union (European Union, 1996a). I think it would be fair to say also that Australian information policy documents also incorporate both cultural and spatial concepts of Information Society. The Broadband Services Expert Group final report dealt with the question of equity of access (regardless of geography) and called for communication and information infrastructure developments to build on community and individual user need rather than technological capacity. (Broadband Services Expert Group, 1994:5). The Jones report mentioned earlier while focusing on economic and occupational aspects, acknowledges the Information Society as a period in which use of time, and family life will be influenced by access to information technology. Looking to the implications of these varied ideas of the Information Society for public policy making it is clear there was a time when policy was clearly the business of the public sector and was essentially about "what governments choose to do and what not to do" (Dye 1995). The trouble now is that the edges of the public and private spheres are becoming more difficult to distinguish as has been amply demonstrated by papers in this strand of the Conference. It is interesting that the field of information studies has in some ways anticipated this development as it has accepted the place of private sector organisational policy on information matters to be recognised as "information policy" even though, at least traditionally, these policies were turned inwards to the support of organisational roles. With the general global drive to interweave public and private sector activities within market-led, neo-liberal frameworks the burgeoning information and IT infrastructure within governments cannot be considered adequately without looking the interaction of public and private sectors. The private sector can have monumental effects on what governments can do with information for their own use or in the context of making information available to the community at large. Take, for example, the decision to concentrate Microsoft and Apple interests. This cannot but impact on government through the extension of control of the IT and software industries. This effect is even more pointed when governments operate along strictly market philosophies and for-profit activities are incorporated in the government sector. Some understanding of how the fusion of public and private impacts on information policy can be gained from Nick Moore's analysis of Western and East Asian information policy implementation strategies (Moore, 1997). Moore argues that there are two broad approaches to information policy formation. One, the neo-liberal, puts its trust in the market to move society along towards the Information Society. The European Union policies illustrate this particularly well as there the basic tenet of information policy is the belief that the achievement of the Information Society "is a task for the private sector" with the role of government confined to ensuring a supportive regulatory climate and a refocussing of current public expenditure patterns. Bangemann is adamant that additional public money, subsidies or protectionism will not be available and talks about the need to "strike down entrenched positions which put Europe at a competitive disadvantage". The role of government is strictly limited to providing a regulatory framework for a partnership of private and public sectors (European Commission, 1994:3). It seems ironic in one sense, but also understandable, that a number of countries which might at first glance seem likely to adopt market-driven strategies actually drive their information policy with strong interventions of government, with Singapore being an obvious example. (Moore, 1997). Many Australian reports and information policy documents also fall into this more interventionist or dirigiste category, with the Goldsworthy report the most recent of a long line of reports calling for direct government assistance to the private sector (Department of Industry, Science and Tourism, 1997). In conclusion, it can be said that there is a generally optimistic response to the idea of the Information Society and it is mostly enthusiastically endorsed as desirable. Many go further and say that it is absolutely essential for nations and regions to become an Information Society. There are, however, many conceptions of the Information Society which means that there is an ambiguous foundation for policy makers. Added to this is the complexity of different political philosophies which impact on implementation of information policy. This complexity is further compounded when we start to look a the informational component of the Information Society.http://sirio.deusto.es/abaitua/konzeptu/nlp/Browne_M.html

What is the role of HLTCentral.org?

HLTCentral web site was established as an online information resource of human language technologies and related topics of interest to the HLT community at large. It covers news, R&D, technological and business developments in the field of speech, language, multilinguality, automatic translation, localisation and related areas. Its coverage of HLT news and developments is worldwide - with a unique European perspective.

HLTCentral is Powered by Two EU funded projects, ELSNET and EUROMAP, are behind the development of HLTCentral.

EUROMAP("Facilitating the path to market for language and speech technologies in Europe") - aims to provide awareness, bridge-building and market-enabling services for accelerating the rate of technology transfer and market take-up of the results of European HLT.

ELSNET("The European Network of Excellence in Human Language Technologies") - aims to bring together the key players in language and speech technology, both in industry and in academia, and to encourage interdisciplinary co-operation through a variety of events and services RTD projects. http://www.hltcentral.org/htmlengine.shtml?id=615

Why language technologies are so important for the Information Society?

Computational linguistics (CL) is a discipline between linguistics and computer science which is concerned with the computational aspects of the human language faculty. It belongs to the cognitive sciences and overlaps with the field of artificial intelligence (AI), a branch of computer science aiming at computational models of human cognition. Computational linguistics has applied and theoretical components.

Applied CL focusses on the practical outcome of modelling human language use. The methods, techniques, tools and applications in this area are often subsumed under the term language engineering or (human) language technology. Although existing CL systems are far from achieving human ability, they have numerous possible applications. The goal is to create software products that have some knowledge of human language. Such products are going to change our lives. They are urgently needed for improving human-machine interaction since the main obstacle in the interaction beween human and computer is a communication problem. Today's computers do not understand our language but computer languages are difficult to learn and do not correspond to the structure of human thought. Even if the language the machine understands and its domain of discourse are very restricted, the use of human language can increase the acceptance of software and the productivity of its users. Applied CL focusses on the practical outcome of modelling human language use. The methods, techniques, tools and applications in this area are often subsumed under the term language engineering or (human) language technology. Although existing CL systems are far from achieving human ability, they have numerous possible applications. The goal is to create software products that have some knowledge of human language. Such products are going to change our lives. They are urgently needed for improving human-machine interaction since the main obstacle in the interaction beween human and computer is a communication problem. Today's computers do not understand our language but computer languages are difficult to learn and do not correspond to the structure of human thought. Even if the language the machine understands and its domain of discourse are very restricted, the use of human language can increase the acceptance of software and the productivity of its users.

http://sirio.deusto.es/abaitua/konzeptu/nlp/HU_what_cl.htm

Why "knowledge" is of more value than "information"?

Consider a document containing a table of numbers indicating product sales for the quarter. As they stand, these numbers are Data. An employee reads these numbers, recognizes the name and nature of the product, and notices that the numbers are below last year’s figures, indicating a downward trend. The data has become Information. The employee considers possible explanations for the product decline (perhaps using additional information and personal judgment), and comes to the conclusion that the product is no longer attractive to its customers. This new belief, derived from reasoning and reflection, is Knowledge.

Thus, information is data given context, and endowed with meaning and significance. Knowledge is information that is transformed through reasoning and reflection into beliefs, concepts, and mental models.

Information management is the harnessing of the information resources and information capabilities of the organization in order to add and create value both for itself and for its clients or customers. Knowledge management is a framework for designing an organization’s goals, structures, and processes so that the organization can use what it knows to learn and to create value for its customers and community. A KM framework involves designing and working with the following elements: Categories of organizational knowledge (tacit knowledge, explicit knowledge, cultural knowledge) Knowledge processes (knowledge creation, knowledge sharing, knowledge utilization) Organizational enablers (vision and strategy; roles and skills; policies and processes; tools and platforms) IM provides the foundation for KM, but the two are focused differently. IM is concerned with processing and adding value to information, and the basic issues here include access, control, coordination, timeliness, accuracy, and usability. KM is concerned with using the knowledge to take action, and the basic issues here include codification, diffusion, practice, learning, innovation, and community building.

TD Wilson emphatically differentiates 'information' from 'knowledge'. In addition he notes that knowledge is a property of an individual and cannot be directly transmitted; information and data can. [I would add that a corporation, a learning community and human systems, generally, can meet the criteria for knowing. See what you think after you read what I say below.]

"Knowledge is power, but information is not. It's like the detritus that a gold-panner needs to sift through in order to find the nuggets." D. Lewis

http://sirio.deusto.es/abaitua/konzeptu/nlp/infomanage.htm

http://radio.weblogs.com/0106698/2002/10/22.html

http://sirio.deusto.es/abaitua/konzeptu/fatiga.htm#knowledge

Does the possesion of big quantities of data imply that we are well informed?

According to Gilbert Ryle, "Know is a capacity verb, and a capacity verb of that special sort that is used for signifying that the person described can bring things off, or get things right." Thus, an observer, saying , "Agnes can make bread" is making a statement about her knowledge. The claim as to her knowledge is based on observing her interactions with and effects upon her environment.

Karl Popper, Jean Piaget, and Dewey also see knowledge as something which allows one to achieve goals, to get it right [the relation between goal and an action which will transform situations so as to allow realization of the goal] is to be able to transform situations so as to match goals. Those philosophers also see knowledge as being achieved in successive stages in which an individual applies capacity for critical analysis to her/his most recent experience. By doing so that individual has altered his/her method for producing the transformations that s/he desires. At some point or another, after multiple alterations in approach and successive enhancements in the ability to produce results under widely varying conditions, it is obvious that the individual is 'getting things right'. Whether the transformations that are produced are referred to as something specific, (e.g., 'giving an order in a restaurant' or 'setting the table' or 'bringing a meeting to order') or more general (e.g., cooking, fishing, boat building, writing code, blacksmithing, horticulture, childraising or teaching) the knowledgeable individual is seen to be able to achieve a quality result through her/his own efforts under a variety of conditions

"Better training in separating essential data from material that, no matter how interesting, is irrelevant to the task at hand is needed." D. Lewis

http://sirio.deusto.es/abaitua/konzeptu/fatiga.htm#brain

How many words of technical information are recorded every day?

Statistics from Reuters (1997) show that every day 20 million words of technical information are recorded. In human terms this means - assuming a reading of 1,000 words per minute, it would take 6 weeks of 8-hour days to read. Upon completion any sense of accomplishment would be quickly dulled with the realisation that they have fallen - five and a half years behind.

http://www.p-jones.demon.co.uk/infintro.htm

What is the most convenient way of representing information? Why?

Language is the natural means of human communication; the most effective way we have to express ourselves to each other. We use language in a host of different ways: to explain complex ideas and concepts; to manage human resources; to negotiate; to persuade; to make our needs known; to express our feelings; to narrate stories; to record our culture for future generations; and to create beauty in poetry and prose. For most of us language is fundamental to all aspects of our lives.

http://www.hltcentral.org/usr_docs/project-source/en/broch/harness.html#lt

How can computer science and language technologies help manage information?

Language Engineering can improve the quality of information services by using techniques which not only give more accurate results to search requests, but also increase greatly the possibility of finding all the relevant information available. Use of techniques like concept searches, i.e. using a semantic analysis of the search criteria and matching them against a semantic analysis of the database, give far better results than simple keyword searches

One of the major, direct benefits of the Information Society for the ordinary citizen will be the improvement in public service information. However, the wide accessibility of this information will depend upon Language Engineering. People who are not familiar with the conventional user interface of a computer system will be able to request information by voice and the system will guide them through the possibilities. Those who want information about other countries, which may be held in a foreign language, will be able to receive it in their own language. A good example of this is a service which is currently being developed which will provide information about job opportunities across the European Union in the native language of the potential applicant. Obviously these are jobs where language skills are not significant. The service will be available on the Internet and it is also planned to have public booths where job seekers can use the service. In a mono-lingual pilot service run in Flanders, a surprising 26% of applications for jobs were received from applicants who had seen the details on the Internet

http://www.hltcentral.org/usr_docs/project-source/en/broch/harness.html#bi

Why language can sometimes be seen as a barrier to communication? How can this change?

Communication is probably the most obvious use of language. On the other hand, language is also the most obvious barrier to communication. Across cultures and between nations, difficulties arise all the time not only because of the problem of translating accurately from one language to another, but also because of the cultural connotations of word and phrases. A typical example in the European context is the word 'federal' which can mean a devolved form of government to someone who already lives in a federation, but to someone living in a unitary sovereign state, it is likely to mean the imposition of another level of more remote, centralised government

As the application of language knowledge enables better support for translators, with electronic dictionaries, thesauri, and other language resources, and eventually when high quality machine translation becomes a reality, so the barriers will be lowered. Agreements at all levels, whether political or commercial, will be better drafted more quickly in a variety of languages. International working will become more effective with a far wider range of individuals able to contribute. An example of a project which is successfully helping to improve communications in Europe is one which interconnects many of the police forces of northern Europe using a limited, controlled language which can be automatically translated, in real-time. Such a facility not only helps in preventing and detecting international crime, but also assists the emergency services to communicate effectively during a major incident.

http://www.hltcentral.org/usr_docs/project-source/en/broch/harness.html#nlu

In what ways does Language Engineering improves the use of language?

Communication is probably the most obvious use of language. On the other hand, language is also the most obvious barrier to communication. Across cultures and between nations, difficulties arise all the time not only because of the problem of translating accurately from one language to another, but also because of the cultural connotations of word and phrases. A typical example in the European context is the word 'federal' which can mean a devolved form of government to someone who already lives in a federation, but to someone living in a unitary sovereign state, it is likely to mean the imposition of another level of more remote, centralised government.

As the application of language knowledge enables better support for translators, with electronic dictionaries, thesauri, and other language resources, and eventually when high quality machine translation becomes a reality, so the barriers will be lowered. Agreements at all levels, whether political or commercial, will be better drafted more quickly in a variety of languages. International working will become more effective with a far wider range of individuals able to contribute. An example of a project which is successfully helping to improve communications in Europe is one which interconnects many of the police forces of northern Europe using a limited, controlled language which can be automatically translated, in real-time. Such a facility not only helps in preventing and detecting international crime, but also assists the emergency services to communicate effectively during a major incident.

http://www.hltcentral.org/usr_docs/project-source/en/broch/harness.html#ec

Language Tecnology, Language Engineering and Computational Linguistics. Similarities and differences.

Language technologies are information technologies that are specialized for dealing with the most complex information medium in our world: human language. Therefore these technologies are also often subsumed under the term Human Language Technology. Human language occurs in spoken and written form. Whereas speech is the oldest and most natural mode of language communication, complex information and most of human knowledge is maintained and transmitted in written texts. Speech and text technologies process or produce language in these two modes of realization. But language also has aspects that are shared between speech and text such as dictionaries, most of grammar and the meaning of sentences. Thus large parts of language technology cannot be subsumed under speech and text technologies. Among those are technologies that link language to knowledge. We do not know how language, knowledge and thought are represented in the human brain. Nevertheless, language technology had to create formal representation systems that link language to concepts and tasks in the real world. This provides the interface to the fast growing area of knowledge technologies.

In our communication we mix language with other modes of communication and other information media. We combine speech with gesture and facial expressions. Digital texts are combined with pictures and sounds. Movies may contain language and spoken and written form. Thus speech and text technologies overlap and interact with many other technologies that facilitate processing of multimodal communication and multimedia documents.

Computational linguistics (CL) is a discipline between linguistics and computer science which is concerned with the computational aspects of the human language faculty. It belongs to the cognitive sciences and overlaps with the field of artificial intelligence (AI), a branch of computer science aiming at computational models of human cognition. Computational linguistics has applied and theoretical components

Applied CL focusses on the practical outcome of modelling human language use. The methods, techniques, tools and applications in this area are often subsumed under the term language engineering or (human) language technology. Although existing CL systems are far from achieving human ability, they have numerous possible applications. The goal is to create software products that have some knowledge of human language. Such products are going to change our lives. They are urgently needed for improving human-machine interaction since the main obstacle in the interaction beween human and computer is a communication problem. Today's computers do not understand our language but computer languages are difficult to learn and do not correspond to the structure of human thought. Even if the language the machine understands and its domain of discourse are very restricted, the use of human language can increase the acceptance of software and the productivity of its users.

Much older than communication problems between human beings and machines are those between people with different mother tongues. One of the original aims of applied computational linguistics has always been fully automatic translation between human languages. From bitter experience scientists have realized that they are still far away from achieving the ambitious goal of translating unrestricted texts. Nevertheless computational linguists have created software systems that simplify the work of human translators and clearly improve their productivity. Less than perfect automatic translations can also be of great help to information seekers who have to search through large amounts of texts in foreign languages

Language Engineering is the application of knowledge of language to the development of computer systems which can recognise, understand, interpret, and generate human language in all its forms. In practice, Language Engineering comprises a set of techniques and language resources. The former are implemented in computer software and the latter are a repository of knowledge which can be accessed by computer software.

http://www.hltcentral.org/usr_docs/project-source/en/broch/harness.html#wile

http://sirio.deusto.es/abaitua/konzeptu/nlp/HU_whatLT.pdf

http://sirio.deusto.es/abaitua/konzeptu/nlp/HU_what_cl.htm

Which are the main techniques used in Language Engineering?

There are many techniques used in Language Engineering and some of these are described below.

Speaker Identification and Verification:

A human voice is as unique to an individual as a fingerprint. This makes it possible to identify a speaker and to use this identification as the basis for verifying that the individual is entitled to access a service or a resource. The types of problems which have to be overcome are, for example, recognising that the speech is not recorded, selecting the voice through noise (either in the environment or the transfer medium), and identifying reliably despite temporary changes (such as caused by illness)

The sound of speech is received by a computer in analogue wave forms which are analysed to identify the units of sound (called phonemes) which make up words. Statistical models of phonemes and words are used to recognise discrete or continuous speech input. The production of quality statistical models requires extensive training samples (corpora) and vast quantities of speech have been collected, and continue to be collected, for this purpose.

There are a number of significant problems to be overcome if speech is to become a commonly used medium for dealing with a computer. The first of these is the ability to recognise continuous speech rather than speech which is deliberately delivered by the speaker as a series of discrete words separated by a pause. The next is to recognise any speaker, avoiding the need to train the system to recognise the speech of a particular individual. There is also the serious problem of the noise which can interfere with recognition, either from the environment in which the speaker uses the system or through noise introduced by the transmission medium, the telephone line, for example. Noise reduction, signal enhancement and key word spotting can be used to allow accurate and robust recognition in noisy environments or over telecommunication networks. Finally, there is the problem of dealing with accents, dialects, and language spoken, as it often is, ungrammatically

Character and Document Image Recognition

Recognition of written or printed language requires that a symbolic representation of the language is derived from its spatial form of graphical marks. For most languages this means recognising and transforming characters. There are two cases of character recognition:

recognition of printed images, referred to as Optical Character Recognition (OCR)

recognising handwriting, usually known as Intelligent Character Recognition (ICR)

OCR from a single printed font family can achieve a very high degree of accuracy. Problems arise when the font is unknown or very decorative, or when the quality of the print is poor. In these difficult cases, and in the case of handwriting, good results can only be achieved by using ICR. This involves word recognition techniques which use language models, such as lexicons or statistical information about word sequences.

Document image analysis is closely associated with character recognition but involves the analysis of the document to determine firstly its make-up in terms of graphics, photographs, separating lines and text, and then the structure of the text to identify headings, sub-headings, captions etc. in order to be able to process the text effectively.

Natural Language Understanding:

The understanding of language is obviously fundamental to many applications. However, perfect understanding is not always a requirement. In fact, gaining a partial understanding is often a very useful preliminary step in the process because it makes it possible to be intelligently selective about taking the depth of understanding to further levels.

Shallow or partial analysis of texts is used to obtain a robust initial classification of unrestricted texts efficiently. This initial analysis can then be used, for example, to focus on 'interesting' parts of a text for a deeper semantic analysis which determines the content of the text within a limited domain. It can also be used, in conjunction with statistical and linguistic knowledge, to identify linguistic features of unknown words automatically, which can then be added to the system's knowledge

Semantic models are used to represent the meaning of language in terms of concepts and relationships between them. A semantic model can be used, for example, to map an information request to an underlying meaning which is independent of the actual terminology or language in which the query was expressed. This supports multi-lingual access to information without a need to be familiar with the actual terminology or structuring used to index the information

Combinations of analysis and generation with a semantic model allow texts to be translated. At the current stage of development, applications where this can be achieved need be limited in vocabulary and concepts so that adequate Language Engineering resources can be applied. Templates for document structure, as well as common phrases with variable parts, can be used to aid generation of a high quality text

Natural Language Generation:

A semantic representation of a text can be used as the basis for generating language. An interpretation of basic data or the underlying meaning of a sentence or phrase can be mapped into a surface string in a selected fashion; either in a chosen language or according to stylistic specifications by a text planning system

Speech Generation:

Speech is generated from filled templates, by playing 'canned' recordings or concatenating units of speech (phonemes, words) together. Speech generated has to account for aspects such as intensity, duration and stress in order to produce a continuous and natural response

Dialogue can be established by combining speech recognition with simple generation, either from concatenation of stored human speech components or synthesising speech using rules

Providing a library of speech recognisers and generators, together with a graphical tool for structuring their application, allows someone who is neither a speech expert nor a computer programmer to design a structured dialogue which can be used, for example, in automated handling of telephone calls.

http://www.hltcentral.org/usr_docs/project-source/en/broch/harness.html#t

Which language resources are essential components of Language Engineering?

Language resources are essential components of Language Engineering. They are one of the main ways of representing the knowledge of language, which is used for the analytical work leading to recognition and understanding.

The work of producing and maintaining language resources is a huge task. Resources are produced, according to standard formats and protocols to enable access, in many EU languages, by research laboratories and public institutions. Many of these resources are being made available through the European Language Resources Association (ELRA).

Lexicons :

A lexicon is a repository of words and knowledge about those words. This knowledge may include details of the grammatical structure of each word (morphology), the sound structure (phonology), the meaning of the word in different textual contexts, e.g. depending on the word or punctuation mark before or after it. A useful lexicon may have hundreds of thousands of entries. Lexicons are needed for every language of application

Specialist Lexicons

There are a number of special cases which are usually researched and produced separately from general purpose lexicons

Proper names: Dictionaries of proper names are essential to effective understanding of language, at least so that they can be recognised within their context as places, objects, or person, or maybe animals. They take on a special significance in many applications, however, where the name is key to the application such as in a voice operated navigation system, a holiday reservations system, or railway timetable information system, based on automated telephone call handling.

Terminology: In today's complex technological environment there are a host of terminologies which need to be recorded, structured and made available for language enhanced applications. Many of the most cost-effective applications of Language Engineering, such as multi-lingual technical document management and machine translation, depend on the availability of the appropriate terminology banks.

Wordnets: A wordnet describes the relationships between words; for example, synonyms, antonyms, collective nouns, and so on. These can be invaluable in such applications as information retrieval, translator workbenches and intelligent office automation facilities for authoring

Grammars:

A grammar describes the structure of a language at different levels: word (morphological grammar), phrase, sentence, etc. A grammar can deal with structure both in terms of surface (syntax) and meaning (semantics and discourse).

Corpora

A corpus is a body of language, either text or speech, which provides the basis for:

There are national corpora of hundreds of millions of words but there are also corpora which are constructed for particular purposes. For example, a corpus could comprise recordings of car drivers speaking to a simulation of a control system, which recognises spoken commands, which is then used to help establish the user requirements for a voice operated control system for the market.

http://www.hltcentral.org/usr_docs/project-source/en/broch/harness.html#lr

Check for the following terms:

[p] a term in use since the 1980s to define a class of software systems which handle text intelligently

[p] a software system providing a working environment for a human translator, which offers a range of aids such as on-line dictionaries, thesauri, translation memories, etc

[p] software which parses language to a point where a rudimentary level of understanding can be realised; this is often used in order to identify passages of text which can then be analysed in further depth to fulfil the particular objective

[n] a means to represent the rules used in the establishment of a model of linguistic knowledge

The sound of speech is received by a computer in analogue wave forms which are analysed to identify the units of sound (called phonemes) which make up words. Statistical models of phonemes and words are used to recognise discrete or continuous speech input. The production of quality statistical models requires extensive training samples (corpora) and vast quantities of speech have been collected, and continue to be collected, for this purpose.

There are a number of significant problems to be overcome if speech is to become a commonly used medium for dealing with a computer. The first of these is the ability to recognise continuous speech rather than speech which is deliberately delivered by the speaker as a series of discrete words separated by a pause. The next is to recognise any speaker, avoiding the need to train the system to recognise the speech of a particular individual. There is also the serious problem of the noise which can interfere with recognition, either from the environment in which the speaker uses the system or through noise introduced by the transmission medium, the telephone line, for example. Noise reduction, signal enhancement and key word spotting can be used to allow accurate and robust recognition in noisy environments or over telecommunication networks. Finally, there is the problem of dealing with accents, dialects, and language spoken, as it often is, ungrammatically.

[p] the process of aligning different language versions of a text in order to be able to identify equivalent terms, phrases, or expressions

[p] facilities provided in conjunction with word processing to aid the author of documents, typically including an on-line dictionary and thesaurus, spell-, grammar-, and style-checking, and facilities for structuring, integrating and linking documents

[p] language which has been designed to restrict the number of words and the structure of (also artificial language) language used, in order to make language processing easier; typical users of controlled language work in an area where precision of language and speed of response is critical, such as the police and emergency services, aircraft pilots, air traffic control, etc

[n] usually applied to the area of application of the language enabled software e.g. banking, insurance, travel, etc.; the significance in Language Engineering is that the vocabulary of an application is restricted so the language resource requirements are effectively limited by limiting the domain of application

http://www.hltcentral.org/usr_docs/project-source/en/broch/harness.html#gcut

In the translation curricula, which factors make technology more indispensable?

When discussing the relevance of technological training in the translation curricula, it is important to clarify the factors that make technology more indispensable and show how the training should be tuned accordingly. The relevance of technology will depend on the medium that contains the text to be translated. This particular aspect is becoming increasingly evident with the rise of the localization industry, which deals solely with information in digital form. There may be no other imaginable means for approaching the translation of such things as on-line manuals in software packages or CD-ROMs with technical documentation than computational ones.

What should students learn about translation technology? As we now know, there is no one single answer to this question. Technological skills will depend on how students see their own future as translators. Those with good aptitudes for interpreting or literary translation could leave technology on a secondary level. However, it is clear that the vast majority of students should be prepared to satisfy the growing demand for specialists in technical documentation, and in particular the demand from the localization industry. Thus, training centers should seriously consider introducing the LEIT initiative into their training curricula

Apart from a basic common computational background these would include official and industrial standards in offimatics (with word-processing, data-base maintenance, spreadsheet management, Internet browsing, emailing, etc.); students should have realistic knowledge of some specific translation technology, ideally in the form of a translation workstation. However, it is important to realize that software is constantly evolving, software and hardware up-dating is expensive, and that key concepts and skills may be equally well acquired with tools which are two or three years old. What is most important is becoming competent with the basic functional operations such as file and window management, editing, and net interaction. More specialized operations will be easily acquired on top of the basic ones, and will largely depend on the student's natural sympathy for the computer. I would recommend at least one year of basic computer training before attempting any specialization.

http://sirio.deusto.es/abaitua/konzeptu/ta/vic.htm

Do professional interpreters and literary translators need translation technology? Which are the tools they need for their job?

the traditional crafts of interpreting natural speech or translating printed material, which are peripheral to technology, may still benefit from technological training slightly more than anecdotally. It is clear that word processors, on-line dictionaries and all sorts of background documentation, such as concordances or collated texts, besides e-mail or other ways of network interaction with colleagues anywhere in the world may substantially help the literary translator's work.

With the exception of a few eccentrics or maniacs, it will be rare in the future to see good professional interpreters and literary translators not using more or less sophisticated and specialized tools for their jobs, comparable to the familiarization with tape recorders or typewriters in the past. In any case, this might be something best left to the professional to decide, and may not be indispensable.

http://sirio.deusto.es/abaitua/konzeptu/ta/vic.htm

In what ways is documentation becoming electronic? How does this affect the industry?

the greater number of jobs for our students is in the localization market. Information of many types is rapidly changing format and going digital. Electronic documentation is the adequate realm for the incorporation of translation technology. This is something that young students of translation must learn. As the conception and design of technical documentation becomes progressively influenced by the electronic medium, it is integrating more and more with the whole concept of a software product. The strategies and means for translating both software packages and electronic documents are becoming very similar and both are now, as we will see, the goal of the localization industry

The increase of information in electronic format is linked to advances in computational techniques for dealing with it. Together with the proliferation of informational webs in Internet, we can also see a growing number of search and retrieval devices, some of which integrate translation technology. Technical documentation is becoming electronic, in the form of CD-ROM, on-line manuals, intranets, etc. An important consequence of the popularization of Internet is that the access to information is now truly global and the demand for localizing institutional and commercial Web sites is growing fast. In the localization industry, the utilization of technology is congenital, and developing adequate tools has immediate economic benefits

http://sirio.deusto.es/abaitua/konzeptu/ta/vic.htm

What is the focus of the localization industry? Do you believe there might be a job for you in that industry sector?

The main role of localization companies is to help software publishers, hardware manufacturers and telecommunications companies with versions of their software, documentation, marketing, and Web-based information in different languages for simultaneous worldwide release. The recent expansion of these industries has considerably increased the demand for translation products and has created a new burgeoning market for the language business. According to a recent industry survey by LISA (the Localization Industry Standards Association), almost one third of software publishers, such as Microsoft, Oracle, Adobe, Quark, etc., generate above 20 percent of their sales from localized products, that is, from products which have been adapted to the language and culture of their targeted markets, and the great majority of publishers expect to be localizing into more than ten different languages

Localization is not limited to the software-publishing business and it has infiltrated many other facets of the market, from software for manufacturing and enterprise resource planning, games, home banking, and edutainment (education and entertainment), to retail automation systems, medical instruments, mobile phones, personal digital assistants (PDA), and the Internet. Doing business in an integrated global economy, with growing electronic transactions, and world wide access to products and services means an urgent need to break through language barriers. A prediction of $220 billion online spending by 2001 shows the potential of this new market. It means that product information, from purchasing procedures to user manuals, must be made available in the languages of potential customers. According to the latest surveys, there are more than 35 million non-English-speaking Internet users. Internet is thus evolving into a huge consumer of Web-based information in different languages. The company Nua Ltd. provides a good example of how the demand for multilingual Web-sites is changing the notion of translation into localization. Nua has recently won a substantial contract to develop and maintain a searchable multilingual intranet for the American Export Group (AEG), a division of Thomas Publishing International. Nua's task is to transform the existing American Export Register (AER), a directory of some 6,000 pages, into a localized database of 45,000 company listings, with information about each company, including a categorization into one of AEG's 5,000 categories. AEG's intranet will link 47,000 US firms to overseas clients. The first version of the AER register will provide access in five languages: English, French, German, Spanish, and Portuguese. Russian is due to follow, and the company hopes eventually to have an Arabic version. Any such multilingual service involves frequent revisions and updates, which in turn means a high demand for constant localization effort.

http://sirio.deusto.es/abaitua/konzeptu/ta/vic.htm

Define internationalization, globalization and localization. How do they affect the design of software products?

Globalization: The adaptation of marketing strategies to regional requirements of all kinds (e.g., cultural, legal, and linguistic).

Internationalization: The engineering of a product (usually software) to enable efficient adaptation of the product to local requirements.

Localization: The adaptation of a product to a target language and culture (locale).

The main goal of the LEIT initiative is to introduce localization courseware into translation studies, with versions ready for the start of the 1999 academic year. However, this must be done with care. Bert Esselink (1998), from AlpNet, for example, argues against separating localization from other disciplines and claims its basic principles should be covered in all areas of translation training. Furthermore, it would be useful to add the trainers not only need constant feedback and guidance from the commercial sector, they also need to maintain close contact with the software industry. So, perhaps, one of the best features of the LEIT initiative is its combination of partnership from the academic as well as from the industry world. LISA offers the first version of this courseware on its Web-site and users have the possibility to contact the LEIT group and collaborate through an on-line questionnaire.

http://sirio.deusto.es/abaitua/konzeptu/ta/vic.htm

Are translation and localization the same thing? Explain the differences.

Localization packages are now being designed to assist users throughout the whole life cycle of a multilingual document. These take them through job setup, authoring, translation preparation, translation, validation, and publishing, besides ensuring consistency and quality in source and target language variants of the documentation. New systems help developers monitor different versions, variants and languages of product documentation, and author customer specific solutions. An average localization package today will normally consist of an industry standard SGML/XML editor (e.g. ArborText), a translation and terminology toolkit (Trados Translator's Workbench), and a publishing engine (e.g. Adobe's Frame+SGML).

Unlike traditional translators, software localizers may be engaged in early stages of software development, as there are issues, such as platform portability, code exchange, format conversion, etc. which if not properly dealt with may hinder product internationalization. Localizers are often involved in the selection and application of utilities that perform code scanning and checking, that automatically isolate and suggest solutions to National Language Support (NLS) issues, which save time during the internationalization enabling process. There are run-time libraries that enable software developers and localizers to create single-source, multilingual, and portable cross-platform applications. Unicode support is also fundamental for software developers who work with multilingual texts, as it provides a consistent coding format for international character sets.

What is a translation workstation? Compare it with a standard localization tool.

Leaving behind the old conception of a monolithic compact translation engine, the industry is now moving in the direction of integrating systems: "In the future Trados will offer solutions that provide enterprise-wide applications for multilingual information creation and dissemination, integrating logistical and language-engineering applications into smooth workflow that spans the globe," says Trados manager Henri Broekmate. Logos, the veteran translation technology provider, has announced "an integrated technology-based translation package, which will combine term management, TM, MT and related tools to create a seamless full service localization environment." Other software manufacturers also in the race are Corel, Star, IBM, and the small but belligerent Spanish company Atril. This approach for integrating different tools is largely the view advocated by many language-technology specialists. Below is a description of an ideal engine which captures the answers given by Muriel Vasconcellos (from the Pan American Health Organization), Minako O'Hagan (author of The Coming Age of Teletranslations) and Eduard Hovy (President of the Association of Machine Translation in the Americas) to a recent survey (by Language International 10.6). The ideal workstation for the translator would combine the following features:

Eduard Hovy underlines the need for a genre detector. "We need a genre topology, a tree of more or less related types of text and ways of recognizing and treating the different types computationally." He also sees the difficulty of constantly up-dating the dictionaries and suggests a "restless lexicon builder that crawls all over the Web every night, ceaselessly collecting words, names, and phrases, and putting them into the appropriate lexicons."

Muriel Vasconcellos pictures her ideal design of the workstation in the following way:

Unlike traditional translators, software localizers may be engaged in early stages of software development, as there are issues, such as platform portability, code exchange, format conversion, etc. which if not properly dealt with may hinder product internationalization. Localizers are often involved in the selection and application of utilities that perform code scanning and checking, that automatically isolate and suggest solutions to National Language Support (NLS) issues, which save time during the internationalization enabling process. There are run-time libraries that enable software developers and localizers to create single-source, multilingual, and portable cross-platform applications. Unicode support is also fundamental for software developers who work with multilingual texts, as it provides a consistent coding format for international character sets

http://sirio.deusto.es/abaitua/konzeptu/ta/vic.htm

Machine translation vs. human translation. Do you agree that translation excellence goes beyond technology? Why?

A computer translates an entire document automatically and presents it to a human being. This is machine translation. But, computers have no sense of audience and just blindly follow rules. Words with ambiguous meanings are not accounted for when translating and this presents a serious drawback

Instead it is useful to use a database driven web infrastructure to save on maintenance and programming costs. Some of the translation tools available in the market are listed here. These companies also offer translation services and translation software.

It is important to reassess the human factor. Like cooks, tailors or architects, professional translators need to become acquainted with technology, because good use of technology will make their jobs more competitive and satisfactory. But they should not dismiss craftsmanship. Technology enhances productivity, but translation excellence goes beyond technology. It is important to delimit the roles of humans and machines in translation. Martin Kay's (1987) words in this respect are most illustrative:

A computer is a device that can be used to magnify human productivity. Properly used, it does not dehumanize by imposing its own Orwellian stamp on the products of human spirit and the dignity of human labor but, by taking over what is mechanical and routine, it frees human beings over what is mechanical and routine. Translation is a fine and exacting art, but there is much about it that is mechanical and routine, if this were given over to a machine, the productivity of the translator would not only be magnified but this work would become more rewarding, more exciting, more human

It has taken some 40 years for the specialists involved in the development of MT to realize that the limits to technology arise when going beyond the mechanical and routine aspects of language. From the outside, translation is often seen as a mere mechanical process, not any more complex than playing chess, for example. If computers have been programed with the capacity of beating a chess master champion such as Kasparov, why should they not be capable of performing translation of the highest quality? Few people are aware of the complexity of literary translation. Douglas Hofstadter (1998) depicts this well:

A skilled literary translator makes a far larger number of changes, and far more significant changes, than any virtuoso performer of classical music would ever dare to make in playing notes in the score of, say, a Beethoven piano sonata. In literary translation, it's totally humdrum stuff for new ideas to be interpreted, old ideas to be deleted, structures to be inverted, twisted around, and on and on.

http://www.indiawebdevelopers.com/articles/web_translation.asp

http://sirio.deusto.es/abaitua/konzeptu/ta/vic.htm

Which profiles should any person with a University degree in Translation be qualified for?

The General Manager of LionBridge, Santi van der Kruk, for example, declares:

The profile we look for in translators is an excellent knowledge of computer technology and superb linguistic ability in both the source and target languages. They must know how to use the leading CAT [computer assisted translation] tools and applications and be flexible. The information technology and localization industries are evolving very rapidly and translators need to move with them.

Vand der Meer, president of AlpNet, puts it this way:

Localization was originally intended to set software (or information technology) translators apart from 'old fashioned' non-technical translators of all types of documents. Software translation required a different skill set: software translators had to understand programming code, they had to work under tremendous time pressure and be flexible about product changes and updates. Originally there was only a select group--the localizers--who knew how to respond to the needs of the software industry. >From these beginnings, pure localization companies emerged focusing on testing, engineering, and project management.

http://sirio.deusto.es/ABAITUA/konzeptu/ta/vic.htm

Why is translation such a difficult task?

Although it may not be perceived at first sight, the complexity of natural language is of an order of magnitude far superior to any purely mechanical process.

Which are the main problems of MT?

we will consider some particular problems which the task of translation poses for the builder of MT systems --- some of the reasons why MT is hard. It is useful to think of these problems under two headings: (i) Problems of ambiguity , (ii) problems that arise from structural and lexical differences between languages and (iii) multiword units like idiom s and collocations . We will discuss typical problems of ambiguity in Section , lexical and structural mismatches in Section , and multiword units in Section .

Of course, these sorts of problem are not the only reasons why MT is hard. Other problems include the sheer size of the undertaking, as indicated by the number of rules and dictionary entries that a realistic system will need, and the fact that there are many constructions whose grammar is poorly understood, in the sense that it is not clear how they should be represented, or what rules should be used to describe them. This is the case even for English, which has been extensively studied, and for which there are detailed descriptions -- both traditional `descriptive' and theoretically sophisticated -- some of which are written with computational usability in mind. It is an even worse problem for other languages. Moreover, even where there is a reasonable description of a phenomenon or construction, producing a description which is sufficiently precise to be used by an automatic system raises non-trivial problems.

http://sirio.deusto.es/ABAITUA/konzeptu/ta/MT_book_1995/node52.html#SECTION00810000000000000000

Which parts of Linguistics are more relevant for MT?

We are led therefore to the argument that good quality translation is not possible without understanding the reality behind what is being expressed, i.e. translation goes beyond the familiar linguistic information: morphology, syntax and semantics. The need is particularly striking in the treatment of pronouns. Human translators have virtually no problems with pronouns, and it must seem strange to many that while MT systems can deal quite well with complex idioms and certain complex structures, they all seem to have great difficulties with pronouns. Why do we get such errors as die Europäische Gemeinschaft und ihre Mitglieder, rendered as the European Community and her members? The problem is that the antecedent of pronouns must be identified; the default translation of ihr as her does not work.

http://sirio.deusto.es/ABAITUA/konzeptu/ta/hutchins91.htm

How many different types of ambiguity are there?

it is said to be lexically ambiguous. When a phrase or sentence can have more than one structure it is said to be structurally ambiguous.

http://sirio.deusto.es/ABAITUA/konzeptu/ta/MT_book_1995/node53.html#SECTION00820000000000000000

Illustrate your discussion with:

  1. Although I agree to a few of Smith’s points, I must disagree to the majority.

the problem is the two uses of the word "to". The difficulty here is that English allows for a number of prepositions to fit idiomatically with the verb agree, but you must know when to use which preposition. For instance, you can use the word "to" but only in a sentence like this: - She agreed to the conditions spelled out in the contract.

  1. We can differ the transportation and ritual models of communication by contrasting the space-biased nature of one with the time-biased nature of the second. .

The correct idiomatic wording would be "We can differentiate between…".

http://suo.ieee.org/email/msg02295.html

http://logic.philosophy.ox.ac.uk/tutorial1/Tut1-03.htm

http://www.essex.ac.uk/linguistics/clmt/MTbook/HTML/node54.html

http://www.longman-elt.com/dictionaries/llreview/r3komuro.html

http://www.sfu.ca/~gmccarro/Grammar/Expressions.html

Which are the most usual interpretations of the term "machine translation" (MT)?

The term machine translation (MT) is normally taken in its restricted and precise meaning of fully automatic translation. However, in this chapter we consider the whole range of tools that may support translation and document production in general, which is especially important when considering the integration of other language processing techniques and resources with MT We therefore define Machine Translation to include any computer-based process that transforms (or helps a user to transform) written text from one human language into another. We define Fully Automated Machine Translation (FAMT) to be MT performed without the intervention of a human being during the process. Human-Assisted Machine Translation (HAMT) is the style of translation in which a computer system does most of the translation, appealing in case of difficulty to a (mono- or bilingual) human for help. Machine-Aided Translation (MAT) is the style of translation in which a human does most of the work but uses one of more computer systems, mainly as resources such as dictionaries and spelling checkers, as assistants.

What do FAHQT and ALPAC mean in the evolution of MT?

There were of course dissenters from the dominant 'perfectionism'. Researchers at Georgetown University and IBM were working towards the first operational systems, and they accepted the long-term limitations of MT in the production of usable translations. More influential was the well-known dissent of Bar-Hillel. In 1960, he published a survey of MT research at the time which was highly critical of the theory-based projects, particularly those investigating interlingua approaches, and which included his demonstration of the non-feasibility of fully automatic high quality translation (FAHQT) in principle. Instead, Bar-Hillel advocated the development of systems specifically designed on the basis of what he called 'man-machine symbiosis', a view which he had first proposed nearly ten years before when MT was still in its infancy (Bar-Hillel 1951).

Nevertheless, the main thrust of research was based on the explicit or implicit assumption that the aim of MT must be fully automatic systems producing translations at least as good as those made by human translators. The current operational systems were regarded as temporary solutions to be superseded in the near future. There was virtually no serious consideration of how 'less than perfect' MT could be used effectively and economically in practice. Even more damaging was the almost total neglect of the expertise of professional translators, who naturally became anxious and antagonistic. They foresaw the loss of their jobs, since this is what many MT researchers themselves believed was inevitable.

In these circumstances it is not surprising that the Automatic Language Processing Advisory Committee (ALPAC) set up by the US sponsors of research found that MT had failed by its own criteria, since by the mid 1960s there were clearly no fully automatic systems capable of good quality translation and there was little prospect of such systems in the near future. MT research had not looked at the economic use of existing 'less than perfect' systems, and it had disregarded the needs of translators for computer-based aids.

While the ALPAC report brought to an end many MT projects, it did not banish the public perception of MT research as essentially the search for fully automatic solutions. The subsequent history of MT is in part the story of how these is this mistaken emphasis of the early years has had to be repaired and corrected. The neglect of the translation profession has been made good eventually by the provision of translation tools and translator workstations. MT research has turned increasingly to the development of realistic practical MT systems where the necessity for human involvement at different stages of the process is fully accepted as an integral component of their design architecture. And 'pure' MT research has by and large recognised its role within the broader contexts of commercial and industrial realities.

List some of the major methods, techniques and approaches

Tools for translators, practical machine translation and research methods for machine translation.

Where was MT ten years ago?

Ten years ago, the typical users of machine translation were large organizations such as the European Commission, the US Government, the Pan American Health Organization, Xerox, Fujitsu, etc. Fewer small companies or freelance translators used MT, although translation tools such as online dictionaries were becoming more popular. However, ongoing commercial successes in Europe, Asia, and North America continued to illustrate that, despite imperfect levels of achievement, the levels of quality being produced by FAMT and HAMT systems did address some users’ real needs. Systems were being produced and sold by companies such as Fujitsu, NEC, Hitachi, and others in Japan, Siemens and others in Europe, and Systran, Globalink, and Logos in North America (not to mentioned the unprecedented growth of cheap, rather simple MT assistant tools such as PowerTranslator).

http://sirio.deusto.es/abaitua/konzeptu/nlp/Mlim/mlim4.html

New directions and foreseeable breakthroughs of MT in the sort term

Several applications have proven to be able to work effectively using only subsets of the knowledge required for MT. It is possible now to evaluate different tasks, to measure the information involved in solving them, and to identify the most efficient techniques for a given task. Thus, we must face the decomposition of monolithic systems, and to start talking about hybridization, engineering, architectural changes, shared modules, etc. It is important when identifying tasks to evaluate linguistic information in terms of what is generalizable, and thus a good candidate for traditional parsing techniques (argument structure of a transitive verb in active voice?), and what is idiosyncratic (what about collocations?). Besides, one cannot discard the power of efficient techniques that yield better results than older approaches, as illustrated clearly by part of speech disambiguation, which has proved to be better solved using Hidden Markov Models than traditional parsers. On the other hand, it has been proven that good theoretically motivated and linguistically driven tagging label sets improve the accuracy of statistical systems. Hence we must be ready to separate the knowledge we want to represent from the techniques/formalisms that have to process it

http://sirio.deusto.es/abaitua/konzeptu/nlp/Mlim/mlim4.html

Within the last ten years, research on spoken translation has developed into a major focus of MT activity. Of course, the idea or dream of translating the spoken word automatically was present from the beginning (Locke 1955), but it has remained a dream until now. Research projects such as those at ATR, CMU and on the Verbmobil project in Germany are ambitious. But they do not make the mistake of attempting to build all-purpose systems. The constraints and limitations are clearly defined by definition of domains, sublanguages and categories of users. That lesson has been learnt. The potential benefits even if success is only partial are clear for all to see, and it is a reflection of the standing of MT in general and a sign that it is no longer suffering from old perceptions that such ambitious projects can receive funding.

http://ourworld.compuserve.com/homepages/WJHutchins/MTS-95.htm

Which are Internet's essential features?

Before the nineties, three main approaches to Machine Translation were developed: the so-called direct, transfer and interlingua approaches. Direct and transfer-based systems must be implemented separately for each language pair in each direction, while the interlingua-based approach is oriented to translation between any two of a group of languages for which it has been implemented. The implications of this fundamental difference, as well as other features of each type of system, are discussed in this and the following sections. The more recent corpus-based approach is considered later in this section.

More recently developed approaches to MT divide the translation process into discrete stages, including an initial stage of analysis of the structure of a sentence in the source language, and a corresponding final stage of generation of a sentence from a structure in the target language. Neither analysis nor generation are translation as such. The analysis stage involves interpreting sentences in the source language, arriving at a structural representation which may incorporate morphological, syntactic and lexical coding, by applying information stored in the MT system as grammatical rules and dictionaries. The generation stage performs approximately the same functions in reverse, converting structural representations into sentences, again applying information embodied in rules and dictionaries.

The transfer approach, which characterizes the more sophisticated MT systems now in use, may be seen as a compromise between the direct and interlingua approaches, attempting to avoid the most extreme pitfalls of each. Although no attempt is made to arrive at a completely language-neutral interlingua representation, the system nevertheless performs an analysis of input sentences, and the sentences it outputs are obtained by generation. Analysis and generation are however shallower than in the interlingua approach, and in between analysis and generation, there is a transfer component, which converts structures in one language into structures in the other and carries out lexical substitution. The object of analysis here is to represent sentences in a way that will facilitate and anticipate the subsequent transfer to structures corresponding to the target language sentences

http://www.europarl.eu.int/stoa/publi/99-12-01/part2_en.htm

What is the role of minority languages on the Internet (Catalan, Basque...)?

This point requires even more careful consideration when what is needed is not merely a bilingual but a multilingual MT network, in which translation is possible from any language into any other language among a given network of languages or in a multilingual community. Unless a high degree of reusability be achieved, some serious problems arise unless the multilingual set is very limited in size. When, in 1978, an ambitious project, named Eurotra, was started to develop "a machine translation system of advanced design" between all official languages of the European Community (a target which was not achieved before the programme came to an end), the Community's official languages numbered only six: English, French, German, Dutch, Danish and Italian. This meant fifteen language pairs. Within eight years, the entry of Greece and subsequently Spain and Portugal into the Community had added three new official languages which had to be integrated into the system, still under development. This increase from six to nine languages meant that the number of language pairs more than doubled, rising from fifteen to thirty-six. If the programme had continued a little longer, by the time there were twelve official languages of the Community, the number of language pairs would have gone from 36 to 66; fifteen languages would have brought the figure up to 105, and so on in geometric progression.

http://www.europarl.eu.int/stoa/publi/99-12-01/part2_en.htm

In what ways can Machine Translation be applied on the Internet?

The Internet is, and will be to an increasing degree, both a vehicle for providing MT services and a major beneficiary of their application. To this extent, it is likely to provide a further key to making the Internet a truly global medium which can transcend not only geographical barriers but also linguistic ones.

Europe, as the most notable focal point in the present-day world where a great capacity for technological innovation crosses paths with a high level of linguistic diversity, is excellently placed to lead the way forward. Other parts of the world are technologically capable but too self-contained and homogeneous culturally to acquire immediate awareness of the need for information technology to find its way across linguistic barriers, while still other communities are fully aware of the language problem but lack a comparable degree of access to technological resources and initiative needed to address the issue on such a scale. Whoever succeeds in making future communication global in linguistic terms will have forged a new tool of incalculable value to the entire world

http://www.europarl.eu.int/stoa/publi/99-12-01/part2_en.htm

CONCLUSION

This report contains a collection of quotations looked up in different documents on-line, reflecting the main issues about the connection between the language technologies and their role in the information society. Looking back to my report I see that the most important subject is that of Machine Translation, which is seen from different points of view in parts four, five and six.

In my opinion, this theme of machine translation is the most important because it shows clearly the connection between language and new technologies, although it has doubt to be better than human translation. Related to this last issue, in part four the problem of language barriers has been analyzed to show that Machine Translation has negative points too.

Nowadays, we are living in a time where the development of sciences occurs in such a fast way that I think that actual barriers to new technologies will vanish little by little. In a near future we will be able to control almost everything by information technology, and the Internet has an important role in it.

REFERENCES

http://www.hltcentral.org/

http://sirio.deusto.es/ABAITUA/konzeptu/nlp/Browne_M.html

http://sirio.deusto.es/abaitua/konzeptu/nlp/HU_whatLT.pdf

http://www.hltcentral.org/usr_docs/project-source/en/broch/harness.html#t

http://www.multilingual.com/volume2.htm

http://choo.fis.utoronto.ca/IMfaq/

http://sirio.deusto.es/abaitua/konzeptu/fatiga.htm

http://sirio.deusto.es/abaitua/konzeptu/ta/vic.htm

http://mull.ccl.umist.ac.uk/events/eamt-bcs/

http://ourworld.compuserve.com/homepages/WJHutchins/Aslib13.htm

http://www.essexac.uk/linguistics/clmt/MTbook/

http://sirio.deusto.es/abaitua/deli/testuteka/index.html

http://cst.dk/bente/

http://ourworld.compuserve.com/homepages/WJHutchins/

http://ourworld.compuserve.com/homepages/WJHutchins/MTS-95.htm

http://sirio.deusto.es/abaitua/konzeptu/ta/mt10h_es/index.html

http://sirio.deusto.es/abaitua/konzeptu/ta/EuroParlament.htm

http://www.europarl.eu.int/stoa/publi/99-12-01/part2_en.htm

http://sirio.deusto.es/abaitua/konzeptu/nlp/Mlim/mlim0.html