Abstract:

According to the terms given during these months, a study and an essay about "English Language and New Technologies" will be explained and discussed. I will comment something about the relevance and influence of the new technologies which are linked (belonged) to the Enlgish language. The subjects which I will be centered on are: the influence of information, the advantages of new technologies and comments on some subjects which were studied.

Introduction:

During these months I have learned what is the role and the aims of "Human Language & New Technologies" subject. Here is going to be explained my point of view and acknowledgements I have learnt during these months, I will discuss and exemplify the information and the appreciation I have taken from that information. The main aim is to explain and clarify what I have understood with some specific themes which I will comment on.

The influence of information:

The need of being well and properly informed is one of the aims of the foreign languages, but the problem is when an excess of information has no sense, it means that we could be well informed but we don't know how to use and explain that information in its best way.

This sometimes could appear as a disadvantage for the speaker and furthermore if the speaker is not a native English speaker. Here is an explanation of how the technology and information come together for a better quality of the language:

"We live, it is said, in the 'information age'. A moments reflection, however, might suggest that these times are in fact in the 'data' age - with the true information age just around the corner. From the dialogues of Plato (What is knowledge?) to the present day, questions about the nature of knowledge have perplexed philosophers for millennia. Since Plato's age enlightenment has waxed and waned, it is only since the digital age that our preoccupation with knowledge, data and information has exploded".

http://www.p-jones.demon.co.uk/infintro.htm

The advantages of the new technologies:

The advantages are developed in some different ways, we can use the traditional crafts of interpreting natural speech or translating printed material, which are peripheral to technology, may still benefit from technological training slightly more than anecdotally. "It is clear that those word processors, on-line dictionaries and all sorts of background documentation, such as concordances or collated texts".

http://sirio.deusto.es/abaitua/konzeptu/ta/vic.htm

Another new technology is the Machine Translator, this new gear makes better and easier the work of translating foreign sentences and these machines are defined like this; the term machine translation (MT) is normally taken in its restricted and precise meaning of fully automatic translation. This software broke the barrier of communication. But these programs also have some problems of semantics, because the researches are improving day by day these types of programs.

Internet also contributed for its better propaganda and diffusion of Machine Translators; "the Internet is, and will be to an increasing degree, both a vehicle for providing MT services and a major beneficiary of their application. To this extent, it is likely to provide a further key to making the Internet a truly global medium which can transcend not only geographical barriers but also linguistic ones".

http://www.europarl.eu.int/stoa/publi/99-12-01/part2_en.htm

Comments about some subjects:

It has been explained some of the many features of the "English Language and New Technologies" subject. The languages around the world have barriers and these barriers are the understanding of both sides of the languages that is why some programs and new techniques were used to make the way of understanding better, and to make different cultures to get closer.

The internet also contributed for the improvevement of minotity languages and its explanation for the interested society. "The Community's (European Community) official languages numbered only six: English, French, German, Dutch, Danish and Italian. This meant fifteen language pairs. Within eight years, the entry of Greece and subsequently Spain and Portugal into the Community had added three new official languages which had to be integrated into the system, still under development".

http://www.europarl.eu.int/stoa/publi/99-12-01/part2_en.htm

Body:

HLT Central

Gateway to Speech & Language Technology "Opportunities on the Web HLTCentral web site was established as an online information resource of human language technologies and related topics of interest to the HLT community at large. It covers news, R&D, technological and business developments in the field of speech, language, multilinguality, automatic translation, localisation and related areas. Its coverage of HLT news and developments is worldwide - with a unique European perspective. HLTCentral is Powered by Two EU funded projects, ELSNET and EUROMAP, are behind the development of HLTCentral".

What we can perceive on todays world is that we often try to simplify and understand different kind of languages and words in context. In this case related to new technologies and new researches in linguistic developments, we have seen a great change. The Purpose of HLT Central is to unit both sides; joining the new technologies as computers, machines and new engineering researches

What is the Information Society Commission?

The Information Society Commission (ISC) is an independent advisory body to Government, reporting directly to the Taoiseach. It draws on high-level representation from the business community, the social partners, and government itself. The ISC has a key role in shaping the evolving public policy framework for the Information Society in Ireland. It contributes to the policy forumulation process, monitor progress, and highlights issues that need to be prioritised. The ISC has 21 members (link to members page) drawn from a broad range of interests. Its chairman is Dr Danny O'Hare, former President of Dublin City University.

http://sirio.deusto.es/abaitua/konzeptu/nlp/HU_whatLT.pdf

http://www.isc.ie/about/reports.html

http://www.isc.ie/about/commission.html

Is it worth learning translation technology? Joseba Abaitua Universidad de Deusto "Abstract There are conditions under which translation technology is not only worth learning but essential. Then again, it may be completely disregarded in many other circumstances without substantial loss. Localization is the paradigm of the need for technology, while interpreting and literary translation are examples of the latter. The localization business is intimately connected with the software industry and companies in the field complain about the lack of qualified personnel that combine both an adequate linguistic background and computational skills. This is the reason why the industry (around the LISA association) has taken the lead over educational institutions by proposing courseware standards (the LEIT initiative) for training localization professionals. We will discuss this and other issues connected with the formation of professional translators for today.

0. Introduction

When discussing the relevance of technological training in the translation curricula, it is important to clarify the factors that make technology more indispensable and show how the training should be tuned accordingly. The relevance of technology will depend on the medium that contains the text to be translated. This particular aspect is becoming increasingly evident with the rise of the localization industry, which deals solely with information in digital form. There may be no other imaginable means for approaching the translation of such things as on-line manuals in software packages or CD-ROMs with technical documentation than computational ones. On the other hand, the traditional crafts of interpreting natural speech or translating printed material, which are peripheral to technology, may still benefit from technological training slightly more than anecdotally. It is clear that word processors, on-line dictionaries and all sorts of background documentation, such as concordances or collated texts, besides e-mail or other ways of network interaction with colleagues anywhere in the world may substantially help the literary translator's work. With the exception of a few eccentrics or maniacs, it will be rare in the future to see good professional interpreters and literary translators not using more or less sophisticated and specialized tools for their jobs, comparable to the familiarization with tape recorders or typewriters in the past. In any case, this might be something best left to the professional to decide, and may not be indispensable. However, the greater number of jobs for our students is in the localization market. Information of many types is rapidly changing format and going digital. Electronic documentation is the adequate realm for the incorporation of translation technology. This is something that young students of translation must learn. As the conception and design of technical documentation becomes progressively influenced by the electronic medium, it is integrating more and more with the whole concept of a software product. The strategies and means for translating both software packages and electronic documents are becoming very similar and both are now, as we will see, the goal of the localization industry".

Language Engineering and the Information Society
The Information Age.

The development and convergence of computer and telecommunication technologies has led to a revolution in the way that we work, communicate with each other, buy goods and use services, and even the way we entertain and educate ourselves. One of the results of this revolution is that large volumes of information will increasingly be held in a form which is more natural for human users than the strictly formatted, structured data typical of computer systems of the past. Information presented in visual images, as sound, and in natural language, either as text or speech, will become the norm.

We all deal with computer systems and services, either directly or indirectly, every day of our lives. This is the information age and we are a society in which information is vital to economic, social, and political success as well as to our quality of life. The changes of the last two decades may have seemed revolutionary but, in reality, we are only on the threshold of this new age. There are still many new ways in which the application of telematics and the use of language technology will benefit our way of life, from interactive entertainment to lifelong learning. Although these changes will bring great benefits, it is important that we anticipate difficulties which may arise, and develop ways to overcome them. Examples of such problems are: access to much of the information may be available only to the computer literate and those who understand English; a surfeit of information from which it is impossible to identify and select what is really wanted. Language Engineering can solve these problems. Information universally available The language technologies will make an indispensable contribution to the success of this information revolution. The availability and usability of new telematics services will depend on developments in language engineering.

Speech recognition will become a standard computer function providing us with the facility to talk to a range of devices, from our cars to our home computers, and to do so in our native language. In turn, these devices will present us with information, at least in part, by generating speech. Multi-lingual services will also be developed in many areas. In time, material provided by information services will be generated automatically in different languages. This will increase the availability of information to the general public throughout Europe. Initially, multi-lingual services will become available, based on basic data, such as weather forecasts and details of job vacancies, from which text can be generated in any language. Eventually, however, we can expect to see automated translation as an everyday part of information services so that we can both request and receive all sorts of information in our own language.

Home and Abroad Language Engineering will also help in the way that we deal with associates abroad. Although the development of electronic commerce depends very much on the adoption of interchange standards for communications and business transactions, the use of natural language will continue, precisely because it is natural. However, systems to generate business letters and other forms of communication in foreign languages will ease and greatly enhance communication. Automated translation combined with the management of documentation, including technical manuals and user handbooks, will help to improve the quality of service in a global marketplace. Export business will be handled cost effectively with the same high level of customer care that is provided in the home market. How can we cope with so much information ? One of the fundamental components of Language Engineering is the understanding of language, by the computer.

This is the basis of speech operated control systems and of translation, for example. It is also the way in which we can prevent ourselves from being overwhelmed with information, unable to collate, analyse, and select what we need. However, if information services are capable of understanding our requests, and can scan and select from the information base with real understanding, not only will the problem of information overload be solved but also no significant information will be missed. Language Engineering will deliver the right information at the right time". http://sirio.deusto.es/abaitua/konzeptu/nlp/echo/infoage.htmlhttp://babelfish.altavista.com/

http://www.hltcentral.org/

Why "knowledge" is of more value than "information"?

http://www.genecehamby.com/Knowledge.htmInformation versus Knowledge: Knowing the Difference by Gene Pinder As part of a marketing research project for a client recently, I was obtaining a fair amount of information on my own company’s telemarketing efforts. In fact, I was getting a lot of information—so much so that I felt I was beginning to drown in data. Sound familiar?.

While it’s true that most businesses do not have good, deep information on their customers, markets or competitors because they often don’t perform what one market research firm calls the “rigorous analysis of unimpeachable data,” it’s equally true that many businesses are currently experiencing information overload. A credit union executive, for example, showed me a survey that was completed by a marketing professional. Nicely bound in a black three-ring binder, the survey went on and on, page after page, of single and multi-layered cross tabulations. The only problem was the executive had no clue what the data said or even how to interpret it. In other words, it was a wasted exercise. Perhaps more importantly, it was a waste of precious financial resources.

Don’t confuse data with knowledge. The former is the ingredients one needs to make a great cake, but it’s the latter that that’s going to be eaten and enjoyed. Knowledge begins by knowing what information to look for, which often begins by identifying the problem that needs to be solved, and then it’s using only that information or data that will help you make good, sound business decisions. In my case, the most important question for me was “Am I reaching my quota of qualifying individuals for this survey?” By working backwards from that important question, I was able to use the data to see how effective and efficient my efforts were in achieving that goal. All of that, however, was secondary to my main purpose.

You may think that solo entrepreneurs are not drowning in data. Perhaps. However, they could be drowning in advice, especially now that so many people have discovered the joys of using newsletters as marketing tools. My e-mail boxes are becoming besieged with too many newsletters—to the point now where I won’t sign up for a new one unless it’s a “must have.”

How about your situation? Are you overloaded on information but lacking important, critical knowledge about your business? The sooner you distinguish between the two, the stronger your business will be.

Gene Pinder is the president and CEO of Pinder Marketing, a strategic marketing consulting and research firm located in Research Triangle Park, North Carolina. He can be reached at.

genep@pindermarketing.com.

Does the possesion of big quantities of data imply that we are well informed?

http://www.p-jones.demon.co.uk

We live, it is said, in the 'information age'. A moments reflection, however, might suggest that these times are in fact in the 'data' age - with the true information age just around the corner. From the dialogues of Plato (What is knowledge?) to the present day, questions about the nature of knowledge have perplexed philosophers for millennia. Since Plato's age enlightenment has waxed and waned, it is only since the digital age that our preoccupation with knowledge, data and information has exploded.

Together cognitive science and information science have given rise to whole new areas of ontological study, and in related fields such as search and retrieval. The digital age is prompting new, radical definitions of information. Devlin (1991) backtracks a little to address the need for a definition of 'perception'. This is defined as a two stage process, corresponding to an analogue/digital distinction: 'The first stage is perception, where the information in the environment becomes directly accessible to the agent by way of some sort of sensor ... At this stage the information flow is analogue, relative to whatever information we are concerned with.

The second stage, if there is one, involves the extraction of a specific item (or items) of information from that perceived 'continuum'; that is to say, it involves the conversion from analogue to digital information. This stage is cognition.' INFORMATION about DEFINING DATA Despite emphasis upon 'analogue' and 'digital devices', 'data gathering', 'CPUs', 'flowcharts', 'resource management', and other 'infospeak' little is understood about data and information. Everyone uses it, but what is information? Scarrot (1987) and Stamper (1985) posed this question, which businesses are often forced to ask as they seek to realize benefits from ICT. Bawden (1992) defines information as the fourth "corporate resource". During the mid-1980s, the microcomputer boon gathered pace, three pivotal applications - wordprocessing, spreadsheets and databases - were becoming accessible to more non-specialist people. Commentators at this time pointed out that despite references to 'information theory', there was in fact no 'theory of information'. Editorial (1985) Some proposed that the beginnings of a theory of information existed, but much needed to be done. Hamming (1980) wrote; "Information theory does not handle the meaning of the information, it treats only the amount of information.." and "The applicability of the ideas is not exact - they are often merely suggestive - but the ideas are still very useful. ... ; the theory provides an intellectual tool for understanding the processing of information.".

So how far has our understanding progressed? What is data, information? Data is usually regarded as the most fundamental form of information. A symbol e.g. 'a', 'y', 'K', '6', '%', '+', or a signal can all be viewed as data. Usually the term 'data' suggests something raw and unrefined. Something that must be polished into a finished product. Researchers actually talk of 'cleaning' their data. Data in machine readable form may be unintelligible to people, for example, barcodes and sensor readings. Currently, as defined by their operation, computers process data they do not (yet) process information. Specific words - such as a person's name - are usually considered 'information'; while a person's hospital number is usually viewed as data. Despite this the terms 'data' and 'information' are often used synonymously. For example, a collection of data items commonly called a 'data set' may in fact include information. Executives 'mine' the data in their corporate databases. What they seek is information, hidden in vast quantities of data. Information usually denotes data, in a combined form used for some specific purpose. So people speak of the database, from which the report is produced containing the information required. Further complexities are found when information is encoded. What looks like 'data' as I have just defined could in fact be information, as described below. Despite the advent of the information age there is no clear and precise definition of information.

Those definitions that have emerged, however, are remarkable in their underlying simplicity and potential scope of application. Furthermore, they are especially interesting when applied to health related disciplines. A classic, and much cited text in communication theory is - 'A Mathematical Theory of Communication' by Shannon and Weaver (1949). Weaver's contribution is quite accessible to non-mathematicians (like the author of this paper), while Shannon concentrates on the mathematics of telecommunications. Weaver provides the following definition of information: '.. this word information in communication theory relates not so much to what you do say, as to what you could say. That is, information is a measure of one's freedom of choice when one selects a message. If one is confronted with a very elementary situation where one has to choose one of two alternative messages, then it is arbitrarily said that the information, associated with this situation, is unity.' pp.8-9 Work continues under the flag of 'information theory', which is concerned with the processing of information, its constitution, transmission and properties.

Drekste (1981) writes: 'Information theory identifies the amount of information associated with, or generated by the occurrence of an event (or the realization of a state of affairs) with the reduction in uncertainty, the elimination of possibilities, represented by that event of state of affairs.' p.4

How many words of technical information are recorded every day?

When we talk about language proficiency, many people often think about the size of a language user's vocabulary, i.e. how many words she knows in the language in question. Many figures have been advanced for how many words a language user really knows actively in her native language - the figure 20,000 has often been mentioned (e.g. Nation, 1990). But the question then is: How many and which of these words are central for a given learner of a foreign or second language?

The concepts of core words or basic words indicate a need to lay down certain criteria for the selection of precisely the words that are to be learned first, and that are absolutely necessary for building up, for example, sufficient reading proficiency in the target language. Many people have mentioned 2000 words as a minimum requirement, and a figure that is enough to ensure that more than 80% of a normal text can be read and understood. In my article on various types of tasks in connection with work on vocabulary in this number some of the criteria are discussed that can be established for drawing up a list of basic words.

In his article, J. Gimbel mentions second language pupils' problems with the so-called pre-subject vocabulary, i.e. the words that lie between the most basic vocabulary and the vocabulary that is subject-specific, which is far less frequent and which most Danish children do not master, either. Even though 80% of a text can possibly be read with a knowledge of the most frequent words, there is still a long way to the 95% that Gimbel mentions as a minimum for the quality of one's reading to be satisfactory. His investigation emphasises, then, the need for research that examines more specifically the size of the vocabulary of various groups of people, and which tries to determine more precisely how large a vocabulary must be if a language user is to be in a position to solve particular tasks.

What is the most convenient way of representing information? Why?

All information has structure, and any physical rendering of a document is a projection of this structure onto a particular medium, e.g.,printed paper. A ``rendering'' of a document on some medium is best understood if it makes this logical structure readily apparent. For example, a visual rendering -onto a two-dimensional medium like paper- may use cues like boldface, different fonts, and indenting to help reveal structure. A visual rendering takes advantage of the eye's ability to rapidly access different parts of a two-dimensional display. An audio rendering has to use an entirely different set of cues to reveal structure.

Early in the development of AsTeR , we realized that the ability to render information in a variety of output modalities would be a direct function of the richness of the internal representation used to capture structure and content. Abstractly speaking, the high-level structure of a document is independent of any particular mode of display, and the internal representation should reflect this. As a first step in realizing AsTeR , therefore, we developed high-level models to represent document structure. For instance, the richness of the representation used by AsTeR completely frees the order in which subterms in an equation are rendered aurally from the order in which they would appear on paper.

This section briefly outlines some of the representations used in AsTeR . Rendering this high-level representation is outlined in s:rendering. Based on these ideas, we define a set of requirements in s:conclusion that should prevent electronic encodings from being tied down to any single display form

But how exactly they would reflect the specificity of the social and the human experience of living in this new society which appears to resist easy visualization?

(For instance, all kinds of work are reduced to sitting in front of a computer screen; all kind of activities are reduced to invisible streams of data traveling through the global computer networks.)

A related question is what kind of aesthetics is appropriate for a society where most work and many forms of leisure are computer based? If industrial society led to a range of different aesthetics strategies, from montage to streamlined, ornament free architecture and design, what are the new aesthetics appropriate for information society?

The workshop would consist from two parts: theoretical and practical. In the first part (to be accomplished in September) we will explore historical parallels between the economics and culture of the industrial age (nineteenth and early 20th century) and the economics and culture of the information age (today). We will also look at selected areas of contemporary culture (new media, architecture, fashion, and cinema) to see if we can already find signs of info-aesthetics at work.

In the second, practical part (which will extend into the Fall and will involve virtual collaboration), the participants will work on individual projects designed to explore info-aesthetics -- that is, they will use digital media to represent different social and human dimensions of information society.

http://www.manovich.net/IA/IA_workshop.html

How can computer science and language technologies help manage information?

Information management is the harnessing of the information resources and information capabilities of the organization in order to add and create value both for itself and for its clients or customers.

The idea underlying IM is that just as an organization purposefully and systematically manages its human resources or financial assets, it should do likewise for its information resources and processes. All the classic functions of managing an organizational activity apply to IM as well: defining goals, providing leadership, developing policies, allocating resources, training staff, evaluation and feedback

Consider a document containing a table of numbers indicating product sales for the quarter. As they stand, these numbers are Data. An employee reads these numbers, recognizes the name and nature of the product, and notices that the numbers are below last year’s figures, indicating a downward trend. The data has become Information. The employee considers possible explanations for the product decline (perhaps using additional information and personal judgment), and comes to the conclusion that the product is no longer attractive to its customers. This new belief, derived from reasoning and reflection, is Knowledge.

Thus, information is data given context, and endowed with meaning and significance. Knowledge is information that is transformed through reasoning and reflection into beliefs, concepts, and mental models.

A KM framework involves designing and working with the following elements: Categories of organizational knowledge (tacit knowledge, explicit knowledge, cultural knowledge) Knowledge processes (knowledge creation, knowledge sharing, knowledge utilization) Organizational enablers (vision and strategy; roles and skills; policies and processes; tools and platforms) IM provides the foundation for KM, but the two are focused differently. IM is concerned with processing and adding value to information, and the basic issues here include access, control, coordination, timeliness, accuracy, and usability. KM is concerned with using the knowledge to take action, and the basic issues here include codification, diffusion, practice, learning, innovation, and community building.

Why language can sometimes be seen as a barrier to communication? How can this change?

Our research problem originates from the two contradicting tales of Falungong. Each claims to be the only appropriate understanding of Li Hongzhis theory and the followers behaviors, thus generating serious debates since 1999. Controversies have been following Falungong ever since 1996 when the Guangming Daily published a critical review of Zhuan Falun1. By that time, it had been four years since Li Hongzhi first appeared in the Victory Park (Shengli Gongyuan) in the city of Changchun on May 8, 1992 and announced his discovery. When it drew critical reviews in 1996, Falungong had already attracted hundreds of thousands, and probably over a million followers, including some in the law enforcement departments of the central government (such as Mr. Ye Hao, now the leader of the Falungongs. Minghui Net in Canada, then a ranking officer in the Ministry of Public Security) and several scholars in prominent educational institutions. Using their number mass, strategic acumen, and effective organization, Falungong successfully outmaneuvered its critics from the civil society for three more years.

http://216.239.33.100/search?q=cache:qJ7dFEq3FTgC:www.xys.org/xys/netters/Fang-Zhouzi/religion/2tales.doc+%22language+as+a+barrier+of+communication%22&hl=es&ie=UTF-8

There has been two types of communication. One is face-to-face communication. It involves not only meeting but also talking by telephon, chat...etc. It's trait is that we can get the answer soon, but we can't take it in deep consideration and we should share the same time together. The other is composition-to-composition communication.

It involves not only mails, but fax, video letter...etc. It's trait is that we can't get the answer soon, but we can read or watch any time we are free, and we can think about it deeply. By these characteristics, we can say that former one is mainly used in private communication, and the latter one is mainly used in official communication. In order to prevent fruitless disputes, we accept the time-lag which might be more than a week, and wait for the answer. Some of composition-to-composition communications, though, are used to inform private message as a Japanese custom. It is called Nengajo, which is new year's greetings. It's origin is from Heian period (794-1192), and widely accustomed by Edo period(1603-1868).

Any way, in the future, technics will break down the barrier of communication, that is, we can easily express our feelings by new communication tools. Then we should polish up our own ideas. Thre are other aspect of future communication. As the urbanization expands, people become more and more lonely and need an easy communication. Greetings or some other communications are used to maintain communication. In future communication, more and more people will use new tools to communicate only for communication, that is to talk with others. As the Cyber world feels us ease to communicate, these communications will grow their weight.

It may be difficult to communicate between parent and their child, especially when children are in their puberty. It is important to express each other's feelings and the place to do this is needed. We propose that parents and their children should make web pages together. The only problem left is how.

Volunteers who teach people how to make web pages may be help. There is a result. Mariko Oka and her father, Toru Oka made web pages together with Takayuki Matsuo.

Mariko and Toru made the contents. Takayuki assembled them into web page. As she is interested in the topic, v6, she positively made the page with her father. Mariko said, "I felt that my page is growing fast. At first, I couldn't believe that I could make my page. My own page made it much easier to communicate with my mail friends." Mr Oka said, "The time I spent with my daughter increased. I was satisfied that I could concern with my daughter's education."

As the example shows, new era needs a new form of communication.

http://contest.thinkquest.gr.jp/tqj2000/30202/english/communication.html

In what ways does Language Engineering improves the use of language?

Natural Language engineering is a new approach to natural language processing, with respect to the traditional computational linguistics method, and has been acknowledged by the EU in their LRE programme as the approach most likely to bring substantial benefits in the medium term to end users. The Laboratory for Natural Language Engineering (LNLE) (one of the first named and completely specialised in this branch of NLP) has been working in this area for over nine years and is dedicated to research which can lead to working NLE produces used for industrial strength applications. We are not interested in "toy" systems. To that end, the LNLE undertakes research on the LOLITA large scale natural language processing system, built according to the principles of natural language engineering with a strong theoretical foundation in neo-pragmatist philosophy.

http://www.dur.ac.uk/~dcs0www3/lnle/lnlehome.html

What is Language Engineering ?

Language Engineering is the application of knowledge of language to the development of computer systems which can recognise, understand, interpret, and generate human language in all its forms. In practice, Language Engineering comprises a set of techniques and language resources. The former are implemented in computer software and the latter are a repository of knowledge which can be accessed by computer software.

Language Technology, Language Engineering and Computational Linguistics. Similarities and differences.

Language technologies are information technologies that are specialized for dealing with the most complex information medium in our world: human language. Therefore these technologies are also often subsumed under the term Human Language Technology. Human language occurs in spoken and written form. Whereas speech is the oldest and most natural mode of language communication, complex information and most of human knowledge is maintained and transmitted in written texts.

Speech and text technologies process or produce language in these two modes of realization. But language also has aspects that are shared between speech and text such as dictionaries, most of grammar and the meaning of sentences. Thus large parts of language technology cannot be subsumed under speech and text technologies. Among those are technologies that link language to knowledge. We do not know how language, knowledge and thought are represented in the human brain. Nevertheless, language technology had to create formal representation systems that link language to concepts and tasks in the real world.

This provides the interface to the fast growing area of knowledge technologies. In our communication we mix language with other modes of communication and other information media. We combine speech with gesture and facial expressions. Digital texts are combined with pictures and sounds. Movies may contain language and spoken and written form. Thus speech and text technologies overlap and interact with many other technologies that facilitate processing of multimodal communication and multimedia documents. Computational linguistics (CL) is a discipline between linguistics and computer science which is concerned with the computational aspects of the human language faculty. It belongs to the cognitive sciences and overlaps with the field of artificial intelligence (AI), a branch of computer science aiming at computational models of human cognition. Computational linguistics has applied and theoretical components Applied CL focusses on the practical outcome of modelling human language use. The methods, techniques, tools and applications in this area are often subsumed under the term language engineering or (human) language technology. Although existing CL systems are far from achieving human ability, they have numerous possible applications.

The goal is to create software products that have some knowledge of human language. Such products are going to change our lives. They are urgently needed for improving human-machine interaction since the main obstacle in the interaction beween human and computer is a communication problem. Today's computers do not understand our language but computer languages are difficult to learn and do not correspond to the structure of human thought. Even if the language the machine understands and its domain of discourse are very restricted, the use of human language can increase the acceptance of software and the productivity of its users. Much older than communication problems between human beings and machines are those between people with different mother tongues.

One of the original aims of applied computational linguistics has always been fully automatic translation between human languages. From bitter experience scientists have realized that they are still far away from achieving the ambitious goal of translating unrestricted texts. Nevertheless computational linguists have created software systems that simplify the work of human translators and clearly improve their productivity. Less than perfect automatic translations can also be of great help to information seekers who have to search through large amounts of texts in foreign languages Language Engineering is the application of knowledge of language to the development of computer systems which can recognise, understand, interpret, and generate human language in all its forms. In practice, Language Engineering comprises a set of techniques and language resources. The former are implemented in computer software and the latter are a repository of knowledge which can be accessed by computer software.

http://www.hltcentral.org/usr_docs/project-source/en/broch/harness.html#wile

http://sirio.deusto.es/abaitua/konzeptu/nlp/HU_whatLT.pdf

Which are the main techniques used in Language Engineering?

There are many techniques used in Language Engineering and some of these are described below. Speaker Identification and Verification: A human voice is as unique to an individual as a fingerprint. This makes it possible to identify a speaker and to use this identification as the basis for verifying that the individual is entitled to access a service or a resource. The types of problems which have to be overcome are, for example, recognising that the speech is not recorded, selecting the voice through noise (either in the environment or the transfer medium), and identifying reliably despite temporary changes (such as caused by illness). Speech Recognition: The sound of speech is received by a computer in analogue wave forms which are analysed to identify the units of sound (called phonemes) which make up words. Statistical models of phonemes and words are used to recognise discrete or continuous speech input. The production of quality statistical models requires extensive training samples (corpora) and vast quantities of speech have been collected, and continue to be collected, for this purpose.

There are a number of significant problems to be overcome if speech is to become a commonly used medium for dealing with a computer. The first of these is the ability to recognise continuous speech rather than speech which is deliberately delivered by the speaker as a series of discrete words separated by a pause. The next is to recognise any speaker, avoiding the need to train the system to recognise the speech of a particular individual. There is also the serious problem of the noise which can interfere with recognition, either from the environment in which the speaker uses the system or through noise introduced by the transmission medium, the telephone line, for example.

Noise reduction, signal enhancement and key word spotting can be used to allow accurate and robust recognition in noisy environments or over telecommunication networks. Finally, there is the problem of dealing with accents, dialects, and language spoken, as it often is, ungrammatically Character and Document Image Recognition Recognition of written or printed language requires that a symbolic representation of the language is derived from its spatial form of graphical marks. For most languages this means recognising and transforming characters.

There are two cases of character recognition: recognition of printed images, referred to as Optical Character Recognition (OCR) recognising handwriting, usually known as Intelligent Character Recognition (ICR) OCR from a single printed font family can achieve a very high degree of accuracy. Problems arise when the font is unknown or very decorative, or when the quality of the print is poor. In these difficult cases, and in the case of handwriting, good results can only be achieved by using ICR. This involves word recognition techniques which use language models, such as lexicons or statistical information about word sequences.

Document image analysis is closely associated with character recognition but involves the analysis of the document to determine firstly its make-up in terms of graphics, photographs, separating lines and text, and then the structure of the text to identify headings, sub-headings, captions etc. in order to be able to process the text effectively. Natural Language Understanding: The understanding of language is obviously fundamental to many applications.

However, perfect understanding is not always a requirement. In fact, gaining a partial understanding is often a very useful preliminary step in the process because it makes it possible to be intelligently selective about taking the depth of understanding to further levels. Shallow or partial analysis of texts is used to obtain a robust initial classification of unrestricted texts efficiently. This initial analysis can then be used, for example, to focus on 'interesting' parts of a text for a deeper semantic analysis which determines the content of the text within a limited domain. It can also be used, in conjunction with statistical and linguistic knowledge, to identify linguistic features of unknown words automatically, which can then be added to the system's knowledge Semantic models are used to represent the meaning of language in terms of concepts and relationships between them. A semantic model can be used, for example, to map an information request to an underlying meaning which is independent of the actual terminology or language in which the query was expressed. This supports multi-lingual access to information without a need to be familiar with the actual terminology or structuring used to index the information Combinations of analysis and generation with a semantic model allow texts to be translated.

At the current stage of development, applications where this can be achieved need be limited in vocabulary and concepts so that adequate Language Engineering resources can be applied. Templates for document structure, as well as common phrases with variable parts, can be used to aid generation of a high quality text Natural Language Generation: A semantic representation of a text can be used as the basis for generating language. An interpretation of basic data or the underlying meaning of a sentence or phrase can be mapped into a surface string in a selected fashion; either in a chosen language or according to stylistic specifications by a text planning system Speech Generation: Speech is generated from filled templates, by playing 'canned' recordings or concatenating units of speech (phonemes, words) together. Speech generated has to account for aspects such as intensity, duration and stress in order to produce a continuous and natural response Dialogue can be established by combining speech recognition with simple generation, either from concatenation of stored human speech components or synthesising speech using rules Providing a library of speech recognisers and generators, together with a graphical tool for structuring their application, allows someone who is neither a speech expert nor a computer programmer to design a structured dialogue which can be used, for example, in automated handling of telephone calls.

Which language resources are essential components of Language Engineering?

Language resources are essential components of Language Engineering. They are one of the main ways of representing the knowledge of language, which is used for the analytical work leading to recognition and understanding. The work of producing and maintaining language resources is a huge task. Resources are produced, according to standard formats and protocols to enable access, in many EU languages, by research laboratories and public institutions. Many of these resources are being made available through the European Language Resources Association (ELRA). Lexicons : A lexicon is a repository of words and knowledge about those words. This knowledge may include details of the grammatical structure of each word (morphology), the sound structure (phonology), the meaning of the word in different textual contexts, e.g. depending on the word or punctuation mark before or after it.

A useful lexicon may have hundreds of thousands of entries. Lexicons are needed for every language of application Specialist Lexicons There are a number of special cases which are usually researched and produced separately from general purpose lexicons Proper names: Dictionaries of proper names are essential to effective understanding of language, at least so that they can be recognised within their context as places, objects, or person, or maybe animals. They take on a special significance in many applications, however, where the name is key to the application such as in a voice operated navigation system, a holiday reservations system, or railway timetable information system, based on automated telephone call handling.

Terminology: In today's complex technological environment there are a host of terminologies which need to be recorded, structured and made available for language enhanced applications. Many of the most cost-effective applications of Language Engineering, such as multi-lingual technical document management and machine translation, depend on the availability of the appropriate terminology banks. Wordnets: A wordnet describes the relationships between words; for example, synonyms, antonyms, collective nouns, and so on. These can be invaluable in such applications as information retrieval, translator workbenches and intelligent office automation facilities for authoring Grammars: A grammar describes the structure of a language at different levels: word (morphological grammar), phrase, sentence, etc. A grammar can deal with structure both in terms of surface (syntax) and meaning (semantics and discourse).

Corpora A corpus is a body of language, either text or speech, which provides the basis for: analysis of language to establish its characteristics training a machine, usually to adapt its behaviour to particular circumstances verifying empirically a theory concerning language a test set for a Language Engineering technique or application to establish how well it works in practice. There are national corpora of hundreds of millions of words but there are also corpora which are constructed for particular purposes.

For example, a corpus could comprise recordings of car drivers speaking to a simulation of a control system, which recognises spoken commands, which is then used to help establish the user requirements for a voice operated control system for the market.

http://www.hltcentral.org/usr_docs/project-source/en/broch/harness.html#lr

Check for the following terms:

natural language processing : [p] a term in use since the 1980s to define a class of software systems which handle text intelligently translator's workbench: [p] a software system providing a working environment for a human translator, which offers a range of aids such as on-line dictionaries, thesauri, translation memories, etc shallow parser : [p] software which parses language to a point where a rudimentary level of understanding can be realised; this is often used in order to identify passages of text which can then be analysed in further depth to fulfil the particular objective formalism : [n] a means to represent the rules used in the establishment of a model of linguistic knowledge speech recognition : The sound of speech is received by a computer in analogue wave forms which are analysed to identify the units of sound (called phonemes) which make up words. Statistical models of phonemes and words are used to recognise discrete or continuous speech input

. The production of quality statistical models requires extensive training samples (corpora) and vast quantities of speech have been collected, and continue to be collected, for this purpose. There are a number of significant problems to be overcome if speech is to become a commonly used medium for dealing with a computer. The first of these is the ability to recognise continuous speech rather than speech which is deliberately delivered by the speaker as a series of discrete words separated by a pause. The next is to recognise any speaker, avoiding the need to train the system to recognise the speech of a particular individual.

There is also the serious problem of the noise which can interfere with recognition, either from the environment in which the speaker uses the system or through noise introduced by the transmission medium, the telephone line, for example. Noise reduction, signal enhancement and key word spotting can be used to allow accurate and robust recognition in noisy environments or over telecommunication networks. Finally, there is the problem of dealing with accents, dialects, and language spoken, as it often is, ungrammatically. text alignment : [p] the process of aligning different language versions of a text in order to be able to identify equivalent terms, phrases, or expressions authoring tools : [p] facilities provided in conjunction with word processing to aid the author of documents, typically including an on-line dictionary and thesaurus, spell-, grammar-, and style-checking, and facilities for structuring, integrating and linking documents controlled language : [p] language which has been designed to restrict the number of words and the structure of (also artificial language) language used, in order to make language processing easier; typical users of controlled language work in an area where precision of language and speed of response is critical, such as the police and emergency services, aircraft pilots, air traffic control, etc domain : [n] usually applied to the area of application of the language enabled software e.g. banking, insurance, travel, etc.; the significance in Language Engineering is that the vocabulary of an application is restricted so the language resource requirements are effectively limited by limiting the domain of application.

Multilinguality

In the translation curricula, which factors make technology more indispensable?

When discussing the relevance of technological training in the translation curricula, it is important to clarify the factors that make technology more indispensable and show how the training should be tuned accordingly. The relevance of technology will depend on the medium that contains the text to be translated. This particular aspect is becoming increasingly evident with the rise of the localization industry, which deals solely with information in digital form. There may be no other imaginable means for approaching the translation of such things as on-line manuals in software packages or CD-ROMs with technical documentation than computational ones.

http://sirio.deusto.es/abaitua/konzeptu/ta/vic.htm

Do professional interpreters and literary translators need translation technology? Which are the tools they need for their job?

On the other hand, the traditional crafts of interpreting natural speech or translating printed material, which are peripheral to technology, may still benefit from technological training slightly more than anecdotally. It is clear that word processors, on-line dictionaries and all sorts of background documentation, such as concordances or collated texts, besides e-mail or other ways of network interaction with colleagues anywhere in the world may substantially help the literary translator's work. With the exception of a few eccentrics or maniacs, it will be rare in the future to see good professional interpreters and literary translators not using more or less sophisticated and specialized tools for their jobs, comparable to the familiarization with tape recorders or typewriters in the past. In any case, this might be something best left to the professional to decide, and may not be indispensable.

However, the greater number of jobs for our students is in the localization market. Information of many types is rapidly changing format and going digital. Electronic documentation is the adequate realm for the incorporation of translation technology. This is something that young students of translation must learn. As the conception and design of technical documentation becomes progressively influenced by the electronic medium, it is integrating more and more with the whole concept of a software product. The strategies and means for translating both software packages and electronic documents are becoming very similar and both are now, as we will see, the goal of the localization industry.

In what ways is documentation becoming electronic? How does this affect the industry?

the greater number of jobs for our students is in the localization market. Information of many types is rapidly changing format and going digital. Electronic documentation is the adequate realm for the incorporation of translation technology. This is something that young students of translation must learn. As the conception and design of technical documentation becomes progressively influenced by the electronic medium, it is integrating more and more with the whole concept of a software product. The strategies and means for translating both software packages and electronic documents are becoming very similar and both are now, as we will see, the goal of the localization industry.

The increase of information in electronic format is linked to advances in computational techniques for dealing with it. Together with the proliferation of informational webs in Internet, we can also see a growing number of search and retrieval devices, some of which integrate translation technology. Technical documentation is becoming electronic, in the form of CD-ROM, on-line manuals, intranets, etc. An important consequence of the popularization of Internet is that the access to information is now truly global and the demand for localizing institutional and commercial Web sites is growing fast. In the localization industry, the utilization of technology is congenital, and developing adequate tools has immediate economic benefits.

What is the focus of the localization industry? Do you believe there might be a job for you in that industry sector?

The increase of information in electronic format is linked to advances in computational techniques for dealing with it. Together with the proliferation of informational webs in Internet, we can also see a growing number of search and retrieval devices, some of which integrate translation technology. Technical documentation is becoming electronic, in the form of CD-ROM, on-line manuals, intranets, etc. An important consequence of the popularization of Internet is that the access to information is now truly global and the demand for localizing institutional and commercial Web sites is growing fast. In the localization industry, the utilization of technology is congenital, and developing adequate tools has immediate economic benefits.

The main role of localization companies is to help software publishers, hardware manufacturers and telecommunications companies with versions of their software, documentation, marketing, and Web-based information in different languages for simultaneous worldwide release. The recent expansion of these industries has considerably increased the demand for translation products and has created a new burgeoning market for the language business. According to a recent industry survey by LISA (the Localization Industry Standards Association), almost one third of software publishers, such as Microsoft, Oracle, Adobe, Quark, etc., generate above 20 percent of their sales from localized products, that is, from products which have been adapted to the language and culture of their targeted markets, and the great majority of publishers expect to be localizing into more than ten different languages.

Localization is not limited to the software-publishing business and it has infiltrated many other facets of the market, from software for manufacturing and enterprise resource planning, games, home banking, and edutainment (education and entertainment), to retail automation systems, medical instruments, mobile phones, personal digital assistants (PDA), and the Internet. Doing business in an integrated global economy, with growing electronic transactions, and world wide access to products and services means an urgent need to break through language barriers. A prediction of $220 billion online spending by 2001 shows the potential of this new market. It means that product information, from purchasing procedures to user manuals, must be made available in the languages of potential customers. According to the latest surveys, there are more than 35 million non-English-speaking Internet users. Internet is thus evolving into a huge consumer of Web-based information in different languages. The company Nua Ltd. provides a good example of how the demand for multilingual Web-sites is changing the notion of translation into localization.

Nua has recently won a substantial contract to develop and maintain a searchable multilingual intranet for the American Export Group (AEG), a division of Thomas Publishing International. Nua's task is to transform the existing American Export Register (AER), a directory of some 6,000 pages, into a localized database of 45,000 company listings, with information about each company, including a categorization into one of AEG's 5,000 categories. AEG's intranet will link 47,000 US firms to overseas clients. The first version of the AER register will provide access in five languages: English, French, German, Spanish, and Portuguese. Russian is due to follow, and the company hopes eventually to have an Arabic version. Any such multilingual service involves frequent revisions and updates, which in turn means a high demand for constant localization effort.

Define internationalization, globalization and localization. How do they affect the design of software products?

Professor Margaret King of Geneva University described the first step of the project as consisting of the "clarification of the state of affairs and to plan courses that are comprehensive enough to cover all aspects of interest of the localization industry, to review all aspects of the localization industry, from translation and technical writing through globalization, internationalization, and localization". The definition of the critical terms involved was a contentious topic, although there seems to be a consensus with the following:

Globalization: The adaptation of marketing strategies to regional requirements of all kinds (e.g., cultural, legal, and linguistic).

Internationalization: The engineering of a product (usually software) to enable efficient adaptation of the product to local requirements.

Localization: The adaptation of a product to a target language and culture (locale).

The main goal of the LEIT initiative is to introduce localization courseware into translation studies, with versions ready for the start of the 1999 academic year. However, this must be done with care. Bert Esselink (1998), from AlpNet, for example, argues against separating localization from other disciplines and claims its basic principles should be covered in all areas of translation training. Furthermore, it would be useful to add the trainers not only need constant feedback and guidance from the commercial sector, they also need to maintain close contact with the software industry. So, perhaps, one of the best features of the LEIT initiative is its combination of partnership from the academic as well as from the industry world. LISA offers the first version of this courseware on its Web-site and users have the possibility to contact the LEIT group and collaborate through an on-line questionnaire.

Are translation and localization the same thing? Explain the differences.

The above lines depict a view of a translation environment which is closer to more traditional needs of the translator than to current requirements of the industry. Many aspects of software localization have not been considered, particularly the concepts of multilingual management and document-life monitoring. Corporations are now realizing that documentation is an integral part of the production line where the distinction between product, marketing and technical material is becoming more and more blurred. Product documentation is gaining importance in the whole process of product development with direct impact on time-to-market. Software engineering techniques that apply in other phases of software development are beginning to apply to document production as well. The appraisal of national and international standards of various types is also significant: text and character coding standards (e.g. SGML/XML and Unicode), as well as translation quality control standards (e.g. DIN 2345 in Germany, or UNI 10574 in Italy).

In response to these new challenges, localization packages are now being designed to assist users throughout the whole life cycle of a multilingual document. These take them through job setup, authoring, translation preparation, translation, validation, and publishing, besides ensuring consistency and quality in source and target language variants of the documentation. New systems help developers monitor different versions, variants and languages of product documentation, and author customer specific solutions. An average localization package today will normally consist of an industry standard SGML/XML editor (e.g. ArborText), a translation and terminology toolkit (Trados Translator's Workbench), and a publishing engine (e.g. Adobe's Frame+SGML).

Unlike traditional translators, software localizers may be engaged in early stages of software development, as there are issues, such as platform portability, code exchange, format conversion, etc. which if not properly dealt with may hinder product internationalization. Localizers are often involved in the selection and application of utilities that perform code scanning and checking, that automatically isolate and suggest solutions to National Language Support (NLS) issues, which save time during the internationalization enabling process. There are run-time libraries that enable software developers and localizers to create single-source, multilingual, and portable cross-platform applications. Unicode support is also fundamental for software developers who work with multilingual texts, as it provides a consistent coding format for international character sets.

What is a translation workstation? Compare it with a standard localization tool.

Leaving behind the old conception of a monolithic compact translation engine, the industry is now moving in the direction of integrating systems: "In the future Trados will offer solutions that provide enterprise-wide applications for multilingual information creation and dissemination, integrating logistical and language-engineering applications into smooth workflow that spans the globe," says Trados manager Henri Broekmate. Logos, the veteran translation technology provider, has announced "an integrated technology-based translation package, which will combine term management, TM, MT and related tools to create a seamless full service localization environment." Other software manufacturers also in the race are Corel, Star, IBM, and the small but belligerent Spanish company Atril. This approach for integrating different tools is largely the view advocated by many language-technology specialists. Below is a description of an ideal engine which captures the answers given by Muriel Vasconcellos (from the Pan American Health Organization), Minako O'Hagan (author of The Coming Age of Teletranslations) and Eduard Hovy (President of the Association of Machine Translation in the Americas) to a recent survey (by Language International 10.6). The ideal workstation for the translator would combine the following features:

Full integration in the translator's general working environment, which comprises the operating system, the document editor (hypertext authoring, desktop publisher or the standard word-processor), as well as the emailer or the Web browser. These would be complemented with a wide collection of linguistic tools: from spell, grammar and style checkers to on-line dictionaries, and glossaries, including terminology management, annotated corpora, concordances, collated texts, etc.

The system should comprise all advances in machine translation (MT) and translation memory (TM) technologies, be able to perform batch extraction and reuse of validated translations, enable searches into TM databases by various keywords (such as phrases, authors, or issuing institutions). These TM databases could be distributed and accessible through Internet. There is a new standard for TM exchange (TMX) that would permit translators and companies to work remotely and share memories in real-time.

Eduard Hovy underlines the need for a genre detector. "We need a genre topology, a tree of more or less related types of text and ways of recognizing and treating the different types computationally." He also sees the difficulty of constantly up-dating the dictionaries and suggests a "restless lexicon builder that crawls all over the Web every night, ceaselessly collecting words, names, and phrases, and putting them into the appropriate lexicons.

" Muriel Vasconcellos pictures her ideal design of the workstation in the following way:

Good view of the source text extensive enough to offer the overall context, including the previous sentence and two or three sentences after the current one.

Relevant on-line topical word lists, glossaries and thesaurus. These should be immediately accessible and, in the case of topical lists, there should be an optimal switch that shows, possibly in color, when there are subject-specific entries available.

Three target-text windows. The first would be the main working area, and it would start by providing a sentence from the original document (or a machine pre-translation), which could be over-struck or quickly deleted to allow the translator to work from scratch. The original text or pre-translation could be switched off. Characters of any language and other symbols should be easy to produce.

Drag-and-drop is essential and editing macros are extremely helpful when overstriking or translating from scratch.

The second window would offer translation memory when it is available. The TM should be capable of fuzzy matching with a very large database, with the ability to include the organization's past texts if they are in some sort of electronic form.

The third window would provide a raw machine translation which should be easy to paste into the target document.

The grammar checker can be tailored so that it is not so sensitive. It would be ideal if one could write one's own grammar rules.

Machine translation vs. human translation. Do you agree that translation excellence goes beyond technology? Why?

Having said all this, it is important to reassess the human factor. Like cooks, tailors or architects, professional translators need to become acquainted with technology, because good use of technology will make their jobs more competitive and satisfactory. But they should not dismiss craftsmanship. Technology enhances productivity, but translation excellence goes beyond technology. It is important to delimit the roles of humans and machines in translation. Martin Kay's (1987) words in this respect are most illustrative:

A computer is a device that can be used to magnify human productivity. Properly used, it does not dehumanize by imposing its own Orwellian stamp on the products of human spirit and the dignity of human labor but, by taking over what is mechanical and routine, it frees human beings over what is mechanical and routine. Translation is a fine and exacting art, but there is much about it that is mechanical and routine, if this were given over to a machine, the productivity of the translator would not only be magnified but this work would become more rewarding, more exciting, more human.

It has taken some 40 years for the specialists involved in the development of MT to realize that the limits to technology arise when going beyond the mechanical and routine aspects of language. From the outside, translation is often seen as a mere mechanical process, not any more complex than playing chess, for example. If computers have been programed with the capacity of beating a chess master champion such as Kasparov, why should they not be capable of performing translation of the highest quality? Few people are aware of the complexity of literary translation. Douglas Hofstadter (1998) depicts this well:

A skilled literary translator makes a far larger number of changes, and far more significant changes, than any virtuoso performer of classical music would ever dare to make in playing notes in the score of, say, a Beethoven piano sonata. In literary translation, it's totally humdrum stuff for new ideas to be interpreted, old ideas to be deleted, structures to be inverted, twisted around, and on and on.

Although it may not be perceived at first sight, the complexity of natural language is of an order of magnitude far superior to any purely mechanical process. To how many words should the vocabulary be limited to make the complexity of producing "free sonnets" (that is, any combination of 6 words in 14 verses) comparable to the number of possible chess games? It may be difficult to believe, but the vocabulary should be restricted to 100 words. That is, making free sonnets with 100 words offers as many different alternatives as there are ways of playing a chess game (roughly, 10120; see DELI's Web page for discussion). The number of possibilities would quickly come down if combinations were restricted so that they not only made sense but acquired some sort of poetic value. However, defining formally or mechanically the properties of "make sense" and "have poetic value" is not an easy task. Or at least, it is far more difficult than establishing winning heuristics for a color to succeed in a chess game. No wonder then that Douglas Hofstadter's MT experiment translating 16th century French Clément Marot's poemMa Mignonne into English using IBM's Candide system should have performed so badly (see Sgrung's interview in Language International 10.1) :

Well, when you look at [IBM's Candide's] translation of Ma Mignonne, thinking of Ma Mignonne as prose, not as poetry, it's by far the worst. It's so terrible that it's not even laughable, it just stinks! It's pathetic!

Obviously, Hofstadter's experiment has gone beyond the recommended mechanical and routine scope of language and is therefore an abuse of MT. Outside the limits of the mechanical and routine, MT is impracticable and human creativity becomes indispensable. Translators of the highest quality are only obtainable from first-class raw materials and constant and disciplined training. The potentially good translator must be a sensitive, wise, vigilant, talented, gifted, experienced, and knowledgeable person. An adequate use of mechanical means and resources can make a good human translator a much more productive one. Nevertheless, very much like dictionaries and other reference material, technology may be considered an excellent prothesis, but little more than that. As Martin Kay (1992) argues, there is an intrinsic and irreplaceable human aspect of translation:

There is nothing that a person could know, or feel, or dream, that could not be crucial for getting a good translation of some text or other. To be a translator, therefore, one cannot just have some parts of humanity; one must be a complete human being.

However, even for skilled human translators, translation is often difficult. One clear example is when linguistic form, as opposed to content, becomes an important part of a literary piece. Conveying the content, but missing the poetic aspects of the signifier may considerably hinder the quality of the translation. This is a challenge to any translator. Jaime de Ojeda's (1989) Spanish translation of Lewis Carroll's Alice in Wonderland illustrates this problem:

Twinkle, twinkle, little bat how I wonder what you're at! Up above the world you fly like a tea-tray in the sky.

Brilla, luce, ratita alada ¿en qué estás tan atareada? Por encima del universo vuelas como una bandeja de teteras.

Manuel Breva (1996) analyzes the example and shows how Ojeda solves the "formal hurdles" of the original:

The above lines are a parody of the famous poem "Twinkle, twinkle, little star" by Jane Taylor, which, in Carroll's version, turns into a sarcastic attack against Bartholomew Price, a professor of mathematics, nicknamed "The Bat". Jaime de Ojeda translates "bat" as "ratita alada" for rhythmical reasons. "Murciélago", the Spanish equivalent of "bat", would be hard to fit in this context for the same poetic reasons. With Ojeda's choice of words the Spanish version preserves the meaning and maintains the same rhyming pattern (AABB) as in the original English verse-lines.

What would the output of any MT system be like if confronted with this fragment? Obviously, the result would be disastrous. Compared with the complexity of natural language, the figures that serve to quantify the "knowledge" of any MT program are absurd: 100,000 word bilingual vocabularies, 5,000 transfer rules.... Well developed systems such as Systran, or Logos hardly surpass these figures. How many more bilingual entries and transfer rules would be necessary to match Ojeda's competence? How long would it take to adequately train such a system? And even then, would it be capable of challenging Ojeda in the way the chess master Kasparov has been challenged? I have serious doubts about that being attainable at all. But there are other opinions, as is the case of the famous Artificial Intelligence master, Marvin Minsky. Minsky would argue that it is all a matter of time. He sees the human brain as an organic machine, and as such, its behavior, reactions and performance can be studied and reproduced. Other people believe there is an important aspect separating organic, living "machines" from synthetic machines. They would claim that creativity is in life, and that it is an exclusive faculty of living creatures to be creative.

http://sirio.deusto.es/abaitua/konzeptu/ta/vic.htm

Which profiles should any person with a University degree in Translation be qualified for?

Having said all this, it is important to reassess the human factor. Like cooks, tailors or architects, professional translators need to become acquainted with technology, because good use of technology will make their jobs more competitive and satisfactory. But they should not dismiss craftsmanship. Technology enhances productivity, but translation excellence goes beyond technology. It is important to delimit the roles of humans and machines in translation. Martin Kay's (1987) words in this respect are most illustrative:

A computer is a device that can be used to magnify human productivity. Properly used, it does not dehumanize by imposing its own Orwellian stamp on the products of human spirit and the dignity of human labor but, by taking over what is mechanical and routine, it frees human beings over what is mechanical and routine. Translation is a fine and exacting art, but there is much about it that is mechanical and routine, if this were given over to a machine, the productivity of the translator would not only be magnified but this work would become more rewarding, more exciting, more human.

It has taken some 40 years for the specialists involved in the development of MT to realize that the limits to technology arise when going beyond the mechanical and routine aspects of language. From the outside, translation is often seen as a mere mechanical process, not any more complex than playing chess, for example. If computers have been programed with the capacity of beating a chess master champion such as Kasparov, why should they not be capable of performing translation of the highest quality? Few people are aware of the complexity of literary translation. Douglas Hofstadter (1998) depicts this well:

A skilled literary translator makes a far larger number of changes, and far more significant changes, than any virtuoso performer of classical music would ever dare to make in playing notes in the score of, say, a Beethoven piano sonata. In literary translation, it's totally humdrum stuff for new ideas to be interpreted, old ideas to be deleted, structures to be inverted, twisted around, and on and on.

Why is translation such a difficult task?

Although it may not be perceived at first sight, the complexity of natural language is of an order of magnitude far superior to any purely mechanical process.

Well, when you look at [IBM's Candide's] translation of Ma Mignonne, thinking of Ma Mignonne as prose, not as poetry, it's by far the worst. It's so terrible that it's not even laughable, it just stinks! It's pathetic!

There is nothing that a person could know, or feel, or dream, that could not be crucial for getting a good translation of some text or other. To be a translator, therefore, one cannot just have some parts of humanity; one must be a complete human being.

http://sirio.deusto.es/ABAITUA/konzeptu/ta/vic.htm

Which are the main problems of MT?

we will consider some particular problems which the task of translation poses for the builder of MT systems --- some of the reasons why MT is hard. It is useful to think of these problems under two headings: (i) Problems of ambiguity , (ii) problems that arise from structural and lexical differences between languages and (iii) multiword units like idiom s and collocations . We will discuss typical problems of ambiguity in Section , lexical and structural mismatches in Section , and multiword units in Section .

Of course, these sorts of problem are not the only reasons why MT is hard. Other problems include the sheer size of the undertaking, as indicated by the number of rules and dictionary entries that a realistic system will need, and the fact that there are many constructions whose grammar is poorly understood, in the sense that it is not clear how they should be represented, or what rules should be used to describe them. This is the case even for English, which has been extensively studied, and for which there are detailed descriptions -- both traditional `descriptive' and theoretically sophisticated -- some of which are written with computational usability in mind. It is an even worse problem for other languages. Moreover, even where there is a reasonable description of a phenomenon or construction, producing a description which is sufficiently precise to be used by an automatic system raises non-trivial problems.

http://sirio.deusto.es/ABAITUA/konzeptu/ta/MT_book_1995/node52.html#SECTION00810000000000000000

Which parts of Linguistics are more relevant for MT?

We are led therefore to the argument that good quality translation is not possible without understanding the reality behind what is being expressed, i.e. translation goes beyond the familiar linguistic information: morphology, syntax and semantics. The need is particularly striking in the treatment of pronouns. Human translators have virtually no problems with pronouns, and it must seem strange to many that while MT systems can deal quite well with complex idioms and certain complex structures, they all seem to have great difficulties with pronouns. Why do we get such errors as die Europäische Gemeinschaft und ihre Mitglieder, rendered as the European Community and her members? The problem is that the antecedent of pronouns must be identified; the default translation of ihr as her does not work.

http://sirio.deusto.es/ABAITUA/konzeptu/ta/hutchins91.htm

How many different types of ambiguity are there?

it is said to be lexically ambiguous. When a phrase or sentence can have more than one structure it is said to be structurally ambiguous.

http://sirio.deusto.es/ABAITUA/konzeptu/ta/MT_book_1995/node53.html#SECTION00820000000000000000

Illustrate your discussion with:

  1. Although I agree to a few of Smith’s points, I must disagree to the majority.

the problem is the two uses of the word "to". The difficulty here is that English allows for a number of prepositions to fit idiomatically with the verb agree, but you must know when to use which preposition. For instance, you can use the word "to" but only in a sentence like this: - She agreed to the conditions spelled out in the contract.

  1. We can differ the transportation and ritual models of communication by contrasting the space-biased nature of one with the time-biased nature of the second. .

The correct idiomatic wording would be "We can differentiate between…".

http://suo.ieee.org/email/msg02295.html

http://logic.philosophy.ox.ac.uk/tutorial1/Tut1-03.htm

http://www.essex.ac.uk/linguistics/clmt/MTbook/HTML/node54.html

http://www.longman-elt.com/dictionaries/llreview/r3komuro.html

http://www.sfu.ca/~gmccarro/Grammar/Expressions.html

Which are the most usual interpretations of the term "machine translation" (MT)?

The term machine translation (MT) is normally taken in its restricted and precise meaning of fully automatic translation. However, in this chapter we consider the whole range of tools that may support translation and document production in general, which is especially important when considering the integration of other language processing techniques and resources with MT. We therefore define Machine Translation to include any computer-based process that transforms (or helps a user to transform) written text from one human language into another. We define Fully Automated Machine Translation (FAMT) to be MT performed without the intervention of a human being during the process. Human-Assisted Machine Translation (HAMT) is the style of translation in which a computer system does most of the translation, appealing in case of difficulty to a (mono- or bilingual) human for help. Machine-Aided Translation (MAT) is the style of translation in which a human does most of the work but uses one of more computer systems, mainly as resources such as dictionaries and spelling checkers, as assistants.

What do FAHQT and ALPAC mean in the evolution of MT?

There were of course dissenters from the dominant 'perfectionism'. Researchers at Georgetown University and IBM were working towards the first operational systems, and they accepted the long-term limitations of MT in the production of usable translations. More influential was the well-known dissent of Bar-Hillel. In 1960, he published a survey of MT research at the time which was highly critical of the theory-based projects, particularly those investigating interlingua approaches, and which included his demonstration of the non-feasibility of fully automatic high quality translation (FAHQT) in principle. Instead, Bar-Hillel advocated the development of systems specifically designed on the basis of what he called 'man-machine symbiosis', a view which he had first proposed nearly ten years before when MT was still in its infancy (Bar-Hillel 1951).

Nevertheless, the main thrust of research was based on the explicit or implicit assumption that the aim of MT must be fully automatic systems producing translations at least as good as those made by human translators. The current operational systems were regarded as temporary solutions to be superseded in the near future. There was virtually no serious consideration of how 'less than perfect' MT could be used effectively and economically in practice. Even more damaging was the almost total neglect of the expertise of professional translators, who naturally became anxious and antagonistic. They foresaw the loss of their jobs, since this is what many MT researchers themselves believed was inevitable.

In these circumstances it is not surprising that the Automatic Language Processing Advisory Committee (ALPAC) set up by the US sponsors of research found that MT had failed by its own criteria, since by the mid 1960s there were clearly no fully automatic systems capable of good quality translation and there was little prospect of such systems in the near future. MT research had not looked at the economic use of existing 'less than perfect' systems, and it had disregarded the needs of translators for computer-based aids.

While the ALPAC report brought to an end many MT projects, it did not banish the public perception of MT research as essentially the search for fully automatic solutions. The subsequent history of MT is in part the story of how these is this mistaken emphasis of the early years has had to be repaired and corrected. The neglect of the translation profession has been made good eventually by the provision of translation tools and translator workstations. MT research has turned increasingly to the development of realistic practical MT systems where the necessity for human involvement at different stages of the process is fully accepted as an integral component of their design architecture. And 'pure' MT research has by and large recognised its role within the broader contexts of commercial and industrial realities.

List some of the major methods, techniques and approaches

Tools for translators, practical machine translation and research methods for machine translation.

Where was MT ten years ago?

Ten years ago, the typical users of machine translation were large organizations such as the European Commission, the US Government, the Pan American Health Organization, Xerox, Fujitsu, etc. Fewer small companies or freelance translators used MT, although translation tools such as online dictionaries were becoming more popular. However, ongoing commercial successes in Europe, Asia, and North America continued to illustrate that, despite imperfect levels of achievement, the levels of quality being produced by FAMT and HAMT systems did address some users’ real needs. Systems were being produced and sold by companies such as Fujitsu, NEC, Hitachi, and others in Japan, Siemens and others in Europe, and Systran, Globalink, and Logos in North America (not to mentioned the unprecedented growth of cheap, rather simple MT assistant tools such as PowerTranslator).

http://sirio.deusto.es/abaitua/konzeptu/nlp/Mlim/mlim4.html

New directions and foreseeable breakthroughs of MT in the sort term

Several applications have proven to be able to work effectively using only subsets of the knowledge required for MT. It is possible now to evaluate different tasks, to measure the information involved in solving them, and to identify the most efficient techniques for a given task. Thus, we must face the decomposition of monolithic systems, and to start talking about hybridization, engineering, architectural changes, shared modules, etc. It is important when identifying tasks to evaluate linguistic information in terms of what is generalizable, and thus a good candidate for traditional parsing techniques (argument structure of a transitive verb in active voice?), and what is idiosyncratic (what about collocations?). Besides, one cannot discard the power of efficient techniques that yield better results than older approaches, as illustrated clearly by part of speech disambiguation, which has proved to be better solved using Hidden Markov Models than traditional parsers. On the other hand, it has been proven that good theoretically motivated and linguistically driven tagging label sets improve the accuracy of statistical systems. Hence we must be ready to separate the knowledge we want to represent from the techniques/formalisms that have to process it

.http://sirio.deusto.es/abaitua/konzeptu/nlp/Mlim/mlim4.html

Within the last ten years, research on spoken translation has developed into a major focus of MT activity. Of course, the idea or dream of translating the spoken word automatically was present from the beginning (Locke 1955), but it has remained a dream until now. Research projects such as those at ATR, CMU and on the Verbmobil project in Germany are ambitious. But they do not make the mistake of attempting to build all-purpose systems. The constraints and limitations are clearly defined by definition of domains, sublanguages and categories of users. That lesson has been learnt. The potential benefits even if success is only partial are clear for all to see, and it is a reflection of the standing of MT in general and a sign that it is no longer suffering from old perceptions that such ambitious projects can receive funding.

http://ourworld.compuserve.com/homepages/WJHutchins/MTS-95.htm

Which are Internet's essential features?

Before the nineties, three main approaches to Machine Translation were developed: the so-called direct, transfer and interlingua approaches. Direct and transfer-based systems must be implemented separately for each language pair in each direction, while the interlingua-based approach is oriented to translation between any two of a group of languages for which it has been implemented. The implications of this fundamental difference, as well as other features of each type of system, are discussed in this and the following sections. The more recent corpus-based approach is considered later in this section.

More recently developed approaches to MT divide the translation process into discrete stages, including an initial stage of analysis of the structure of a sentence in the source language, and a corresponding final stage of generation of a sentence from a structure in the target language. Neither analysis nor generation are translation as such. The analysis stage involves interpreting sentences in the source language, arriving at a structural representation which may incorporate morphological, syntactic and lexical coding, by applying information stored in the MT system as grammatical rules and dictionaries. The generation stage performs approximately the same functions in reverse, converting structural representations into sentences, again applying information embodied in rules and dictionaries.

The transfer approach, which characterizes the more sophisticated MT systems now in use, may be seen as a compromise between the direct and interlingua approaches, attempting to avoid the most extreme pitfalls of each. Although no attempt is made to arrive at a completely language-neutral interlingua representation, the system nevertheless performs an analysis of input sentences, and the sentences it outputs are obtained by generation. Analysis and generation are however shallower than in the interlingua approach, and in between analysis and generation, there is a transfer component, which converts structures in one language into structures in the other and carries out lexical substitution. The object of analysis here is to represent sentences in a way that will facilitate and anticipate the subsequent transfer to structures corresponding to the target language sentences

http://www.europarl.eu.int/stoa/publi/99-12-01/part2_en.htm

What is the role of minority languages on the Internet (Catalan, Basque...)?

This point requires even more careful consideration when what is needed is not merely a bilingual but a multilingual MT network, in which translation is possible from any language into any other language among a given network of languages or in a multilingual community. Unless a high degree of reusability be achieved, some serious problems arise unless the multilingual set is very limited in size. When, in 1978, an ambitious project, named Eurotra, was started to develop "a machine translation system of advanced design" between all official languages of the European Community (a target which was not achieved before the programme came to an end), the Community's official languages numbered only six: English, French, German, Dutch, Danish and Italian. This meant fifteen language pairs. Within eight years, the entry of Greece and subsequently Spain and Portugal into the Community had added three new official languages which had to be integrated into the system, still under development. This increase from six to nine languages meant that the number of language pairs more than doubled, rising from fifteen to thirty-six. If the programme had continued a little longer, by the time there were twelve official languages of the Community, the number of language pairs would have gone from 36 to 66; fifteen languages would have brought the figure up to 105, and so on in geometric progression.

http://www.europarl.eu.int/stoa/publi/99-12-01/part2_en.htm

In what ways can Machine Translation be applied on the Internet?

The Internet is, and will be to an increasing degree, both a vehicle for providing MT services and a major beneficiary of their application. To this extent, it is likely to provide a further key to making the Internet a truly global medium which can transcend not only geographical barriers but also linguistic ones.

Europe, as the most notable focal point in the present-day world where a great capacity for technological innovation crosses paths with a high level of linguistic diversity, is excellently placed to lead the way forward. Other parts of the world are technologically capable but too self-contained and homogeneous culturally to acquire immediate awareness of the need for information technology to find its way across linguistic barriers, while still other communities are fully aware of the language problem but lack a comparable degree of access to technological resources and initiative needed to address the issue on such a scale. Whoever succeeds in making future communication global in linguistic terms will have forged a new tool of incalculable value to the entire world

http://www.europarl.eu.int/stoa/publi/99-12-01/part2_en.htm

Conclusion:

The purpose of this work was to make closer to the people who never heard anything from "English Language and New Technologies" to understand it in some general ways.

In my opinion without many of the software made and the researches of many specialists, the world of foreign language and the barriers of understanding would be huge. The Machine Translators, sociologic researches and many more advantages created helped to work and study them easier.

I like this kind of project because is very useful, I have never noticed of many things which had been taught at class, it is useful because you can have at hand as may possibilities as you can to work on data, software, and many more things for the improvement of English Language.

To conclude, I can add that these types of projects of evaluation and self understanding about the subjects are good because you have all the freedom to work on with. So, I have enjoyed so much with this experience and I hope to be useful in the future.

References:

http://sirio.deusto.es/abaitua/konzeptu/nlp/HU_whatLT.pdf

http://www.isc.ie/about/reports.html http://www.isc.ie/about/commission.html

http://sirio.deusto.es/abaitua/konzeptu/nlp/echo/infoage.htmlhttp://babelfish.altavista.com/

http://www.hltcentral.org/

http://www.genecehamby.com/Knowledge.htm

http://www.p-jones.demon.co.uk

http://www.manovich.net/IA/IA_workshop.html

http://216.239.33.100/search?q=cache:qJ7dFEq3FTgC:www.xys.org/xys/netters/Fang-Zhouzi/religion/2tales.doc+%22language+as+a+barrier+of+communication%22&hl=es&ie=UTF-8

http://contest.thinkquest.gr.jp/tqj2000/30202/english/communication.html

http://www.dur.ac.uk/~dcs0www3/lnle/lnlehome.html

http://www.hltcentral.org/usr_docs/project-source/en/broch/harness.html#wile

http://sirio.deusto.es/abaitua/konzeptu/nlp/HU_whatLT.pdf

http://www.hltcentral.org/usr_docs/project-source/en/broch/harness.html#lr

http://sirio.deusto.es/abaitua/konzeptu/ta/vic.htm

http://sirio.deusto.es/abaitua/konzeptu/ta/vic.htm

http://sirio.deusto.es/ABAITUA/konzeptu/ta/vic.htm

http://sirio.deusto.es/ABAITUA/konzeptu/ta/MT_book_1995/node52.html#SECTION00810000000000000000

http://sirio.deusto.es/ABAITUA/konzeptu/ta/hutchins91.htm

http://sirio.deusto.es/ABAITUA/konzeptu/ta/MT_book_1995/node53.html#SECTION00820000000000000000

http://suo.ieee.org/email/msg02295.html

http://logic.philosophy.ox.ac.uk/tutorial1/Tut1-03.htm

http://www.essex.ac.uk/linguistics/clmt/MTbook/HTML/node54.html

http://www.longman-elt.com/dictionaries/llreview/r3komuro.html

http://www.sfu.ca/~gmccarro/Grammar/Expressions.html

http://sirio.deusto.es/abaitua/konzeptu/nlp/Mlim/mlim4.html

.http://sirio.deusto.es/abaitua/konzeptu/nlp/Mlim/mlim4.html

http://ourworld.compuserve.com/homepages/WJHutchins/MTS-95.htm

http://www.europarl.eu.int/stoa/publi/99-12-01/part2_en.htm

http://www.europarl.eu.int/stoa/publi/99-12-01/part2_en.htm

http://www.europarl.eu.int/stoa/publi/99-12-01/part2_en.htm

Student Name: Mikel Vergara Hormaza