By: Olatz Garcia


In this brief report I am going to explain the influence New Technologies have in our society. The social system in which we are living now can also be known as Information Society, where these New Technologies have a lot of importance.

In our course of English Language and New Technologies we have been asked to make few weekly questionnaires. Those are going to be used as the bases for the development of this report.


This report is divided into four parts according to the questionnaires we should have made from the beginning of the course till now. In these questionnaires we were asked to find answers about different questions based on New Technologies and The Information Society. These answers are taken from the net, using the references our teacher gave us.

We are living in a society full of information. In fact, net is extremely important for acquiring the needed information for our degree. This sometimes can be a problem, because there are people who don’t know how to use the New Technologies. The big amount of information is highly spreaded throughout the net and we must be able to identify which part is useful for us.

Some people may find another difficulty, which is the problem of the language. The information may appear in any language (normally English) that people may not understand. This problem can be solved through the use of Machine Translation.





1·1 Human language technologies

Human Language Technologies not only help us to build bridges across languages and cultures but also provide natural access to information and communication services. It will enable an active use and assimilation of multimedia content, and further strengthen Europe's position at the forefront of language-enabled digital services. It will support business activities in a global context and promote a truly human-centred infostructure ensuring equal access and usage opportunities for all. The ultimate goal of Human Language Technologies is an optimal use of the human capital, maximising businesses' competitiveness and empowering people.

1·2 Natural language processing:

A natural language is one that evolved along with a culture of human native speakers who use the language for general-purpose communication. Languages like English, American Sign Language and Japanese are natural languages, while languages like Esperanto are called constructed languages, having been deliberately created for a specific purpose.

Natural Language Generation (NLG) is the natural language processing task of generating natural language from a machine representation system such as a knowledge base or a logical form.

Some people view NLG as the opposite of natural language understanding. The difference can be put this way: whereas in natural language understanding the system needs to disambiguate the input sentence to produce the machine representation language, in NLG the system needs to take decisions about how to put a concept into words.

From Wikipedia, the free encyclopaedia.

1·3 What’s computational linguistics?

Computational linguistics (CL) is a discipline between linguistics and computer science which is concerned with the computational aspects of the human language faculty. It belongs to the cognitive sciences and overlaps with the field of artificial intelligence (AI), a branch of computer science aiming at computational models of human cognition. Computational linguistics has applied and theoretical components.



1·4 Language engineering:

Language Engineering is the application of knowledge of language to the development of computer systems which can recognise, understand, interpret, and generate human language in all its forms. In practice, Language Engineering comprises a set of techniques and language resources. The former are implemented in computer software and the latter are a repository of knowledge which can be accessed by computer software.

hltteam (.at.) 

2 Does the notion of "Information Society" have any relation to human language?

The term Information Society has been around for a long time now and, indeed, has become something of a cliché. The notion of the coming Information Society reminds me of the way the idea of the Sydney 2000 Olympics and the way it shimmers in the distance. We look towards the Olympics and resolve to prepare hard for it. We must rapidly transform ourselves, our city, our demeanour to be ready and worthy. Time is of the essence in making ourselves ready for the challenge. There is certain breathlessness in all of this rhetoric.




3 Is there any concern in Europe with Human Language Technologies?

In the European Union, the concept of the Information Society has been evolving strongly over the past few years building on the philosophy originally spelled out by Commissioner Martin Bangemann in 1994. Bangemann argued that the Information Society represents a "revolution based on information ... [which] adds huge new capacities to human intelligence and constitutes a resource which changes the way we work together and the way we live together..." (European Commission, 1994:4). One of the main implications of this "revolution" for Bangemann is that the Information Society can secure badly needed jobs (Europe and the Global Information Society, 1994:3). In other words, a driving motivation for the Information Society is the creation of employment for depressed economies.

Closer to home it is instructive to look at just a few policy (or would-be policy) documents to see the views of the Information Society dominant here. The Goldsworthy report sees the Information Society as a "societal revolution based around information and communication technologies and about the role of these in developing global competitiveness and managing the transition to a globalised free trade world" (Department of Industry, Science and Tourism, 1997). In short, Goldsworthy's idea of the Information Society is entirely an economic one. At a broader level Barry Jones, the author of the House of Representatives Standing Committee's 1991 report 'Australia as a Information Society' sets out a definition of the Information Society which sees it as simply "a period when use of time, family life, employment, education and social interaction are increasingly influenced by access to Information Technology" (Australia as an Information Society: Grasping New Paradigms, 1991).

These are just a few examples of ideas underpinning information policy drives in the developed world where the concept is accepted almost without challenge, and there is an inherent belief that like the Olympics, the Information Society is real - or will be very soon if only we can get ourselves organised properly. Some claim, of course, that the Information Society is here already and not just on its way. But one way or the other "it" exists and is a "good thing". By and large, national and regional Information Society documents do not question the belief that the Information Society will bring prosperity and happiness if a few basic safeguards are put in place. Some of the very few notes of serious caution in the practice of information policy have come through the influence of the Scandinavian countries which joined the European Union when the EU was already in full flight with implementing the actions flowing from the Bangemann report.

Interestingly, in recent travels in India I noticed an extraordinary level of hope and trust in that developing country in the potential of information technology to transform India into a modern fully developed economy. The push to develop information and technological infrastructure initiated by Rajiv Gandhi is seen as positive and a necessary step for the goal of a universally prosperous society in India. Effectively there is the same acceptance of the goodness of an Information Society and the absolute necessity to be one, that is found in the West.

Given this blind faith in the existence and the desirability of an Information Society among diverse nations, it is instructive to look at the theoretical literature which has spawned the idea to see what it claims for the Information Society. The term Information Society has many synonyms: Information Age, Information Revolution, Information Explosion and so on and it is found across a wide spectrum of disciplines. Fortunately the task of unravelling many of these ideas has been accomplished in a masterly way by Frank Webster. He has categorised the variety of concepts of the Information Society, Information Revolution, or whatever, and provided an analysis of five common conceptions of the Information Society (Webster, 1995).


4 What is the current situation of the office

The overall objective of HLT is to support e-business in a global context and to promote a human centred infostructure ensuring equal access and usage opportunities for all. This is to be achieved by developing multilingual technologies and demonstrating exemplary applications providing features and functions that are critical for the realisation of a truly user friendly Information Society. Projects address generic and applied RTD from a multi- and cross-lingual perspective, and undertake to demonstrate how language specific solutions can be transferred to and adapted for other languages.
While elements of the three initial HLT action lines - Multilinguality, Natural Interactivity and Crosslingual Information Management are still present, there has been periodic re-assessment and tuning of them to emerging trends and changes in the surrounding economic, social, and technological environment.

Multilingual WebObjectives: To advance towards a fuller realisation of the multilingual Internet for personal development and informational purposes, and for distributed enterprise knowledge management across languages and delivery platforms.

Natural and multilingual interactivity
Objectives: To progress towards a more intuitive interaction with, and effective use of intelligent network services and appliances. RTD will address both relatively short term applicative showcases and longer term research efforts aimed at robust dialogue and unconstrained speech/language understanding. The intended orientation towards middleware and embedded technologies presupposes significant advances of the component technologies and further progress towards their integration within mass market products and services.

Natural interactivity
Objectives: To enhance the naturalness of interaction between humans and digital services and devices, the ease of use of computer systems in non-expert environments, and the richness and effectiveness of technology-mediated interpersonal communication.

Cross-lingual information management and knowledge discovery
Objectives: To empower people confronted with large quantities of digital information and to support them in knowledge intensive tasks, by exploiting the linguistic knowledge embodied in documents, messages, dialogues and audio-visual objects. (2001)






1·Which are the main techniques used in Language Engineering?



There are many techniques used in Language Engineering and some of these are described below.


*Speaker Identification and Verification

A human voice is as unique to an individual as a fingerprint. This makes it possible to identify a speaker and to use this identification as the basis for verifying that the individual is entitled to access a service or a resource. The types of problems which have to be overcome are, for example, recognising that the speech is not recorded, selecting the voice through noise (either in the environment or the transfer medium), and identifying reliably despite temporary changes (such as caused by illness).


*Speech Recognition

The sound of speech is received by a computer in analogue wave forms which are analysed to identify the units of sound (called phonemes) which make up words. Statistical models of phonemes and words are used to recognise discrete or continuous speech input. The production of quality statistical models requires extensive training samples (corpora) and vast quantities of speech have been collected, and continue to be collected, for this purpose.

There are a number of significant problems to be overcome if speech is to become a commonly used medium for dealing with a computer. The first of these is the ability to recognise continuous speech rather than speech which is deliberately delivered by the speaker as a series of discrete words separated by a pause. The next is to recognise any speaker, avoiding the need to train the system to recognise the speech of a particular individual. There is also the serious problem of the noise which can interfere with recognition, either from the environment in which the speaker uses the system or through noise introduced by the transmission medium, the telephone line, for example. Noise reduction, signal enhancement and key word spotting can be used to allow accurate and robust recognition in noisy environments or over telecommunication networks. Finally, there is the problem of dealing with accents, dialects, and language spoken, as it often is, ungrammatically.




*Character and Document Image Recognition

Recognition of written or printed language requires that a symbolic representation of the language is derived from its spatial form of graphical marks. For most languages this means recognising and transforming characters.


*Natural Language Understanding

The understanding of language is obviously fundamental to many applications. However, perfect understanding is not always a requirement. In fact, gaining a partial understanding is often a very useful preliminary step in the process because it makes it possible to be intelligently selective about taking the depth of understanding to further levels.

Trivial or partial analysis of texts is used to obtain a robust initial classification of unrestricted texts efficiently. One use for this initial analysis can be to focus on 'interesting' parts of a text for a deeper semantic analysis which determines the content of the text within a limited domain.

Semantic models are used to represent the meaning of language in terms of concepts and relationships between them.

Combinations of analysis and generation with a semantic model allow texts to be translated. At the current stage of development, applications where this can be achieved need be limited in vocabulary and concepts so that adequate Language Engineering resources can be applied.

*Natural Language Generation

A semantic representation of a text can be used as the basis for generating language. An interpretation of basic data or the underlying meaning of a sentence or phrase can be mapped into a surface string in a selected fashion; either in a chosen language or according to stylistic specifications by a text planning system.

*Speech Generation

Speech is generated from filled templates, by playing 'canned' recordings or concatenating units of speech (phonemes, words) together. Speech generated has to account for aspects such as intensity, duration and stress in order to produce a continuous and natural response.

Dialogue can be established by combining speech recognition with simple generation, either from concatenation of stored human speech components or synthesising speech using rules.


Providing a library of speech recognisers and generators, together with a graphical tool for structuring their application, allows someone who is neither a speech expert nor a computer programmer to design a structured dialogue which can be used, for example, in automated handling of telephone calls.

2·Which language resources are essential components of Language Engineering?





*Language Resources

Language resources are essential components of Language Engineering. They are one of the main ways of representing the knowledge of language, which is used for the analytical work leading to recognition and understanding.

The work of producing and maintaining language resources is a huge task. Resources are produced, according to standard formats and protocols to enable access, in many EU languages, by research laboratories and public institutions. Many of these resources are being made available through the European Language Resources Association (ELRA).


A lexicon is a repository of words and knowledge about those words. This knowledge may include details of the grammatical structure of each word (morphology), the sound structure (phonology), the meaning of the word in different textual contexts, e.g. depending on the word or punctuation mark before or after it. A useful lexicon may have hundreds of thousands of entries. Lexicons are needed for every language of application.

*Specialist Lexicons

There are a number of special cases which are usually researched and produced separately from general purpose lexicons:

Proper names: Dictionaries of proper names are essential to effective understanding of language, at least so that they can be recognised within their context as places, objects, or person, or maybe animals. They take on a special significance in many applications, however, where the name is key to the application such as in a voice operated navigation system, a holiday reservations system, or railway timetable information system, based on automated telephone call handling.

Terminology: In today's complex technological environment there are a host of terminologies which need to be recorded, structured and made available for language enhanced applications. Many of the most cost-effective applications of Language Engineering, such as multi-lingual technical document management and machine translation, depend on the availability of the appropriate terminology banks.

Wordnets: A wordnet describes the relationships between words; for example, synonyms, antonyms, collective nouns, and so on. These can be invaluable in such applications as information retrieval, translator workbenches and intelligent office automation facilities for authoring.


A grammar describes the structure of a language at different levels: word (morphological grammar), phrase, sentence, etc. A grammar can deal with structure both in terms of surface (syntax) and meaning (semantics and discourse).


A corpus is a body of language, either text or speech, which provides the basis for:

  1. analysis of language to establish its characteristics
  2. training a machine, usually to adapt its behaviour to particular circumstances
  3. verifying empirically a theory concerning language
  4. a test set for a Language Engineering technique or application to establish how well it works in practice.

There are national corpora of hundreds of millions of words but there are also corpora which are constructed for particular purposes. For example, a corpus could comprise recordings of car drivers speaking to a simulation of a control system, which recognises spoken commands, which is then used to help establish the user requirements for a voice operated control system for the market.

3·Check for the following terms (choose at least five):



A stemmer is a program or algorithm which determines the morphological root of a given inflected (or, sometimes, derived) word form -- generally a written word form.

A stemmer for English, for example, should identify the string "cats" (and possibly "catlike", "catty" etc.) as based on the root "cat", and "stemmer", "stemming", "stemmed" as based on "stem".

English stemmers are fairly trivial (with only occasional problems, such as "dries" being the third-person singular present form of the verb "dry", "axes" being the plural of "ax" as well as "axis"); but stemmers become harder to design as the morphology, orthography, and character encoding of the target language becomes more complex. For example, an Italian stemmer is more complex than an English one (because of more possible verb inflections), a Russian one is more complex (more possible noun declensions), a Hebrew one is even more complex (a hairy writing system), and so on.

Stemmers are common elements in query systems, since a user who runs a query on "daffodils" probably cares about documents that contain the word "daffodil" (without the s).


Formalism a means to represent the rules used in the establishment of a model of linguistic knowledge.


Domain usually applied to the area of application of the language enabled software e.g. banking, insurance, travel, etc.; the significance in Language Engineering is that the vocabulary of an application is restricted so the language resource requirements are effectively limited by limiting the domain of application

*Translator`s workbench

Translator's workbench a software system providing a working environment for a human translator, which offers a range of aids such as on-line dictionaries, thesauri, translation memories, etc


*Authoring tools

Authoring tools facilities provided in conjunction with word processing to aid the author of documents, typically including an on-line dictionary and thesaurus, spell-, grammar-, and style-checking, and facilities for structuring, integrating and linking documents




State of the Art

Comments about the state-of-the-art need to be made in the context of specific applications which reflect the constraints on the task. Moreover, different technologies are sometimes appropriate for different tasks. For example, when the vocabulary is small, the entire word can be modeled as a single unit. Such an approach is not practical for large vocabularies, where word models must be built up from subword units.

The past decade has witnessed significant progress in speech recognition technology. Word error rates continue to drop by a factor of 2 every two years. Substantial progress has been made in the basic technology, leading to the lowering of barriers to speaker independence, continuous speech, and large vocabularies. There are several factors that have contributed to this rapid progress. First, there is the coming of age of the HMM. HMM is powerful in that, with the availability of training data, the parameters of the model can be trained automatically to give optimal performance.

Second, much effort has gone into the development of large speech corpora for system development, training, and testing. Some of these corpora are designed for acoustic phonetic research, while others are highly task specific. Nowadays, it is not uncommon to have tens of thousands of sentences available for system training and testing. These corpora permit researchers to quantify the acoustic cues important for phonetic contrasts and to determine parameters of the recognizers in a statistically meaningful way. While many of these corpora (e.g., TIMIT, RM, ATIS, and WSJ; see section 12.3) were originally collected under the sponsorship of the U.S. Defense Advanced Research Projects Agency (ARPA) to spur human language technology development among its contractors, they have nevertheless gained world-wide acceptance (e.g., in Canada, France, Germany, Japan, and the U.K.) as standards on which to evaluate speech recognition.

Third, progress has been brought about by the establishment of standards for performance evaluation. Only a decade ago, researchers trained and tested their systems using locally collected data, and had not been very careful in delineating training and testing sets. As a result, it was very difficult to compare performance across systems, and a system's performance typically degraded when it was presented with previously unseen data. The recent availability of a large body of data in the public domain, coupled with the specification of evaluation standards, has resulted in uniform documentation of test results, thus contributing to greater reliability in monitoring progress (corpus development activities and evaluation methodologies are summarized in chapters 12 and 13 respectively).

Finally, advances in computer technology have also indirectly influenced our progress. The availability of fast computers with inexpensive mass storage capabilities has enabled researchers to run many large scale experiments in a short amount of time. This means that the elapsed time between an idea and its implementation and evaluation is greatly reduced. In fact, speech recognition systems with reasonable performance can now run in real time using high-end workstations without additional hardware---a feat unimaginable only a few years ago.

One of the most popular, and potentially most useful tasks with low perplexity is the recognition of digits. For American English, speaker-independent recognition of digit strings spoken continuously and restricted to telephone bandwidth can achieve an error rate of 0.3% when the string length is known.

One of the best known moderate-perplexity tasks is the 1,000-word so-called Resource Management (RM) task, in which inquiries can be made concerning various naval vessels in the Pacific ocean. The best speaker-independent performance on the RM task is less than 4%, using a word-pair language model that constrains the possible words following a given word . More recently, researchers have begun to address the issue of recognizing spontaneously generated speech. For example, in the Air Travel Information Service (ATIS) domain, word error rates of less than 3% has been reported for a vocabulary of nearly 2,000 words and a bigram language model with a perplexity of around 15.

High perplexity tasks with a vocabulary of thousands of words are intended primarily for the dictation application. After working on isolated-word, speaker-dependent systems for many years, the community has since 1992 moved towards very-large-vocabulary (20,000 words and more), high-perplexity ,speaker-independent, continuous speech recognition. The best system in 1994 achieved an error rate of 7.2% on read sentences drawn from North America business news.

With the steady improvements in speech recognition performance, systems are now being deployed within telephone and cellular networks in many countries. Within the next few years, speech recognition will be pervasive in telephone networks around the world. There are tremendous forces driving the development of the technology; in many countries, touch tone penetration is low, and voice is the only option for controlling automated services. In voice dialing, for example, users can dial 10--20 telephone numbers by voice (e.g., call home) after having enrolled their voices by saying the words associated with telephone numbers. AT&T, on the other hand, has installed a call routing system using speaker-independent word-spotting technology that can detect a few key phrases (e.g., person to person, calling card) in sentences such as: I want to charge it to my calling card.

At present, several very large vocabulary dictation systems are available for document generation. These systems generally require speakers to pause between words. Their performance can be further enhanced if one can apply constraints of the specific domain such as dictating medical reports.

Speech recognition

(Or voice recognition) The identification of spoken words by a machine. The spoken words are digitised (turned into sequence of numbers) and matched against coded dictionaries in order to identify the words.

Most systems must be "trained," requiring samples of all the actual words that will be spoken by the user of the system. The sample words are digitised, stored in the computer and used to match against future words. More sophisticated systems require voice samples, but not of every word. The system uses the voice samples in conjunction with dictionaries of larger vocabularies to match the incoming words. Yet other systems aim to be "speaker-independent", i.e. they will recognise words in their vocabulary from any speaker without training.

Another variation is the degree with which systems can cope with connected speech. People tend to run words together, e.g. "next week" becomes "neksweek" (the "t" is dropped). For a voice recognition system to identify words in connected speech it must take into account the way words are modified by the preceding and following words.

It has been said (in 1994) that computers will need to be something like 1000 times faster before large vocabulary (a few thousand words), speaker-independent, connected speech voice recognition will be feasible.

This definition may also be useful

From Wikipedia, the free encyclopedia.

Speech recognition technologies allow computers equipped with microphones to interpret human speech, e.g. for transcription or as a control method.

Such systems can be classified as to whether they require the user to "train" the system to recognise their own particular speech patterns or not, whether the system can recognise continuous speech or requires users to break up their speech into discrete words, and whether the vocabulary the system recognises is small (in the order of tens or at most hundreds of words), or large (thousands of words).

Systems requiring a short amount of training can (as of 2001) capture continuous speech with a large vocabulary at normal pace with an accuracy of about 98% (getting two words in one hundred wrong), and different systems that require no training can recognize a small number of words (for instance, the ten digits of the decimal system) as spoken by most English speakers. Such systems are popular for routing incoming phone calls to their destinations in large organisations.

Commercial systems for speech recognition have been available off-the-shelf since the 1990s. However, it is interesting to note that despite the apparent success of the technology, few people use such speech recognition systems.

It appears that most computer users can create and edit documents more quickly with a conventional keyboard, despite the fact that most people are able to speak considerably faster than they can type. Additionally, heavy use of the speech organs results in vocal loading.

Some of the key technical problems in speech recognition are that:

The "understanding" of the meaning of spoken words is regarded by some as a separate field, that of natural language understanding. However, there are many examples of sentences that sound the same, but can only be disambiguated by an appeal to context: one famous T-shirt worn by Apple Computer researchers stated:

I helped Apple wreck a nice beach.

A general solution of many of the above problems effectively requires human knowledge and experience, and would thus require advanced artificial intelligence technologies to be implemented on a computer. In particular, statistical language models are often employed for disambiguation and improvement of the recognition accuracies.



Speech synthesis

Another definition

Speech synthesis is the computer-generated simulation of human speech. It is used to translate written information into aural information where it is more convenient, especially for mobile applications such as voice-enabled e-mail and unified messaging. It is also used to assist the vision-impaired so that, for example, the contents of a display screen can be automatically read aloud to a blind user. Speech synthesis is the counterpart of speech or voice recognition. The earliest speech synthesis effort was in 1779 when Russian Professor Christian Kratzenstein created an apparatus based on the human vocal tract to demonstrate the physiological differences involved in the production of five long vowel sounds. The first fully functional voice synthesizer, Homer Dudley's VODER (Voice Operating Demonstrator), was shown at the 1939 World's Fair. The VODER was based on Bell Laboratories' vocoder (voice coder) research of the mid-thirties.,,sid9_gci773595,00.html

A brief definition

The generation of an sound waveform of human speech from a textual or phonetic description. See also speech recognition.


It is worth remembering that most prototypes developed within research projects are currently only capable of processing a few hundreds of sentences (around 300), on very specific topics (accommodation-booking, planning trips, etc.) and for a small group of languages—English, German, Japanese, Spanish, Italian. It seems unlikely that any application will be able to go beyond these boundaries in the near future.

The direct incorporation of speech translation prototypes into industrial applications is at present too costly. However, the growing demand for these products leads us to believe that they will soon be on the market at more affordable prices. The systems developed in projects such as Verbmobil, EuTrans or Janus—despite being at the laboratory phase—contain in practice thoroughly evaluated and robust technologies. A manufacturer considering their integration may join R&D projects and take part in the development of prototypes with the prospect of a fast return on investment. It is quite clear that we are witnessing the emergence of a new technology with great potential for penetrating the telecommunications and microelectronics market in the not too distant future.

Another remarkable aspect of the EuTrans project is its methodological contribution to machine translation as a whole, both in speech and written modes. Although these two modes of communication are very different in essence, and their respective technologies cannot always be compared, speech-to-speech translation has brought prospects of improvement for text translation. Traditional methods for written texts tend to be based on grammatical rules. Therefore, many MT systems show no coverage problem, although this is achieved at the expense of quality. The most common way of improving quality is by restricting the topic of interest. It is widely accepted that broadening of coverage immediately endangers quality. In this sense, learning techniques that enable systems to automatically adapt to new textual typologies, styles, structures, terminological and lexical items could have a radical impact on the technology.

Due to the differences between oral and written communication, rule-based systems prepared for written texts can hardly be re-adapted to oral applications. This is an approach that has been tried, and has failed. On the contrary, example-based learning methods designed for speech-to-speech translation systems can easily be adapted to the written texts, given the increasing availability of bilingual corpora. One of the main contributions of the PRHLT-ITI group is precisely in its learning model based on bilingual corpora. Herein lie some interesting prospects for improving written translation techniques.

Effective speech-to-speech translation, along with other voice-oriented technologies, will become available in the coming years, albeit with some limitations e.g. the number of languages, linguistic coverage, and context. It could be argued that EuTrans' main contribution has been to raise the possibilities of speech-to-speech translation to the levels of speech recognition technology, making any new innovation immediatly accessible.




Information fatigue syndrome

David Lewis coined the term "information fatigue syndrome" for what he expects will soon be a recognized medical condition.

"Having too much information can be as dangerous as having too little. Among other problems, it can lead to a paralysis of analysis, making it far harder to find the right solutions or make the best decisions."

"Information is supposed to speed the flow of commerce, but it often just clogs the pipes."

David Lewis

Dr. David Lewis is a British psychologist, author of the report Dying for Information?, commissioned by London based Reuters Business Information. Lewis has coined the term "information fatigue syndrome" for what he expects will soon be a recognized medical condition. Lewis is a consultant who has studied the impact of data proliferation in the corporate world.




How much new information is created each year? Newly created information is stored in four physical media – print, film, magnetic and optical – and seen or heard in four information flows through electronic channels – telephone, radio and TV, and the Internet. This study of information storage and flows analyzes the year 2002 in order to estimate the annual size of the stock of new information recorded in storage media, and heard or seen each year in information flows. Where reliable data was available we have compared the 2002 findings to those of our 2000 study (which used 1999 data) in order to describe a few trends in the growth rate of information.

  1. Print, film, magnetic, and optical storage media produced about 5 exabytes of new information in 2002. Ninety-two percent of the new information was stored on magnetic media, mostly in hard disks.

  3. We estimate that the amount of new information stored on paper, film, magnetic, and optical media has about doubled in the last three years.

  5. Information flows through electronic channels -- telephone, radio, TV, and the Internet -- contained almost 18 exabytes of new information in 2002, three and a half times more than is recorded in storage media. Ninety eight percent of this total is the information sent and received in telephone calls - including both voice and data on both fixed lines and wireless.








The overall objective of HLT is to support e-business in a global context and to promote a human centred infostructure ensuring equal access and usage opportunities for all. This is to be achieved by developing multilingual technologies and demonstrating exemplary applications providing features and functions that are critical for the realisation of a truly user friendly Information Society. Projects address generic and applied RTD from a multi- and cross-lingual perspective, and undertake to demonstrate how language specific solutions can be transferred to and adapted for other languages.

While elements of the three initial HLT action lines - Multilinguality, Natural Interactivity and Crosslingual Information Management are still present, there has been periodic re-assessment and tuning of them to emerging trends and changes in the surrounding economic, social, and technological environment. The trials and best practice in multilingual e-service and e-commerce action line was introduced in the IST 2000 work programme (IST2000) to stimulate new forms of partnership between technology providers, system integrators and users through trials and best practice actions addressing end-to-end multi-language platforms and solutions for e-service and e-commerce. The fifth IST call for proposals covered this action line.

Human language technologies

"Language technology refers to a range of technologies that have been developed over the last 40 years to enable people to more easily and naturally communicate with computers, through speech or text and, when called for, receive an intelligent and natural reply in much the same way as a person might respond." (E-S.l)

"Human Language Techology is the term for the language capabilities designed into the computing applications used in information and communication technology systems." (EM)

"Human Language Technology is sometimes quite familiar, e.g. the spell checker in your word processor, but can often be hidden away inside complex networks – a machine for automatically reading postal addresses, for example." (EM)

"From speech recognition to automatic translation, Human Language Technology products and services enable humans to communicate more naturally and more effectively with their computers – but above all, with each other." (EM)





Intelligent Text Processing: "Ever been frustrated by a search engine? Find out how they work, but more importantly, find out how to make them intelligent. This unit also covers sophisticated web-based language technologies like document summarization, information extraction and machine translation. If you want to know about the Semantic Web, this is the unit for you." (CLT)



Semantic Web

The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries. It is a collaborative effort led by W3C with participation from a large number of researchers and industrial partners. It is based on the Resource Description Framework (RDF), which integrates a variety of applications using XML for syntax and URIs for naming.

"The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation." -- Tim Berners-Lee, James Hendler, Ora Lassila, The Semantic Web, Scientific American, May 2001







This report has helped me to learn the way of using the new techniques that we may need as students in our daily lives. It is important for us to know how to look for information in the net and how to find the appropriate information we need in each case. Now I think I am able to find what I need and to solve the little problems related to the New Technologies.

On the other hand, we must be able to determine what is worthy, and one of the main problems that we have to deal with is the language. We as students of English Philology are supposed to translate the information, but if we have any problem we know how to use the Machine Translation.

Also in our future career we may need some information about authors and novels, and because of that this course is important to know how to use the tools that internet provide us.