The information society and HTL

1. Abstract

2. Introduction

3. The information society

3.1. Too much information

4. Human Languaje Technologies

4.1. Techniques

5. Conclusion




In this paper, Iīm going to explain how human language technologies are essential to help us to manage the huge amounts of information we use everyday. In fact, as the only way to keep that information in order is to use machines, we must work on languages to improve our comunication with them and making it simpler. This will help us making more effective searching machines for the internet, better translation tools and, thus, this will lead to a better comunication between cultures.




These recent years, the flow of information has increased enormously, thanks to the new means of transport and comunication technology. Now instantaneous comunication between places far apart is possible, and the interchange of information is, no doubt, benefitious. But such a big amount of information circulating through the world brings new problems. The first one is, of course, that there is too much of it. This makes very difficult to keep it in order and to sort the useful one out, in order to learn what we wanted. The other problem is that, when looking for information, we can end up with the wrong information because there is so much of it and so interrelated that the searching engines often get confused. Thatīs because  the understanding between humans and machines has to be improved.

Languaje Engineering was created for that purpose, among others. It studies human languaje, how it works, its grammar, vocabulary, construction of sentences... thanks to this effort, speech recognition and generation will be possible, makin it easier the understanding between machines and humans. L.E. is also essential to create better translations tools, which do not only translate literally, but also the connotations (something quite difficult, which will still require human work, as every culture has different traditions and taboos, and change constantly).

All this will improve our productivity, as the vast amounts of information will be more manageable to work with, so it will take less time to get the right information and elaborate it for our purposes (informas on economy, estadistics, etc...), and, as a result, provide with better services to customers.

There are different techniques inside Languaje Engineering, which I will explain in this paper. Speaker Identification and Verification, Speech Recognition, Character and Document Image Recognition, Natural Language Understanding, Natural Language Generation, Speech Generation, Lexicons, etc... But first, letīs go deeper into what "society of information " is, then, how it is related to Languaje Engineering.


The information society

Definition according to (  ):

"The information society is a new kind of society. Specific to this kind of society is the central position information technology has for production and economy. Information society is seen as successor to industrial society. Closely related concepts are post-industrial society (Daniel Bell), post-fordism, post-modern society, knowledge society, and informational society (Manuel Castells).

Most theoreticans agree that we see a transformation which started somewhere between the 1970s and today and is changing the way our societies work fundamentally. Information technology is not only internet, and there are discussions how big the influence of specific media or specific modes of production really is.

Caveat: Information society is often used by politicans meaning something like "we all do internet now"; the sociological term information society (or informational society) has some deeper implications about change of societal structure. "



*Too much information 

 So the information society depends on the flow of information, so new technologies are used to make this flow bigger. But this also leads to having too much information, and when that happens, we find ourselves trying to sort out the "crap" information from the useful one. See the quantities of information recorded in 2003.  (Researchers: Peter Charles, Nathan Good, Laheem Lamar Jordan, Joyojeet Pal )

Before having all this information available everywhere, it was said that who had the information, had the power Now it is better said "that who has the knowledge has the power".

According to David Lewis :

"Knowledge is power, but information is not. It's like the detritus that a gold-panner needs to sift through in order to find the nuggets."


"Having too much information can be as dangerous as having too little. Among other problems, it can lead to a paralysis of analysis, making it far harder to find the right solutions or make the best decisions."

"Information is supposed to speed the flow of commerce, but it often just clogs the pipes."

See more about the Information Fatigue Sindrome

Human Language technologies will help solving this problem of excesive information, creating better tools for searching and so.



Human Language Technologies

[Excerpts from Edinburgh-Stanford link (E-S.l), EuroMap (EM), Centre for Language Technology (CLT - Macquarie University)]

Language technology refers to a range of technologies that have been developed over the last 40 years to enable people to more easily and naturally communicate with computers, through speech or text and, when called for, receive an intelligent and natural reply in much the same way as a person might respond." (E-S.l)

"From speech recognition to automatic translation, Human Language Technology products and services enable humans to communicate more naturally and more effectively with their computers – but above all, with each other." (EM)

"Language Technology is all about getting computers to do useful things with human language, whether in spoken or written form." (CLT)

As computers have a much bigger capacity for managing information, but a smaller capacity to select what the user is asking for, by making the communication between computers and humans easier and more precise, it will solve the problem of getting the wrong information. It will also help us to comunicate between different languajes and cultures, by developing better translation tools. (Machine translation).


*Some techniques HLT is working on:

Speaker Identification and Verification

A human voice is as unique to an individual as a fingerprint. This makes it possible to identify a speaker and to use this identification as the basis for verifying that the individual is entitled to access a service or a resource. The types of problems which have to be overcome are, for example, recognising that the speech is not recorded, selecting the voice through noise (either in the environment or the transfer medium), and identifying reliably despite temporary changes (such as caused by illness).


Speech Recognition

The sound of speech is received by a computer in analogue wave forms which are analysed to identify the units of sound (called phonemes) which make up words. Statistical models of phonemes and words are used to recognise discrete or continuous speech input. The production of quality statistical models requires extensive training samples (corpora) and vast quantities of speech have been collected, and continue to be collected, for this purpose.

There are a number of significant problems to be overcome if speech is to become a commonly used medium for dealing with a computer. The first of these is the ability to recognise continuous speech rather than speech which is deliberately delivered by the speaker as a series of discrete words separated by a pause. The next is to recognise any speaker, avoiding the need to train the system to recognise the speech of a particular individual. There is also the serious problem of the noise which can interfere with recognition, either from the environment in which the speaker uses the system or through noise introduced by the transmission medium, the telephone line, for example. Noise reduction, signal enhancement and key word spotting can be used to allow accurate and robust recognition in noisy environments or over telecommunication networks. Finally, there is the problem of dealing with accents, dialects, and language spoken, as it often is, ungrammatically.


Character and Document Image Recognition

Recognition of written or printed language requires that a symbolic representation of the language is derived from its spatial form of graphical marks. For most languages this means recognising and transforming characters. There are two cases of character recognition:


OCR from a single printed font family can achieve a very high degree of accuracy. Problems arise when the font is unknown or very decorative, or when the quality of the print is poor. In these difficult cases, and in the case of handwriting, good results can only be achieved by using ICR. This involves word recognition techniques which use language models, such as lexicons or statistical information about word sequences.

Document image analysis is closely associated with character recognition but involves the analysis of the document to determine firstly its make-up in terms of graphics, photographs, separating lines and text, and then the structure of the text to identify headings, sub-headings, captions etc. in order to be able to process the text effectively.


Natural Language Understanding

The understanding of language is obviously fundamental to many applications. However, perfect understanding is not always a requirement. In fact, gaining a partial understanding is often a very useful preliminary step in the process because it makes it possible to be intelligently selective about taking the depth of understanding to further levels.

Shallow or partial analysis of texts is used to obtain a robust initial classification of unrestricted texts efficiently. This initial analysis can then be used, for example, to focus on 'interesting' parts of a text for a deeper semantic analysis which determines the content of the text within a limited domain. It can also be used, in conjunction with statistical and linguistic knowledge, to identify linguistic features of unknown words automatically, which can then be added to the system's knowledge.

Semantic models are used to represent the meaning of language in terms of concepts and relationships between them. A semantic model can be used, for example, to map an information request to an underlying meaning which is independent of the actual terminology or language in which the query was expressed. This supports multi-lingual access to information without a need to be familiar with the actual terminology or structuring used to index the information.

Combinations of analysis and generation with a semantic model allow texts to be translated. At the current stage of development, applications where this can be achieved need be limited in vocabulary and concepts so that adequate Language Engineering resources can be applied. Templates for document structure, as well as common phrases with variable parts, can be used to aid generation of a high quality text.


Natural Language Generation

A semantic representation of a text can be used as the basis for generating language. An interpretation of basic data or the underlying meaning of a sentence or phrase can be mapped into a surface string in a selected fashion; either in a chosen language or according to stylistic specifications by a text planning system.


Speech Generation

Speech is generated from filled templates, by playing 'canned' recordings or concatenating units of speech (phonemes, words) together. Speech generated has to account for aspects such as intensity, duration and stress in order to produce a continuous and natural response.

Dialogue can be established by combining speech recognition with simple generation, either from concatenation of stored human speech components or synthesising speech using rules.

Providing a library of speech recognisers and generators, together with a graphical tool for structuring their application, allows someone who is neither a speech expert nor a computer programmer to design a structured dialogue which can be used, for example, in automated handling of telephone calls.


Taken from here:



Although there is a lot to do yet and that HLT isnīt going to solve some issues (at least, from the purely technological point of view) such as perfect translation from one languaje to another, it is going to provide us with useful tools in the future. For translation, human intervention will still be required (as trends and cultural tendencies change constantly, making some meanings difficult to translate for a machine). And, most probably, speech recognition and generation from computers will sound somehow artificial for the same reason, but, as long as they make understand themselves, it will be a very useful tool for handicaped people, for instance. They will be able to dictate documents, make intuitive searchings for information etc. making it very useful for people that donīt know much about computers, also.

Being this so, the information will be much more manageable,and more accesible for everyone who wants and can pay for it (it could also happen that a new form of discrimination appears between those who can pay for the technology and those who canīt, and thus, be more "ignorant" in comparison). But, overall, I think itīs a technology worth investing on that will bring better communication between us, in general.


Ane Alaņa Mendezona

2š Fil.Inglesa, 19/04/04