INFORMATION RETRIEVAL

 

 

 

By: Iera Zinkunegi, Dafne Gurrutxaga and Lide Zubiaurre

 

 

 

1. ABSTRACT

 

In this report we have analised how different internet searchers work and we have looked if they take in into account the morphology when translating. The searchers we have used are Google, Aurki and Jalgi. We have also shown these searchers' procedure and the tools they use in their produre. We have put some examples of the results of our tryes. Lastly, we have given a hard conclussion of how reliable these searchers are.

 

 

2. INTRODUCTION

 

We use the internet searchers everyday. They have become very important tools nowadays. But how do they manage to look for the information we want to get?

Searching for that information is possible thanks to information retrieval. Informayion retrieval is the system interposed between a potential user of information and the information collection.

This report is about information retrieval. First of all, we will see what is information retrieval; a little explanation and some short definition about the subject. Later, we will know which process information retrieval follows and which are the components of information retrieval system. After knowing a little bit about the subject we will go to the searchers. We will see how they work. We will go to some different searchers and see how they find the information we want to get. We will become aware of some of the weak points the searchers have - some of them more than the others. Lastly, after having the results of the searchings we will have the conclusions.

The aim of this report is to get an idea about information retrieval. The report tries to explain what information retrieval is and see how searchers do they job.

 

 

3. BODY

 

Information retrieval (IR) is the art and science of searching for information in documents, searching for documents themselves, searching for metadata which describes documents, or searching within databases, whether relational stand alone databases or hypertext networked databases such as the Internet or intranets, for text, sound, images or data. There is a common confusion, however, between data, document, information, and text retrieval, and each of these have their own bodies of literature, theory, praxis and technologies.

IR is a broad interdisciplinary field, that draws on many other disciplines. Indeed, because it is so broad, it is normally poorly understood, being approached typically from only one perspective or another. It stands at the junction of many established fields, and draws upon cognitive psychology, information architecture, information design, human information behaviour, linguistics, semiotics, information science, computer science and librarianship.

Automated information retrieval (IR) systems were originally used to manage information explosion in scientific literature in the last few decades. Many universities and public libraries use IR systems to provide access to books, journals, and other documents. IR systems are often related to object and query. Queries are formal statements of information needs that are put to an IR system by the user. An object is an entity which keeps or stores information in a database. User queries are matched to documents stored in a database. A document is, therefore, a data object. Often the documents themselves are not kept or stored directly in the IR system, but are instead represented in the system by document surrogates.

The following sentences are some short definitions of Information Retrieval:

- " The location and the presentation to a user of information relevant to an information need expressed as a query" (KORFHAGE)

- " An information retrieval system is a device interposed between a potential user of information and the information collection itself.

For a given information problem, the purpose of the system is to capture wanted items and to filter out unwanted items." (HARTER)

 

Process

- First approach to computer IR: IR is a simple matching process:

QUERY---> FILE ---> ANSWER

- Now realize it is an extremely complex process because:

- Information need is amorphous/hard to express

- Document representation is inexact/ambiguous

- Probabilistic rather than deterministic process

 

Components of an IR system

- Document processing (indexing)

- Query input

- Document-query "matching"

- Output module

- Feedback module

- User interface

After this introduction to Information Retrieval we are going to put into practice how different searchers work:

 

GOOGLE

Azkoitia: Resultados 1 - 10 de aproximadamente 14,600 de azkoitia. (0.27 segundos)

Azkoitiko: Resultados 1 - 10 de aproximadamente 1,540 de Azkoitiko. (0.45 segundos)

Azkoitiako:Resultados 1 - 10 de aproximadamente 267 de azkoitiako. (0.35 segundos)

Telendro: Resultados 1 - 10 de aproximadamente 158,000 de telendro. (0.15 segundos)

Telendros: Resultados 1 - 10 de aproximadamente 1,530 de telendros. (0.57 segundos)

Grupo: Resultados 1 - 10 de aproximadamente 14,400,000 de grupo. (0.23 segundos)

Grupos: Resultados 1 - 10 de aproximadamente 7,640,000 de grupos. (0.17 segundos)

Casa: Resultados 1 - 10 de aproximadamente 17,000,000 de Casa. (0.19 segundos)

Casas: Resultados 1 - 10 de aproximadamente 3,580,000 de Casas. (0.20 segundos)

Etxe: Resultados 1 - 10 de aproximadamente 25,700 de etxe. (0.24 segundos)

Etxeak: Resultados 1 - 10 de aproximadamente 7,490 de etxeak. (0.14 segundos)

 

 

JALGI

Azkoitia: bilaketa eskaerarentzat aurkitutako orri kopurua: 146

Azkoitiko: bilaketa eskaerarentzat aurkitutako orri kopurua: 33

Azkoitiako: bilaketa eskaerarentzat aurkitutako orri kopurua: 10

Telendro: bilaketa eskaerarentzat aurkitutako orri kopurua:

Telendros: bilaketa eskaerarentzat aurkitutako orri kopurua:

Grupo: bilaketa eskaerarentzat aurkitutako orri kopurua: 3659

Grupos: bilaketa eskaerarentzat aurkitutako orri kopurua: 2465

Casa: bilaketa eskaerarentzat aurkitutako orri kopurua: 2508

Casas: bilaketa eskaerarentzat aurkitutako orri kopurua: 1679

Etxe: bilaketa eskaerarentzat aurkitutako orri kopurua:

Etxeak: bilaketa eskaerarentzat aurkitutako orri kopurua:

 

AURKI

 

Azkoitia:A T A L A K ¡Error!Marcador no definido.[1-5] Guztira: 5/ G U N E A K ¡Error!Marcador no definido.[1-7] Guztira: 7

Azkoitiko: A T A L A K ¡Error!Marcador no definido.[1-4] Guztira: 4 / G U N E A K ¡Error!Marcador no definido.[1-7] Guztira: 7

Azkoitiako: A T A L A K ¡Error!Marcador no definido.[1-1] Guztira: 1/ G U N E A K ¡Error!Marcador no definido.[1-1] Guztira: 1

Telendro: A T A L A K ¡Error!Marcador no definido.[1-1] Guztira: 1/G U N E A K ¡Error!Marcador no definido.[1-1] Guztira: 1

Telendros: zerbitzarian arazoak daude edo ez dugu telendros bilatuta ezer aurkitu. Google bilatzailera desbideratuko zaitugu.

Grupo: A T A L A K ¡Error!Marcador no definido.[1-5] Guztira: 17 / G U N E A K ¡Error!Marcador no definido.[1-20] Guztira: 5026

Grupos: A T A L A K ¡Error!Marcador no definido.[1-5] Guztira: 5 / G U N E A K ¡Error!Marcador no definido.[1-20] Guztira: 875

Casa: A T A L A K ¡Error!Marcador no definido.[1-5] Guztira: 11 / G U N E A K ¡Error!Marcador no definido.[1-20] Guztira: 6400

Casas: A T A L A K ¡Error!Marcador no definido.[1-5] Guztira: 5 / G U N E A K ¡Error!Marcador no definido.[1-20] Guztira: 1197

Etxe: A T A L A K ¡Error!Marcador no definido.[1-5] Guztira: 5 / G U N E A K ¡Error!Marcador no definido.[1-11] Guztira: 11

Etxeak: A T A L A K ¡Error!Marcador no definido.[1-5] Guztira: 5 / G U N E A K ¡Error!Marcador no definido.[1-6] Guztira: 6

 

4. CONCLUSION

In conclusion, we could say that between these three searchers the best one is Google. Google has better and more sources that the other two. For example, when we search the word Azkoitia the number of pages is 14,600, but if we search for Azkoitiko, the number of page is reduced to 1,540. So we can say that Google does not take the morphology into account. The same happens with Jalgi when searching the same word. Jalgi finds 146 for Azkoitia and 33 for Azkoitiko, and Aurki finds 5-7 for Azkoitia and 4-7 for Azkoitiko. So it doesn't make much difference. Aurki takes better the morphology into account. But Aurki accesses to Google in case that the searcher is not able to find something in its own searcher. Jalgi is more or less reduced to the information in the Basque Country.

However, we can say that the searchers are very useful for internet users. They let us have acces to almost any place we like, without having to know much about how to manage with computers. They are very good systems to have acces, communication and news from the rest of the world.

 

5. REFERENCES

1. http://en.wikipedia.org/wiki/Information_retrieval

2. http://www.sis.pitt.edu/~erasmus/week1.ppt