REPORT C, by Ana Belén Rodríguez Piñuelos (firstname.lastname@example.org).
On this report C I am going to deal with a theme called "information retrieval". I will talk about the different problems you could find using searchers, and the methods to obtain correctly what you were searching for.
Students are accostumed to search on the Internet from different purposes, for example to obtain information to add to a work for English Literature. I will talk from my personal experience to give an example.
When you try to find something about Thomas Hardy on the web, the possibilities of obtain something that you can use to extend your work are really amazing. I will make a script to see what would be my strategy to find something about this writer.
- First of all I introduce the topic "Thomas Hardy". The number of matches found is 1,520,000 pages. Well, I cannot read all these pages, so I must reduce my search.
- I add another word according to the theme. For example, I want to find information about a book called "Tess of the d´Urbervilles.From now on my key words are "Thomas Hardy Tess of the d´Urbervilles". The number of matches found is 30,600, much more littler than in the first attempt.
- I want also the pages on which the word "Talbothays" appears, so I include this word. The number of matches found is 426.
- "Tess of the d´Urbervilles" was published in 1891. Adding this extra information, now we have to search this chain of words: "Thomas Hardy Tess of the d´Urbervilles Talbothays 1891". The number of matches found is 23.
The objective of all this explanation is to exemplify the difference of matches found when you concrete what you are searching. This is the base to introduce some definitions of information retrieval.
Here are some definitions of information retrieval:
- Actions, methods and procedures for recovering stored data to provide information on a given subject. [ISO 2382/1 (1984).
- Searching a body of information for objects that match a search query.
- The activity of retrieving information by extracting documents or its parts from larger quantities of documents with the help of a computer, auxiliary structures and mathematical methods.
- IR is the process of determining the relevant documents from a collection of documents, based on a query presented by the user.
In resume: information retrieval helps you to reduce the search to obtain the most successful pages according to your preferences.
But this"selection" of information is not always successful. You have to choose the better searcher to find what you want. There are three main types of searchers:
a) Automatic searchers: those which from some informaton given can deduce and recover the information you are looking for. Their objective is to found the documents according to the keywords you write.
b) Thematic searchers: those which found the documents in direct relationship with the matter of search. they anañyze from the most general themes to the most concrete.
c)Specialized searchers: those which recopilate all the resources about an specific theme.They could answer to a formulated question.
An example is www.search.com.
Depending on the kind of material you want to obtain, you better use one searcher or another.
|Class of search.||Recommended seeker.|
|Indefinite exploration.||Thematic searchers.|
|Generic search.||Specialized searchers.|
|Concrete search.||Automatic searchers.|
There are some logic operators that could make your search much more easier. These are...
- Logic AND: for example, you want to find something about English Philology. You introduce "English AND Philology".
The searcher will find the pages containing both words, not the pages containing only one of them.
You could also use the symbol "&".
- Logic NOT: for example, you want to find something about some Philology, but not of English Philology. You introduce "Philology NOT English".
The searcher will find pages containing the first element, and without the second element.
We could also use the symbol "!"
- Logic OR: for example, you want to find something about English or something about Phlology. You introduce "English OR Philology".
The searcher will find pages containing one of those elements, or the pages with both elements.
We could also use the symbol "/". When you don´t write any logic symbol, the machine intrerprets it as a logic OR.
But there are also closeness operators, those which help us to specify the relationship between elements appearing in your search.
- NEAR: for example you want to find one page about ships. but you want to read also something about commercial ones. So you introduce "ships NEAR commercial".
It has a similarity with the logic AND, but in this case the requirement is not to have more than ten words between the two required words.
This function is not present in all searchers. In this case, it is present in Altavista.
- ADJ: for example you want to find one page about commercial ships. So you introduce "ships ADJ commercial".
The searcher will find the pages on which both words appear together.
We could also use "", or a dash "-" between those two words.
- To look for a concrete text you have to introduce those all word with dashes. For example "to-be-or-not-to-be-that-is-the-question".
When you want to specify the existance or non-existance of a word you can use existance operators. There are two classes:
- if you want the presence of some word you´ve to introduce the symbol "+".
- if you want the absence of some word you´ve to introduce the symbol "-".
We normally use automatic searchers, like Google. There are always three main problems appearing. Here I will explain some of the techniques to annotate the search.
- Try to be more specific.
- Use more key words.
- To demand a word appearing, use the logic AND. To restrict some words, use the logic NOT.
- Use sentences instead of words.
- Restrict the search using words such as "title", "url", "link" or "host".
- Use capital letters with the proper nouns and add the orthographic accent.
- If you thnk that a word is really important, just repeat it.
2. No results, or a little amount of them:
- Eliminate some key words.
- Change the logic AND for the logic OR.
- Take in care your orthography.
- Use synonimes.
- Change from singular to plural or vice-versa.
3. The search is very slow:
- Eliminate frequent words such as articles.
- Use a little amount of words.
Now I am going to illustrate the main errors appearing at four different searchers. These are:
|Overclocking||2,830,000 results found.||770,119 results found.||1.105,040 results found.||No results found.|
|Overcloking||It corrects you, but it only looks for the introduced word.||It says you what did you mean and seachs for overclocking.||It makes an "spelling suggestion" but the word appearing on the pages is overcloking.||No results found.|
|Esdrujula||The searcher adds the written accent and search only the corrected word.||No results found. It recommends to make sure the spelling.||It searchs the word with and without written accent.||No results found.|
|Esdrújula||The same results as in the previous example.||It finds the required word.||It finds only the word with written accent.||No results found.|
|Lamia||134,000 results found.||21,554 results found.||45,529 pages found.||2 results found.|
|Lamía||135,000 results found. The machine corrects you, "what you did mean was La mía".||No matches found. Review you spelling.||3,501 pages found. It doesn´t correct you and searchs for the introduced word.||2 results found.|
|lamia||The same results as in the word with capital letter.||The same results as in the word with capital letter.||The same results as in the word with capital letter.||The same results as in the word with capital letter.|
|Piñuelos||No differentiation: instead of the capital letter, it also looks for "piñuelos". 28 results found.||14 results found. It also looks for ·piñuelos".||12 results found. It also looks for "piñuelos".||No results found.|
|piñuelos||The machine corrects you,"what you did mean was pueblos/pañuelos". It searchs for the required word.||The same results as in the previous example.||The same results as in the previous example.||No results found.|
|gogle||"What you did mean was google". It searchs for the required word. 59,800 matches found.||No results found. Correct the spelling.||8,968 matches found.||No results found.|
|flugelhorn||106,000 matches found.||21,523 matches found.||25,470 matches found.||No results found.|
|Apel||It looks for APEL, APeL & apel.650,000 matches found.||It looks for APEL, Apel, APeL & apel.26,682 matches found.||Like in Alltheweb, it looks four different acceptions. 157,764 matches found.||The system understands the word as "papel". 99 matches found.|
|peach||3,930,000 matches found.||795,270 matches found.||975,238 matches found.||No results found.|
|peaches||1,220,000 matches found.||322,288 matches found.||408,480 matches found.||No results found.|
"In today's "information society" there is an increasing need for effective management of the information available; through local networks and the Internet. Improving access to and efficient use of on-line information mainly involves "Information Retrieval" (IR), i.e. identifying the documents containing relevant information; and "Information Extraction" (IE), i.e. extracting relevant information from documents, and "Text Summarisation" (TS), i.e. presenting condensed information." (CCL-UMIST)
So I could resume saying that Information Retrieval is only one of those "paces" to obtain some information fron the Internet. Being the first action you must do, it is important to know some "triks", to make the seach much more easier.
I hope you have understood what I wanted to explain!