Report C by Ager Gondra, Unai Diego de Somonte, Andrew San Juan, Stéphane Cos.
this project we are going to work on Information Retrieval. For this purpose we
have selected a few internet search engines, and we are going to look for
certain words on them. First, we are going to write down the words in their
singular form, and afterwards in their plural form. Once we have done this we
shall compare the results obtained and we will draw our conclusions on why are
the results are different..
first step in order to accomplish our task has been to select a few internet
search engines such as: Google, Aurki and Yahoo.
of these, like Google or Yahoo are well known through out the world. But others,
like Aurki and Jalgi, are not that famous. This is an interesting contrast
because this way we will see if the famousness of an internet search engine is
directly proportional to its semantic capabilities.
we have chosen the search engines, the next step will be to select a list of
words. The words we have selected are: ball, giant, apple, war, connexion and
boy. These words are going to be looked for in each internet search engine in
their singular form as well as in their plural form. And this procedure will not
only be done in English, but also in Euskera and in Spanish.
near the end of the report it will also be possible to find a description of
other internet search engines as well as some of their characteristics.
last step will be to compare the results obtained in each and try to
explain the differences in those results obtained given that there
are any differences.
of all we ahve searched for a definition of what Informatin Retrieval means:
Retrieval (IR), or document retrieval is the systematic manipulation of textual
information so that it can be easily be found again (retrieved). On the WWW, the
most important method of IR is the indexing of free-form text. IR exhibits
to (but is not the same as) other areas of information processing, such as
expert systems and data base management systems (DMBS).
We have also include a
detailed description of each server selected.
of the most versatile web searchers. It permits to make searches using 5
- searches in the web
- searches by groups or
- searches of images only
- searches by directories
- searches of news
associated with the typed word
had an integrated translator too in the search engine that allows a 150 words
translation, moreover, it can translate most of the web pages the user may find
interesting and can be configured to be displayed in various languages,
including languages such as the "Klingon" or the "Bork!Bork!Bork!".
This configuration can be obtained through a complete menu of preferences with
the possibility of an "advanced search" option that makes Google a
globally efficient translator.
of the oldest in the web, conceived more like an information portal than like a
searcher, it offers direct access to the most interesting news and the
posibility to personalize it as long as the user is registered, allowing access
to e-mail, horoscope etc... The advanced search option is as detailed as the one
in Google. Nevertheless the range of languages is smaller and the option of text
translation does not exist. Yahoo has the following searching criteria:
- search in all the web
- search of images
- yellow pages
exclusively spanish portal similar to Yahoo. It offers services like free
e-mail, access to forums and chats. The portal empowers these options neglecting
others like a more complete "advanced search" option, it offers
nevertheless a simple but complete translation system offered by
Basque search engine that can be displayed in French, Spanish, English or Basque.
It is a limited search engine since it does not have an advanced search option
nor a translation text system.
is interesting to note that the most popular searchers, Google and Yahoo, are
more "international" since they are registered in different domains,
that is they have international (Spanish, French, English...) portals,
modifiying their services depending on the chosen country.
is not very famous. The most popular search engine in basque language remains
kaixo. As its title says, it is the first only-basque search engine. Here we
will not find any kind of option for translating the page into Spanish or French,
what makes its usage more exclusive, limitated only to those users with a
knowledge of Basque.
as we deepen into this search engine, we find an option for other languages.
Actually it is not very useful by itself, because it takes us directly to Google.
the Search Engines
|Ball||ball 32.100.000||Balls 9,960,000|
|Giant||Giant 12.200.000||Giants 5,520,000|
|Apple||Apple 41,500,000||Apples 3,150,00|
|War||War 97,500,000||Wars 19,900,000|
|Connexion||Connexion 4,360,000||Connexions 926,000|
|Boy||Boys 47,200,000||Boys 35,900,000|
|Words in Spanish||Singular
|Gigante 924,000||Gigantes 545,000|
|Conexión 1,260,000||Conexiones 354,000|
|Niño 1,480,000||Niños 2,290,000|
|Pilota||Pilota 15.200||Pilotak 864|
|Erraldoia||Erraldoia 974||Erraldoiak 830|
|Sagarra||Sagarra 13.900||Sagarrak 449|
|Guda||Guda 643||Gudak 22|
|Konexioa||Konexioa 936||Konexioak 952|
|Mutila||Mutila 16.100||Mutilak 2.120|
These are the same words above searched with the
Yahoo! search engine.
|Ball||ball 50.300.000||Balls 13.800.000|
|Giant||Giant 22.800.000||Giants 10.300.000|
|Apple||Apple 38.900.000||Apples 4.830.000|
|War||War 136.000.000||Wars 29.200.000|
|Connexion||Connexion 6.400.000||Connexions 864.000|
|Boy||Boys 62.000.000||Boys 45.200.000|
|Words in Spanish||Singular
|Gigante 1,470,000||Gigantes 609,000|
|Conexión 1,260,000||Conexiones 832,000|
|Niño 824||Niños 735|
|Pilota||Pilota 1,140,000||Pilotak 43,000|
|Erraldoia||Erraldoia 491||Erraldoiak 419|
|Sagarra||Sagarra 24,300||Sagarrak 198|
|Guda||Guda 34,800||Gudak 194|
|Konexioa||Konexioa 505||Konexioak 138|
|Mutila||Mutila 28,100||Mutilak 540|
popularity of certain words such as
sagarra, pilota or mutila shows us more than 10.000 words here. Anyway, there is
a generality that there are many more singular words than plural ones, except of
konexioak, which is a word that uses to occur in plural, regarding the chosen
We are going to look at the following basque words: "baloia", "erraldoia", "sagarra", "guda", "konexioa" eta "mutila".
3 sections and 4 places
1 section and 2 places
3 sections and 3 places
1 section and 1 place
3 sections and 2 places
In the case of "erraldoi" we find different interpretations for this word. We see it as an adjective or a noun, respecting the ambiguity that arises from such ambiguous word without its context.
It is strange how the search engine understands the word "sagarra": instead of restricting to its own meaning, it understands it as "sagarroi", a word based on "sagarra", but which has nothing to do with what the word apple means.
It is also remarkable how this search engine does not care too much about the declinations of the basque words we have tested: the words have been tested in singular and plural, but it provides also those places where we find the words in some other cases, such as the non-defined case "mutil" in "mutila" or "mutilak". This is why the results in singular and plural are the same.
what we can derive of the results obtained, we can see that in most of the cases
the noun in its singular form generates far many more results in each search
that the plural form of the same word.
could be related to limitations of computer translators. In report B we’ve
seen how computers had problems dealing with lexical variations: plurals,
composite nouns, neologisms are most of the time not well recognized. This
lexical limitation seems to spread to search engines. Thus making a change of
one letter passing then from singular to plural results in a less number of
entries while searching for a word. This lack of
ability to recognise derivate forms is one important problem of the
we have also noticed that some search engines, like for example Google, actually
correct the word you have chosen to look for in case you haven’t spelled it
right, or you have made any mistake while taping it.
Basque internet search engines obtain far less results than any of the other
search engines. This could be due to the fact that their lexicon is much more
limited. And also because maybe those search engines are designed to look for
words in Basque strictly.