MACHINE TRANSLATION PROBLEMS

by Paola García, Nerea Curto and  Gala Díez

 

ABSTRACT

INTRODUCTION TO MACHINE TRANSLATION

PROVERBS AND IDIOM  PROBLEMS

GRAMMAR ERRORS

USUAL EXPRESIONS IN SPANISH 

  CONCLUSION     

REFERENCES

 

 

ABSTRACT

            This report is about Machines Translations on Internet and some problems that they have to translate correctly from Spanish into English due to the not very advanced technologies. We have sructured the report in different parts, firstly we have made a little introduction to Mchimne Translation and we find the body of the report  on which we can divide the translation problems; they are: colloquial expressions, simple sentences  problems, grammar errors, proverbs and idioms. We have studied a lot of examples and sentences and its errors and we have compared the different methods of each traductor. The traductors we have chosen are: Freetranslation, El mundo and Reverso. Then, we  have written the correct form in English.With these examples the reader can understand clearly the problems that Traductors still have.

            

INTRODUCTION TO MACHINE TRANSLATION

           Machine Translation (MT) has been the great hope and the great disappointment of CL(Computational Linguistics). Taking into account the linguistic and technical challenges and difficulties of the task - translating is without any doubt one of the most complex linguistic processes conceivable, involving large amounts of data and highly complex mental operations that are by no means well-known and understood - it is surprising that MT should have been one of the earliest attempted applications, actually the first non-numerical application of electronic data processing. From the middle of the 1940s information theorists tried to tackle the problem of MT on a mathematical basis as a cryptographic or, more generally, statistical problem. They viewed the translation of text from one language to another as a computable transformation from one method of encoding corresponding information into a different one.

            As this approach did not work out linguists entered the scene. They first tried literal word for word substitution, adding some supplementary rearrangement rules. Further development saw successive extensions and refinements of the rules, until syntactic surface description was complemented by the analysis of underlying logical and semantic structures. Currently available commercial systems running on personal computers still use this linguistic technology.

            But, in fact, not all information that is necessary to correctly translate the content (let alone the stylistic) features of a text is explicitly encoded in lexical and syntactic structures. Human translators also refer to implicit (unexpressed) linguistic and extra-linguistic knowledge (cotextual knowledge and contextual, so-called "world knowledge"). The problem is how to represent and organise this implicit information in order to make all and only the relevant data available to an MT system.

            Trying to solve this and other problems, MT developers have sought on the one hand to achieve further improvements on the linguistic level, and on the other hand to take advantage of methods and achievements of artificial intelligence and "knowledge processing". These efforts are referred to as "third generation MT" - after a first, information theory-based, and a second (computational) linguistics-based generation. In parallel, quantitative methods have emerged which, based on large parallel corpora of existing translations, rely on probabilities of interlingual correspondence or on (partial) literal and structural matches. Currently, efforts in CL concentrate on the (often comparative) evaluation of existing MT systems, emphasising a differentiating and pragmatic approach to translation quality and usability in function of parameters like, among others, text types, text categories or quality and cost requirements.

            Quantitative methods have lead one of the most interesting and promising developments in the field of computer-assisted translation, viz. translation memories. These use archive databases containing parallel sentence for sentence versions of previously translated texts. Input (source) sentences which are identical or very similar to sentences in the archive then do not have to be reanalysed and retranslated. Instead, their translation can be used again, if necessary with some modifications. The strength of such programs lies in retrieving and intelligently handling less than 100% matches of new input sentences with archived previously translated ones. They are particularly useful and effective for translating new versions of, for instance, users' guides and operation manuals, where large oarts of the text may be not at all or only slightly altered in comparison to former versions. Translation memories do not translate automatically, i.e. without any human intervention during the translation process proper, but interactively, displaying potential matches of input sentences from their archives with their translation, including possible differences. It is up to the user in each case to decide whether to accept, modify or reject the solution offered. It should be noted that there are some interactive MT systems, too. And there are some commercially available systems with integrated translation memories now.

           We will consider some particular problems which the task of translation poses for the builder of MT systems --- some of the reasons why MT is hard. It is useful to think of these problems under two headings: (i) Problems of ambiguity , (ii) problems that arise from structural and lexical differences between languages and (iii) multiword units like idiom s and collocations .Of course, these sorts of problems (ambiguity)are not the only reasons why MT is hard. Other problems include the sheer size of the undertaking, as indicated by the number of rules and dictionary entries that a realistic system will need, and the fact that there are many constructions whose grammar is poorly understood, in the sense that it is not clear how they should be represented, or what rules should be used to describe them. This is the case even for English, which has been extensively studied, and for which there are detailed descriptions -- both traditional `descriptive' and theoretically sophisticated -- some of which are written with computational usability in mind. It is an even worse problem for other languages. Moreover, even where there is a reasonable description of a phenomenon or construction, producing a description which is sufficiently precise to be used by an automatic system raises non-trivial problems.

 

PROVERBS AND IDIOM  PROBLEMS

            Due to the similarities of the three translators to translate proverbs, we will show in the following table three parts or colums: the proverb in Spanish, the correct traduction of it and the wrong one translated by we have named before.

SPANISH PROVERB  TRADUCTOR THE CORRECT FORM
Dios los cría y ellos se juntan God raises them and they join  Birds of a feather flock together the correct form 

 

Quien madruga Dios le ayuda The one who gets up early God helps him(her)

 

The early vid cathes the worm
Más vale ser cabeza de ratón que cola de león More it is worth being a head of mouse that rat tail Better to be the first among roosters than last among bulls
Por el humo se sabe donde está el fuego By the smoke it is known where the fire is

 

Where there’s smoke, there’s fire
A donde fueres haz lo que vieres To where you will be a bundle what you will see When in Rome, do as the Romans do
El que siembra recoge The one that he(she) sows, he(she)catches As you sow so shall you reap
No hay miel sin hiel There is no honey without gall Where there's a sweet, there's always a bitter
Al que no quiere caldo, taza y media To that it(he,she) does not want broth, cup and a half It never rains, but it pours
El que nace para mulo del cielo le cae el arnés The one that it(he,she) is born for mule of the sky falls(falls due) the harness He that is born to be hanged shall never be drowned
Paga lo que debes, sanarás del mal que tienes He(She) pays what you owe, you will recover of the evil that you have Out of debt, out of danger
Dinero llama dinero Money calls money Money goes where money is
Tanto tienes, tanto vales So much you have, so much you cost(suit) A man is worth as much as he owns
La sarten le dijo al cazo, apartate que me tiznas The frying pan said to the ladle, give way that you stain me It’s the pot calling the kettle black

 

GRAMMAR ERRORS

Sentences:

Translation2.

paralink.com

El mundo

Freetranslation

The real answer:

¿Què hora es?

What hour is?

That hour (o`clock) is?

Què hour is?

What is the time?

¿ Cuándo es tu cumpleaños?

What day is your birthday?

When it is your birthday?

When it is your birthday?

When is your birthday?

¿ Cuántos años tienes?

How old are you?

All the years have you?

How many years have you?

How old are you?

Tengo frio

I have cold

I have cold

I have frío

I am cold

¿ Cuánto cuesta?

How much costs?

All that costs?

How much costs?

How much does it cost?

¿ Me lo puedo probar?

Can I prove it?

Can I me prove (try) it?

I can test it myself

Can I try it on?

¿ Lo puedo probar?

Can I prove it?

Can I me prove (try) it?

I can test it?

Can I test it?

¿ Tienes ganas de ir a casa de tu tía?

Do you desire to go to the house of your aunt?

Have you desire of going home of your aunt?

You want to go home of your tía?

Do you fancy going to your aunt`s house?

¿ Has alquilado una pista de tenis?

Have you rented a track of tennis?

Have you rented a tennis court?

You have rented a trail of tennis?

Have you rented a tennis court?

¡ qué sorpresa más grande!

That bigger surprise!

That surprise big más

Qué larger surprise!

What a big surprise?

La chica que subía las escaleras era delgada

The girl who was raising the stairs was thing

The girl who was raising the stairs was thing

The girl that subia the stairs was delgada

The girl who was raising the stairs was thing

Mucho calor

A great deal of heat!

A lot of heat!

A lot of heat!

Guapa!

handsome

beauty

Handsome/pretty

Pretty

belleza

beauty

beauty

beauty

beauty

             Thanks to that square we can see that for example "freetranslation.com" does not distinguish the accents, so if you want to translate "Qué", the translator is not going to put "what", it is going to put "Qué".

             Another failure that we have found in that square is that the relative pronouns are bad used in the three of the translators. We can see that with the sentence " the girl who was raising the stairs...". Moreover there are basic errors such as is you want to ask for the time, or just a simple question such as " how old are you"...; the translators translate those basic questions literary and without taking into account grammar.

            To finish analysing that square we have to say that none of the translator has given a good translation so we can not be sure using any of them.

 

USUAL EXPRESIONS IN SPANISH

          And now, we are going to explain how this three different translators translate usual expresions in Spanish in a different way. As we can see, sometimes they translate in the same way the same expresion, nearly ever with mistakes. But sometimes they do a complete different translation.

 

SPANISH

ENGLISH

REVERSO

EL MUNDO

FREETRANSLATION

Venga ya

Come on

1-Come already

2- come already

3- come already

Date prisa

Quickly!

1-Date hurry

2-you give you hurry

3-date hurry

mira que tengo ganas de verte

I´m looking forward to see you

1-it looks that I want to see you

2-Gun-sight that I have desire of meeting

3-sight that I desire to see you

vaya lento

such a slowly person

1-go slow

2-Go slowly

3-go slowly

me estoy cociendo

Such a hot

1-I am cooking myself

3-I am cooking myself

2-I am cooking myself

Menudo lio

Such a problem

1-small lio

2-Tiny mess

3-tiny it tied

 

  CONCLUSION    

             In conclusion, the translations given by "Reverso", "Freetranslation" and "Elmundo" present us a lot of failures. So the result is dissapointed for us as we have showed previously. The main reason for this may be that, as it is a program, it does not deal with the real human language, though it tries to imitate the language in a more natural way.

            The main problems that we have found could be: Translators are not updated, ambiguous words cannot be analysed and Syntax rules are not respected. We have noticed that translators have memorised some common sentences that we usually use in our diary speech and this is something that help us. But although they have memorised some words, they really don’t understand grammatical structures and basic errors in the language, so this could be the biggest problem in machine tranlation.

REFERENCES

http://www.systransoft.com/

http://www.el-mundo.es/traductor/

http://www.freetranslation.com/

http://www.reverso.com/