Examples like the ones in (
) below are familiar to translators,
but the examples of colours (
c), and the Japanese
examples in (
d) are
particularly striking. The latter because they show how languages
need differ not only with respect to the fineness or `granularity' of
the distinctions they make, but also with respect to the basis for the
distinction: English chooses different verbs for the action/event of
putting on, and the action/state of wearing. Japanese does not make
this distinction, but differentiates according to the object that is
worn. In the case of English to Japanese, a fairly simple test on the
semantics of the NPs that accompany a verb may be sufficient to decide
on the right translation. Some of the colour examples are similar, but
more generally, investigation of colour vocabulary indicates that
languages actually carve up the spectrum in rather different
ways, and that deciding on the best translation may require
knowledge that goes well beyond what is in the text, and may even be
undecidable. In this sense, the translation of colour terminology
begins to resemble the translation of terms for cultural artifacts
(e.g. words like English cottage, Russian dacha,
French château, etc. for which no adequate translation
exists, and for which the human translator must decide between
straight borrowing, neologism, and providing an explanation). In this
area, translation is a genuinely creative act
, which is well beyond the capacity of
current computers.
Calling cases such as those above lexical mismatches is not controversial. However, when one turns to cases of structural mismatch, classification is not so easy. This is because one may often think that the reason one language uses one construction, where another uses another is because of the stock of lexical items the two languages have. Thus, the distinction is to some extent a matter of taste and convenience.
A particularly obvious example of this involves
problems arising from what are sometimes called lexical holes
--- that is, cases where one language has to use a phrase to express
what another language expresses in a single word. Examples of this
include the `hole' that exists in English with respect to French
ignorer (`to not know', `to be ignorant of'), and se
suicider (`to suicide', i.e. `to commit suicide', `to kill
oneself'). The problems raised by such lexical holes have a certain
similarity to those raised by idiom s: in both cases, one has phrases
translating as single words. We will therefore postpone discussion of
these until Section .
One kind of structural mismatch occurs where two languages use the same construction for different purposes, or use different constructions for what appears to be the same purpose.
Cases where the same structure is used for different purposes include the use of passive constructions in English, and Japanese . In the example below, the Japanese particle wa, which we have glossed as `TOP' here marks the `topic' of the sentence --- intuitively, what the sentence is about.
Example (
) indicates that Japanese has a passive-like
construction, i.e. a construction where the PATIENT, which is normally
realized as an OBJECT, is realized as SUBJECT. It is different from
the English passive in the sense that in Japanese this construction
tends to have an extra adversive
nuance which might make (
a) rather odd, since it suggests an
interpretation where Mr Satoh did not want to be elected, or where
election is somehow bad for him. This is not suggested by the English
translation, of course. The translation problem from Japanese to
English is one of those that looks unsolvable for MT, though one might
try to convey the intended sense by adding an adverb such as
unfortunately. The translation problem from English to Japanese is
on the other hand within the scope of MT, since one must just choose
another form. This is possible, since Japanese allows SUBJECTs to be
omitted freely, so one can say the equivalent of elected Mr
Satoh, and thus avoid having to mention an AGENT
. However,
in general, the result of this is that one cannot have simple rules
like those described in Chapter
for passives. In fact,
unless one uses a very abstract structure indeed, the rules will be
rather complicated.
We can see different constructions used for the same effect in cases like the following:
Figure: venir-de and have-just
The first example shows how English, German and French choose
different methods for expressing `naming'. The other two examples
show one language using an adverbial ADJUNCT ( just, or
graag(Dutch) `likingly' or `with pleasure'), where another uses a
verbal construction. This is actually one of the most
discussed problems in current MT, and it is worth examining why it is
problematic. This can be seen by looking at the representations for
() in Figure
.
These representations are relatively abstract (e.g. the information
about tense and aspect conveyed by the auxiliary verb have
has been expressed in a feature) , but they are still rather different.
In particular, notice that while the main verb of (a) is
see, the main verb of (
b) is venir-de.
Now notice what is involved in writing rules which relate these
structures (we will look at the direction English
French).
All this is summarized in Figure and Figure
.
Figure: Translating have-just into venir-de
Of course, given a complicated enough rule, all this can be stated.
However, there will still be problems because writing a rule in
isolation is not enough. One must also consider how the rule interacts
with other rules. For example, there will be a rule somewhere that
tells the system how see is to be translated, and what one
should do with its SUBJECT and OBJECT. One must make sure that this
rule still works (e.g. its application is not blocked by the fact that
the SUBJECT is dealt with by the special rule above; or that it does
not insert an extra SUBJECT into the translation, which would give
* Sam vient de Sam voir Kim). One must also make sure that the
rule works when there are other problematic phenomena around. For
example, one might like to make sure the system produces (
b) as
the translation of (
a).
Figure: The Representation of venir-de
We said above that everything except the SUBJECT, and some of the
tense information goes into the `lower' sentence in French. But this
is clearly not true, since here the translation of probably
actually becomes part of the main sentence, with the translation of
(a) as its COMPLEMENT.
Of course, one could try to argue that the difference between English just and French venir de is only superficial. The argument could, for example, say that just should be treated as a verb at the semantic level. However, this is not very plausible. There are other cases where this does not seem possible. Examples like the following show that where English uses a `manner' verb and a directional adverb/prepositional phrase, French (and other Romance languages ) use a directional verb and a manner adverb. That is where English classifies the event described as `running', French classifies it as an `entering':
The syntactic structures of these examples are very different, and it is hard to see how one can naturally reduce them to similar structures without using very abstract representations indeed.
A slightly different sort of structural mismatch occurs where two languages have `the same' construction (more precisely, similar constructions, with equivalent interpretations), but where different restrictions on the constructions mean that it is not always possible to translate in the most obvious way. The following is a relatively simple example of this.
What this shows is that English and French differ in that English
permits prepositions to be `stranded' (i.e. to appear without their
objects, like in
a). French normally requires the preposition and
its object to appear together, as in (
d) --- of course, English
allows this too. This will make translating (
a) into French
difficult for many sorts of system (in particular, for systems that
try to manage without fairly abstract syntactic representations).
However, the general solution is fairly clear --- what one wants is to
build a structure where (
a) is represented in the same way as
(
c), since this will eliminate the translation problem. The most
obvious representation would probably be something along the lines of
(
a), or perhaps (
b).
While by no means a complete solution to the treatment of relative clause constructions, such an approach probably overcomes this particular translation problem. There are other cases which pose worse problems, however.
In general, relative clause constructions in English consist of a
head noun ( letters in the previous example), a relative pronoun
(such as which), and a sentence with a `gap' in it. The
relative pronoun (and hence the head noun) is understood as if it
filled the gap --- this is the idea behind the representations in
(
). In English, there are restrictions on where the `gap' can
occur. In particular, it cannot occur inside an indirect question, or
a `reason' ADJUNCT.
Thus, (
b), and (
d) are both ungrammatical. However, these
restrictions are not exactly paralleled in other languages. For
example, Italian allows the former, as in (
a), and Japanese the
latter, as in (
c).
These sorts of problem are beyond the scope of current MT systems ---
in fact, they are difficult even for human translators.