A colloquium presented by Joseba Abaitua (Universidad de Deusto in Bilbao, Spain) at the Computer Research Laboratory, New Mexico State University, Las Cruces, New Mexico. July 1998.
[Lesson 1]
[Lesson 2] [Lesson 3] [Lesson 4] [Lesson 5] [Lesson 6] [Lesson
7]
[References]
At the outermost tip of the branch of the appletree on the hillside a bird was singing... liruliruli, liruliruli... Who's going to dance that little tune? |
Example 1 Basque popular song |
aldapa | cuesta | hillside | Noun |
sagar (sagarondoa) |
manzana (manzano) |
apple (appletree) |
Noun Noun |
adar | rama | branch | Noun |
punta | punta | tip | Noun |
txori | pájaro | bird | Noun |
zegoen | estaba | was | Finite Verb |
kantari | cantando | singing | Noun |
nork | quién | who | Pronoun |
dantzatuko | bailará | will dance | Verbal Noun |
ote | quizás | perhaps (dare) | Particle |
du | ha | has | Finite Verb |
soinutxo | musiquita | little tune | Noun |
hori | esa | that | Determiner |
Aldapeko sagarraren adarraren puntaren puntan | |
hillside-on-of appletree-of branch-of tip-of tip-at | |
En la punta de la punta de la rama del manzano de la cuesta | |
At the outermost tip of the branch of the appletree on the hillside | |
Example 2 Head final structure of a Basque phrase |
puntan | > | En la punta | ||||||||
puntaren | de la punta | |||||||||
adarraren | de la rama | |||||||||
sagarraren | del manzano | |||||||||
Aldapeko | de la cuesta |
Figure 1 Head final vs. head initial phrases
See note 1.
PP -> N" P | aldapa-n, punta-n | Postposition '-n' (innesive) |
N" -> AP N' | adarraren punta | |
aldapeko sagarra | ||
N' -> N Det | punta-a sagar-a | |
Other postpositions: | ||
PP -> N" P | aldapa-ra | Postposition '-ra' (directional 'to') |
aldapa-rantz | Postposition '-rantz' (directional towards 'towards') | |
aldapa-raino | Postposition '-raino' (directional-endpoint 'up to') | |
aldapa-tik | Postposition '-tik' (origin 'from') | |
sagarra-z | Postposition '-z' (instrumental 'with', 'by') | |
sagarra-gatik | Postposition '-gatik' (cause, motive 'because', 'for') | |
sagarra-rekin | Postposition '-rekin' (comitative 'with') | |
sagarra-rentzat | Postposition '-rentzat' (goal 'for') | |
See note 2.
NP -> N" Case | adarra-0 > adarra | -0 case (absolutive / iv<SUBJ>or tv<OBJ> ) |
adarra-k > adarrak | -k case (ergative / tv<SUBJ>) | |
adarra-i > adarrari | -i case (dative <OBJ2> ) | |
N" -> AP N' | amaren sagar gorria | mam's red apple |
N" -> S' AP N' | jan dituzun Eako hiru sagarrak | The three apples from Ea which you ate |
N' -> No N Det | hiru sagar-ak | the three apples |
N' -> Dgr N A Det | txit sagar gorri -a | a very red apple |
Category order under Basque NP: S'(rel) > AP > No > Dgr > N > A > Det
Category order under Spanish NP: Det > No > N > Dgr > A > PP > S'(rel)
The categories S', AP and PP above are recursive. Therefore, contrary to Spanish, Basque, as Japanese, suffers from left recursion, which is a well-know problem for straight top-down parsers (DCGs, etc.). Japanese researches realized very quickly and developed bottom-up parsers to solve the problem (such as BUP, by Matsumoto et al. 83, which was later applied to Basque by Ruiz et al. 91).
See note 3.
Complements of the Noun in Basque are derived by means of two suffixes. Complements derived from Noun Phrases take the suffex ren and Postpositional Phrases take ko. Given that these complements show conceptual and situational similarities with some adjectives (eg. euskal 'Basque'), we use the category AP (Adjectival Phrase) to define them.
AP -> N" ren | adarra-ren | ren is attached to N" |
AP -> PP ko | aldapan-ko | ko is attached to PP |
_ | PP | |||||||||||
_______ | N" | | | ||||||||||
_ | AP | ___ | N' | | | ||||||||
_______ | N" | | | | | | | | | |||||||
_ | AP | ___ | N' | | | | | | | | | |||||
_______ | N" | | | | | | | | | | | | | | | ||||
_ | AP | ___ | N' | | | | | | | | | | | | | | | ||
_ | PP | | | | | | | | | | | | | | | | | | | | | |
N" | | | | | | | | | | | | | | | | | | | | | | | |
__ | N' | | | | | | | | | | | | | | | | | | | | | | |
N | Det | P | ko | N | Det | ren | N | Det | ren | N | Det | P |
| | | | | | | | | | | | | | | | | | | | | | | | | |
aldapa | -a | -n | -ko | sagar | -a | -ren | adar | -a | -ren | punta | -a | -n |
Figure 2 Tree structure of phrase in (2)
Aldapekoarenari begira nago | Aldapeko sagarraren adarrari begira nago |
aldapa-n-ko-a-ren-a-ri hillside-on-of-the-of-the-to |
hillside-on-of appletree-the-of branch-the--to |
a la del de la cuesta | a la rama del mazano de la cuesta |
to the one of the one on the hillside | to the branch of the appletree on the hillside |
Example 3 Agglutination of suffixes in Basque |
This elliptical derived nominal has several anaphoric elements. We can think of it as an answer to a sentence as:
Zein sagarren adarri zaude begira?
Aldapekoarenari.
¿A la rama de qué manzano miras? A la del
de la cuesta.
To the branch of which appletree are you looking at? To the
one of the one on the hillside.
[The bitext: a sample] [Techniques] [RBBP] [HBP] [Large language resources]
Foru Agindua |
Orden Foral |
Foru Agindua, 767/1994 zk., urriaren 24ko. Aipatutako Foru Aginduaren bidez hurrengo hau xedatu da: Lurzoru batzuk dentsitate txikiko lurzoru urbanizagai gisa birsailkatzeko Zallako Udalerriko Planeamenduari buruzko Sorospidezko Arauen aldarazpena ukatzea. Erabaki honen aurka, haren jakinarazpenetik zenbatu beharreko hilabete biko epearen barruan, administraziozko liskarrauzi-errekurtsoa jarri ahal izango da, Euskal Herriko Justizia Auzitegi Nagusiko Administraziozko Liskarrauzietarako Salan, komeniesten diren beste defentsabideak erabil daitezkeelako kalterik gabe. Adierazi den epearen barruan, BHI-015/94-P05-A espedientea Bilbaoko Gran Vía, 19-21eko 5gn. solairuan egongo da ageriko, azter dadin. Bilbon, 1994.eko urriaren 24an.-Hirigintzako foru diputatua. Pedro Hernández González. Example 4. Bitext sample |
Orden Foral número 767/1994 de 24 octubre. Mediante la Orden Foral de referencia se ha dispuesto lo siguiente: Denegar la Modificación de las Normas Subsidiarias de Planeamiento del municipio de Zalla para la reclasificación de unos terrenos como Suelo Apto para Urbanizar de Baja Densidad. Contra dicha Orden Foral podrá interponerse, en el plazo de dos meses desde su notificación, recurso contencioso-administrativo ante la Sala de lo Contencioso-Administrativo del Tribunal Superior de Justicia del País Vasco, sin perjuicio de la utilización de otros medios de defensa que estime conveniente. Durante el referido plazo el expediente BHI-015/94-P05-A, quedará de manifiesto para su examen en las dependencias situadas en Bilbao calle Alameda Rekalde, 30, 5.a y 6.a plantas. Bilbao, 24 de octubre de 1994.-El Diputado Foral de Urbanismo.- Pedro Hernández González. |
Lurzoru batzuk dentsitate txikiko lurzoru urbanizagai gisa birsailkatzeko Zallako Udalerriko Planeamenduari buruzko Sorospidezko Arauen aldarazpena ukatzea. |
Denegar la Modificación de las Normas Subsidiarias de Planeamiento del municipio de Zalla para la reclasificación de unos terrenos como Suelo Apto para Urbanizar de Baja Densidad. |
Example 5. Recursive phrases in the bitext |
|
VP[ NP[ [AP5] AP1[ AP2[ AP3[ AP4[ Zallako] Udalerriko AP3] Planeamenduari buruzko AP2] AP[Sorospidezko] Arauen AP1] aldarazpena NP] ukatzea_V VP] | VP[ Denegar_V NP[ la Modificación PP1[ de las Normas Subsidiarias_A PP2[ de Planeamiento PP3[ de el municipio PP4[ de Zalla] PP3] PP2] PP1] [PP5] NP] VP] |
Example 6. Bracketed tree structure of (5) |
|
Example 7. Symmetric recursion in (5) |
|
AP5[ N"[ VP[ NP[ N"[ Lurzoru_N batzuk_Det N"] Case NP] [ PP[ N"[ AP[ N"[ dentsitate_N txiki_A N"] _ko AP] lurzoru_N urbanizagai_A ] N"] gisa_P PP] birsailka_VP] _tze N'] _ko AP] | PP5[ para_P NP[ la_Det reclasificación_N PP[ de_P NP[ unos_Det terrenos_N NP] PP] PP[ como_P NP[ Suelo_N AP[ Apto_A PP[ para_P NP[ Urbanizar_N NP] PP] AP] PP[ de_P NP[ Baja_A Densidad_N NP] PP] NP] PP] NP] PP] |
Example 8. Asymmetric recursion in (5) |
|
A combination of techniques (structural and POS tagging, heuristics, cognate matching, dictionary lookup, statistics, rules, etc. see Martínez et al. 98) have been used to process the bitext. This has resulted in the segmentation and alignment of:
<rs type=organization>Euskal Herriko Justizia Auzitegi Nagusiko Administraziozko Liskarrauzietarako Salan</rs> | <rs type=organization>Sala de lo Contencioso-Administrativo del Tribunal Superior de Justicia del País Vasco</rs> |
<rs type=law>Zallako Udalerriko Planeamenduari buruzko Sorospidezko Arauen aldarazpena</rs> | <rs type=law>Modificación de las Normas Subsidiarias de Planeamiento del municipio de Zalla </rs> |
<term>Lurzoru batzuk dentsitate txikiko lurzoru urbanizagai gisa birsailkatzeko | <term>para la reclasificación de unos terrenos como Suelo Apto para Urbanizar de Baja Densidad |
<seg type=9>Erabaki honen aurka, haren jakinarazpenetik zenbatu beharreko hilabete biko epearen barruan, administraziozko liskarrauzi-errekurtsoa jarri ahal izango da, Euskal Herriko Justizia Auzitegi Nagusiko Administraziozko Liskarrauzietarako Salan, komeniesten diren beste defentsabideak erabil daitezkeelako kalterik gabe. Adierazi den epearen barruan, BHI-015/94-P05-A espedientea Bilbaoko Gran Vía, 19-21eko 5gn. solairuan egongo da ageriko, azter dadin. Bilbon, 1994.eko urriaren 24an.-Hirigintzako foru diputatua. Pedro Hernández González. </seg> |
<seg type=9>Contra dicha Orden Foral podrá interponerse, en el plazo de dos meses desde su notificación, recurso contencioso-administrativo ante la Sala de lo Contencioso-Administrativo del Tribunal Superior de Justicia del País Vasco, sin perjuicio de la utilización de otros medios de defensa que estime conveniente. Durante el referido plazo el expediente BHI-015/94-P05-A, quedará de manifiesto para su examen en las dependencias situadas en Bilbao calle Alameda Rekalde, 30, 5.a y 6.a plantas. Bilbao, 24 de octubre de 1994.-El Diputado Foral de Urbanismo.- Pedro Hernández González. </seg> |
Example 9. Segmented translation units
An aligned bitext encoded by means of rich descriptive markup becomes a large language resource in itself (we follow TEI P3 guidelines for SGML markup). But the following additional resources can also be derived, as a by-product, from the annotated and aligned bitext:
<div>... <seg type=9 id=9EU2 corresp=9ES2> <p id=pEU11> <s id=sEU11 corresp=ES11> <rs type=law id=LEU10 corresp=LES12>Foru agindu </rs> horrek amaiera eman dio administrazio bideari; eta beraren aurka <rs type=organization id=OEU10> Administrazioarekiko </rs> auzibide-errekurtsoa jarri ahal izango zaio <rs type=organization id=OEU11 corresp=OES9> Euskal Herriko Justizi Auzitegi Nagusiko Administrazioarekiko Auzibideetarako Salari </rs>, bi hilabeteko epean; jakinarazpen hau egiten den egunaren biharamunetik zenbatuko da epe hori; hala eta guztiz ere, egokiesten diren beste defentsabideak ere erabil litezke. </s> </p> </seg> <seg type=10 id=10EU1 corresp=10ES1> <p id=pEU12> <s id=sEU12 corresp=ES12> Epe hori amaitu arte BHI-<num num=10094> 100/94 </num>-P05-A espedientea agerian egongo da, nahi duenak azter dezan, <rs type=place id=PEU2 corresp=PES3> Bilboko Errekalde zumarkaleko </rs> <num num=30> 30.eko </num> bulegoetan, <num num=5> 5 </num> eta <num num=6> 6.</num> solairuetan.</s> </p> </seg> </div> <closer id=pEU13> <docAuthor> <s id=sEU13 corresp=ES13> <rs type=title id=TLEU4 corresp=TLES4> Hirigintzako foru diputatua </rs>. </s> <s id=sEU14 corresp=ES14> _ <rs type=name id=NEU4 corresp=NES4> Pedro Hernández González </rs>.</s> </docAuthor> | <div> ... <seg type=9 id=9ES2 corresp=9EU2> <p id=pES11> <s id=sES11 corresp=EU11> Contra dicha <rs type=law id=LES12 corresp=LEU10> Orden Foral </rs>, que agota la vía administrativa podrá interponerse recurso contencioso-administrativo ante la <rs type=organization id=OES9 corresp=OEU11> Sala de lo Contencioso-Administrativo del Tribunal Superior de Justicia del País Vasco </rs>, en el plazo de dos meses, contado desde el día siguiente a esta notificación sin perjuicio de la utilización de otros medios de defensa que estime oportunos.</s> </p> </seg> <seg type=10 id=10ES1 corresp=10EU1> <p id=pES12> <s id=sES12 corresp=EU12> Durante el referido plazo el expediente BHI-<num num=10094> 100/94 </num>- P05-A quedará de manifiesto para su exámen en las dependencias de <rs type=place id=PES3 corresp=PEU2> Bilbao calle Alameda Rekalde </rs>, <num num=30> 30 </num>, <num num=5> 5.a </num> y <num num=6> 6.a </num> plantas. </s> </p> </seg> </div> <closer=pES13> <docAuthor> <s id=sES13 corresp=EU13> El <rs type=title id=TLES4 corresp=TLEU4> Diputado Foral de Urbanismo </rs>. </s> <s id=sES14 corresp=EU14> - <rs type=name id=NES4 corresp=NEU4> Pedro Hernández González </rs> </s> </docAuthor> </closer> |
Example 10. Aligned annotated bitex sample |
1. Things are more complicated than represented here. Itziar Laka has a good introductory Basque grammar that can be consulted on the Internet.
Both Basque cases and postpositions have plural as well as singular forms (adarra[abs sg]/adarrak[abs pl]; adarrak[erg sg]/adarrek[erg pl]; adarrari[dat sg]/adarrei[dat pl], etc.).
It must be pointed out also that a large number of morphophonological rules apply when cases, postpositions and suffixes get agglutinated. These rules will depend on such things as whether the nominal stem ends in a vowel or a consonant, whether two vowels are together, and so on: adar-a > adarra, aldapa-a > aldapa, punta-ei > puntei, etc.
In any case, these rules will be interpreted differently depending on the analyst's affiliation. Linguists will be more concerned about etymological and morphophonologycal issues. Programmers will try to cope with data in the most implementable way, whatever the etymological reality of their rules may be. We aggree with the former, but write our rules largely as the latter. Two further explanations are due here:
2. Our NP rules as explained here might be somehow contentious. The agglutination of determiner and case suffixes makes them become compacted and appear more like inflections than isolated suffixes (in a sense, Basque is both agglutinative and inflectional, as is seen also in finite verbs). It is therefore difficult to truely separate grammatical suffixes clearly, although it is very convenient for the sake of writing phrase structure grammar rules. We will mantain our NP rules as:
sagarra[det sg]-0[case abs] | > sagarra[sg abs] |
sagarrak[det pl]-0[case abs] | > sagarrak[pl abs] |
sagarra[det sg]-k[case erg] | > sagarrak[sg erg] |
sagarrak[det pl]-k[case erg] | > sagarrek[pl erg] |
sagarra[det sg]-i[case dat] | > sagarrari[sg dat] |
sagarrak[det pl]-i[case dat] | > sagarrei[pl dat] |
3. Laka explains the difference between the ko and ren suffixes in the following manner: "the morpheme ko can indicate location, and this is why it is sometimes referred to as a 'locative genitive', but location is not the only relation it can convey. However, one general guideline that is helpful in distinguishing the use of ko and ren phrases involves location: ko is attached to phrases that denote location, or phrases that denote a property. All other relations a phrase may bear with respect to a Noun are dealt with by means of the morpheme ren. " (See more examples in Laka's section on Complements of [Basque] Nouns).
One major feature of the ko suffix, related to the locative denotation, is that it is used to derive adjectival phrases from postpositional phrases. The latter are not allowed inside noun phrases (unless they get a ko attached). This fact permits the generalization expressed by the rule AP -> PP -ko.
The ko suffix has however a wider derivational force. There are also cases involving participal clauses, which are a much used alternative to relative clauses:
Ean erosi genituen hiru sagar gorriak
Ea-in buy have three
apple red-det
The three red apples that we bought in Ea
Example
11
Ean erositako hiru sagar gorriak
Ea-in buy-ta-ko three apple
red-det
The three red apples bought in Ea
Example 12
The -ta ending is used to give participles a resultative meaning.
But on top of PP and participles, ko can also be used to derive predicative phrases out of nominal constructions:
Sapore oneko sagarra
Taste good-ko apple-det
An apple with
good taste
Example 13
There has been some discussion concerning the nature of the underlying phrase that is attached to the ko suffix. However things may be, for our brief introuduction we will asume that, at least for phrases involving location, there does exist an underlying PP:
As is the case of example (2) above: aldapa-a-n-ko > aldapeko
Although Basque morphology is complex to deal with, it has been successfully processed by means of a two-level morphological analyzer. See the references to the sofware developed by the IXA Group at the Euskal Herriko Unibertsitatea.
[up]
[up]