MG

Constraint Grammar for Basque

IXA Group

CONSTRAINT GRAMMAR FOR BASQUE

In the way of computational treatment of a language, syntactic analysis is the task that necessarily follows the morphological one. Therefor, our group has already undertaken the syntactic challenge and has chosen for Basque the Constraint Grammar (CG) formalism as a tool for general syntactic analysis.

The aim of CG is not, as it often happens, to create a toy grammar to play with laboratory sentences. CG is thought to analyse real text, that is, to be used with row text. That philosophy agrees thoroughly with our aims, since the parser we want for Basque must be the basis for other applications.

On the other hand, CG supports a parsing based on morphology. It goes without saying that this fits perfectly to a language like Basque where morphology and syntax are so related. A very important part of the analysis through CG is morphological disambiguation, namely the treatment of ambiguous output from morphological analysis using constraints based on linguistic knowledge.

We considered that CG formalism was appropriate to be used in general syntactic treatment. Far from the stiffness of other formalisms and, in spite of the problems, we think it is fit for languages with quite a free word order.

Up to now and with the Basque Lemmatiser (EUSLEM) in mind, we have mainly focused on solving categorial ambiguity.

In the example 1 we can see the analysis of the sentence . Gero hegoak moztu eta poxpolu kaxa batean gartzelaratuko zizkizun. As we can see, the analyser gives more than one analysis for each word.

Example 1.
Analysis of the sentence . Gero hegoak moztu eta poxpolu kaxa batean gartzelaratuko zizkizun.

  "<$.>"
  PUNT_PUNT
"<Gero>"
  "gero"  ADB ADO  HAS_MAI @ADLG
  "gero"  IZE ARR  DEK ABS MG @OBJ @SUBJ  HAS_MAI
  "gero"  IZE ARR  ZERO HAS_MAI @KM>
"<,>"
  PUNT_KOMA
"<hegoak>"
  "hego"  IZE ARR DEK ABS NUMP MUGM @OBJ @SUBJ
  "hego"  IZE ARR DEK ERG NUMS MUGM @SUBJ
"<moztu>"
  "motz"  ADI SIN ASP PART DEK ABS MG @OBJ @SUBJ
  "motz"  ADI SIN ASP PART  ZERO NOTDEK @-JADNAG
"<eta>"
  "eta"  LOT JNT @PJ @SJ AORG
  "eta"  LOT MEN KAUS @MP AORG
"<poxpolu>"
  "poxpolu"  IZE ARR DEK ABS MG @OBJ @SUBJ
  "poxpolu"  IZE ARR  ZERO @KM>
"<kaxa>"
  "kaxa"  IZE ARR DEK ABS MG @OBJ @SUBJ  AORG 
  "kaxa"  IZE ARR DEK ABS NUMS MUGM @OBJ @SUBJ  AORG 
  "kaxa"  IZE ARR  ZERO AORG @KM>
"<batean>"
  "bat"  DET DZH DEK NUMS MUGM DEK INE @ADLG 
  "bat"  IZE ARR DEK NUMS MUGM DEK INE @ADLG 
  "bate"  IZE ARR DEK NUMS MUGM DEK INE @ADLG 
"<gartzelaratuko>"
  "gartzelara"  ADI SIN ASP PART ASP ETOR  NOTDEK AORG
  "gartzelara"  ADI SIN ASP PART DEK NUMS MUGM DEK GEL @IZLG> @<IZLG @ADLG DEK ABS MG @OBJ @SUBJ  AORG 
  "gartzelara"  ADI SIN ASP PART DEK NUMS MUGM DEK GEL @IZLG> @<IZLG @ADLG  AORG @-JADNAG
"<zizkizun>"
  "*edun"  ADL B1 NR_HK NI_ZU NK_HU LOT MEN @+JADNAG_MP @+JADLAG_MP  
  "*edun"  ADL B1 NR_HK NI_ZU NK_HU LOT MEN ERLT @+JADNAG_IZLG> @+JADLAG_IZLG>  
  "*edun"  ADL B1 NR_HK NI_ZU NK_HU @+JADLAG
"<$.>"
  PUNT_PUNT

After applying some CG rules (395, 223, 16, 392, 30, 164, 208), we try to leave just a single analysis for each word.

Example 2.
Disambiguated analysis of the sentence in Example 1.

"<$.>"
  PUNT_PUNT
"<Gero>" D:395
  "gero"  ADB ADO  HAS_MAI @ADLG
"<,>"
  PUNT_KOMA
"<hegoak>" D:223
  "hego"  IZE ARR DEK ABS NUMP MUGM @OBJ @SUBJ 
"<moztu>" D:16
  "motz"  ADI SIN ASP PART  ZERO NOTDEK @-JADNAG
"<eta>" D:392
  "eta"  LOT JNT @PJ @SJ AORG
"<poxpolu>"
  "poxpolu"  IZE ARR DEK ABS MG @OBJ @SUBJ  
  "poxpolu"  IZE ARR  ZERO @KM
"<kaxa>" D:30
  "kaxa"  IZE ARR  ZERO AORG @KM
"<batean>" D:164
  "bat"  DET DZH DEK NUMS MUGM DEK INE @ADLG 
"<gartzelaratuko>" D:187
  "gartzelara"  ADI SIN ASP PART ASP ETOR  NOTDEK AORG @-JADNAG
"<zizkizun>" D:208
  "*edun"  ADL B1 NR_HK NI_ZU NK_HU LOT MEN @+JADNAG_MP @+JADLAG_MP  
  "*edun"  ADL B1 NR_HK NI_ZU NK_HU @+JADLAG
"<$.>"
  PUNT_PUNT

For example, the rule 187 (@w =! ETOR (0 C PART) (NOT 1 DET)) indicates the following:

	°  @w =! ETOR
	   Take (=!) the feature ETOR(kizun), 'future'
	°  (0 C PART)
	   if all the interpretations of this word are PARTiciples
	°  (NOT 1 DET)
	   and it has no DETerminant a step to the left