MGConstraint Grammar for Basque | |
IXA Group |
In the way of computational treatment of a language, syntactic analysis is the task that necessarily follows the morphological one. Therefor, our group has already undertaken the syntactic challenge and has chosen for Basque the Constraint Grammar (CG) formalism as a tool for general syntactic analysis.
The aim of CG is not, as it often happens, to create a toy grammar to play with laboratory sentences. CG is thought to analyse real text, that is, to be used with row text. That philosophy agrees thoroughly with our aims, since the parser we want for Basque must be the basis for other applications.
On the other hand, CG supports a parsing based on morphology. It goes without saying that this fits perfectly to a language like Basque where morphology and syntax are so related. A very important part of the analysis through CG is morphological disambiguation, namely the treatment of ambiguous output from morphological analysis using constraints based on linguistic knowledge.
We considered that CG formalism was appropriate to be used in general syntactic treatment. Far from the stiffness of other formalisms and, in spite of the problems, we think it is fit for languages with quite a free word order.
Up to now and with the Basque Lemmatiser (EUSLEM) in mind, we have mainly focused on solving categorial ambiguity.
In the example 1 we can see the analysis of the sentence . Gero hegoak moztu eta poxpolu kaxa batean gartzelaratuko zizkizun. As we can see, the analyser gives more than one analysis for each word.
"<$.>" PUNT_PUNT "<Gero>" "gero" ADB ADO HAS_MAI @ADLG "gero" IZE ARR DEK ABS MG @OBJ @SUBJ HAS_MAI "gero" IZE ARR ZERO HAS_MAI @KM> "<,>" PUNT_KOMA "<hegoak>" "hego" IZE ARR DEK ABS NUMP MUGM @OBJ @SUBJ "hego" IZE ARR DEK ERG NUMS MUGM @SUBJ "<moztu>" "motz" ADI SIN ASP PART DEK ABS MG @OBJ @SUBJ "motz" ADI SIN ASP PART ZERO NOTDEK @-JADNAG "<eta>" "eta" LOT JNT @PJ @SJ AORG "eta" LOT MEN KAUS @MP AORG "<poxpolu>" "poxpolu" IZE ARR DEK ABS MG @OBJ @SUBJ "poxpolu" IZE ARR ZERO @KM> "<kaxa>" "kaxa" IZE ARR DEK ABS MG @OBJ @SUBJ AORG "kaxa" IZE ARR DEK ABS NUMS MUGM @OBJ @SUBJ AORG "kaxa" IZE ARR ZERO AORG @KM> "<batean>" "bat" DET DZH DEK NUMS MUGM DEK INE @ADLG "bat" IZE ARR DEK NUMS MUGM DEK INE @ADLG "bate" IZE ARR DEK NUMS MUGM DEK INE @ADLG "<gartzelaratuko>" "gartzelara" ADI SIN ASP PART ASP ETOR NOTDEK AORG "gartzelara" ADI SIN ASP PART DEK NUMS MUGM DEK GEL @IZLG> @<IZLG @ADLG DEK ABS MG @OBJ @SUBJ AORG "gartzelara" ADI SIN ASP PART DEK NUMS MUGM DEK GEL @IZLG> @<IZLG @ADLG AORG @-JADNAG "<zizkizun>" "*edun" ADL B1 NR_HK NI_ZU NK_HU LOT MEN @+JADNAG_MP @+JADLAG_MP "*edun" ADL B1 NR_HK NI_ZU NK_HU LOT MEN ERLT @+JADNAG_IZLG> @+JADLAG_IZLG> "*edun" ADL B1 NR_HK NI_ZU NK_HU @+JADLAG "<$.>" PUNT_PUNT |
After applying some CG rules (395, 223, 16, 392, 30, 164, 208), we try to leave just a single analysis for each word.
"<$.>" PUNT_PUNT "<Gero>" D:395 "gero" ADB ADO HAS_MAI @ADLG "<,>" PUNT_KOMA "<hegoak>" D:223 "hego" IZE ARR DEK ABS NUMP MUGM @OBJ @SUBJ "<moztu>" D:16 "motz" ADI SIN ASP PART ZERO NOTDEK @-JADNAG "<eta>" D:392 "eta" LOT JNT @PJ @SJ AORG "<poxpolu>" "poxpolu" IZE ARR DEK ABS MG @OBJ @SUBJ "poxpolu" IZE ARR ZERO @KM "<kaxa>" D:30 "kaxa" IZE ARR ZERO AORG @KM "<batean>" D:164 "bat" DET DZH DEK NUMS MUGM DEK INE @ADLG "<gartzelaratuko>" D:187 "gartzelara" ADI SIN ASP PART ASP ETOR NOTDEK AORG @-JADNAG "<zizkizun>" D:208 "*edun" ADL B1 NR_HK NI_ZU NK_HU LOT MEN @+JADNAG_MP @+JADLAG_MP "*edun" ADL B1 NR_HK NI_ZU NK_HU @+JADLAG "<$.>" PUNT_PUNT |
For example, the rule 187 (@w =! ETOR (0 C PART) (NOT 1 DET)) indicates the following:
° @w =! ETOR Take (=!) the feature ETOR(kizun), 'future' ° (0 C PART) if all the interpretations of this word are PARTiciples ° (NOT 1 DET) and it has no DETerminant a step to the left