Letras

Joseba Abaitua: Asignaturas de licenciatura

Despacho: 323-D  Dirección electrónica: abaitua@fil.deusto.es

Horas de tutoría

 Lu 12.00, Mi 11.00, Ju 16.00
  • HLT: English Language and New Technologies
English Language and New Technologies

Objectives Chronology Methodology Readings Evaluation

Objectives

  1. Become acquaintance with new technologies applied to natural languages in general and to English in particular.
  2. Learn the contribution of linguistics to the development of information technologies.
  3. Focus on practical applications of theoretical linguistics and of computational linguistics in the framework of the new information society. We will consider applications such asinformation extraction, natural language interfaces, multilingual electronic publishing, machine translation, digital document localization, and the multilingual Internet.
  4. Review the techniques of computational linguistics and experiment with 2 of these techniques: automatic corpus tagging and feature-based grammars.
  5. Study the straithforward connection between formal syntax and computational parsers. Rules and principles of formal linguistics will be implemented in a grammar developing computer environment (PATR-II). At least the 10 most relevant grammatical constructions of English, Spanish and Basque will be implemented.
  6. Create a computational lexicon of at least 50 representative lexical entries in each of the languages under consideration.
  7. Construct a semantic taxonomy consisting of semantic features of at least 3 levels of abstraction and contrats its efficiency in the disambiguation of prepositional phrases.
  8. Experiment with some translation software and corpus processors.

Contents

  1. Human Language Technologies and the information society
  2. Information technologies for the new society
  3. Information overload
  4. Information managment, retrieval and extraction
  5. Translation technology
  6. Text collections and corpora
  7. Automatic text processing and tagging
  8. Linguistic annotations
  9. Morphosyntactic tagging
  10. Formal grammars and shallow parsers
  11. Feature-based grammars
  12. PART-II (phrase structure rules)
  13. PART-II (lexical entries)
  14. Parsing and feature representatiosn
  15. Applications: Information managment, retrieval and extraction
  16. Applications: Machine translation
  17. Applications: The multilingual Internet
  18. Applications: Electronic publishing

Weeks (Course 2000/2001):

  1. Feb 19-23.Read or browse and discuss:
  2. Feb 26-2
  3. Mar 5-9.
  4. Mar 12-16.
  5. Mar 19-23.
  6. Mar 26-30 Review of grammar in "samples\eng-ela" folder.
    • Feature template definitions
    • Category defaults
    • Grammar rules
  7. Apr 2-6 Adding lexical entries and grammar rules to eng-ela.
  8. Apr 23-27. Introduction to Prolog. Downloading the Prolog interpreter from Deusto's intranet. Installation. Introduction to difference lists (diffl.dec or diffl.htm), and definite clause grammars (dcg.dec or dcg.htm).
  9. Apr 30-4. Generation of phrases from DCGs. The complexity of natural language. Try the permutation(L,PedL) predicate. The number of possibilities are n!, and in the example below with seven words it will be 7! (7*6*5*4*3*2*1 = 5,040). But if words can repeat, then there will be 7^7solutions (823,543). Try dcg0.dec or dcg0.htm. DCGs serve to constrain this generative explosion. Try dcg1.dec or dcg1.htm; the possibilities are cut down to just four (three of which should be ruled out by a more restricted grammar; try dcg2.dec or dcg2.htm). If we enrich the grammar a bit more, then we get again four correct solutions, try dcg3.dec or dcg3.htm.
  10. May 7-11. Extending the grammar to one of Shakespeare's sonnets. Use the file dict.dec or dict.htm to reuse and define the vocabulary. Create new DCG files.
  11. May 14-18. Translating basic English and Spanish constructions. Try dcg6.dec or dcg6.htm (dcg5.dec or dcg5.htm show an earlier stage of the program).
  12. May 21-25. Translating into logic forms. A natural language interface to royal.dec (this was an earlier version i_royal.dec).

You can install our Prolog interpreter from this file Prolog0.zip. You can alsto try this demo of a simple Spanish-Catalan machine translation system, Demollull.zip

Methodology

We will learn how to use electronic documentation as referential material for the course. Every week a questionaire will be set and the students will work on the documentation to answer the questionaire. This will be combined with practical exercises and the utilization of dedicated software. A grammar development computer environment will be used to implement formal grammars of English, Basque and Spanish. References and documentation will be provided in the form of hypertext.

Evaluation

There will be a continuous assessment of class exercises. In addition to regular attendance and participation, students will prepare an individual project, from a list suggested at the beginning of the course. A written examination will be set when regular evaluation has not been accomplished. Grading in this course will depend on class attendance and participation (30%), group projects (30%), and a final individual project or exam (40%).

Assignments
  1. By March, 23th: Notes taken from on-line references (optional and individual)
  2. By April, 27th: Evaluation of PATR exercises (groups)
  3. By May, 31st: Prolog exercises (groups)

Report A: It will consist of a collection of text fragments or quotatations taken (copied and pasted) from the on-line documentation. You must organize these fragments in accordance with the course syllabus. The report must include a hand written Abstract (no more than 100 words), an Introduction (around 500 words) and a Conclusion (around 200 words), with your own words (that is, you cannot copy and paste this part!). The size of the report will be of 5,000 to 10,000 words. Quotations from on-line documents need not be hand-written (can be printed), but you should not forget to provide the full reference (that is, the date, the title and name of the autor(s) of the paper from which you copied the quotation). In sum, reports will consist of i) Abstract, ii) Content list (index), iii) Introduction, iv) as many exposition sections as you wish, and v) a Conclusion. The Introduction and the Conclusion must reflect your personal view and style.

Reports B and C: These will illustrate experiences with class exercises, and can be developed in groups of three people at most.

References

On-line
On paper
  • Arnold, Douglas, L. Balkan, Lee Humphreys, and S. Meijr, eds. 1994. Machine translation: introductory guide. Blackwell Publishers.
  • Gazdar, Gerald & Chris Mellish. 1989. Natural Language Processing In Prolog. An Introduction to Computational Linguistics. Addison-Wesley Publishing Company.
  • Hutchins, W. J. and Harold Somers. 1992. An introduction to machine translation. Academic Press.
  • Jones, Daniel. 1996. Analogical Natural Language Processing. 1996. University College London Press. .
  • McEnery, Tony. 1992. Computational Linguistics: A handbook and toolbox for natural language processing. Sigma Press.
  • McEnery, Tony and Andrew Wilson. 1996. Corpus Linguistics. Edinburgh University Press .
  • Melby, Alan K. and C.T. Warner. 1995. The Possibility of Language. A discussion of the nature of language, with the implications for human and machine translation. John Benjamins. Amsterdam.
Inicio
© Universidad de Deusto 2000
Última modificación: abril de 2000