English Language and New Technologies
- Become acquaintance with new technologies applied to natural
languages in general and to English in particular.
- Learn the contribution of linguistics to the development of
- Focus on practical applications of theoretical linguistics
and of computational linguistics in the framework of the new
information society. We
will consider applications such asinformation extraction,
natural language interfaces, multilingual electronic publishing,
machine translation, digital document localization, and the
- Review the techniques of computational linguistics and
experiment with 2 of these techniques: automatic corpus tagging
and feature-based grammars.
- Study the straithforward connection between formal syntax and
computational parsers. Rules and principles of formal
linguistics will be implemented in a grammar developing computer
environment (PATR-II). At least the 10 most relevant grammatical
constructions of English, Spanish and Basque will be
- Create a computational lexicon of at least 50 representative
lexical entries in each of the languages under consideration.
- Construct a semantic taxonomy consisting of semantic features
of at least 3 levels of abstraction and contrats its efficiency
in the disambiguation of prepositional phrases.
- Experiment with some translation software and corpus
- Human Language Technologies and the
- Information technologies for the
- Information overload
- Information managment,
retrieval and extraction
- Translation technology
- Text collections and corpora
- Automatic text processing and
- Linguistic annotations
- Morphosyntactic tagging
- Formal grammars and shallow
- Feature-based grammars
- PART-II (phrase structure
- PART-II (lexical entries)
- Parsing and feature
- Applications: Information
managment, retrieval and extraction
- Applications: Machine translation
- Applications: The multilingual
- Applications: Electronic publishing
Weeks (Course 2000/2001):
- Feb 19-23.Read or browse and discuss:
- Feb 26-2
- Mar 5-9.
- Mar 12-16.
- Mar 19-23.
- Mar 26-30 Review of grammar in "samples\eng-ela"
- Feature template definitions
- Category defaults
- Grammar rules
- Apr 2-6 Adding lexical entries and grammar rules to
- Apr 23-27. Introduction to Prolog. Downloading the
Prolog interpreter from
Deusto's intranet. Installation. Introduction to difference
and definite clause grammars (
- Apr 30-4. Generation of phrases from DCGs. The
complexity of natural
predicate. The number of possibilities are n!, and in the
example below with seven words it will be 7! (7*6*5*4*3*2*1 =
5,040). But if words can repeat, then there will be 7^7solutions
DCGs serve to constrain this generative explosion. Try
possibilities are cut down to just four (three of which should
be ruled out by a more restricted grammar; try
we enrich the grammar a bit more, then we get again four correct
- May 7-11. Extending the grammar to one of
Shakespeare's sonnets. Use the file
reuse and define the vocabulary. Create new DCG files.
- May 14-18. Translating basic English and Spanish
show an earlier stage of the program).
- May 21-25. Translating into logic forms.
A natural language interface
royal.dec (this was an earlier version
You can install our Prolog interpreter from this file
Prolog0.zip. You can
alsto try this demo of a simple Spanish-Catalan machine
translation system, Demollull.zip
We will learn how to use electronic documentation as referential
material for the course. Every week a questionaire will be set and
the students will work on the documentation to answer the
questionaire. This will be combined with practical exercises and
the utilization of dedicated software. A grammar development
computer environment will be used to implement formal grammars of
English, Basque and Spanish. References and documentation will be
provided in the form of hypertext.
There will be a continuous assessment of class exercises. In
addition to regular attendance and participation, students will
prepare an individual project, from a list suggested at the
beginning of the course. A written examination will be set when
regular evaluation has not been accomplished. Grading in this
course will depend on class attendance and participation (30%),
group projects (30%), and a final individual project or exam
- By March, 23th: Notes taken from on-line references
(optional and individual)
- By April, 27th: Evaluation of PATR exercises (groups)
- By May, 31st: Prolog exercises (groups)
Report A: It will consist of a collection of text fragments or
quotatations taken (copied and pasted) from the on-line
documentation. You must organize these fragments in accordance
with the course syllabus. The report must include a hand written
Abstract (no more than 100 words), an Introduction (around 500
words) and a Conclusion (around 200 words), with your own words
(that is, you cannot copy and paste this part!). The size of the
report will be of 5,000 to 10,000 words. Quotations from on-line
documents need not be hand-written (can be printed), but you
should not forget to provide the full reference (that is, the
date, the title and name of the autor(s) of the paper from which
you copied the quotation). In sum, reports will consist of i)
Abstract, ii) Content list (index), iii) Introduction, iv) as many
exposition sections as you wish, and v) a Conclusion. The
Introduction and the Conclusion must reflect your personal view
Reports B and C: These will illustrate experiences with class
exercises, and can be developed in groups of three people at most.
of the State of the Art in Human Language Technology.
This "on-line" book, available through Internet,
surveys the state of the art of human language technology. The
book consists of thirteen chapters written by 97 different
authors. Editorial Board: Ronald A. Cole, Editor in Chief;
Joseph Mariani; Hans Uszkoreit; Annie Zaenen; Victor Zue.
Contents: Spoken Language Input, Written Language Input,
Language Analysis and Understanding, Language Generation, Spoken
Output Technologies, Discourse and Dialogue, Document
Processing, Multilinguality, Multimodality, Transmission and
Storage, Mathematical Methods, Language Resources, Evaluation.
- Our main reference in corpus linguistics will be Tony McEnery
and Andrew Wilson's book on Corpus
Linguistics, but there will be other material available (Downloads
- European Commission, DG XIII.
Engineering sector of the Telematics Applications
Sites, (Copy, Sept. 1998)
- European Commission, DG XIII.
Society, (Copy, Sept.
Lingüística Computacional, by Juan Carlos Ruiz
Antón, Universitat Jaume I.
- Multilingual Information Management:
Current Levels and Future Abilities, by Eduard Hovy, USC
Information Sciences Institute (co-chair) Nancy Ide, Vassar
College (co-chair) Robert Frederking, Carnegie Mellon University
Joseph Mariani, LIMSI-CNRS Antonio Zampolli, University of Pisa
retrieval & natural language processing, by Felisa
Verdejo, Julio Gonzalo and Anselmo Peñas at the UNED
Natural Language Processing Group under the auspices of the
ACO*HUM European network, and partially funded by ELSNET.
Language Research Group (UPC)
- Arnold, Douglas, L. Balkan, Lee Humphreys, and S. Meijr, eds.
1994. Machine translation: introductory guide. Blackwell
- Gazdar, Gerald & Chris Mellish. 1989. Natural
Language Processing In Prolog. An Introduction to Computational
Linguistics. Addison-Wesley Publishing Company.
- Hutchins, W. J. and Harold Somers. 1992. An introduction
to machine translation. Academic Press.
- Jones, Daniel. 1996. Analogical Natural Language
Processing. 1996. University College London Press. .
- McEnery, Tony. 1992. Computational Linguistics: A
handbook and toolbox for natural language processing. Sigma
- McEnery, Tony and Andrew Wilson. 1996. Corpus Linguistics.
Edinburgh University Press .
- Melby, Alan K. and C.T. Warner. 1995. The Possibility of
Language. A discussion of the nature of language, with the
implications for human and machine translation. John
© Universidad de Deusto 2000
Última modificación: abril de 2000