English Language and New Technologies
Objectives
Chronology Methodology
Readings Evaluation
Objectives
- Become acquaintance with new technologies applied to natural
languages in general and to English in particular.
- Learn the contribution of linguistics to the development of
information technologies.
- Focus on practical applications of theoretical linguistics
and of computational linguistics in the framework of the new
information society. We
will consider applications such asinformation extraction,
natural language interfaces, multilingual electronic publishing,
machine translation, digital document localization, and the
multilingual Internet.
- Review the techniques of computational linguistics and
experiment with 2 of these techniques: automatic corpus tagging
and feature-based grammars.
- Study the straithforward connection between formal syntax and
computational parsers. Rules and principles of formal
linguistics will be implemented in a grammar developing computer
environment (PATR-II). At least the 10 most relevant grammatical
constructions of English, Spanish and Basque will be
implemented.
- Create a computational lexicon of at least 50 representative
lexical entries in each of the languages under consideration.
- Construct a semantic taxonomy consisting of semantic features
of at least 3 levels of abstraction and contrats its efficiency
in the disambiguation of prepositional phrases.
- Experiment with some translation software and corpus
processors.
Contents
- Human Language Technologies and the
information society
- Information technologies for the
new society
- Information overload
- Information managment,
retrieval and extraction
- Translation technology
- Text collections and corpora
- Automatic text processing and
tagging
- Linguistic annotations
- Morphosyntactic tagging
- Formal grammars and shallow
parsers
- Feature-based grammars
- PART-II (phrase structure
rules)
- PART-II (lexical entries)
- Parsing and feature
representatiosn
- Applications: Information
managment, retrieval and extraction
- Applications: Machine translation
- Applications: The multilingual
Internet
- Applications: Electronic publishing
Weeks (Course 2000/2001):
- Feb 19-23.Read or browse and discuss:
- Feb 26-2
- Mar 5-9.
- Mar 12-16.
- Mar 19-23.
- Mar 26-30 Review of grammar in "samples\eng-ela"
folder.
- Feature template definitions
- Category defaults
- Grammar rules
- Apr 2-6 Adding lexical entries and grammar rules to
eng-ela.
- Apr 23-27. Introduction to Prolog. Downloading the
Prolog interpreter from
Deusto's intranet. Installation. Introduction to difference
lists (
diffl.dec
or diffl.htm ),
and definite clause grammars (dcg.dec
or dcg.htm ).
- Apr 30-4. Generation of phrases from DCGs. The
complexity of natural
language. Try
the
permutation(L,PedL)
predicate. The number of possibilities are n!, and in the
example below with seven words it will be 7! (7*6*5*4*3*2*1 =
5,040). But if words can repeat, then there will be 7^7solutions
(823,543). Try dcg0.dec
or dcg0.htm .
DCGs serve to constrain this generative explosion. Try
dcg1.dec or
dcg1.htm ; the
possibilities are cut down to just four (three of which should
be ruled out by a more restricted grammar; try
dcg2.dec or
dcg2.htm ). If
we enrich the grammar a bit more, then we get again four correct
solutions, try dcg3.dec
or dcg3.htm .
- May 7-11. Extending the grammar to one of
Shakespeare's sonnets. Use the file
dict.dec or
dict.htm to
reuse and define the vocabulary. Create new DCG files.
- May 14-18. Translating basic English and Spanish
constructions. Try
dcg6.dec
or dcg6.htm (dcg5.dec
or dcg5.htm
show an earlier stage of the program).
- May 21-25. Translating into logic forms.
A natural language interface
to
royal.dec (this was an earlier version
i_royal.dec ).
You can install our Prolog interpreter from this file
Prolog0.zip. You can
alsto try this demo of a simple Spanish-Catalan machine
translation system, Demollull.zip
Methodology
We will learn how to use electronic documentation as referential
material for the course. Every week a questionaire will be set and
the students will work on the documentation to answer the
questionaire. This will be combined with practical exercises and
the utilization of dedicated software. A grammar development
computer environment will be used to implement formal grammars of
English, Basque and Spanish. References and documentation will be
provided in the form of hypertext.
Evaluation
There will be a continuous assessment of class exercises. In
addition to regular attendance and participation, students will
prepare an individual project, from a list suggested at the
beginning of the course. A written examination will be set when
regular evaluation has not been accomplished. Grading in this
course will depend on class attendance and participation (30%),
group projects (30%), and a final individual project or exam
(40%).
Assignments
- By March, 23th: Notes taken from on-line references
(optional and individual)
- By April, 27th: Evaluation of PATR exercises (groups)
- By May, 31st: Prolog exercises (groups)
Report A: It will consist of a collection of text fragments or
quotatations taken (copied and pasted) from the on-line
documentation. You must organize these fragments in accordance
with the course syllabus. The report must include a hand written
Abstract (no more than 100 words), an Introduction (around 500
words) and a Conclusion (around 200 words), with your own words
(that is, you cannot copy and paste this part!). The size of the
report will be of 5,000 to 10,000 words. Quotations from on-line
documents need not be hand-written (can be printed), but you
should not forget to provide the full reference (that is, the
date, the title and name of the autor(s) of the paper from which
you copied the quotation). In sum, reports will consist of i)
Abstract, ii) Content list (index), iii) Introduction, iv) as many
exposition sections as you wish, and v) a Conclusion. The
Introduction and the Conclusion must reflect your personal view
and style.
Reports B and C: These will illustrate experiences with class
exercises, and can be developed in groups of three people at most.
References
On-line
- Survey
of the State of the Art in Human Language Technology.
This "on-line" book, available through Internet,
surveys the state of the art of human language technology. The
book consists of thirteen chapters written by 97 different
authors. Editorial Board: Ronald A. Cole, Editor in Chief;
Joseph Mariani; Hans Uszkoreit; Annie Zaenen; Victor Zue.
Contents: Spoken Language Input, Written Language Input,
Language Analysis and Understanding, Language Generation, Spoken
Output Technologies, Discourse and Dialogue, Document
Processing, Multilinguality, Multimodality, Transmission and
Storage, Mathematical Methods, Language Resources, Evaluation.
- Our main reference in corpus linguistics will be Tony McEnery
and Andrew Wilson's book on Corpus
Linguistics, but there will be other material available (Downloads
from Internet).
- European Commission, DG XIII.
Language
Engineering sector of the Telematics Applications
Programme, Related
Sites, (Copy, Sept. 1998)
- European Commission, DG XIII.
Information
Society, (Copy, Sept.
1998)
- La
Lingüística Computacional, by Juan Carlos Ruiz
Antón, Universitat Jaume I.
- Multilingual Information Management:
Current Levels and Future Abilities, by Eduard Hovy, USC
Information Sciences Institute (co-chair) Nancy Ide, Vassar
College (co-chair) Robert Frederking, Carnegie Mellon University
Joseph Mariani, LIMSI-CNRS Antonio Zampolli, University of Pisa
- Information
retrieval & natural language processing, by Felisa
Verdejo, Julio Gonzalo and Anselmo Peñas at the UNED
Natural Language Processing Group under the auspices of the
ACO*HUM European network, and partially funded by ELSNET.
- Natural
Language Research Group (UPC)
On paper
- Arnold, Douglas, L. Balkan, Lee Humphreys, and S. Meijr, eds.
1994. Machine translation: introductory guide. Blackwell
Publishers.
- Gazdar, Gerald & Chris Mellish. 1989. Natural
Language Processing In Prolog. An Introduction to Computational
Linguistics. Addison-Wesley Publishing Company.
- Hutchins, W. J. and Harold Somers. 1992. An introduction
to machine translation. Academic Press.
- Jones, Daniel. 1996. Analogical Natural Language
Processing. 1996. University College London Press. .
- McEnery, Tony. 1992. Computational Linguistics: A
handbook and toolbox for natural language processing. Sigma
Press.
- McEnery, Tony and Andrew Wilson. 1996. Corpus Linguistics.
Edinburgh University Press .
- Melby, Alan K. and C.T. Warner. 1995. The Possibility of
Language. A discussion of the nature of language, with the
implications for human and machine translation. John
Benjamins. Amsterdam.
Inicio
© Universidad de Deusto 2000
Última modificación: abril de 2000 |