English Language and New Technologies
Objectives
Chronology Methodology
Readings Evaluation
GradesObjectives
- Become acquaintance with new technologies applied to natural languages in
general and to English in particular.
- Learn the contribution of linguistics to the development of information
technologies.
- Apply information technologies to your own education curricula.
- Review the main techniques developed in computational linguistics.
- Focus on practical applications of theoretical and of computational
linguistics in the framework of the new information society. We will consider
applications such as information extraction, natural language interfaces,
multilingual electronic publishing, machine translation, digital document
localization, and the multilingual Internet.
- Study the connection between formal syntax and computational parsers.
- Review the structure of computational lexicons.
- Experiment with some translation software and corpus processors.
Contents
- Human Language Technologies and the information society (Presentation of
Action Line, by the EC: caché)
- Language Engineering (Brouchure by HLTCentral: caché)
- Information overload
- Information management, retrieval and extraction
- Translation technology (Paper)
- Text collections and corpora (Reference page)
- Automatic text processing and tagging (Paper)
- Grammars formalisms (Short description by Hans
Uszkoreit & Annie Zaenen)
- Applications: Information management, retrieval and extraction (PPT presentation by Julio Gonzalo)
- Applications: Machine translation
- Applications: The multilingual Internet
- Applications: Electronic publishing
Weeks
Weekly questionnaire
- Feb 12-14. First contact with
the browser and on-line materials.
- Feb 17-21. Language technologies
and the Information Society.
- Feb 24-28. "Having too much
information can be as dangerous as having too little"
- Mar 3-7. Language technologies
and resources.
- Mar 10-14. Multilinguality is a
barrier to communication. We will review translation technology and its
potential to help overcome that problem.
- Mar 17-21.Machine Translation
(MT) is a possible way of overcoming language barriers. However quality is
often a problem for MT systems.
- Mar 24-28. We are going to learn
more about MT: history, methods, approaches, best known systems, etc.
Linguistic Diversity on the Internet: Assessment of the Contribution of Machine
Translation. Multilingual resources
- Apr 31-4. Corpus Linguistics.
- Apr 7-11. Submission of
Report A. See the list of
submitted reports (Apr. 16).
- Apr 14. Evaluation of Machie
Translation systems
- Apr 28. Evaluation of Machie
Translation systems
- May 5-9. Evaluation of Machie
Translation systems
- May 12-16. Submission of
Report B.
Evaluation of search and content retrieval engines on internet (1)
- May 19-23. Evaluation of search
and content retrieval engines on internet (and 2)
- May 26-30. Submission of
Report C. Review.
Methodology
We will learn how to use electronic documentation as referential material
for the course. Every week a questionnaire will be set and the students will
work on the documentation to answer the questionnaire. This will be combined
with practical exercises and the utilization of dedicated software. References
and documentation will be provided mainly in the form of hypertext.
Evaluation
In addition to regular attendance and participation, students will submit
three reports (see below). A written examination will be set when regular
evaluation has not been accomplished. Grading in this course will depend on
class attendance and participation (10%), group projects (30%), and an
individual project or exam (60%).
Assignments
- By April, 11th: Report based on notes taken from on-line references
(optional and individual)
- By May, 12th: Evaluation of Machie Translation systems
- By May, 27th: Evaluation of search and content retrieval engines
Report A |
Grades May 22nd, 2003
This report could have a title similar to this "A review of Human
Language Technologies and their role in the Information Society"
It can contain as many text fragments or quotatations taken (copied
and pasted) from the on-line documentation as necessary. Although the author
must clarify and justify the relevance of those selected fragments with his own
text. The main line of argumentation must be original, not copied!
It is a very important that quotations are acknowledged, i.e. do not
forget to provide the source!.
- Name of the author, group or institution
- Date
- Publishers's name, or URL
In addition to the main body of the report, it must also include an
- Abstract (no more than 100 words)
- Introduction (around 500 words)
- Conclusion (around 200 words)
It is very important that these sections contain original text. Try
to convey your own personal view and style.
The size of the report will be of 3,000 to 7,000 words (i.e. 15-25 pages).
Have a look to these interesting recomendations:
How to
Write a Good Report, by Prof. G. Yadigaroglu, ETHZ (caché)
Student Handbook and Essay
Writing Guide, Brock University (caché)
A
letter of Professor Michael Stubbs to his students (Englische
Sprachwissenschaft Universität Trier ) (caché)
Check these documents:
Academic paper:
Is it
worth learning translation technology?
Technical report:
Linguistic
Diversity on the Internet: Assessment of the Contribution of Machine
Translation (caché)
|
|
Reports B and C: These will illustrate experiments with on-line
resources, and can be developed in groups of four people at most.
|
References
On-line
- Survey of the State of
the Art in Human Language Technology. This "on-line" book,
available through Internet, surveys the state of the art of human language
technology. The book consists of thirteen chapters written by 97 different
authors. Editorial Board: Ronald A. Cole, Editor in Chief; Joseph Mariani; Hans
Uszkoreit; Annie Zaenen; Victor Zue. Contents: Spoken Language Input, Written
Language Input, Language Analysis and Understanding, Language Generation,
Spoken Output Technologies, Discourse and Dialogue, Document Processing,
Multilinguality, Multimodality, Transmission and Storage, Mathematical Methods,
Language Resources, Evaluation.
- Our main reference in corpus linguistics will be Tony McEnery and Andrew
Wilson's book on Corpus Linguistics, but
there will be other material available (Downloads from
Internet).
- European Commission, DG XIII.
Information Society,
(caché)
- La Lingüística Computacional, by Juan Carlos Ruiz Antón,
Universitat Jaume I.
- Multilingual Information Management: Current Levels
and Future Abilities, by Eduard Hovy, USC Information Sciences Institute
(co-chair) Nancy Ide, Vassar College (co-chair) Robert Frederking, Carnegie
Mellon University Joseph Mariani, LIMSI-CNRS Antonio Zampolli, University of
Pisa
- Information retrieval
& natural language processing, by Felisa Verdejo, Julio Gonzalo and
Anselmo Peñas at the UNED Natural Language Processing Group under the
auspices of the ACO*HUM European network, and partially funded by ELSNET.
- Natural Language Research
Group (UPC)
- Walter J. Ong SJ and the Technology of
Writing
In print
- Arnold, Douglas, L. Balkan, Lee Humphreys, and S. Meijr, eds. 1994.
Machine translation: introductory guide. Blackwell Publishers.
- Gazdar, Gerald & Chris Mellish. 1989. Natural Language Processing In
Prolog. An Introduction to Computational Linguistics. Addison-Wesley
Publishing Company.
- Hutchins, W. J. and Harold Somers. 1992. An introduction to machine
translation. Academic Press.
- Jones, Daniel. 1996. Analogical Natural Language Processing. 1996.
University College London Press. .
- McEnery, Tony. 1992. Computational Linguistics: A handbook and toolbox
for natural language processing. Sigma Press.
- McEnery, Tony and Andrew Wilson. 1996. Corpus Linguistics. Edinburgh
University Press .
- Melby, Alan K. and C.T. Warner. 1995. The Possibility of Language. A
discussion of the nature of language, with the implications for human and
machine translation. John Benjamins. Amsterdam.
Inicio
© Universidad de Deusto 2003
Última modificación: junio 2003
|