Text typology | DELi | feb. 2002 |
This Report and its recommendations should be read in conjunction with EAGLES interim recommendations on Corpus Typology (EAGLES, 1996a).
The purpose of this project is to develop models for empirical and theoretical study of the simultaneous interpreting process, including assessment of theories about text understanding and text production, text/discourse types, and the applicability of translation theory to simultaneous interpreting of expert (LSP) discourse.
Robert-Alain de Beaugrande & Wolfgang Ulrich Dressler. 1981. Introduction to Text Linguistics. Longman.
Vijay K. Bhatia. 1993. Analysing Genre. Language use in professional settings. Longman.
Douglas Biber. 1989. A Typology of English Texts. Linguistics 27: 3-43.
Douglas Biber. 1995. Dimensions of register variation: A cross-linguistic comparison. Cambridge University Press. (Review by Nigel Armstrong, caché)
Douglas Biber y Edward Finegan. 1986. An initial typology of English text types. Jan Aarts y Willen Meijs (Eds.) Corpus Linguistics II: New Studies in the Analysis and Exploitation of Computer Corpora. Rodopi: 19-46.
Douglas Biber, S. Conrad, and R. Reppen. 1998. Corpus linguistics: Investigating language structure and use. Cambridge University Press.
Douglas Biber, S. Johansson, G. Leech, S. Conrad, E. Finegan. 1999. The Longman grammar of spoken and written English. Longman.
Philip R. Cohen & C. Raymond Perrault. 1979. Elements of a Plan-Based Theory of Speech Acts. Cognitive Science 3: 177-212.
S. Conrad & Douglas Biber (eds.). 2001. Variation in English: Multi-Dimensional studies. Longman.
EAGLES. 1996. Preliminary Recommendations on Text Typology. http://www.ilc.pi.cnr.it/EAGLES/texttyp/texttyp.html
James L. Kinneavy. 1980. A Theory of Discourse. Norton.
M.A. K. Halliday & R. Hasan. 1976. Cohesion in English. London.
Sara Laviosa. 1998. The English Comparable Corpus: A resource and a methodology. Lynne Bowker, Michael Cronin, Dorothy Kenny y Jennifer Pearson (Eds.). Unity in Diversity? Current Trends in Translation Studies. St. Jerome Publishing. (link)
Junsaku Nakamura. 1991. The relationships among genres in the LOB corpus based upon the distribution of grammatical tags. Jacet Bulletin 22: 55-74.
Christiane Nord. 1997. A Functional Typology of Translations. In Trosborg: 43-65.
Roel Popping. 2000. Computer-assisted Text Analysis. Sage.
Roda P. Roberts. 1995. Towards a Typology of Translations. Hieronymus Complutensis 1: 69-78.
Michael Stubbs. 1996. Text and Corpus Analysis. Blackwell.
John M. Swales. 1990. Genre Analysis. English in academic and research settings. Cambridge University Press.
Anna Trosborg. 1997. Text Typology: Register, Genre and Text Type. In Trosborg: 3-23. (notes)
Anna Trosborg (Ed.). 1997. Text Typology and Translation. John Benamins.
Texto completo en diversos formatos de ResearchIndex.
The automated categorisation (or classification) of texts into topical categories has a long history, dating back at least to the early '60s. Until the late '80s, the most effective approach to the problem seemed to be that of manually building automatic classifiers by means of knowledge-engineering techniques, i.e. manually defining a set of rules encoding expert knowledge on how to classify documents under a given set of categories.
In the '90s, with the booming production and availability of on-line documents, automated text categorisation has witnessed an increased and renewed interest, prompted by which the machine learning paradigm to automatic classifier construction has emerged and definitely superseded the knowledge-engineering approach.
Within the machine learning paradigm,a general inductive process (called the learner) automatically builds a classifier (also called the rule,or the hypothesis)by learning, from a set of previously classified documents, the characteristics of one or more categories. The advantages of this approach are a very good effectiveness, a considerable savings in terms of expert manpower, and domain independence. In this tutorial we look at the main approaches that have been taken towards automatic text categorisation within the general machine learning paradigm.
Issues pertaining to document indexing, classifier construction, and classifier evaluation, will be discussed in detail. A final section will be devoted to the techniques that have specifically been devised for an emerging application such as the automatic classification of Web pages into "Yahoo!-like" hierarchically structured sets of categories
In this tutorial we look at the main approaches that have been taken towards automatic text categorization within the general machine learning paradigm. A general presentation of the basic issues in document categorization will be followed by the presentation of basic (such as linear separators, decision trees, etc.) and advanced machine learning concepts and techniques (such as boosting, support vector machines, etc.). Then issues pertaining to document indexing, classifier construction, and classifier evaluation, will be discussed in detail, and a review of the current most relevant research in text categorization by machine learning tools will be presented. Finally, the special case of automatic classification of Web pages is considered and the concepts and techniques specifically devised for this case are discussed.
Text categorization and text routing both involve taking a text, and assigning keywords to it, to reflect its content. The applications of categorization and routing are many and varied. For example, large companies sometimes use a text routing tool to scan incoming telexes and assign a keyword to them, typically the name of the department or of the person the telex should go to.