Electronic publishing: links, associations, universities

Muestras en SGML/XML (ver también bibliotecas, otros centros, corpora)

Modelo de Ivan Fernández: Edición electrónica (TEI-P3) de la silva "Al lino", según un ejemplar atribuido a Francisco de Calatayud, s. XVII.
Browsing William Shakespeare in XML, based on Jon Bosak's tagged text . (Trabajo incompleto. Sirve como muestra de las posibilidades de JAVA y XML. Muestra en XML.)
Love and hate in Shakespeare
Navegación en corpus BOB/LEGEBiDUNA (Grupo DELi) (Prueba de diseño provisional de interfaz de acceso al corpus LEGE-Bi. Muestra en XML). Se inspira en el servicio TranSearch the Canadian Hansard (1986-1993).
The Decameron an SGML encoded English translation of the Italian text based on THE DECAMERON OF GIOVANNI BOCCACCIO faithfully translated by J.M. RIGG, London, 1921
Dos ejemplos paradigmáticos de acceso a corpora en SGML son CORDE y CREA de la RAE y BNC. (Más información sobre corpora).
CAMENA: Corpus Automatum Manhemiense Electorum Neolatinitatis Auctorum Latin poetry of 50 major German writers of the early modern period. The corpus will comprise about 50.000 pages of printed text which will be presented in two ways: Standard editions published in the 16th, 17th or 18th century will be made available through digital facsimile images. Full text transcriptions marked up according to the principles of the Text Encoding Initiative (TEILITE.DTD) will be presented in XML format and made accessible through field searching. The text will be linked throughout with the corresponding image files.
TELRI's "Plato" Corpus: plato-sl.sgml . In the scope of Working Group 9 (Joint Research) of the TELRI concerted action electronic editions of Plato's Republic in Bulgarian, Chinese, Czech, English, German, Latvian, Polish, Romanian, Slovak, and Slovene were collected.

Bibliotecas

Euskal testuen gordailua, P. Salaberri (EHU)
Biblioteca Virtual Miguel de Cervantes
Cervantes Digital Library
eBooks at Virginia University Library
The Perseus Digital Library. Perseus is a non-profit enterprise, located in the Department of the Classics, Tufts University. The Perseus Project is funded by the Digital Libraries Initiative Phase 2, the National Endowment for the Humanities, the National Science Foundation, private donations, and Tufts University.
History E-Book Project . The American Council of Learned Societies will collaborate with five Learned Societies and a select group of University Presses to publish electronic books, exploring the intellectual possibilities of new technologies and helping to assure the continued viability of history writing in today's publishing environment
Making of America (MOA) is a digital library of primary sources in American social history from the antebellum period through reconstruction. The collection is particularly strong in the subject
The Bibliothèque Nationale de France: The Age of King Charles V (1338-1380)
The Library of Iberian resources online/
Bibliomania. Free online literature.

Corpora

Gnome-ren itzulpenak Mandrake Linux sistema eragilearen euskarazko bertsiorako Gnome mahaigaineko kudeatzailea itzuli zen. Linux banaketa hau librea izanik, itzulpen horretako .po fitxategiak modu publikoan kontsultatzeko modukoak iruditzen zaizkigu. EuskalGNU elkartearen bidez eskuratu ditugu guk (Code & Syntax), eta hemen dauzkazue kontsultarako. 12.800 termino, mezu edo testu-kate daude hiztegi moduko honetan, euskaraz eta ingelesez. Funtsean, informatikarako glosario ia-ia konpletoa dago hemen. Espero dugu ekimen honek lagun dezakeela bateratasuna eta zuzentasuna zabaltzen softwaregintzan, lokalizazioan, eta informatikarako itzulpenak egiten ari diren profesional eta bolondres euskaldunen artean.
EuskaraCorpusa.net/ XX. mendeko euskara jasotzen duen corpus estatistikoa da kontsultagai duzun hau, 4.658.036 testu-hitzez osatua. Erabili izan den eta erabiltzen den euskararen lekuko eta erakusgarri izatea du egiteko nagusi eta ia bakarra, eta ez ereduzko hizkuntza proposatzea. Corpus estatistikoaren oinarria, XX. mendeko euskal argitalpenen inbentario osoa da, irizpide batzuen arabera sailkatua gainera, ondoren ikusiko dugunez. Argitalpenek osatzen duten unibertsotik abiatuta, osotasun hori proportzionalki adieraziko duen lagina eskuratu da zozketa bidez, orotara jasotako 6.351 obra-zatik osatzen dutena. Corpus irekia da oraingoz, urtero eguneratzen dena, nahiz laster corpus itxi izatera pasako den, mende oso baten erakusgarri. Bestalde, euskara idatzia jaso da hor, ez ahozkoa. Ahozkoek badute bere lekua, baina transkribatu eta argitaratu diren neurrian jaso dira.
Project Gutenberg (Sailor) (PromoNet). Fine Literature Digitally Re-Published, initiative of Michael Stern Hart. ASCII, plain text files, listing by title (caché, 9.12.1999). [Eg. copied ASCII Typee by Herman Melville]
Project for American and French Research on the Treasury of the French Language. ARTFL is a cooperative project of the Institut National de la Langue Française (INaLF) of the Centre National de la Recherche Scientifique (CNRS) and the Divisions of the Humanities and Social Sciences of the University of Chicago. Bibliography of the ARTFL Database (caché, 9.12.1999)
Biblioteca Virtual Miguel de Cervantes. La Biblioteca Virtual Miguel de Cervantes es el proyecto más ambicioso de digitalización documental del patrimonio cultural de acceso totalmente gratuito, para poner a disposición de la comunidad científica y de la población hispanohablante en general treinta mil obras de autores españoles o hispanoamericanos. Catálogo, Otras fuentes (caché, 9.12.1999).
Electronic Text Centre (University of Virginia Library). The Center combines an on-line archive of thousands of SGML-encoded electronic texts and images with a library service that offers hardware and software suitable for the creation and analysis of text. English Online Resources (caché, 9.12.1999). [Eg. copied HTML Alice in Wonderland by Lewis Carroll]
Electronic Text Service. (Columbia University Library). ETS, located in 504 Butler Library, is a research and instructional facility of the Columbia University Libraries designed to help Columbia faculty and students incorporate computer-based textual and bibliographic information into their research, study, and teaching. ETS has machine-readable primary source texts, software programs for textual analysis and critical editing, hypermedia and database research tools in the humanities, bibliographic database management programs, IBM and Macintosh microcomputers, X terminals, and optical scanning equipment for the creation of machine-readable text. The ETS staff will provide demonstrations, workshops, and classes for students and faculty, as well as individual consultations. Major Online Text Collections (caché, 9.12.1999)
Center for Electronic Texts in the Humanities. CETH offices are located in Alexander Library on the College Avenue Campus of Rutgers University NJ.
Oxford Text Archive. Founded in 1976 by Lou Burnard, OTA has over twenty years experience of serving the research and teaching needs of electronic text users within the scholarly community. This is an example: [copied text record and copied SGML Billy Budd, Foretopman by Herman Melville]. Lou Burnard also belongs to the CTI Textual Studies , Guide to Digital Resources 1996-98 (caché, 9.12.1999).
The British National Corpus. BNC is a 100 million word collection of samples of written and spoken language from a wide range of sources, designed to represent a wide cross-section of current British English, both spoken and written. The project was carried out and is managed by an industrial/academic consortium lead by Oxford University Press, of which the other members are major dictionary publishers Addison-Wesley Longman and Larousse Kingfisher Chambers; academic research centres at Oxford University Computing Services, Lancaster University's Centre for Computer Research on the English Language, and the British Library's Research and Innovation Centre. Work on building the corpus began in 1991, and was completed in 1994. The project was funded by the commercial partners, the Science and Engineering Council (now EPSRC) and the DTI under the Joint Framework for Information Technology (JFIT) programme. Additional support was provided by the British Library and the British Academy.
University Centre for Computer Corpus Research on Language. UCREL is a research centre of Lancaster University specialized in the automatic or computer-aided analysis of large bodies of naturally-occurring language (`corpora'). They are popular because of theirweb-based course in corpus linguistics). UCREL also hosts a large collection of corpora (caché, 10.12.1999).
International Computer Archive of Modern and Medieval English. ICAME is an international organization of linguists and information scientists working with English machine-readable texts. The aim of the organization is to collect and distribute information on English language material available for computer processing and on linguistic research completed or in progress on the material, to compile an archive of English text corpora in machine-readable form, and to make material available to research institutions. The archive mentioned in the name resides at the Norwegian Computing Centre for the Humanities (NCCH) in Bergen, Norway. This acts as a distribution centre for computerized English-language corpora and corpus-related software. ICAME publishes the ICAME Journal which appears at least once a year, with articles and information about English computer corpora. There is also an electronic information service. Conferences, usually in May/June each year, have been arranged since 1979. See their Texts, text centres, resources and programs on the Web (caché, 10.12.1999).
The Bank of English was launched in 1991 by COBUILD (a division of HarperCollins Publishers) and The University of Birmingham. Since 1980 COBUILD, which is based within the School of English at Birmingham University, has been collecting a corpus of texts on computer for dictionary compilation and language study. In 1991 HarperCollins decided on a major initiative to increase the scale of the corpus to 200 million words, to form the basic data resource for a new generation of authoritative language reference publications.
Linguistic Data Consortium. LDC is an open consortium of universities, companies and government research laboratories. It creates, collects and distributes speech and text databases, lexicons, and other resources for research and development purposes. The University of Pennsylvania is the LDC's host institution. The LDC was founded in 1992 with a grant from the Advanced Research Projects Agency (ARPA), and is partly supported by grant IRI-9528587 from the Information and Intelligent Systems division of the National Science Foundation. Text Collection and Processing (caché, 9.12.1999)
European Language Resources Association. ELRA was established in Luxembourg in February, 1995, with the goal of founding an organization to promote the creation, verification, and distribution of language resources in Europe. A non-profit organization, ELRA aims to serve as a focal point for information related to language resources in Europe. It will collect, market, distribute, and license European language resources. ELRA will help users and developers of language resources, government agencies, and other interested parties exploit language resources for a wide variety of uses. Eventually, ELRA will serve as the European repository for EU-funded language resources and interact with similar bodies in other parts of the world. WRITTEN RESOURCES (caché, caché, 9.12.1999)
The Translational English Corpus (TEC). TEC is a corpus of contemporary translational English: it consists of written texts translated into English from a variety of source languages. The basic structure of TEC was first proposed in Baker (1995) but the design principles have since been elaborated in greater detail in Laviosa-Braithwaite (1996) and Laviosa (1997). The corpus is currently held at the Department of Language Engineering, University of Manchester Institute of Science and Technology (UMIST). Part of the work involved in enlarging the corpus and in making it available on the web is funded by the British Academy. (caché, 24.5.2000). See also Core patterns of lexical use in a comparable corpus of English narrative prose. Sara Laviosa, UMIST.
An International Corpus of Learner English ICLE is a computerized corpus of argumentative essays on different topics written by advanced learners of English (university students of English mainly in their second or third year). The ICLE project was launched in 1990 by Sylviane Granger, University of Louvain-la Neuve, Belgium, and the corpus is made up of a number of subcorpora representing the following language backgrounds: Chinese, Czech, Dutch, Finnish, French, German, Japanese, Polish, Russian, Spanish, and Swedish. There is also a smaller comparable corpus of British and American undergraduate essays. The length of the essays varies between 500 and 1000 words. The corpus is under compilation and nearing completion. It is scheduled to be available in grammatically tagged form in 1998. (caché, 24.5.2000)
Euralex 2000 Tutorial - Homepage Listing of selected links

Centros

Oxford Text Archive (OTA)
Center for Humanities Computing at Oxford University (CHC)
Leeds Electronic Text Centre (LETC)
Center for Electronic Texts in the Humanities (CETH)
Leiden Centre for the book. The Leiden Centre for the Book offers teaching and research in European and non-European book history and textual studies from the manuscript period to the latest digital developments.
Poesia-inter.net/
Chaucer and Spain, by Jesús Luis Serrano Reyes
Chronology of Geoffrey Chaucer's life and times
Menéndez Pelayo digital
The Cortes of Castile-León, by Joseph F. O'Callaghan
Kairos: teaching through hypertext