Home | Catalogue | Speech | Termino | Tools |
WRITTEN RESOURCES
The description of LRs given herein are brief summaries to facilitate its readability. Further information is given: follow the links !
R : | For academic use | If none of these abbreviations (R, C or RC) appears, there are no restrictions for the type of use. |
RC : | For research use by a commercial organisation | |
C : | For commercial use | |
Discount for Non members are offered to members of organizations with which ELRA entered into special agreements (e.g. ELSNET). |
||
*** : | At cost | |
ELRA : | Please contact ELRA office. | |
--- : | Price under discussion | |
WWW : | Please download this free resource from the Web (follow the links) | |
The following prices are indicated in EURO (1 EUR~=1.2 USD). Some prices, which were negotiated in local currency, have been re-adjusted wrt exchange rate. |
Ref. ELRA | Name | Type & No of entries | Language | M | Non-M | Date |
W0001 | BRITISH NATIONAL CORPUS - BNC (OTA) | 100 million words | English | R 175 | R 254 | 01/09/96 |
W0002 | CONTEMPORARY PORTUGUESE CORPUS | 1.5 million words | Portuguese | --- | --- | |
W0003 | CRATER Multi-lingual aligned corpus | 1 million tokens | English, French, Spanish | 20 | 100 | 23/01/97 |
W0004 | ECI/MCI European Corpus Initiative | Multilingual Corpus 98 million words | Major European languages + Turkish, Japanese, Russian, Chinese, Malay, etc. | R 45 | R 45 | 01/09/96 |
W0005 | ECI-ELSNET Italian & German tagged sub-corpus | Economy 17,000 words Politics 14,000 words Culture 18,000 words Sports 9,000 words Local Events 8,500 words | Italian & German | R 20 | R 45 | 01/09/96 |
W0006 | MLCC - Multi-lingual corpus | Het Financieele Dagblad (8.5 million words) The Financial Times (30 million words) Le Monde (10 million words) Handelsblatt (33 million words) Il sole 24 Ore (1.88 million words) Expansion (10 million words) | Dutch, English, French, German, Italian, Spanish | R 360 C 1500 | R 750 C 3200 | 01/09/96 |
W0007 | MLCC - Office of Official Publications of the European Communities (Parliamentary Debates + OJ) | Parallel corpus of translated documents in the nine European official languages, divided into 2 sub-corpora: written questions and parliamentary debates | Multilingual | R 120 C 480 | R 200 C 800 | 01/09/96 |
W0008 | MTP annotated German Corpus (500000 Words from FAZ/ Die Zeit) | 500,000 words | German | untagged: 2000 tagged: 8000 | untagged: 3500 tagged: 12000 | 01/09/96 |
W0009 | MULTEXT / MULTEXT East (Data/Tools) | Written Lexicon and Corpora | Multilingual | *** | *** | |
W0010 | Swedish Corpus PRESS 65 (Corpus of over 1m Words) | 1 million words | Swedish | R 12000 | R 20000 | 23/01/97 |
W0011 | Tagged text in French (MEMODATA) Typographic tagging | 170 books | French | R 1723 C 2154 | R 2154 C 2692 | 23/01/97 |
W0012 | Tagged text in French (MEMODATA) Morphologic tagging | 170 books | French | R 2461 C 3077 | R 3077 C 3846 | 23/01/97 |
W0013 | TSNLP (Test Suites for NLP Testing) | 4,000 test items | Multilingual | *** | *** | 01/09/96 |
W0014 | Monolingual Greek corpus | 1 million words | Greek | R 360 | R 600 | 17/02/97 |
W0015 | Text corpus of "Le Monde" | Corpus from "Le Monde" newspaper. From 1 to 5 years of data are available. Each tape/year contains some 10 Mbytes of data per month (circa 120 Mbytes per year). | French | R. 1year 238,91 2yrs 477,83 3yrs 716,74 4yrs 955,65 5yrs 1194,56 | R. 1year 310,59 2yrs 621,17 3yrs 931,76 4yrs 1242,35 5yrs 1552,93 | 15/09/97 |
W0016 | Karl May Korpus (KMK) | Karl-May-Korpus is a German monolingual corpus, available in an SGML-tagged ASCII text format. It contains the works of the German author Karl May and consists of around 1.6 million words (divided into 9 sub-corpora of about 180,000 words each). | German | R 400 C 2500 | R 800 C 3500 | 28/11/97 |
W0017 | MULTEXT JOC Corpus | This CD-ROM contains a part of the corpus developed in the MULTEXT project financed by the European Commission (LRE 62-050). This part contains raw, tagged and aligned data from the Written Questions and Answers of the Official Journal of the European Community. The corpus contains ca. 5 million words in English, French, German, Italian and Spanish (ca. 1 million words par language). About 800,000 words were grammatically tagged and manually checked for English, French, Italian and Spanish, i.e. roughly 200,000 words per language. The same subset for French, German, Italian and Spanish was aligned to English at the sentence level. | English, French, German, Italian, Spanish | R 45 C 2000 | R 100 C 5000 | 23/11/98 |
W0018 | ARCADE/ROMANSEVAL corpus | The corpus contains raw data from the JOC corpus developed in the MULTEXT project financed by the European Commission (LRE 62-050), composed of 1 million words in English and four romance languages: French, Italian, Spanish and Portuguese (Written Question and Answers from the Official Journal of the European Commission). The annotation concerns all the contexts of 60 different test words (20 nouns, 20 adjectives, 20 verbs), i.e. ca. 3700 contexts all together. It comprises: semantic tagging of all the occurrences of the test words in the JOC corpus for French and Italian; a,d word-level alignment of all the occurrences of the test words between French and English. | English, French, Italian | R 45 C 2000 | R 100 C 5000 | 23/11/98 |
W0019 | Dutch PAROLE Distributable Corpus | This Dutch corpus is a 3 million words selection built according to the specifications of the PAROLE project. Over 250,000 words of corpus texts have been PoS-tagged automatically. A total of 59,798 running words has been manually corrected and checked | Dutch | R 270 RC 800 C 1600 | R 300 RC 1300 C 2500 | 12/07/99 |
* Special price for academic users from the Netherlands and Belgium: 150 EURO (the data will be supplied directly by the Instituut voor Nederlandse Lexicologie, http://www.inl.nl) |
Ref. ELRA | Name | Type & No of entries | Language | M | Non-M | Date |
L0001 | DICO-MORPH_lemme. MEMODATA | Morpho-syntactic information 400,000 entries | French | R 12090 C 15112 | R 15112 C 18890 | 23/01/97 |
L0002 | DICO-MORPH_Collocation. MEMODATA | Collocation lexicon 35,000 entries | French | R 6992 C 8740 | R 8740 C 10925 | 23/01/97 |
L0003 | DICO-SYNT. MEMODATA | 90,000 inflexional forms | French | R 8861 C 11077 | R 11077 C 13846 | 23/01/97 |
L0004 | Dutch Lexicon. (LanTmark) | General vocabulary 64,000 entries | Dutch | R 9360 C 32400 | R 15600 C 54000 | 23/01/97 |
L0005 | French Lexicon (LanTmark) | General vocabulary 50,000 entries | French | R 7440 C 25440 | R 12400 C 42400 | 23/01/97 |
L0006 | ILC Italian Morphological lexicon | Lexicon About 60,000 lemmas/lexical entries | Italian | R 4000 C 12000 | R 8000 C 20000 | 15/09/97 |
L0007 | LexIn 1:e Swedish Lexicon | Lexicon 17,000 headwords and 21,000 senses | Swedish | R 1200 C 12000 | R 2000 C 20000 | 23/01/97 |
L0008 | Monolingual Danish lexicon. (Institut for Erhvervsinformatik) | lexicon 25,000 entries | Danish | R 1,2/entry C 2,4/entry | R 2/entry C 4/entry | 13/05/97 |
L0009 | Monolingual Portuguese lexicon. (Centro de Linguistica da Universidade de Lisboa) | lexicon 60,000 entries | Portuguese | --- | --- | |
L0010 | MULTEXT lexicons | This CD-ROM contains a set of lexicons developed in the MULTEXT project financed by the European Commission (LRE 62-050). The set contains the following languages: English, French, German, Italian and Spanish. English 66,214 Word forms French 306,795 Word forms German 233,861 Word forms Italian 145,530 Word forms Spanish 510,710 Word forms | English, French, German, Italian, Spanish | R 45 C 2000 | R 100 C 5000 | 23/11/98 |
L0011 | Portuguese morphological lexicon PALAVROSO (INESC) | lexicon 60,000 entries | Portuguese | --- | --- | |
L0012 | Spanish gilcUB-M-Dictionary | General vocabulary 60,000 entries | Spanish | R 6500 C 8250 | R 8225 C 10300 | 23/01/97 |
L0013 | THAMUS. Generic Italian dictionary (Consorzio per la linguistica computazionale) | i) Generic (canonical forms) 87,000 ii) Generic (inflected forms) 612,000 iii) Technical (canonical forms) 48,000 iv) Technical (inflected forms) 96,000 | Italian | R. i) 19140 ii) 135080 iii) 10560 iv) 21120 C. i) 47850 ii) 336600 iii) 26400 iv) 52800 | R. i) 20880 ii) 147360 iii) 11520 iv) 23040 C. i) 52200 ii) 367200 iii) 28800 iv) 57600 | 13/05/97 |
L0014 | Adverbial Equivalence Dictionary (CORA) | Generic Dictionary 1,200 entries | French | C 243,92 | C 304,90 | 23/01/97 |
L0015 | Nominalisation Dictionary (CORA) | Generic Dictionary 2,300 entries | French | C 365,88 | C 457,35 | 23/01/97 |
L0016 | Tri-quadri-pentagrams Dictionary (CORA) | Generic Dictionary 5,487 entries | French | C 365,88 | C 457,35 | 23/01/97 |
L0017 | N de N Dictionary (CORA) | Generic Dictionary 10,000 entries | French | C 1219,59 | C 1524,49 | 23/01/97 |
L0018 | German lexicon (CORA) | Lexicon 466,300 | German | C 4878,37 | C 6097,96 | 23/01/97 |
L0019 | English lexicon (CORA) | Lexicon 160,000 entries | English | C 4878,37 | C 6097,96 | 23/01/97 |
L0020 | DST Dictionary (CORA) 1) String dictionary 2) Optional extra sets: i) Part of speech (optional) ii) Gender, number, conjugation (optional) iii) Lemma (optional) iv) Semantical information (optional) v) Syntactical information (optional) vi) Prep/adv. phrases (optional) vii Compound nouns (optional) 3) The whole dictionary |
Generic Dictionary 550,000 inflected forms | French | C 1) 4878,37 2) i) 2439,18 ii) 1219,59 iii) 1219,59 iv) 1219,59 v) 609,80 vi) 609,80 vii) 1219,59 3) 12195,92 |
C 1) 6097,96 2) i) 3048,98 ii) 1524,49 iii) 1524,49 iv) 1524,49 v) 762,25 vi) 762,25 vii) 1524,49 3) 15244,90 | 23/01/97 |
L0021 | Dictionary of French verbs (CORA - Jean Dubois) | >25,610 verbs | French | C 7317,55 | C 9146,94 | 21/05/97 |
L0022 | Dictionary of words (CORA - Jean Dubois) | 126,844 words | French | C 4878,35 | C 6097,96 | 21/05/97 |
L0023 | Dictionary of affixes (CORA) | 4,286 suffixes and prefixes | French | C 609,80 | C 762,25 | 21/05/97 |
L0024 | Dictionary of verb phrases (CORA) | 3,480 entries based on the model of the dictionary of French verbs (ELRA-L0021) | French | C 487,84 | C 609,80 | 21/05/97 |
L0025 | Dictionary of invariable forms and phrases (CORA) | 4,783 entries based on the model of the dictionary of words (ELRA-L0022) | French | C 243,92 | C 304,90 | 21/05/97 |
L0026 | Dictionary of exclamatory stereotyped phrases (CORA) | 1,901 entries based on the model of the dictionary of invariable forms and phrases (ELRA-L0025) | French | C 243,92 | C 304,90 | 21/05/97 |
L0027 | Dictionary of French local authorities (CORA) | 38,965 entries in lower cases with accents, controlled on the guide Michelin, without localities | French | C 243,92 | C 304,90 | 21/05/97 |
L0028 | Dictionary of noun phrases and plural-only words (CORA) | 2,138 compound names and 1,397 entries of plural-only words | French | C 243,92 | C 304,90 | 21/05/97 |
L0029 | CELEX - Dutch lexical database | Dutch lexical database containing lemmas (124136 entries), wordforms (381292 entries), abbreviations (1622 entries), syllables (31358 entries). The database is divided into different subsets.
i) Complete set of data ii) Subset Orthography iii) Subset Phonology iv) Subset Morphology Infl. v) Subset Morphology Der. vi) Subset Syntax vii) Subset Frequency | Dutch | C. i) 56087,32 ii) 5989,90 iii) 12252,07 iv) 5989,90 v) 13613,41 vi) 5989,90 vii) 12252,07 R. ELRA | C. i) 93478,72 ii) 9983,16 iii) 20420,11 iv) 9983,16 v) 22689,01 vi) 9983,16 vii) 20420,11 R. ELRA | 15/09/97 |
L0030 | Bulgarian Morphological Dictionary | 67,500 entries divided into 242 inflectional types (including proper nouns), morphosyntactic information for each entry, and a morphological engine (MS DOS and WINDOWS 95/NT) for morphological analysis and generation | Bulgarian | R 45 C 6000 | R 100 C 12000 | 16/04/98 |
L0031 | Dutch PAROLE lexicon | The entry list of the lexicon consists of about 20,200 entries distributed over 13 parts of speech (POS). The entries have been described along the dimensions of morphosyntax and syntax, according to the specifications of the PAROLE project. The lexicon is set up as an SGML file. | Dutch | R 300 RC 1600 C 8000 | R 400 RC 3000 C 10000 | 12/07/99 |
* Special price for academic users from the Netherlands and Belgium: 200 EURO (the data will be supplied directly by the Instituut voor Nederlandse Lexicologie, http://www.inl.nl) |
Ref. ELRA | Name | Type & No of entries | Language | M | Non-M | Date |
M0001 | Basic multilingual lexicon (MEMODATA) | Lexicon 30 000 each language | French, English, Italian, German, Spanish | R 8861 C 11077 | R 11077 C 13846 | 23/01/97 |
M0002 | Bilingual Spanish-English and English-Spanish Lexicons (INCYTA) | Technical domains Economics, law & business managment 10,642 Leisure, Tourism, Sports, Food 3,144 Geography, History, Arts 4,116 Sociology, Psychology, Pedagogy 4,089 Natural and medical sciences 10,535 Exact sciences, Phys., Chemistry, Geology 10,616 Data Processing, Electronics, Telecoms 4,904 Technology, Engineering & Construction 11,953 Economics 1,320 Data Processing 3,565 Telecommunications 3,733 Electrical Engineering 1,760 Plastics and Chemistry 9,022 Aeronaut., Navigat., Mechanic. Engin. 23,170 | Spanish-English English-Spanish | R 0,12/entry C 0,96/entry | R 0,2/entry C 1,6/entry | 23/01/97 |
M0003 | Danish-German dictionary (Institut for Erhvervsinformatik) | General vocabulary 10,000 | Danish-German | R 1,2/entry C 2,4/entry | R 2/entry C 4/entry | 23/01/97 |
M0004 | Dutch-French Lexicon (LanTmark) | Vocabularies for transfer i) General Vocabulary 26,000 ii) Administrative 32,000 iii) Data processing 10,000 | Dutch-French | R i) 7800 ii) 8160 iii) 2400 C i) 17760 ii) 19920 iii) 6000 | R i) 12800 ii) 13600 iii) 4000 C i) 29600 ii) 23200 iii) 10000 | 23/01/97 |
M0005 | English-French Lexicon (LanTmark) | General vocabulary for transfer 27,000 entries | English-French | R 8160 C 18720 | R 13600 C 31200 | 23/01/97 |
M0006 | French-Dutch Lexicon (LanTmark) | Vocabularies for transfer i) General Vocabulary 34,000 ii) Administrative 18,000 iii) Data processing 10,000 | French-Dutch | R i) 8880 ii) 4800 iii) 2400 C i) 21480 ii) 11520 iii) 6000 | R i) 14800 ii) 8000 iii) 4000 C i) 35800 ii) 19200 iii) 10000 | 23/01/97 |
M0007 | French-English Lexicon (LanTmark) | General vocabulary for transfer 34,000 entries | French-English | R 10320 C 23640 | R 17200 C 39400 | 23/01/97 |
M0008 | German-Danish dictionaries (Institut for Erhvervsinformatik) | Technical 6,800 General 15,500 | German-Danish | R 1,2/entry C 2,4/entry | R 2/entry C 4/entry | 23/01/97 |
M0009 | THAMUS Bilingual dictionaries (Consorzio per la linguistica computazionale) | Technical domains Computer Science i) canonical forms 17,800 ii) inflected forms 35,000 | German-Italian or Italian-German | R. i) 3916 ii) 7700 C. i) 19580 ii) 38500 |
R. i) 4272 ii) 8400 C. i) 21360 ii) 42000 | 13/05/97 |
M0010 | THAMUS Bilingual dictionaries (Consorzio per la linguistica computazionale) | Technical domains i) Aeronautics 6,300 ii) Law (canonical forms) 8,900 iii) Law (inflected forms) 18,000 iv) Computer Science (canonical forms) 15,700 v) Computer Science (inflected forms) 32,000 vi) Medicine (canonical forms) 20,000 vii) Economics (canonical forms) 50,000 viii) Economics (inflected forms) 86,000 ix) Engineering (canonical forms) 13,000 x) Engineering (inflected forms) 27,000 | English-Italian or Italian-English | R.
i) 1386 ii) 1958 iii) 3960 iv) 3454 v) 7040 vi) 4400 vii) 11000 viii) 18920 ix) 2860 x) 5940 C. i) 6930 ii) 9790 iii) 19800 iv) 17270 v) 35200 vi) 22000 vii) 55000 viii) 94600 ix) 14300 x) 29700 | R.
i) 1512 ii) 2136 iii) 4320 iv) 3768 v) 7680 vi) 4800 vii) 12000 viii) 20640 ix) 3120 x) 6480 C. i) 7560 ii) 10680 iii) 21600 iv) 18840 v) 38400 vi) 24000 vii) 60000 viii) 103200 ix) 15600 x) 32400 |
13/05/97 |
M0013 | Bilingual Collocational Dictionary | The bilingual English-German collocational dictionary consists of around 40,000 English headwords, including concepts expressed with more than one word and hyphenated compounds. It contains verbs, adjectives, synonyms and phrases that collocate with the headword. It provides the German equivalents for the headwords as well as their English synonyms. | English, German | 210 | 300 | 28/11/97 |
M0014 | Bilingual Dictionaries | Bilingual dictionaries containing local linguistic variant, local spelling variant, words frequency, usage (familiar, old, slang, etc.) and semantic features:
GROUP 1 English <=> Spanish, French, German, Italian, Brazilian Portuguese, Portuguese, Dutch. |
See description | R. G1 0.06/ent. C. G1 0.25/ent. |
R. G1 0.12/ent. C. G1 0.5/ent. |
16/04/98 |
M0015 | English EuroWordNet | Each EuroWordNet database is composed of the following:
- The Inter-Lingual-Index, which is a list of records (ILI-records), in the form of synsets mainly taken from WordNet1.5 or manually created. - A top-ontology which consists of an ontology of 63 basic semantic classes based on fundamental distinctions. - A domain-ontology which consists of an ontology of subject-domains optionally assigned to ILI-records. - A selection of ILI-records, the so-called Base-Concepts, which play a major role in the different wordnets. - WordNet1.5 (91591 synsets; 168217 meanings; 126520 entry words) in EuroWordNet format. | English | More info | More info | 30/08/99 |
M0016 | Dutch EuroWordNet | See ELRA-M0015 | Dutch-English | More info | More info | 30/08/99 |
M0017 | Spanish EuroWordNet | See ELRA-M0015 | Spanish-English | More info | More info | 30/08/99 |
M0018 | Italian EuroWordNet | See ELRA-M0015 | Italian-English | More info | More info | 15/10/99 |
M0019 | German EuroWordNet | See ELRA-M0015 | German-English | More info | More info | 15/10/99 |
M0020 | French EuroWordNet | See ELRA-M0015 | French-English | More info | More info | 15/10/99 |
M0021 | Czech EuroWordNet | See ELRA-M0015 | Czech-English | More info | More info | 15/10/99 |
M0022 | Estonian EuroWordNet | See ELRA-M0015 | Estonian-English | More info | More info | 15/10/99 |