enow.com Web Search

Search results

  1. Results from the WOW.Com Content Network
  2. Lancaster-Oslo-Bergen Corpus - Wikipedia

    en.wikipedia.org/wiki/Lancaster-Oslo-Bergen_Corpus

    The Lancaster-Oslo/Bergen (LOB) Corpus is a one-million-word collection of British English texts which was compiled in the 1970s in collaboration between the University of Lancaster, the University of Oslo, and the Norwegian Computing Centre for the Humanities, Bergen, to provide a British counterpart to the Brown Corpus compiled by Henry Kučera and W. Nelson Francis for American English in ...

  3. International Computer Archive of Modern and Medieval English

    en.wikipedia.org/wiki/International_Computer...

    The International Computer Archive of Modern and Medieval English (ICAME) is an international group of linguists and data scientists working in corpus linguistics to digitise English texts. [1] The organisation was founded in Oslo , Norway in 1977 as the International Computer Archive of Modern English, before being renamed to its current title.

  4. List of text corpora - Wikipedia

    en.wikipedia.org/wiki/List_of_text_corpora

    Timestamped JSI web corporaweb corpora of news articles crawled from a list of RSS feeds. Newsfeed corpora are being prepared in the framework of the project implemented by the Jožef Stefan Institute at Slovenian scientific research institute. [43] and published in Sketch Engine. More information about the project is on the project websites.

  5. International Corpus of English - Wikipedia

    en.wikipedia.org/.../International_Corpus_of_English

    Comparable variations would be British English, American English, and Indian English, that would be represented through a computer corpora. [2] The corpora are used by researchers to compare the syntax of the varieties of English. [3] ICE corpora completion would have comprehensive linguistic analysis of varieties of English that have emerged. [4]

  6. Cambridge English Corpus - Wikipedia

    en.wikipedia.org/wiki/Cambridge_English_Corpus

    The Cambridge International Corpus (CIC) is a collection of over 2 billion words [1] of real spoken and written English. The texts are stored in a database that can be searched to see how English is used. The CIC also contains the Cambridge Learner Corpus, a unique collection of over 60,000 exam papers from Cambridge ESOL.

  7. English-Arabic Parallel Corpus of United Nations Texts

    en.wikipedia.org/wiki/English-Arabic_Parallel...

    The EAPCOUNT consists mainly, but not exclusively, of resolutions and annual reports issued by different UN organizations and institutions. Some texts are taken from the authoritative publications of another UN-like institution, namely the Inter-Parliamentary Union (IPU); representing 2.18% of the total number of tokens in the English subcorpus.

  8. Bank of English - Wikipedia

    en.wikipedia.org/wiki/Bank_of_English

    The Bank of English totals 650 million running words. [1] Copies of the corpus are held both at HarperCollins Publishers and the University of Birmingham. The version at Birmingham can be accessed for academic research. The Bank of English forms part of the Collins Word Web together with the French, German and Spanish corpora.

  9. TenTen Corpus Family - Wikipedia

    en.wikipedia.org/wiki/TenTen_Corpus_Family

    The TenTen Corpus Family (also called TenTen corpora) is a set of comparable web text corpora, i.e. collections of texts that have been crawled from the World Wide Web and processed to match the same standards. These corpora are made available through the Sketch Engine corpus manager. There are TenTen corpora for more than 35 languages.