enow.com Web Search

Search results

  1. Results from the WOW.Com Content Network
  2. List of text corpora - Wikipedia

    en.wikipedia.org/wiki/List_of_text_corpora

    British National Corpus; Bergen Corpus of London Teenage Language (COLT) Brown Corpus, forming part of the "Brown Family" of corpora, together with LOB, Frown and F-LOB; Corpus of Contemporary American English (COCA) 425 million words, 1990–2011. Freely searchable online; Corpus Resource Database (CoRD), more than 80 English language corpora. [2]

  3. British National Corpus - Wikipedia

    en.wikipedia.org/wiki/British_National_Corpus

    The British National Corpus (BNC) is a 100-million-word text corpus of samples of written and spoken English from a wide range of sources. [1] The corpus covers British English of the late 20th century from a wide variety of genres, with the intention that it be a representative sample of spoken and written British English of that time.

  4. Category:Corpora - Wikipedia

    en.wikipedia.org/wiki/Category:Corpora

    Main page; Contents; Current events; Random article; About Wikipedia; Contact us; Pages for logged out editors learn more

  5. Text corpus - Wikipedia

    en.wikipedia.org/wiki/Text_corpus

    An example of annotating a corpus is part-of-speech tagging, or POS-tagging, in which information about each word's part of speech (verb, noun, adjective, etc.) is added to the corpus in the form of tags. Another example is indicating the lemma (base) form of each word.

  6. CLAWS (linguistics) - Wikipedia

    en.wikipedia.org/wiki/CLAWS_(linguistics)

    In tagging the BNC, the many rounds of work that went into CLAWS4 focused on making the CLAWS program independent from the tagsets. For example, the BNC project used two tagset versions: "a main tagset (C5) with 62 tags with which the whole of the corpus has been tagged, and a larger (C7) tagset with 152 tags, which has been used to make a ...

  7. Lancaster-Oslo-Bergen Corpus - Wikipedia

    en.wikipedia.org/wiki/Lancaster-Oslo-Bergen_Corpus

    The Lancaster-Oslo/Bergen (LOB) Corpus is a one-million-word collection of British English texts which was compiled in the 1970s in collaboration between the University of Lancaster, the University of Oslo, and the Norwegian Computing Centre for the Humanities, Bergen, to provide a British counterpart to the Brown Corpus compiled by Henry Kučera and W. Nelson Francis for American English in ...

  8. Bank of English - Wikipedia

    en.wikipedia.org/wiki/Bank_of_English

    The Bank of English (BoE) is a representative subset of the 4.5 billion words COBUILD corpus, a collection of English texts.These are mainly British in origin, but content from North America, Australia, New Zealand, South Africa and other Commonwealth countries is also being included.

  9. Xaira - Wikipedia

    en.wikipedia.org/wiki/Xaira

    It is based on SARA, [2] an SGML-aware text-searching system originally developed for searching the British National Corpus. Xaira has been redeveloped as a generic XML system for constructing query-systems for any kind of XML data, in particular for use with TEI. The current Windows implementation is intended for non-specialist users.