enow.com Web Search

Search results

  1. Results from the WOW.Com Content Network
  2. British National Corpus - Wikipedia

    en.wikipedia.org/wiki/British_National_Corpus

    The British National Corpus (BNC) is a 100-million-word text corpus of samples of written and spoken English from a wide range of sources. [1] The corpus covers British English of the late 20th century from a wide variety of genres, with the intention that it be a representative sample of spoken and written British English of that time.

  3. Bank of English - Wikipedia

    en.wikipedia.org/wiki/Bank_of_English

    Bank of English. The Bank of English (BoE) is a representative subset of the 4.5 billion words COBUILD corpus, a collection of English texts. These are mainly British in origin, but content from North America, Australia, New Zealand, South Africa and other Commonwealth countries is also being included. The majority of the texts are from written ...

  4. List of text corpora - Wikipedia

    en.wikipedia.org/wiki/List_of_text_corpora

    List of text corpora. Text corpora (singular: text corpus) are large and structured sets of texts, which have been systematically collected. Text corpora are used by corpus linguists and within other branches of linguistics for statistical analysis, hypothesis testing, finding patterns of language use, investigating language change and ...

  5. Beryl Atkins - Wikipedia

    en.wikipedia.org/wiki/Beryl_Atkins

    Among her contributions to corpus linguistics, Atkins originated the idea of the British National Corpus. [ 6 ] [ 7 ] In 1997 and 1998, Atkins, together with Michael Rundell, planned and presented two week-long workshops in South Africa, for linguists and lexicographers from the eleven language communities. [ 8 ]

  6. TenTen Corpus Family - Wikipedia

    en.wikipedia.org/wiki/TenTen_Corpus_Family

    The TenTen Corpus Family (also called TenTen corpora) is a set of comparable web text corpora, i.e. collections of texts that have been crawled from the World Wide Web and processed to match the same standards. These corpora are made available through the Sketch Engine corpus manager. There are TenTen corpora for more than 35 languages.

  7. International Corpus of English - Wikipedia

    en.wikipedia.org/wiki/International_Corpus_of...

    Each corpus contains one million words in 500 texts of 2000 words, [7] following the sampling methodology used for the Brown Corpus.Unlike Brown or the Lancaster-Oslo-Bergen (LOB) Corpus (or indeed mega-corpora such as the British National Corpus), however, the majority of texts are derived from spoken data.

  8. CLAWS (linguistics) - Wikipedia

    en.wikipedia.org/wiki/CLAWS_(linguistics)

    The CLAWS4 was used for the 100-million-word British National Corpus (BNC). A general-purpose grammatical tagger, it is a successor of the CLAWS1 tagger. [11] In tagging the BNC, the many rounds of work that went into CLAWS4 focused on making the CLAWS program independent from the tagsets.

  9. Concordance (publishing) - Wikipedia

    en.wikipedia.org/wiki/Concordance_(publishing)

    Concordancing techniques are widely used in national text corpora such as American National Corpus (ANC), British National Corpus (BNC), and Corpus of Contemporary American English (COCA) available on-line. Stand-alone applications that employ concordancing techniques are known as concordancers [3] or more advanced corpus managers.