enow.com Web Search

Search results

  1. Results from the WOW.Com Content Network
  2. Corpus of Contemporary American English - Wikipedia

    en.wikipedia.org/wiki/Corpus_of_Contemporary...

    The Corpus of Contemporary American English (COCA) is composed of one billion words as of November 2021. [1] [2] [4] The corpus is constantly growing: In 2009 it contained more than 385 million words; [5] in 2010 the corpus grew in size to 400 million words; [6] by March 2019, [7] the corpus had grown to 560 million words.

  3. List of text corpora - Wikipedia

    en.wikipedia.org/wiki/List_of_text_corpora

    Text corpora (singular: text corpus) are large and structured sets of texts, which have been systematically collected.Text corpora are used by both AI developers to train large language models and corpus linguists and within other branches of linguistics for statistical analysis, hypothesis testing, finding patterns of language use, investigating language change and variation, and teaching ...

  4. International Corpus of English - Wikipedia

    en.wikipedia.org/.../International_Corpus_of_English

    Each corpus contains one million words in 500 texts of 2000 words, [7] following the sampling methodology used for the Brown Corpus.Unlike Brown or the Lancaster-Oslo-Bergen (LOB) Corpus (or indeed mega-corpora such as the British National Corpus), however, the majority of texts are derived from spoken data.

  5. Mark Davies (linguist) - Wikipedia

    en.wikipedia.org/wiki/Mark_Davies_(linguist)

    Mark E. Davies (born 1963) is an American linguist. He specializes in corpus linguistics and language variation and change.He is the creator of most of the text corpora from English-Corpora.org (including the Corpus of Contemporary American English/ COCA) as well as the Corpus del español and the Corpus do português.

  6. Category:English corpora - Wikipedia

    en.wikipedia.org/wiki/Category:English_corpora

    Download as PDF; Printable version; ... Pages in category "English corpora" The following 18 pages are in this category, out of 18 total. ... International Corpus of ...

  7. American National Corpus - Wikipedia

    en.wikipedia.org/wiki/American_National_Corpus

    The American National Corpus (ANC) is a text corpus of American English containing 22 million words of written and spoken data produced since 1990. Currently, the ANC includes a range of genres, including emerging genres such as email, tweets, and web data that are not included in earlier corpora such as the British National Corpus .

  8. Category:Corpora - Wikipedia

    en.wikipedia.org/wiki/Category:Corpora

    Download as PDF; Printable version; ... English corpora (18 P) P. Persian corpora ... Bank of English; Bergen Corpus of London Teenage Language;

  9. Manually Annotated Sub-Corpus - Wikipedia

    en.wikipedia.org/wiki/Manually_Annotated_Sub-Corpus

    Manually Annotated Sub-Corpus (MASC) is a balanced subset of 500K words of written texts and transcribed speech drawn primarily from the Open American National Corpus (OANC). The OANC is a 15 million word (and growing) corpus of American English produced since 1990, all of which is in the public domain or otherwise free of usage and ...