enow.com Web Search

Search results

  1. Results from the WOW.Com Content Network
  2. Corpus of Contemporary American English - Wikipedia

    en.wikipedia.org/wiki/Corpus_of_Contemporary...

    The Corpus of Contemporary American English (COCA) is composed of one billion words as of November 2021. [1] [2] [4] The corpus is constantly growing: In 2009 it contained more than 385 million words; [5] in 2010 the corpus grew in size to 400 million words; [6] by March 2019, [7] the corpus had grown to 560 million words.

  3. List of text corpora - Wikipedia

    en.wikipedia.org/wiki/List_of_text_corpora

    Text corpora (singular: text corpus) are large and structured sets of texts, which have been systematically collected.Text corpora are used by both AI developers to train large language models and corpus linguists and within other branches of linguistics for statistical analysis, hypothesis testing, finding patterns of language use, investigating language change and variation, and teaching ...

  4. Category:English corpora - Wikipedia

    en.wikipedia.org/wiki/Category:English_corpora

    This page was last edited on 29 September 2023, at 00:16 (UTC).; Text is available under the Creative Commons Attribution-ShareAlike 4.0 License; additional terms may apply.

  5. Brown Corpus - Wikipedia

    en.wikipedia.org/wiki/Brown_Corpus

    The Brown Corpus was a carefully compiled selection of current American English, totalling about a million words drawn from a wide variety of sources. Kučera and Francis subjected it to a variety of computational analyses, from which they compiled a rich and variegated opus, combining elements of linguistics, psychology, statistics, and sociology.

  6. American National Corpus - Wikipedia

    en.wikipedia.org/wiki/American_National_Corpus

    The American National Corpus (ANC) is a text corpus of American English containing 22 million words of written and spoken data produced since 1990. Currently, the ANC includes a range of genres, including emerging genres such as email, tweets, and web data that are not included in earlier corpora such as the British National Corpus .

  7. TenTen Corpus Family - Wikipedia

    en.wikipedia.org/wiki/TenTen_Corpus_Family

    First text corpora were created in the 1960s, such as the 1-million-word Brown Corpus of American English. Over time, many further corpora were produced (such as the British National Corpus and the LOB Corpus) and work had begun also on corpora of larger sizes and covering other languages than English. This development was linked with the ...

  8. Manually Annotated Sub-Corpus - Wikipedia

    en.wikipedia.org/wiki/Manually_Annotated_Sub-Corpus

    Manually Annotated Sub-Corpus (MASC) is a balanced subset of 500K words of written texts and transcribed speech drawn primarily from the Open American National Corpus (OANC). The OANC is a 15 million word (and growing) corpus of American English produced since 1990, all of which is in the public domain or otherwise free of usage and ...

  9. Word list - Wikipedia

    en.wikipedia.org/wiki/Word_list

    In particular, words relating to technology, such as "blog," which, in 2014, was #7665 in frequency [7] in the Corpus of Contemporary American English, [8] was first attested to in 1999, [9] [10] [11] and does not appear in any of these three lists. The Teachers Word Book of 30,000 words (Thorndike and Lorge, 1944)