Search results
Results from the WOW.Com Content Network
The British National Corpus (BNC) is a 100-million-word text corpus of samples of written and spoken English from a wide range of sources. [1] The corpus covers British English of the late 20th century from a wide variety of genres, with the intention that it be a representative sample of spoken and written British English of that time.
Bank of English. The Bank of English (BoE) is a representative subset of the 4.5 billion words COBUILD corpus, a collection of English texts. These are mainly British in origin, but content from North America, Australia, New Zealand, South Africa and other Commonwealth countries is also being included. The majority of the texts are from written ...
List of text corpora. Text corpora (singular: text corpus) are large and structured sets of texts, which have been systematically collected. Text corpora are used by corpus linguists and within other branches of linguistics for statistical analysis, hypothesis testing, finding patterns of language use, investigating language change and ...
Among her contributions to corpus linguistics, Atkins originated the idea of the British National Corpus. [ 6 ] [ 7 ] In 1997 and 1998, Atkins, together with Michael Rundell, planned and presented two week-long workshops in South Africa, for linguists and lexicographers from the eleven language communities. [ 8 ]
The TenTen Corpus Family (also called TenTen corpora) is a set of comparable web text corpora, i.e. collections of texts that have been crawled from the World Wide Web and processed to match the same standards. These corpora are made available through the Sketch Engine corpus manager. There are TenTen corpora for more than 35 languages.
Each corpus contains one million words in 500 texts of 2000 words, [7] following the sampling methodology used for the Brown Corpus.Unlike Brown or the Lancaster-Oslo-Bergen (LOB) Corpus (or indeed mega-corpora such as the British National Corpus), however, the majority of texts are derived from spoken data.
The CLAWS4 was used for the 100-million-word British National Corpus (BNC). A general-purpose grammatical tagger, it is a successor of the CLAWS1 tagger. [11] In tagging the BNC, the many rounds of work that went into CLAWS4 focused on making the CLAWS program independent from the tagsets.
Concordancing techniques are widely used in national text corpora such as American National Corpus (ANC), British National Corpus (BNC), and Corpus of Contemporary American English (COCA) available on-line. Stand-alone applications that employ concordancing techniques are known as concordancers [3] or more advanced corpus managers.