Search results
Results from the WOW.Com Content Network
The corpus of Global Web-based English (GloWbE; pronounced "globe") contains about 1.9 billion words of text from twenty different countries. This makes it about 100 times as large as other corpora like the International Corpus of English, and it allows for many types of searches that would not be possible otherwise.
The Lancaster-Oslo/Bergen (LOB) Corpus is a one-million-word collection of British English texts which was compiled in the 1970s in collaboration between the University of Lancaster, the University of Oslo, and the Norwegian Computing Centre for the Humanities, Bergen, to provide a British counterpart to the Brown Corpus compiled by Henry Kučera and W. Nelson Francis for American English in ...
The Cambridge Learner Corpus (CLC) is a collection of exam scripts written by students learning English, built in collaboration with Cambridge English Language Assessment. The CLC contains scripts from over 180,000 students, from around 200 countries, speaking 138 different first languages and is growing all the time. [ 3 ]
The TenTen Corpus Family – comparable web corpora of target size 10 billion words. These corpora are available in the corpus management system Sketch Engine, currently, there exist TenTen corpora for more than 30 languages (such as English TenTen corpus, [38] Arabic TenTen corpus, [39] Spanish TenTen corpus, [40] Russian Tenten corpus, [41 ...
The American National Corpus (ANC) is a text corpus of American English containing 22 million words of written and spoken data produced since 1990. Currently, the ANC includes a range of genres, including emerging genres such as email, tweets, and web data that are not included in earlier corpora such as the British National Corpus.
The TenTen Corpus Family (also called TenTen corpora) is a set of comparable web text corpora, i.e. collections of texts that have been crawled from the World Wide Web and processed to match the same standards. These corpora are made available through the Sketch Engine corpus manager. There are TenTen corpora for more than 35 languages.
Download QR code; Print/export Download as PDF; ... Appearance. move to sidebar hide. Help. Pages in category "English corpora" The following 18 pages are in this ...
To ensure compatibility between the individual corpora in ICE, each team is following a common corpus design, as well as a common scheme for grammatical annotation. [11] Many corpora are currently available for download on the ICE official webpage, though some require a license. Others, however, are not ready for publication. [12]