Search results
Results from the WOW.Com Content Network
The International Computer Archive of Modern and Medieval English (ICAME) is an international group of linguists and data scientists working in corpus linguistics to digitise English texts. [1] The organisation was founded in Oslo , Norway in 1977 as the International Computer Archive of Modern English, before being renamed to its current title.
The Lancaster-Oslo/Bergen (LOB) Corpus is a one-million-word collection of British English texts which was compiled in the 1970s in collaboration between the University of Lancaster, the University of Oslo, and the Norwegian Computing Centre for the Humanities, Bergen, to provide a British counterpart to the Brown Corpus compiled by Henry Kučera and W. Nelson Francis for American English in ...
Timestamped JSI web corpora – web corpora of news articles crawled from a list of RSS feeds. Newsfeed corpora are being prepared in the framework of the project implemented by the Jožef Stefan Institute at Slovenian scientific research institute. [43] and published in Sketch Engine. More information about the project is on the project websites.
To ensure compatibility between the individual corpora in ICE, each team is following a common corpus design, as well as a common scheme for grammatical annotation. [11] Many corpora are currently available for download on the ICE official webpage, though some require a license. Others, however, are not ready for publication. [12]
The American National Corpus (ANC) is a text corpus of American English containing 22 million words of written and spoken data produced since 1990. Currently, the ANC includes a range of genres, including emerging genres such as email, tweets, and web data that are not included in earlier corpora such as the British National Corpus.
The Brown University Standard Corpus of Present-Day American English, better known as simply the Brown Corpus, is an electronic collection of text samples of American English, the first major structured corpus of varied genres. This corpus first set the bar for the scientific study of the frequency and distribution of word categories in ...
The Cambridge Business English Corpus also includes the Cambridge and Nottingham Spoken Business English Corpus (CANBEC), the result of a joint project between Cambridge University Press and the University of Nottingham. This is a collection of recordings of English from companies of all sizes, ranging from big multinational companies to small ...
Download QR code; Print/export Download as PDF; Printable version; In other projects ... Pages in category "English corpora" The following 18 pages are in this ...