enow.com Web Search

Search results

  1. Results from the WOW.Com Content Network
  2. Text corpus - Wikipedia

    en.wikipedia.org/wiki/Text_corpus

    In a comparable corpus, the texts are of the same kind and cover the same content, but they are not translations of each other. [2] To exploit a parallel text, some kind of text alignment identifying equivalent text segments (phrases or sentences) is a prerequisite for analysis.

  3. List of text corpora - Wikipedia

    en.wikipedia.org/wiki/List_of_text_corpora

    Text corpora (singular: text corpus) are large and structured sets of texts, which have been systematically collected.Text corpora are used by corpus linguists and within other branches of linguistics for statistical analysis, hypothesis testing, finding patterns of language use, investigating language change and variation, and teaching language proficiency.

  4. TenTen Corpus Family - Wikipedia

    en.wikipedia.org/wiki/TenTen_Corpus_Family

    The TenTen Corpus Family (also called TenTen corpora) is a set of comparable web text corpora, i.e. collections of texts that have been crawled from the World Wide Web and processed to match the same standards. These corpora are made available through the Sketch Engine corpus manager. There are TenTen corpora for more than 35 languages.

  5. Category:English corpora - Wikipedia

    en.wikipedia.org/wiki/Category:English_corpora

    Pages for logged out editors learn more. ... Download as PDF; Printable version; In other projects ... Help. Pages in category "English corpora" The following 18 ...

  6. Category:Corpora - Wikipedia

    en.wikipedia.org/wiki/Category:Corpora

    Download as PDF; Printable version; In other projects ... out of 3 total. A. ... Pages in category "Corpora" The following 51 pages are in this category, out of 51 ...

  7. International Corpus of English - Wikipedia

    en.wikipedia.org/wiki/International_Corpus_of...

    With only one million words per corpus, ICE corpora are considered very small for modern standards. [8] ICE corpora contain 60% (600,000 words) of orthographically transcribed spoken English. The father of the project, Sidney Greenbaum, insisted on the primacy of the spoken word, following Randolph Quirk and Jan Svartvik's collaboration on the ...

  8. Ancient text corpora - Wikipedia

    en.wikipedia.org/wiki/Ancient_text_corpora

    Ancient text corpora are the entire collection of texts from the period of ancient history, defined in this article as the period from the beginning of writing up to 300 AD. These corpora are important for the study of literature , history , linguistics , and other fields, and are a fundamental component of the world's cultural heritage .

  9. Parallel text - Wikipedia

    en.wikipedia.org/wiki/Parallel_text

    A parallel corpus contains translations of the same document in two or more languages, aligned at least at the sentence level. These tend to be rarer than less-comparable corpora. [citation needed] A noisy parallel corpus contains bilingual sentences that are not perfectly aligned or have poor quality translations. Nevertheless, most of its ...