Search results
Results from the WOW.Com Content Network
Dataset HF card, and project's GitHub repository. [393] Diggelmann et al. Climate News dataset A dataset for NLP and climate change media researchers The dataset is made up of a number of data artifacts (JSON, JSONL & CSV text files & SQLite database) Climate news DB, Project's GitHub repository [394] ADGEfficiency Climatext
As an offline reader, Kiwix works with a library of contents that are zim files: you can pick & choose whichever Wikimedia project (Wikipedia in any language, Wiktionary, Wikisource, etc.), as well as TED Talks, PhET Interactive Maths & Physics simulations, Project Gutenberg, etc. It is free and open source, and currently available for download on:
GitHub (/ ˈ ɡ ɪ t h ʌ b /) is a proprietary developer platform that allows developers to create, store, manage, and share their code. It uses Git to provide distributed version control and GitHub itself provides access control, bug tracking, software feature requests, task management, continuous integration, and wikis for every project. [8]
On April 17, 2023, TogetherAI launched a project named RedPajama to reproduce and distribute an open source version of the LLaMA dataset. [47] The dataset has approximately 1.2 trillion tokens and is publicly available for download. [48] Llama 2 foundational models were trained on a data set with 2 trillion tokens. This data set was curated to ...
Pandas (styled as pandas) is a software library written for the Python programming language for data manipulation and analysis.In particular, it offers data structures and operations for manipulating numerical tables and time series.
The Pile is an 886.03 GB diverse, open-source dataset of English text created as a training dataset for large language models (LLMs). It was constructed by EleutherAI in 2020 and publicly released on December 31 of that year. [1] [2] It is composed of 22 smaller datasets, including 14 new ones. [1]
Previously, NIST released two datasets: Special Database 1 (NIST Test Data I, or SD-1); and Special Database 3 (or SD-2). They were released on two CD-ROMs. They were released on two CD-ROMs. SD-1 was the test set, and it contained digits written by high school students, 58,646 images written by 500 different writers.
Free and open-source software portal; InfluxDB is a time series database (TSDB) developed by the company InfluxData. It is used for storage and retrieval of time series data in fields such as operations monitoring, application metrics, Internet of Things sensor data, and real-time analytics.