Search results
Results from the WOW.Com Content Network
A dataset for NLP and climate change media researchers The dataset is made up of a number of data artifacts (JSON, JSONL & CSV text files & SQLite database) Climate news DB, Project's GitHub repository [394] ADGEfficiency Climatext Climatext is a dataset for sentence-based climate change topic detection. HF dataset [395] University of Zurich ...
RAWPED is a dataset for detection of pedestrians in the context of railways. The dataset is labeled box-wise. 26000 Images Object recognition and classification 2020 [70] [71] Tugce Toprak, Burak Belenlioglu, Burak Aydın, Cuneyt Guzelis, M. Alper Selver OSDaR23 OSDaR23 is a multi-sensory dataset for detection of objects in the context of railways.
The database is available for download as a ZIP archive, which includes the entire database in JSON and CSV file formats. [5] From all the sources which it draws information, including funding datasets, Digital Science claims that GRID covers 92% of institutions. [11] [12]
Kaggle is a data science competition platform and online community for data scientists and machine learning practitioners under Google LLC.Kaggle enables users to find and publish datasets, explore and build models in a web-based data science environment, work with other data scientists and machine learning engineers, and enter competitions to solve data science challenges.
The Enron Corpus is a database of over 600,000 emails generated by 158 employees [1] of the Enron Corporation in the years leading up to the company's collapse in December 2001. The corpus was generated from Enron email servers by the Federal Energy Regulatory Commission (FERC) during its subsequent investigation. [ 2 ]
seed is a type of reference table used in dbt for static or infrequently changed data, like for example country codes or lookup tables), which are CSV based and typically stored in a seeds folder. References
In 2006, the Internet company AOL released a large excerpt from its web search query logs to the public. AOL did not identify users in the report, but personally identifiable information was present in many of the queries. This allowed some users to be identified by their search queries.
Apache Spark has its architectural foundation in the resilient distributed dataset (RDD), a read-only multiset of data items distributed over a cluster of machines, that is maintained in a fault-tolerant way. [2] The Dataframe API was released as an abstraction on top of the RDD, followed by the Dataset API.