Search results
Results from the WOW.Com Content Network
A dataset for NLP and climate change media researchers The dataset is made up of a number of data artifacts (JSON, JSONL & CSV text files & SQLite database) Climate news DB, Project's GitHub repository [394] ADGEfficiency Climatext Climatext is a dataset for sentence-based climate change topic detection. HF dataset [395] University of Zurich ...
Global Roads Open Access Data Set (gROADS) Well documented global dataset from NASA's Socioeconomic Data and Applications Center of roads between settlements using a consistent data model (UNSDI-T v.2) which is, to the extent possible, topologically integrated, and accurate to approximately 50m. Only roads between settlements are included, not ...
The Enron Corpus is a database of over 600,000 emails generated by 158 employees [1] of the Enron Corporation in the years leading up to the company's collapse in December 2001. The corpus was generated from Enron email servers by the Federal Energy Regulatory Commission (FERC) during its subsequent investigation. [ 2 ]
From the website, customizable search options allow for users of the data to download as csv files with relevantly sorted information. [24] An API is also available. The online dataset contains security incidents (attacks) from 1997 to present, and is updated in real time.
Extended MNIST (EMNIST) is a newer dataset developed and released by NIST to be the (final) successor to MNIST. [ 15 ] [ 16 ] MNIST included images only of handwritten digits. EMNIST includes all the images from NIST Special Database 19 (SD 19), which is a large database of 814,255 handwritten uppercase and lower case letters and digits.
The SDMX converter is an open source application that offers the ability to convert DSPL (Google's Dataset Publishing Language) messages to SDMX-ML, and vice versa. The output file of a DSPL dataset is a zip file containing data (in the form of CSV files) and metadata (as an XML file). Datasets in this format can be visualized in the Google ...
A data lake can include structured data from relational databases (rows and columns), semi-structured data (CSV, logs, XML, JSON), unstructured data (emails, documents, PDFs), and binary data (images, audio, video). [3] A data lake can be established on premises (within an organization's data centers) or in the cloud (using cloud services).
Comma-separated values (CSV) is a text file format that uses commas to separate values, and newlines to separate records. A CSV file stores tabular data (numbers and text) in plain text, where each line of the file typically represents one data record. Each record consists of the same number of fields, and these are separated by commas in the ...