COVID-19 dataset clearinghouse: Difference between revisions

From Polymath Wiki
Jump to navigationJump to search
Line 20: Line 20:
* [https://docs.google.com/spreadsheets/d/1jS24DjSPVWa4iuxuD4OAXrE3QeI8c9BC1hSlqr-NMiU/edit#gid=1187587451 Google sheets from DXY.cn]  
* [https://docs.google.com/spreadsheets/d/1jS24DjSPVWa4iuxuD4OAXrE3QeI8c9BC1hSlqr-NMiU/edit#gid=1187587451 Google sheets from DXY.cn]  
** Contains some patient information [age,gender,etc]
** Contains some patient information [age,gender,etc]
* [https://www.kaggle.com/imdevskp/corona-virus-report COVID-19 Complete Dataset (Updated every 24hrs)], Kaggle


==== North America ====
==== North America ====
Line 25: Line 26:
* [https://github.com/COVID19Tracking/covid-tracking-data COVID Tracking Data], from the [https://covidtracking.com/ COVID tracking project]
* [https://github.com/COVID19Tracking/covid-tracking-data COVID Tracking Data], from the [https://covidtracking.com/ COVID tracking project]
** A daily updated repository with CSV representations of data from the [https://github.com/COVID19Tracking/covid-tracking-api/blob/master/README.md Covid Tracking API].   
** A daily updated repository with CSV representations of data from the [https://github.com/COVID19Tracking/covid-tracking-api/blob/master/README.md Covid Tracking API].   
* [https://www.kaggle.com/sudalairajkumar/covid19-in-usa COVID-19 in USA], Kaggle
* [https://coronavirus.1point3acres.com/en COVID-19 in US and Canada]
* [https://coronavirus.1point3acres.com/en COVID-19 in US and Canada]
** [https://coronavirus.1point3acres.com/en/data Data request form]
** [https://coronavirus.1point3acres.com/en/data Data request form]
Line 39: Line 41:
* [https://github.com/pcm-dpc/COVID-19 COVID-19 Italia - Monitoraggio situazione]
* [https://github.com/pcm-dpc/COVID-19 COVID-19 Italia - Monitoraggio situazione]
* [https://www.epicentro.iss.it/coronavirus/sars-cov-2-sorveglianza-dati Sorveglianza integrata COVID-19: i principali dati nazionali] (Italy), Epicentro
* [https://www.epicentro.iss.it/coronavirus/sars-cov-2-sorveglianza-dati Sorveglianza integrata COVID-19: i principali dati nazionali] (Italy), Epicentro
* [https://www.kaggle.com/sudalairajkumar/covid19-in-india COVID-19 in India], Kaggle
* [https://www.kaggle.com/sudalairajkumar/covid19-in-italy COVID-19 in Italy], Kaggle
* [https://npgeo-corona-npgeo-de.hub.arcgis.com/datasets/dd4580c810204019a7b8eb3e0b329dd6_0/data RKI COVID19] (Germany), NPGEO Corona  
* [https://npgeo-corona-npgeo-de.hub.arcgis.com/datasets/dd4580c810204019a7b8eb3e0b329dd6_0/data RKI COVID19] (Germany), NPGEO Corona  
* [https://www.bag.admin.ch/bag/en/home/krankheiten/ausbrueche-epidemien-pandemien/aktuelle-ausbrueche-epidemien/novel-cov/situation-schweiz-und-international.html New coronavirus: Current situation – Switzerland and international], Bundesamt für Gesundheit.
* [https://www.bag.admin.ch/bag/en/home/krankheiten/ausbrueche-epidemien-pandemien/aktuelle-ausbrueche-epidemien/novel-cov/situation-schweiz-und-international.html New coronavirus: Current situation – Switzerland and international], Bundesamt für Gesundheit.
Line 49: Line 51:
* [https://www.covid19india.org/ India COVID-19 tracker]
* [https://www.covid19india.org/ India COVID-19 tracker]
** [https://docs.google.com/spreadsheets/d/e/2PACX-1vSc_2y5N0I67wDU38DjDh35IZSIS30rQf7_NYZhtYYGU1jJYT6_kDx4YpF-qw0LSlGsBYP8pqM_a1Pd/pubhtml Patient database]
** [https://docs.google.com/spreadsheets/d/e/2PACX-1vSc_2y5N0I67wDU38DjDh35IZSIS30rQf7_NYZhtYYGU1jJYT6_kDx4YpF-qw0LSlGsBYP8pqM_a1Pd/pubhtml Patient database]
* [https://www.kaggle.com/sudalairajkumar/covid19-in-india Dataset on Novel Corona Virus Disease 2019 in India], Kaggle
* [https://www.kaggle.com/imdevskp/covid19-corona-virus-india-dataset COVID-19 Corona Virus India Dataset], Kaggle
** State/UT/NCR wise COVID-19 data
* [https://github.com/jihoo-kim/Data-Science-for-COVID-19-old Data Science for COVID-19 in South Korea]
* [https://github.com/jihoo-kim/Data-Science-for-COVID-19-old Data Science for COVID-19 in South Korea]
** [https://www.kaggle.com/kimjihoo/coronavirusdataset The data set on Kaggle]
** [https://www.kaggle.com/kimjihoo/coronavirusdataset The data set on Kaggle]
Line 55: Line 60:
==== Other regional data ====
==== Other regional data ====


* [https://www.kaggle.com/unanimad/corona-virus-brazil Coronavirus (COVID-19) - Brazil Dataset], Kaggle
* [https://www.health.nsw.gov.au/Infectious/diseases/Pages/covid-19-latest.aspx Latest updates on COVID-19], New South Wales
* [https://www.health.nsw.gov.au/Infectious/diseases/Pages/covid-19-latest.aspx Latest updates on COVID-19], New South Wales


=== Genomics and homology ===
=== Genomics and homology ===
Line 65: Line 72:
* [https://www.kaggle.com/paultimothymooney/coronavirus-genome-sequence Coronavirus Genome Sequence], Kaggle
* [https://www.kaggle.com/paultimothymooney/coronavirus-genome-sequence Coronavirus Genome Sequence], Kaggle
* [https://www.kaggle.com/paultimothymooney/repository-of-coronavirus-genomes Repository of Coronavirus Genomes], Kaggle
* [https://www.kaggle.com/paultimothymooney/repository-of-coronavirus-genomes Repository of Coronavirus Genomes], Kaggle
* [https://www.kaggle.com/jamzing/sars-coronavirus-accession SARS coronavirus accession], Kaggle
** Exploration of mutations of the SARS corona virus with complete genome
* [https://3dprint.nih.gov/discover/3DPX-012867 Wuhan coronavirus 2019-nCoV protease homology model], National Institutes of Health
* [https://3dprint.nih.gov/discover/3DPX-012867 Wuhan coronavirus 2019-nCoV protease homology model], National Institutes of Health



Revision as of 16:19, 28 March 2020

This is a repository for public data sets relating to the COVID-19 pandemic. It was also initially envisioned as a clearinghouse for matching requests for data cleaning of such datasets with volunteers willing to perform this clearing, but the existing clearinghouse at United against COVID-19 is already up and running for this purpose, so we are redirecting such requests to that site in order not to fragment the pools of requests and volunteers.

For discussion of this project, see this blog post.

Data sets

Further contributions are very welcome, and can be made either directly to this wiki page (after requesting an account), or placed in the comments to this blog post, or by email to tao@math.ucla.edu.

Epidemiology

North America

Europe

Asia

Other regional data


Genomics and homology

Literature

Medical imagery

Other data

Data scrapers and aggregators

Visualizations and summaries

Other lists

Data or Data cleaning requests

As mentioned at the top of this page, future requests for data or data cleaning should be directed to this data discourse page at United Against COVID-19. Below are the legacy requests of this project prior to this redirect.

From Chris Strohmeier (UCLA), Mar 25

The biorxiv_medrxiv file at https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge contains another folder titled biorxiv_medrxiv, which in turn contains hundreds of json files. Each file corresponds to a research article, at least tangentially related to COVID-19.

We are requesting:

  • A tf-idf matrix associated to the subset of the above collection which contain full-text articles (some appear to only have abstracts).
  • The rows should correspond to the (e.g. 5000) most commonly used words.
  • The columns should correspond to each individual json file.
  • The clean data should be stored as a npy or mat file (or both).
  • Finally, there should be a csv or text document (or both) explaining the meaning of the individual rows and columns of the matrix (what words do the rows correspond to? What file does each column correspond to).

Contact: c.strohmeier@math.ucla.edu

From Juan José Piñero de Armas (U. Católica de Murcia), Mar 27

We request information (on a person basis) to perform survival analyses, regressions with random effects, etc. Some data exists for instance at

https://www.kaggle.com/sudalairajkumar/novel-corona-virus-2019-dataset/data https://www.kaggle.com/kimjihoo/coronavirusdataset https://www.kaggle.com/imdevskp/covid-19-analysis-visualization-comparisons/data https://www.sirm.org/category/senza-categoria/covid-19/

but we need much more detail (date when each person was diagnosed, date of infection for the same person, discharge date, date of death, gender, age, treatments, temperatures...) not just summaries or country-aggregated data.

Contact: jjpinero@ucam.edu