COVID-19 dataset clearinghouse: Difference between revisions

From Polymath Wiki
Jump to navigationJump to search
Line 11: Line 11:
* [https://www.kaggle.com/sudalairajkumar/novel-corona-virus-2019-dataset Novel Corona Virus 2019 Dataset - Day level information on covid-19 affected cases], Kaggle
* [https://www.kaggle.com/sudalairajkumar/novel-corona-virus-2019-dataset Novel Corona Virus 2019 Dataset - Day level information on covid-19 affected cases], Kaggle
* [https://ourworldindata.org/coronavirus Coronavirus Disease (COVID-19) – Statistics and Research], Our World in Data, by Max Roser, Hannah Ritchie and Esteban Ortiz-Ospina
* [https://ourworldindata.org/coronavirus Coronavirus Disease (COVID-19) – Statistics and Research], Our World in Data, by Max Roser, Hannah Ritchie and Esteban Ortiz-Ospina
** [https://ourworldindata.org/coronavirus?fbclid=IwAR0d1z_t-W9OQVDE08wMbR3-RX-XPCaogFM5hUmFMqL2QZDgY9x-R2CRErE Cases + mortality rate daily]
* [https://github.com/CSSEGISandData/COVID-19 Novel Coronavirus (COVID-19) Cases], Johns Hopkins University Center for Systems Science and Engineering
* [https://github.com/CSSEGISandData/COVID-19 Novel Coronavirus (COVID-19) Cases], Johns Hopkins University Center for Systems Science and Engineering
** [https://github.com/datasets/covid-19 Novel Coronavirus 2019 time series data on cases], sourced and cleaned from the above data set
** [https://github.com/datasets/covid-19 Novel Coronavirus 2019 time series data on cases], sourced and cleaned from the above data set
Line 22: Line 23:
* [https://www.kaggle.com/imdevskp/corona-virus-report COVID-19 Complete Dataset (Updated every 24hrs)], Kaggle
* [https://www.kaggle.com/imdevskp/corona-virus-report COVID-19 Complete Dataset (Updated every 24hrs)], Kaggle
* [https://datarepository.wolframcloud.com/resources/Epidemic-Data-for-Novel-Coronavirus-COVID-19 Epidemic Data for Novel Coronavirus COVID-19], Wolfram  
* [https://datarepository.wolframcloud.com/resources/Epidemic-Data-for-Novel-Coronavirus-COVID-19 Epidemic Data for Novel Coronavirus COVID-19], Wolfram  
* [https://coronavirus-disasterresponse.hub.arcgis.com/datasets/bbb2e4f589ba40d692fab712ae37b9ac Coronavirus COVID-19 Cases], ESRI


==== North America ====
==== North America ====
Line 28: Line 30:
** A daily updated repository with CSV representations of data from the [https://github.com/COVID19Tracking/covid-tracking-api/blob/master/README.md Covid Tracking API].   
** A daily updated repository with CSV representations of data from the [https://github.com/COVID19Tracking/covid-tracking-api/blob/master/README.md Covid Tracking API].   
* [https://www.kaggle.com/sudalairajkumar/covid19-in-usa COVID-19 in USA], Kaggle
* [https://www.kaggle.com/sudalairajkumar/covid19-in-usa COVID-19 in USA], Kaggle
* [https://coronavirus-disasterresponse.hub.arcgis.com/datasets/628578697fb24d8ea4c32fa0c5ae1843_0?geometry=110.290%2C-19.609%2C-134.749%2C71.199 COVID-19 Cases US], ESRI
* [https://coronavirus.1point3acres.com/en COVID-19 in US and Canada]
* [https://coronavirus.1point3acres.com/en COVID-19 in US and Canada]
** [https://coronavirus.1point3acres.com/en/data Data request form]
** [https://coronavirus.1point3acres.com/en/data Data request form]
Line 91: Line 94:
* [https://www.kaggle.com/darshan1504/covid19-detection-xray-dataset COVID-19 Detection X-Ray Dataset], Kaggle
* [https://www.kaggle.com/darshan1504/covid19-detection-xray-dataset COVID-19 Detection X-Ray Dataset], Kaggle
* [https://www.sirm.org/category/senza-categoria/covid-19/ COVID-19: casistica radiologica Italiana], Società Italiana di Radiologia Medica e Interventistica
* [https://www.sirm.org/category/senza-categoria/covid-19/ COVID-19: casistica radiologica Italiana], Società Italiana di Radiologia Medica e Interventistica
=== Healthcare ===
* [https://coronavirus-disasterresponse.hub.arcgis.com/datasets/definitivehc::definitive-healthcare-usa-hospital-beds Definitive Healthcare: USA Hospital Beds], ESRI
* [https://www.arcgis.com/home/webmap/viewer.html?webmap=6afcaeb7549f4390b07224a0be01b3a6 COVID-19 Provider Practice Locations], ArcGIS.


=== Other data ===
=== Other data ===
Line 121: Line 130:
* [https://covidactnow.org/ COVID Act Now] - predictions of COVID cases in the US by state
* [https://covidactnow.org/ COVID Act Now] - predictions of COVID cases in the US by state
** [https://covidactnow.org/model The model used]
** [https://covidactnow.org/model The model used]
* [https://91-divoc.com/pages/covid-visualization/?fbclid=IwAR3vdyvNKRRvfw1t_xEXMwfEO4WMA-sOEoiSF_-w5lH8aDRJMR28vcOm2J8 An interactive visualization of the exponential spread of COVID-19]


=== Other lists ===
=== Other lists ===
Line 130: Line 140:
* [https://datarepository.wolframcloud.com/search?i=COVID Data sets for COVID], Wolfram Data Repository
* [https://datarepository.wolframcloud.com/search?i=COVID Data sets for COVID], Wolfram Data Repository
* [https://www.tableau.com/covid-19-coronavirus-data-resources COVID-19 Data Hub], Tableau
* [https://www.tableau.com/covid-19-coronavirus-data-resources COVID-19 Data Hub], Tableau
* [https://coronavirus-disasterresponse.hub.arcgis.com/ COVID-19 GIS Hub], ESRI


== Data or Data cleaning requests ==
== Data or Data cleaning requests ==

Revision as of 10:35, 29 March 2020

This is a repository for public data sets relating to the COVID-19 pandemic. It was also initially envisioned as a clearinghouse for matching requests for data cleaning of such datasets with volunteers willing to perform this clearing, but the existing clearinghouse at United against COVID-19 is already up and running for this purpose, so we are redirecting such requests to that site in order not to fragment the pools of requests and volunteers.

For discussion of this project, see this blog post.

Data sets

Further contributions are very welcome, and can be made either directly to this wiki page (after requesting an account), or placed in the comments to this blog post, or by email to tao@math.ucla.edu.

Epidemiology

North America

Europe

Asia

Other regional data

Genomics and homology

Literature

Medical imagery and records

Healthcare


Other data

Data scrapers and aggregators

Visualizations and summaries

Other lists

Data or Data cleaning requests

As mentioned at the top of this page, future requests for data or data cleaning should be directed to this data discourse page at United Against COVID-19. Below are the legacy requests of this project prior to this redirect.

From Chris Strohmeier (UCLA), Mar 25

The biorxiv_medrxiv file at https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge contains another folder titled biorxiv_medrxiv, which in turn contains hundreds of json files. Each file corresponds to a research article, at least tangentially related to COVID-19.

We are requesting:

  • A tf-idf matrix associated to the subset of the above collection which contain full-text articles (some appear to only have abstracts).
  • The rows should correspond to the (e.g. 5000) most commonly used words.
  • The columns should correspond to each individual json file.
  • The clean data should be stored as a npy or mat file (or both).
  • Finally, there should be a csv or text document (or both) explaining the meaning of the individual rows and columns of the matrix (what words do the rows correspond to? What file does each column correspond to).

Contact: c.strohmeier@math.ucla.edu

From Juan José Piñero de Armas (U. Católica de Murcia), Mar 27

We request information (on a person basis) to perform survival analyses, regressions with random effects, etc. Some data exists for instance at

https://www.kaggle.com/sudalairajkumar/novel-corona-virus-2019-dataset/data https://www.kaggle.com/kimjihoo/coronavirusdataset https://www.kaggle.com/imdevskp/covid-19-analysis-visualization-comparisons/data https://www.sirm.org/category/senza-categoria/covid-19/

but we need much more detail (date when each person was diagnosed, date of infection for the same person, discharge date, date of death, gender, age, treatments, temperatures...) not just summaries or country-aggregated data.

Contact: jjpinero@ucam.edu