COVID-19 dataset clearinghouse: Difference between revisions

From Polymath Wiki
Jump to navigationJump to search
 
(9 intermediate revisions by the same user not shown)
Line 16: Line 16:
** [https://github.com/datasets/covid-19 Novel Coronavirus 2019 time series data on cases], sourced and cleaned from the above data set
** [https://github.com/datasets/covid-19 Novel Coronavirus 2019 time series data on cases], sourced and cleaned from the above data set
** [https://github.com/summyfeb12/COVID-19-JHU-Data-API JSON Wrapper API for the JHU CSSE data]
** [https://github.com/summyfeb12/COVID-19-JHU-Data-API JSON Wrapper API for the JHU CSSE data]
** [http://shiny.science.ku.dk/pbm/COVID19/ A visualization of the JHU CSSE data]
* [https://github.com/covid19-data/covid19-data 2019-nCoV Data Processing Pipelines and datasets]   
* [https://github.com/covid19-data/covid19-data 2019-nCoV Data Processing Pipelines and datasets]   
** Countries and state names are normalized with ISO 3166-1 code.
** Countries and state names are normalized with ISO 3166-1 code.
Line 45: Line 46:
** [https://covidtracking.com/api/ API]
** [https://covidtracking.com/api/ API]
* [https://github.com/kgjenkins/covid-19-ny Covid-19 coronovirus cases in New York State]
* [https://github.com/kgjenkins/covid-19-ny Covid-19 coronovirus cases in New York State]
* [https://covid19tracker.health.ny.gov/ COVID-19 Tracker], New York State Department of Health
* [https://a816-hrt.nyc.gov/DataCatalog/Pages/DataView.cshtml?id=5ff1078d-8e34-43de-b339-7556dd09b5b2 Communicable Disease Surveillance Data], NYC Health Department Data Resources
* [https://a816-hrt.nyc.gov/DataCatalog/Pages/DataView.cshtml?id=5ff1078d-8e34-43de-b339-7556dd09b5b2 Communicable Disease Surveillance Data], NYC Health Department Data Resources
* [https://www.nytimes.com/article/coronavirus-county-data-us.html Coronavirus Case Data for Every U.S. County], New York Times
* [https://www.nytimes.com/article/coronavirus-county-data-us.html Coronavirus Case Data for Every U.S. County], New York Times
Line 53: Line 55:
** Record of official data from US government websites for the 50 states and DC
** Record of official data from US government websites for the 50 states and DC
* [https://dph.georgia.gov/covid-19-daily-status-report COVID-19 Daily Status Report], Georgia Department of Public Health
* [https://dph.georgia.gov/covid-19-daily-status-report COVID-19 Daily Status Report], Georgia Department of Public Health
* [https://www150.statcan.gc.ca/n1/en/type/data?text=COVID COVID statistics] (Canada), Statistics Canada
* [https://fr.flatten.ca/ FLATTEN] (Canada)
** Online screening tool to provide information on COVID-19
* [https://github.com/midas-network/COVID-19/tree/master/data/cases/canada/ontario_situation_updates/ The 2019 Novel Coronavirus (2019 nCoV), Status of cases in Ontario], Ontario Ministry of Health
* [https://github.com/midas-network/COVID-19/tree/master/data/cases/canada/ontario_situation_updates/ The 2019 Novel Coronavirus (2019 nCoV), Status of cases in Ontario], Ontario Ministry of Health
* [https://github.com/reichlab/covid19-forecast-hub Projections of COVID-19, in standardized format], The Reich Lab at UMass-Amherst


==== Europe ====
==== Europe ====
Line 64: Line 70:
* [https://npgeo-corona-npgeo-de.hub.arcgis.com/search?groupIds=b28109b18022405bb965c602b13e1bbc RKI COVID19] (Germany), NPGEO Corona  
* [https://npgeo-corona-npgeo-de.hub.arcgis.com/search?groupIds=b28109b18022405bb965c602b13e1bbc RKI COVID19] (Germany), NPGEO Corona  
* [https://coronavirus.digitaler-harz.de/ Coronavirus API Deutschland] (Germany), Digitaler Harz
* [https://coronavirus.digitaler-harz.de/ Coronavirus API Deutschland] (Germany), Digitaler Harz
* [https://www.kaggle.com/headsortails/covid19-tracking-germany COVID-19 Tracking Germany], Heads or Tails
* [https://www.bag.admin.ch/bag/en/home/krankheiten/ausbrueche-epidemien-pandemien/aktuelle-ausbrueche-epidemien/novel-cov/situation-schweiz-und-international.html New coronavirus: Current situation – Switzerland and international], Bundesamt für Gesundheit.
* [https://www.bag.admin.ch/bag/en/home/krankheiten/ausbrueche-epidemien-pandemien/aktuelle-ausbrueche-epidemien/novel-cov/situation-schweiz-und-international.html New coronavirus: Current situation – Switzerland and international], Bundesamt für Gesundheit.
** [https://www.bag.admin.ch/dam/bag/de/dokumente/mt/k-und-i/aktuelle-ausbrueche-pandemien/2019-nCoV/covid-19-datengrundlage-lagebericht.xlsx.download.xlsx/200325_Datengrundlage_Grafiken_COVID-19-Bericht.xlsx data set]
** [https://www.bag.admin.ch/dam/bag/de/dokumente/mt/k-und-i/aktuelle-ausbrueche-pandemien/2019-nCoV/covid-19-datengrundlage-lagebericht.xlsx.download.xlsx/200325_Datengrundlage_Grafiken_COVID-19-Bericht.xlsx data set]
Line 86: Line 93:
* [https://www.cdc.go.kr/board/board.es?mid=a30402000000&bid=0030 Press releases], Korea Centers for Disease Control and Prevention
* [https://www.cdc.go.kr/board/board.es?mid=a30402000000&bid=0030 Press releases], Korea Centers for Disease Control and Prevention
* [https://github.com/midas-network/COVID-19/tree/master/data/cases/south%20korea/line_list_park_github/ COVID 19 South Korea], Sang Woo Park
* [https://github.com/midas-network/COVID-19/tree/master/data/cases/south%20korea/line_list_park_github/ COVID 19 South Korea], Sang Woo Park
* [https://github.com/ThisIsIsaac/Data-Science-for-COVID-19 COVID-19 Korea Dataset & Comprehensive Medical Dataset & visualizer], DS4C (Data Science for COVID-19) Project
* [https://hira-covid19.net/ #OpenData4Covid19], Ministry of Health and Welfare of Korea and Health Insurance Review and Assessment Service of Korea
** Medical history of COVID19 patients based on their insurance claims of the last five years.
* [https://github.com/midas-network/COVID-19/tree/master/data/cases/south%20korea/confirmed_cases_movement/ Confirmed patient movement route], Korean Centers for Disease Control
* [https://github.com/midas-network/COVID-19/tree/master/data/cases/south%20korea/confirmed_cases_movement/ Confirmed patient movement route], Korean Centers for Disease Control
* [https://covid19ph.com/ COVID-19 Philippines], Negros Island
* [https://covid19ph.com/ COVID-19 Philippines], Negros Island
Line 125: Line 135:
** requested by the White House Office of Science and Technology Policy, and part of the [https://www.whitehouse.gov/briefings-statements/call-action-tech-community-new-machine-readable-covid-19-dataset/ Call to Action to the Tech Community on New Machine Readable COVID-19 Dataset]
** requested by the White House Office of Science and Technology Policy, and part of the [https://www.whitehouse.gov/briefings-statements/call-action-tech-community-new-machine-readable-covid-19-dataset/ Call to Action to the Tech Community on New Machine Readable COVID-19 Dataset]
* [http://eppi.ioe.ac.uk/cms/Projects/DepartmentofHealthandSocialCare/Publishedreviews/COVID-19Livingsystematicmapoftheevidence/tabid/3765/Default.aspx COVID-19: a living systematic map of the evidence], EPPI
* [http://eppi.ioe.ac.uk/cms/Projects/DepartmentofHealthandSocialCare/Publishedreviews/COVID-19Livingsystematicmapoftheevidence/tabid/3765/Default.aspx COVID-19: a living systematic map of the evidence], EPPI
=== Experts ===
* [https://www.science.org.au/covid19/experts COVID-19 Expert Database], Australian Academy of Science
** A mechanism for governments, the business sector, the research sector, and other decision-makers to easily access the expertise they need to inform their decision making.


=== Medical imagery and records ===
=== Medical imagery and records ===
Line 175: Line 190:


=== Visualizations, projections, summaries ===
=== Visualizations, projections, summaries ===
* [https://www.worldometers.info/coronavirus/ COVID-19 Coronavirus Pandemic], Worldometer
* [https://www.worldometers.info/coronavirus/ COVID-19 Coronavirus Pandemic], Worldometer
* [https://bnonews.com/index.php/2020/03/the-latest-coronavirus-cases/ Tracking coronavirus: Map, data and timeline], BNO News
* [https://bnonews.com/index.php/2020/03/the-latest-coronavirus-cases/ Tracking coronavirus: Map, data and timeline], BNO News
Line 193: Line 207:
* [https://corona.help/ Corona.help], Alex Dumitru
* [https://corona.help/ Corona.help], Alex Dumitru
* [https://aatishb.com/covidtrends/ Covid trends], Aatish Bhatia, Minute Physics
* [https://aatishb.com/covidtrends/ Covid trends], Aatish Bhatia, Minute Physics
* [https://hidden-fjord-23808.herokuapp.com/ COVID-19 Time Exploration]


=== Other lists, hubs, and groups ===
=== Other lists, hubs, and groups ===

Latest revision as of 19:03, 26 April 2020

This is a repository for public data sets relating to the COVID-19 pandemic. It was also initially envisioned as a clearinghouse for matching requests for data cleaning of such datasets with volunteers willing to perform this clearing, but the existing clearinghouse at United against COVID-19 is already up and running for this purpose, so we are redirecting such requests to that site in order not to fragment the pools of requests and volunteers.

For discussion of this project, see this blog post.

Data sets

Further contributions are very welcome, and can be made either directly to this wiki page (after requesting an account), or placed in the comments to this blog post, or by email to tao@math.ucla.edu.

Epidemiology

North America

Europe

Asia

Other regional data

Genomics and homology

Literature

Experts

  • COVID-19 Expert Database, Australian Academy of Science
    • A mechanism for governments, the business sector, the research sector, and other decision-makers to easily access the expertise they need to inform their decision making.

Medical imagery and records

Healthcare, vaccine development and equipment

Social and traffic data

Economic and Policy

Data scrapers and aggregators

Visualizations, projections, summaries

Other lists, hubs, and groups

Data or Data cleaning requests

As mentioned at the top of this page, future requests for data or data cleaning should be directed to this data discourse page at United Against COVID-19. Below are the legacy requests of this project prior to this redirect.

From Chris Strohmeier (UCLA), Mar 25

The biorxiv_medrxiv file at https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge contains another folder titled biorxiv_medrxiv, which in turn contains hundreds of json files. Each file corresponds to a research article, at least tangentially related to COVID-19.

We are requesting:

  • A tf-idf matrix associated to the subset of the above collection which contain full-text articles (some appear to only have abstracts).
  • The rows should correspond to the (e.g. 5000) most commonly used words.
  • The columns should correspond to each individual json file.
  • The clean data should be stored as a npy or mat file (or both).
  • Finally, there should be a csv or text document (or both) explaining the meaning of the individual rows and columns of the matrix (what words do the rows correspond to? What file does each column correspond to).

Contact: c.strohmeier@math.ucla.edu

From Juan José Piñero de Armas (U. Católica de Murcia), Mar 27

We request information (on a person basis) to perform survival analyses, regressions with random effects, etc. Some data exists for instance at

https://www.kaggle.com/sudalairajkumar/novel-corona-virus-2019-dataset/data https://www.kaggle.com/kimjihoo/coronavirusdataset https://www.kaggle.com/imdevskp/covid-19-analysis-visualization-comparisons/data https://www.sirm.org/category/senza-categoria/covid-19/

but we need much more detail (date when each person was diagnosed, date of infection for the same person, discharge date, date of death, gender, age, treatments, temperatures...) not just summaries or country-aggregated data.

Contact: jjpinero@ucam.edu

Miscellaneous