COVID-19 dataset clearinghouse

This is a repository for public data sets relating to the COVID-19 pandemic. It was also initially envisioned as a clearinghouse for matching requests for data cleaning of such datasets with volunteers willing to perform this clearing, but the existing clearinghouse at United against COVID-19 is already up and running for this purpose, so we are redirecting such requests to that site in order not to fragment the pools of requests and volunteers.

For discussion of this project, see this blog post.

Data sets

Further contributions are very welcome, and can be made either directly to this wiki page (after requesting an account), or placed in the comments to this blog post, or by email to tao@math.ucla.edu.

Epidemiology

Novel Corona Virus 2019 Dataset - Day level information on covid-19 affected cases, Kaggle
Coronavirus Disease (COVID-19) – Statistics and Research, Our World in Data, by Max Roser, Hannah Ritchie and Esteban Ortiz-Ospina
- Cases + mortality rate daily
- How many tests for COVID-19 are being performed around the world?
Novel Coronavirus (COVID-19) Cases, Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE)
- Novel Coronavirus 2019 time series data on cases, sourced and cleaned from the above data set
- JSON Wrapper API for the JHU CSSE data
- A visualization of the JHU CSSE data
2019-nCoV Data Processing Pipelines and datasets
- Countries and state names are normalized with ISO 3166-1 code.
Location for summaries and analysis of data related to n-CoV 2019, first reported in Wuhan, China, Outbreak and Pandemic Preparedness team at the Institute for Health Metrics and Evaluation, University of Washington
- A visualization of one of the data sets
Daily data on the geographic distribution of COVID-19 cases worldwide, European Centre for Disease Prevention and Control
Google sheets from DXY.cn
- Contains some patient information [age,gender,etc]
COVID-19 Complete Dataset (Updated every 24hrs), Kaggle
Epidemic Data for Novel Coronavirus COVID-19, Wolfram
Coronavirus COVID-19 Cases, ESRI
COVID-19 Coronavirus data, European Union Open Data Portal
Novel Coronavirus (2019 nCoV) situation reports, World Health Organization
nCoV line listings from various sources and data processing, Oxford University
US Health Weather Map, Kinsa Insights
- Cumulative amount of atypical illnesses observed since March 1

North America

COVID Tracking Data, from the COVID tracking project
- A daily updated repository with CSV representations of data from the Covid Tracking API.
COVID-19 in USA, Kaggle
COVID-19 Cases US, ESRI
COVID-19 in US and Canada
- Data request form
COVID tracking project
- Includes positive and negative results, pending tests, and total people tested for each state in the US
- raw data
- API
Covid-19 coronovirus cases in New York State
COVID-19 Tracker, New York State Department of Health
Communicable Disease Surveillance Data, NYC Health Department Data Resources
Coronavirus Case Data for Every U.S. County, New York Times
- Github repository
- Interactive visualization
COVID-19 Coronavirus US Case Density over Time by State, using JHU CSSE data
Coronavirus API Public Health Initiative
- Record of official data from US government websites for the 50 states and DC
COVID-19 Daily Status Report, Georgia Department of Public Health
COVID statistics (Canada), Statistics Canada
FLATTEN (Canada)
- Online screening tool to provide information on COVID-19
The 2019 Novel Coronavirus (2019 nCoV), Status of cases in Ontario, Ontario Ministry of Health
Projections of COVID-19, in standardized format, The Reich Lab at UMass-Amherst

Europe

Influenzanet
Studying SARS-CoV-2 in European patients, Lean European Open Survey on SARS-CoV-2 Infected Patients (LEOSS)
COVID-19 Italia - Monitoraggio situazione
Sorveglianza integrata COVID-19: i principali dati nazionali (Italy), Epicentro
COVID-19 in Italy, Kaggle
RKI COVID19 (Germany), NPGEO Corona
Coronavirus API Deutschland (Germany), Digitaler Harz
COVID-19 Tracking Germany, Heads or Tails
New coronavirus: Current situation – Switzerland and international, Bundesamt für Gesundheit.
- data set
Situacion Actual (Spain), Ministerio de Sanidad, Consumo y Bienestar
Koronamonitor (Hungary), atlatszo.hu, atlo.team
Fr-SARS-CoV-2 (France), European Data Portal
Consolidation des données de sources officielles concernant l'épidémie de COVID19 (France)
Coronavirus (Netherlands), Rijksinstituut voor Volksgezondheid en Milieu (RIVM)
Covid 19 - заражени (Serbia), European Data Portal
Covid 19 - самоизолација (Serbia), European Data Portal
CovidCountyStatisticsHPSCIreland (Ireland), Health Surveillance Protection Centre

Asia

India COVID-19 tracker
- Patient database
Dataset on Novel Corona Virus Disease 2019 in India, Kaggle
COVID-19 Corona Virus India Dataset, Kaggle
- State/UT/NCR wise COVID-19 data
Data Science for COVID-19 in South Korea
- The data set on Kaggle
Press releases, Korea Centers for Disease Control and Prevention
COVID 19 South Korea, Sang Woo Park
COVID-19 Korea Dataset & Comprehensive Medical Dataset & visualizer, DS4C (Data Science for COVID-19) Project
#OpenData4Covid19, Ministry of Health and Welfare of Korea and Health Insurance Review and Assessment Service of Korea
- Medical history of COVID19 patients based on their insurance claims of the last five years.
Confirmed patient movement route, Korean Centers for Disease Control
COVID-19 Philippines, Negros Island
Hubei early deaths 2020 07 02, Imperial College
Distribution of new coronavirus pneumonia, China CDC
Latest local situation of Severe Respiratory Disease associated with a Novel Infectious Agent (Hong Kong), Hong Kong Center for Health Protection
Novel Coronavirus 2019 Pneumonia Situation (Thailand), Emergency Operation Center, Department of Disease Control

Other regional data

Coronavirus (COVID-19) - Brazil Dataset, Kaggle
COVID-19 (Brazil)
- Boletins informativos e casos do coronavírus por município por dia
Coronavirus In Sub-Saharan Africa, Geopoll
Latest updates on COVID-19, New South Wales

Genomics and homology

GISAID data (Global Initiative on Sharing All Influenza Data)
- Registration is required.
- Nextstrain build for novel coronavirus (nCoV), based on GISAID data
  - A Genomic epidemiology of novel coronavirus
Coronavirus Genome Sequence, Kaggle
Repository of Coronavirus Genomes, Kaggle
SARS coronavirus accession, Kaggle
- Exploration of mutations of the SARS corona virus with complete genome
Genetic Sequences for the SARS-CoV-2 Coronavirus, Wolfram
- Nucleotide sequences of the SARS-CoV-2 virus (the virus associated with the COVID-19 disease, formerly known as 2019-nCoV) including location, collection time and similar supporting data.
Wuhan coronavirus 2019-nCoV protease homology model, National Institutes of Health

Literature

LitCovid - a curated literature hub for tracking up-to-date scientific information about the 2019 novel Coronavirus
COVID-19 SARS-CoV-2 preprints from medRxiv and bioRxiv
BioMed Sanity, karpathy
- Indexing bioRxiv papers on COVID-19
COVID-19 Open Research Dataset (CORD-19), Allen Institute for AI, Microsoft, NLM, CZI, Georgetown University
- Over 44,000 scholarly articles, including over 29,000 with full text, about COVID-19 and the coronavirus family of viruses for use by the global research community
- requested by the White House Office of Science and Technology Policy, and part of the Call to Action to the Tech Community on New Machine Readable COVID-19 Dataset
COVID-19: a living systematic map of the evidence, EPPI

Experts

COVID-19 Expert Database, Australian Academy of Science
- A mechanism for governments, the business sector, the research sector, and other decision-makers to easily access the expertise they need to inform their decision making.

Data scrapers and aggregators

Visualizations, projections, summaries

COVID-19 Coronavirus Pandemic, Worldometer
Tracking coronavirus: Map, data and timeline, BNO News
Coronavirus COVID-19 Global Cases, JHU CSSE
Infection2020
covy.app
CoronaTracker
COVID-19 Global Pandemic Real-Time report, dxy.cn (English version)
Coronavirus tracked: the latest figures as the pandemic spreads, Financial Times
COVID-19 - official Indian government site
COVID-19 - Analysis, Visualization & Comparisons, Kaggle
COVID Act Now - predictions of COVID cases in the US by state
- The model used
COVID-19 Projections, IHME
An interactive visualization of the exponential spread of COVID-19
CoVID 19 Worldwide Growth Rates, Mike Handley, UCL
Corona.help, Alex Dumitru
Covid trends, Aatish Bhatia, Minute Physics
COVID-19 Time Exploration

Other lists, hubs, and groups

COVID-19 data sets, Kaggle
Reddit thread collecting coronavirus datasets
Review of COVID-19 APIs, Wendell Santos
NPGEO Corona Hub 2020, Nationale Plattform für geografische Daten (NPGEO)
Data sets for COVID, Wolfram Data Repository
COVID-19 Data Hub, Tableau
COVID-19 GIS Hub, ESRI
Coronavirus (COVID-19) Data, ESRI
- This is a collection of data that is available from the Esri Living Atlas as well as data from authoritative sources.
COVID-19 information
Coronavirus Tech Handbook
European Data Portal for COVID-19
Call for Action: COVID-19 Data Collaboratives
Possible Covid datasets
COVID-19 Data Providers, Amass Insights
COVID-19 Pandemic Symptom Trackers, Alan Turing Institute
Wuhan2020
- a real-time and synchronous data service for hospitals, factories, procurement and other information
COVID-19 Data Exchange, Dawex
EndCoronavirus
Online Portal for COVID-19 Modeling Research, MIDAS
COVID-19 Public Datasets, Google Cloud

Data or Data cleaning requests

As mentioned at the top of this page, future requests for data or data cleaning should be directed to this data discourse page at United Against COVID-19. Below are the legacy requests of this project prior to this redirect.

From Chris Strohmeier (UCLA), Mar 25

The biorxiv_medrxiv file at https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge contains another folder titled biorxiv_medrxiv, which in turn contains hundreds of json files. Each file corresponds to a research article, at least tangentially related to COVID-19.

We are requesting:

A tf-idf matrix associated to the subset of the above collection which contain full-text articles (some appear to only have abstracts).
The rows should correspond to the (e.g. 5000) most commonly used words.
The columns should correspond to each individual json file.
The clean data should be stored as a npy or mat file (or both).
Finally, there should be a csv or text document (or both) explaining the meaning of the individual rows and columns of the matrix (what words do the rows correspond to? What file does each column correspond to).

Contact: c.strohmeier@math.ucla.edu

From Juan José Piñero de Armas (U. Católica de Murcia), Mar 27

We request information (on a person basis) to perform survival analyses, regressions with random effects, etc. Some data exists for instance at

https://www.kaggle.com/sudalairajkumar/novel-corona-virus-2019-dataset/data https://www.kaggle.com/kimjihoo/coronavirusdataset https://www.kaggle.com/imdevskp/covid-19-analysis-visualization-comparisons/data https://www.sirm.org/category/senza-categoria/covid-19/

but we need much more detail (date when each person was diagnosed, date of infection for the same person, discharge date, date of death, gender, age, treatments, temperatures...) not just summaries or country-aggregated data.

Contact: jjpinero@ucam.edu

Miscellaneous

Rapid assistance in modelling the pandemic: RAMP
- A call for assistance, addressed to the scientific modelling community Coordinated by the Royal Society
Letter on the Coronavirus Disease 2019 (COVID-19), National Science Foundation
- A solicitation for RAPID funding requests relating to COVID-19
CoronaCheck: Computational Fact Checking for Statistical Coronavirus Claims, Paolo Papotti (EURECOM), Immanuel Trummer (Cornell)
COVID-19 Wikiproject
Help with COVID
- New or established projects helping with the COVID-19 crisis that need help
COVID-19 Solutions, Airtable

COVID-19 dataset clearinghouse

Contents

Data sets

Epidemiology

North America

Europe

Asia

Other regional data

Genomics and homology

Literature

Experts

Medical imagery and records

Healthcare, vaccine development and equipment

Social and traffic data

Economic and Policy

Data scrapers and aggregators

Visualizations, projections, summaries

Other lists, hubs, and groups

Data or Data cleaning requests

From Chris Strohmeier (UCLA), Mar 25

From Juan José Piñero de Armas (U. Católica de Murcia), Mar 27

Miscellaneous

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools