COVID-19 dataset clearinghouse: Difference between revisions

From Polymath Wiki
Jump to navigationJump to search
No edit summary
Line 56: Line 56:
* [https://www.kaggle.com/paultimothymooney/repository-of-coronavirus-genomes Repository of Coronavirus Genomes], Kaggle
* [https://www.kaggle.com/paultimothymooney/repository-of-coronavirus-genomes Repository of Coronavirus Genomes], Kaggle
* [https://3dprint.nih.gov/discover/3DPX-012867 Wuhan coronavirus 2019-nCoV protease homology model], National Institutes of Health
* [https://3dprint.nih.gov/discover/3DPX-012867 Wuhan coronavirus 2019-nCoV protease homology model], National Institutes of Health
=== Literature ===
* [https://www.ncbi.nlm.nih.gov/research/coronavirus/ LitCovid] - a curated literature hub for tracking up-to-date scientific information about the 2019 novel Coronavirus
* [https://connect.biorxiv.org/relate/content/181 COVID-19 SARS-CoV-2 preprints from medRxiv and bioRxiv]


=== Other data ===
=== Other data ===
Line 81: Line 86:
* [https://covy.app/ covy.app]
* [https://covy.app/ covy.app]
* [https://ncov.dxy.cn/ncovh5/view/pneumonia?from=dxy&source=&link=&share= COVID-19 Global Pandemic Real-Time report], dxy.cn ([https://ncov.dxy.cn/ncovh5/view/en_pneumonia?from=dxy&source=&link=&share= English version])
* [https://ncov.dxy.cn/ncovh5/view/pneumonia?from=dxy&source=&link=&share= COVID-19 Global Pandemic Real-Time report], dxy.cn ([https://ncov.dxy.cn/ncovh5/view/en_pneumonia?from=dxy&source=&link=&share= English version])
* [https://www.ft.com/coronavirus-latest Coronavirus tracked: the latest figures as the pandemic spreads], Financial Times
* [https://www.mygov.in/covid-19/ COVID-19] - official Indian government site


=== Other lists ===
=== Other lists ===
Line 108: Line 115:
== Miscellaneous links ==
== Miscellaneous links ==


* [https://www.ncbi.nlm.nih.gov/research/coronavirus/ LitCovid] - a curated literature hub for tracking up-to-date scientific information about the 2019 novel Coronavirus
* [https://united-against-covid.org/ United Against COVID-19], which also crowdsources scientific and coding efforts to study the COVID-19 pandemic
* [https://connect.biorxiv.org/relate/content/181 COVID-19 SARS-CoV-2 preprints from medRxiv and bioRxiv]
* [https://www.mygov.in/covid-19/ COVID-19] - official Indian government site

Revision as of 07:06, 27 March 2020

Data cleaning proposal

Instructions for posting a request for a data set to be cleaned

Ideally, the submission should consist of a single plain text file which clearly delineates your request (specify what your “cleaned” data set should contain). This should specify the desired format in which the data should be saved (e.g. csv, npy, mat, json). This text file should also contain a link to a webpage where the raw data to be cleaned can easily be accessed and/or downloaded, and with specific instruction for how to locate the data set on said webpage.

We do not yet have a platform for these requests, so please post them for now at the above blog post or email tao@math.ucla.edu .

Data sets

Epidemiology

North America

Other regional data

Genomics and homology

Literature

Other data

Data scrapers and aggregators

Visualizations and summaries

Other lists

Data cleaning requests

We do not have a platform yet to handle queries or submissions to these cleaning requests, so for now please use the comment thread at this blog post for these.

From Chris Strohmeier (UCLA), Mar 25

The biorxiv_medrxiv file at https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge contains another folder titled biorxiv_medrxiv, which in turn contains hundreds of json files. Each file corresponds to a research article, at least tangentially related to COVID-19.

We are requesting:

  • A tf-idf matrix associated to the subset of the above collection which contain full-text articles (some appear to only have abstracts).
  • The rows should correspond to the (e.g. 5000) most commonly used words.
  • The columns should correspond to each individual json file.
  • The clean data should be stored as a npy or mat file (or both).
  • Finally, there should be a csv or text document (or both) explaining the meaning of the individual rows and columns of the matrix (what words do the rows correspond to? What file does each column correspond to).

Contact: c.strohmeier@math.ucla.edu

Miscellaneous links