Covid-19 Open Research Dataset Of Coronavirus Research Papers
Covid-19 Open Research Dataset (CORD-19) has been released collaboratively by researchers across the world. The dataset includes over 24,000 research publications on COVID-19 and coronavirus from sources like medRxiv and bioRxiv as well as from peer-reviewed journals. The dataset is the most extensive collection of research publications related to the current coronavirus pandemic and as new publications are released, it will continue to update in real-time.
The National Library of Medicine (NLM) at the National Institutes of Health, Microsoft and the research nonprofit Allen Institute for Artificial Intelligence (AI2) collaborated under the request of the White House Office of Science and Technology Policy (OSTP) to compile the dataset. Access to existing scientific publications was provided by NLM, while relevant articles were found by Microsoft through its literature curation algorithms. AI2 converted the PDFs and webpages into a structured format that can be processed through algorithms. The dataset can be accessed through AI2’s Semantic Scholar website.
AI2 has processed the new dataset of research papers using the same extraction and analysis techniques for information that it applies to all new research. It’s surfacing important pieces of information like the data, methods, citations, and authors for scientists to easily and quickly evaluate how each research paper adds to the existing research.
To map out the similarities between papers, it’s also using state-of-the-art natural-language models like BERT and ELMo.
Scientists are rushing to understand the nature of the novel coronavirus in hopes of controlling its spread. Not only does the database help them to consolidate existing research in one place, but it also makes it easier to search for information with natural-language processing algorithms.
The new dataset of coronavirus research papers is developed to accelerate scientific research so that the world can fight the COVID-19 pandemic as soon as possible.