By Tiffany Fox
San Diego, Calif., July 8, 2016 -- Researchers at UC San Diego have created a new online tool that will help biomedical scientists find and access datasets from multiple sources that can be reused to accelerate scientific discoveries.
Biomedical investigators generate large amounts of data every day, from experimental data generated by clinical trials and molecular studies to video and images derived from radiology or dermatology studies, as well as other clinical data, such as physiological measurements and diagnoses. Further complicating matters is that those seeking to use this data must know how to access it, whether it’s by simply downloading it or working with their institutions to sign agreements with the data owners.
The newly created tool, called DataMed, is a Data Discovery Index (DDI) that will make it easier for researchers or citizen scientists to find and access this data. The hope is that it will lead to faster discoveries and more ways to closely monitor the efficacy of interventions to prevent or cure disease. It was developed through the Biomedical and Healthcare Data Discovery Indexing Ecosystem (bioCADDIE) project, which is based at the UC San Diego Department of Medical Bioinformatics. The name bioCADDIE is emblematic of a golf caddie and was created to help data producers, disseminators, and consumers “play their game” in the new era of team science.
“Finding these data so that they could be reused to accelerate discoveries is currently not easy,” said Professor Lucila Ohno-Machado, an affiliate of the Qualcomm Institute, principal investigator of the bioCADDIE project and chair of the Department of Medical Bioinformatics at UC San Diego. “Building a data discovery index is also not easy because there are no general rules about how to describe the data, such as what ‘metadata’ should be used to characterize what a data set is about, who produced the data, how and when, etc.
“The bioCADDIE project,” she continued, “engaged the scientific community and developed metadata specifications and a prototype search engine to find datasets of interest. Investigators benefit from using this tool to find biomedical data that can be accessed under different types of controls.”
The National Institutes of Health (NIH) provided funding for DataMed as part of its NIH BD2K Commons, an emerging interconnected digital ecosystem of resources around data and other research digital objects. NIH is now asking researchers to provide feedback and evaluations of DataMed to help shape the prototype so that it benefits all intended users and contributes to further advances in biomedical research and patient care.
The collective feedback will also help the DataMed development team establish optimal approaches to finding and accessing biomedical data, concepts that are part of the NIH-supported FAIR principles.
“In order to guarantee that data in the commons will have broad impact, all data must adhere to the FAIR principles - a set of guiding principles to make data Findable, Accessible, Interoperable, and Re-usable,” added Jeffrey S. Grethe, Associate Director of the UC San Diego Center for Research in Biological Systems and also a member of the bioCADDIE Steering Committee. Grethe noted that “bioCADDIE plays a critical role in ensuring that data is findable.”
bioCADDIE is led by the University of California San Diego and its executive team brings together investigators from University of Michigan, Texas, Oxford, and NIH. Collaborators from a large number of institutions participate in its funded pilot and supplemental projects. Since the onset bioCADDIE has also engaged with the wide community of researchers, developers and service providers via a number of Working Groups to ensure a community-driven approach to its development.
The NIH Big Data to Knowledge (BD2K; www.datascience.nih.gov) initiative seeks to address the needs of the biomedical research community as it confronts the emerging challenges of data. The overall principles are the need to Find, Access, Interoperate, and Re-use (FAIR, (http://www.nature.com/articles/sdata201618) data to catalyze greater advances in biomedical science and patient care. In order to make data findable and accessible, it must be indexed and made searchable. The biomedical and healthCAre Data Discovery Indexing Ecosystem (BioCADDIE) is an NIH BD2K initiated community based project. Its mission is to fulfill the mandate to establish a Data Discovery Index (DDI) to enhance the discoverability, citation, and access of biomedical data in accordance with the FAIR principles.