SDSC, Calit2 and CRBS Win $1.4 Million Grant to Develop New Bioinformatics Tools
Project to Harness Next-Generation DNA Sequencing and Analysis
San Diego, October 18, 2011 -- Researchers at the San Diego Supercomputer Center (SDSC), California Institute for Telecommunications and Information Technology (Calit2) and Center for Research in Biological Systems (CRBS) at the University of California, San Diego, have been awarded a three-year, $1.4 million grant from the National Science Foundation (NSF) to create a Kepler Scientific Workflow System module. Researchers will develop new tools to help manage ever-growing data sets used in next-generation DNA sequencing.
|
The project receiving the NSF award is called Advances in Biological Informatics Development: bioKepler: A Comprehensive Bioinformatics Scientific Workflow Module for Distributed Analysis of Large-Scale Biological Data. Bioinformatics refers to a field of science that combines biology, information technology, computers and statistical techniques to create research-driven solutions such as customized medications and treatments to help prevent disease, three-dimensional models of genomes and proteins, and advanced agricultural technologies.
“The enormous growth in data-intensive research means that as these data sets get larger, moving data over the network becomes more complicated, error-prone and costly to maintain,” said Altintas, who also serves as SDSC’s deputy coordinator for research.
The bioKepler project is motivated by the following three challenges that remain unsolved:
- How can large-scale sequencing data be analyzed systematically in a way that incorporates and enables reuse of best practices by the scientific community?
- How can such analysis be easily configured or programmed by end-users with various skill levels to formulate actual bioinformatics workflows?
- How can such workflows be executed in computing resources available to scientists in an efficient and intuitive manner?
|
Training Next-Generation Scientists
“These tools will be applicable to a wide range of bioinformatics and computational biology problems,” said Altintas, noting that “a key part of this project will also focus on education and outreach efforts, underscoring the importance of training next-generation scientists, as well as the need to narrow the gap between bioinformatics and technology.”
All the resources, materials, and open-source software products produced by the bioKepler project will be integrated with Calit2’s Community Cyberinfrastructure for Advanced Microbial Ecology Research and Analysis (CAMERA), a data repository and bioinformatics resource for metagenomic analysis.
“The Kepler workflow system has already been used comprehensively in the CAMERA project,” said project co-investigator Weizhong Li, a research scientist at Calit2 and the Center for Research in Biological Systems (CRBS), and Bioinformatics group leader for CAMERA.. “With the proposed developments in bioKepler, the CAMERA project and its big user communities will benefit from a larger set of next-generation sequence analysis tools with much better scalability and flexibility. Other projects that heavily rely on next-generation sequencing, such as various microbiome projects, can also take advantage of the bioKepler software.”
Moreover, bioKepler will be packaged to be installed on diverse, distributed execution environments (e.g., as a Web service and as virtual machines tuned for various Grid and Cloud systems), which in turn will enable deployment of bioKepler on public and private clusters and clouds.
In addition to Altintas and Li, the bioKepler research team includes Eric E. Allen, assistant professor of marine biology at the Scripps Oceanography Institute (SIO); Jianwu Wang, project scientist with CI-RED; Daniel Crawl, workflow specialist with CI-RED; and Shulei Sun and Sitao Wu, bioinformaticians at CRBS.
The bioKepler project is funded by NSF DBI-1062565 under the CI Reuse and Advances in Bioinformatics programs.
About the Kepler Project
The Kepler Project is dedicated to furthering and supporting the capabilities, use, and awareness of the free and open source, scientific workflow environment called Kepler. Kepler is designed to help scientists, analysts, and computer programmers create and share models and analyses across a broad range of scientific and engineering disciplines. Using Kepler's graphical user interface, researchers can create a "scientific workflow" - an executable representation of the steps required to generate results. The Kepler software was developed and is maintained by a team consisting of several key institutions that originated the project: UC Davis, UC Santa Barbara, and UC San Diego.
Related Links
Kepler Scientific Workflow System
San Diego Supercomputer Center
Media Contacts
Jan Zverina, SDSC Communications, 858 534-5111 or jzverina@sdsc.edu;
Warren R. Froelich, SDSC Communications, 858 822-3622 or froelich@sdsc.edu;
Doug Ramsey, Calit2 Communications, (858) 822-5825 or dramsey@ucsd.edu