Cross-training Students in Biology and Computer Science Receives Major Support

By Anna Lynn Spitzer

Professor Pierre Baldi

Professor Pierre Baldi

10.17.02 -- According to Francis Collins, head of the National Human Genome Research Institute, "graduate students come up to me and ask how they can get into computational biology. They can see this coming. We just have to be sure we are providing them with superb training experience."*

"And that's exactly what we're hoping to deliver," says Pierre Baldi, PI on a $4.3M grant awarded to UCI from NIH's National Library of Medicine.

Baldi, a professor of Information and Computer Science (ICS) with a joint appointment in Biological Chemistry in the College of Medicine, also is the leader of the Digitally Enabled Genomic Medicine (DeGEM) "layer" of Calit² at UCI. He and co-PI G. Wesley Hatfield, professor of Microbiology and Molecular Genetics in the College of Medicine, and of Chemical Engineering and Material Sciences in The Henry Samueli School of Engineering, serve as co-directors of the five-year program. "This support will also help us create a new DeGEM project in Calit²," says Baldi.

Hatfield notes that collaborative research projects between UCI computer scientists and molecular biologists are increasing at a rapid rate. "It is these project," he says, "that provide the training environment for the graduate students and postdoctoral fellows of the new UCI Biomedical Informatics Training Program."

In each collaboration, a student from the computational area works with a student in the biological sciences. In this close collaborative research atmosphere, much of the interdisciplinary training is accomplished by the students sharing their knowledge while working together on a common research project. In the end, the cross-trained molecular biologists and computer scientists that emerge from this program will feel equally at home in either biology or computer laboratory environments.

Says Hatfield, "These students will be the researchers of the future that will facilitate much needed interactions between the computational and biological sciences in the post-genomics era."

Why is this type of training so critical? Genome sequencing projects are providing the genetic blueprint of organisms across the kingdoms of life. Biologists, who traditionally focused on single genes and proteins, now can study genes and proteins across multiple organisms and conditions. Thus, in various forms, genomic sequences are catalyzing our abilities to understand biological systems at levels of detail never before possible.

Baldi talks about his current research, including prediction of protein structures, using machine learning technique to analyze DNA microarray data, and bioterrorism--including a database of proteins involved in smallpox (pictured).
Length: 1:28 [video]

"While this is quite an opportunity, we're also faced with some challenges," admits Baldi. New genomic technologies are generating overwhelming amounts of data that must be processed and analyzed. And these data have created a critical need for theoretical, algorithmic, software, and hardware advances in storing, retrieving, networking, processing, modeling, analyzing, and visualizing biological and medical information.

Clearly, computers are having a significant impact on the biological sciences by helping bring sense to increasing volumes of data. But biology, in turn, is inspiring new concepts in computer science, such as genetic algorithms, artificial neural networks, computer viruses, synthetic immune systems, DNA computing methods, artificial life, and hybrid DNA gene chips. This cross-fertilization has enriched both fields and will push the integration of the two in the coming decades.

In addition to DeGEM, this confluence of biology and computer science is recognized and supported by one of the Calit² technology-driven living labs, Knowledge and Data Systems, led by Chaitan Baru, UCSD/San Diego Supercomputer Center, and Sharad Mehrotra, UCI/ICS.

Why was UCI chosen for this grant? Baldi points to successful training programs in biomedical informatics at UCI, in particular, a training emphasis in bioinformatics in ICS, and an interdisciplinary graduate training program in biomedical informatics in ICS and the College of Medicine. Because of the success of these programs in increasing interdisciplinary research and training, a campus-wide Institute for Genomics and Bioinformatics (IGB) was established earlier this year, directed by Baldi, and bioinformatics training tracks have been developed in other academic units.

"We seek to consolidate these accomplishments into a comprehensive campus-wide training program administered by the IGB, and facilitate increased training and
research interactions among the clinical and basic science researchers at
UCI," says Baldi.

The grant, at steady state, will support 20 graduate students and six postdoctoral fellows. It will draw students from the College of Medicine, the School of Biological Sciences, the departments of Chemistry and Bioengineering, and ICS, which has plans in its own right for development into a school.

"All these groups at UCI are moving forward aggressively, and it's our great fortune that they appreciate the advantage, for both research and education, of working together to do something new to support this emerging scientific field," says Baldi. "We don't often get these life-changing opportunities."

A team of 15 investigators, drawn from across the units mentioned above, nominates the students. Students will be required to take coursework in both biology and computer science. With dual advisors from both disciplines, they will rotate among various research assignments doing their Ph.D. thesis work to ensure they receive the broad-based training this grant intends. Already, in just its first year, the program had twice the number of applicants it could accommodate. "As we disseminate information about the program, we expect the number of applicants to increase," says Baldi.

This grant is "perfectly aligned" with Calit², says Baldi, because it complements the research underway at both campuses in the DeGEM layer. "Students supported through this grant will work on DeGEM projects, and, in fact, many DeGEM students will move into, and benefit from, the new Calit² facilities coming online in late 2004."

Complementing these developments from a Calit² perspective, UCSD has an interdisciplinary stand-alone graduate degree program in bioinformatics now in its second year, led by bioengineer and bioinformaticist Shankar Subramaniam and supported by a large NIH training grant from the National Institute of General Medical Sciences.

"We look forward," says Baldi, "to integrating our efforts with those of our Calit² partner campus UCSD to produce more students trained in this important interdisciplinary area."

"The experiment that Calit² is engaged in," says Calit² director Larry Smarr, "is to create a collaborative framework that encourages disciplines to work together, both for research and curriculum development - all to benefit the students. Pierre's major new program teaming biology and computer science is a perfect example of the kinds of synergies we're convinced have great value for California and the nation. We're delighted with his success and grateful to him for leading this charge."

* Science, 287:2396, 2000.