11.7.02 -- "I straddle what's unfortunately still considered a 'no man's land' between computing and biology," says Shankar Subramaniam with a laugh. He's a professor of Bioengineering in the Jacobs School of Engineering and Chemistry and Biochemistry in the School of Natural Sciences at UCSD and affiliated with the Digitally Enabled Genomic Medicine layer of Calit² and the San Diego Supercomputer Center (SDSC). He is also an adjunct professor at the Salk Institute for Biological Studies.
"What is also not typical is that my work bridges experimentation and computation," he says. Subramaniam has been a long-time - and big-time - consumer of cycles at the National Center for Supercomputing Applications and, more recently at SDSC, when he moved to UCSD. He plays a significant role in a host of visionary, multi-disciplinary, multi-institutional, and well supported research projects including The Alliance for Cellular Signaling and the Joint Center for Structural Genomics. And he's very well known for his Biology Workbench, a Web-based environment that enables searching all publicly available protein and nucleic acid sequence databases, and provides analysis and modeling tools to support biology research.
So for all these things, Subramaniam was honored last month as "High Performance Computing's Highest Guru" by Genome Technology Magazine. Initially named one of five finalists in this category, Subramaniam won this honor resoundingly as determined from votes by the magazine's readership. "Nearly 3,000 people voted in this year's GT All-Stars, with highly respected candidates in each category," says Meredith Salisbury, managing editor of Genome Technology. "Shankar won by an overwhelming margin -- there is no question that voters consider him the best of our high-performance computing nominees." "I've worked closely with Shankar on many things, including NIH's Biomedical Information Science and Technology Initiative," says Calit² director Larry Smarr, "so I'm delighted he's won this well deserved honor!"
In exploring reasons motivating the award, our conversation quickly devolves to simple questions: What's next for biology, and why is the integration exemplified by Subramaniam's career so necessary? These are some of his favorite topics.
The genome sequence and related things are at a mature stage, admits Subramaniam, so "now we need to focus on the 'great beyond': how cells function. That will move us toward an understanding of the molecular basis of disease. But it's just not clear how to do this." So, much work remains - likely extending long beyond his lifetime, he ruefully admits.
What's required, he says, is to apply computational biology and bioinformatics approaches to integrate large, heterogeneous data sets to begin piecing together how an individual cell functions. "Today you can determine what genes are expressed in a cell via DNA microarray methodology. But what proteins are there, what states they're in - all that is a cottage industry research area," he says.
To obtain a dynamic picture of a cell, he says, we need to integrate a large body of knowledge, which has the infuriating characteristic of including lots of inconsistent, even contradictory, data.
"It's like putting together a giant jigsaw puzzle where the pieces keep changing shape," he says. "That gives you some idea of the complexity."
The main challenge, he says, is reconstructing of the cellular network. "Once we have that," he says, "we can build models to ask quantitative questions related to mapping input to response." And this work, he quickly points out, affects health care for all of us. "Once we understand cellular networks, we'll begin to integrate them together, and then we'll be in a position to determine, more quantitatively, how a particular pharmaceutical agent influences the function of a cell."
Among the many problems he points to are what we lack: sufficient experimental data, understanding how to bridge data across time and size scales, and enough cross-disciplinary expertise to move us ahead.
Which brings him to an even more favorite topic: The country's premier graduate degree program in bioinformatics, which he launched at UCSD two years ago and continues to direct. It has some 30 students now enrolled.
"We're training students versed in biology, medicine, computer science, and engineering," enthuses Subramaniam. "Other universities are starting similar programs, and, even so, together we can't begin to meet the demand for our graduates." This program is complemented by a large NIH training grant in bioinformatics.
As the newly appointed chair of SDSC's Executive Committee, Subramaniam believes that the supercomputing community needs to realize that computing infrastructure in biology needs to become increasingly domain-specific. "Our focus needs to be less on the computing and more on the infrastructure, software, tools, and expertise." And that may be true for other disciplines as well.
In this regard, he points to a good example of what's needed that's already underway: The Biomedical Informatics Research Network, launched one year ago by PI Mark Ellisman, a UCSD professor with joint appointments in the Neurosciences and Bioengineering departments, with significant technical support from SDSC. This project, just days ago, received a second investment from NIH of $10.9M to fund investigators at nine institutions studying schizophrenia. The PIs include Professor Steven Potkin at Calit² partner campus UCI (news realease).
"My mantra is 'open, open, open' - open source, open data, open research. They're all exceptionally important for this next phase in computational biology. And BIRN supports all these things. It's the next logical step towards next-generation, large-scale, discipline-oriented infrastructure," says Subramaniam.
"And this is where Calit² comes in," he says. "The institute supports this idea of openness via communication: Calit² is not only helping get the word out about these needs, in the old-fashioned sense of the word 'communication,' but it's pushing research and development on the telecommunications aspects of the infrastructure that we're coming to depend increasingly on."
To underscore this point, he thinks it's important to keep in mind that biology is not as mature a science as physics, for example, which relies on fundamental laws identified long ago. "Biology's still taking baby steps, collecting data based on observed phenomena. We want to be able to translate understanding of that data into fundamental rules and algorithms that we can use to 'compute' on those phenomena."
While we need to bring domain-specific information to computation, Subramaniam admits it's a struggle and says that growth pains are an inevitable part of the process. While models will have limitations, the payoff is that they will play an important role in supporting the development of new hypotheses and new experiments. "We've got to 'trust the process,'" he says.
For one of Subramaniam's recent presentations, see [http://www.calit2.net/events/2002/7-17-biocom_article.html] where Calit² researchers briefed San Diego biotechnology executives on bioinformatics at a standing-room-only breakfast meeting hosted by BIOCOM. Subramaniam addressed the topic: "Bioinformatics: Is the Post-Genomic Sequence Era a Harbinger for Systems Biology?" Both his PowerPoint slides and a video of his presentation are available.