Calit2 Co-Sponsors and Smarr Speaks at GEON Annual Meeting

5.12.2005 – The GEON Annual Meeting, held May 5-6 in San Diego and co-sponsored by Calit2, was an interesting mix of the historical record on why GEON was funded, progress reports on various aspects of the project, demonstrations of software that has been developed, and presentations from experts in related fields. GEON is a five-year, NSF-funded Information Technology Research (ITR) project to develop cyberinfrastructure for the earth sciences.

The presentations included one by Calit2 director Larry Smarr on “Analyzing Large Earth Data Sets: New Tools from the OptIPuter and LOOKING Projects.” Smarr is advisor to the NASA administrator and chair of NASA’s Earth Systems Science and Application Advisory Committee, as well as PI on two large NSF Information Technology Research grants related to the earth sciences.

More than 115 people attended the meeting – more than double the attendance at the meeting a year ago, indicating a growing community pushing development of cyberinfrastructure for the earth sciences. In recognition of GEON’s early success, the attendees included four program officers from NSF.

Larry Smarr
Larry Smarr
Chaitan Baru
Chaitan Baru, GEON PI

Leonard Johnson, program director in the Division of Earth Sciences (Directorate of Geosciences), in remarks early in the program, harkened back to the ITR competition and the tough decisions the program officers had to make. A large number of projects were competing for funding, he indicated, and, at one point, the review committee was faced with whether to fund GEON or a competitor in another field. Their choice was based on a comparison of what the two projects could do for their respective fields as a whole: Unlike the other that had a more narrow focus, GEON was seen to be of service to the entire earth science community.

Johnson emphasized that the faith of the committee has been borne out in just the first two years of the project. He said he regularly receives calls from directors of other disciplines at NSF who have heard about GEON and want to free up funds to enable other researchers to participate.

Krishna Sinha
Krishna Sinha,
Deborah McGuinness
Deborah McGuinness,
speaker from
Stanford University
Poster session
Poster session

He cited two ways to know that GEON’s been a success. One, he joked, is when a program director can use the word “ontology”[1] in a sentence and know what it means. (“Ontology” seemed to be the byword of the meeting and came up in many other contexts, some of which are described below.) The other is when a garden variety university scientist can sit down at his or her computer and solve a problem using GEON tools without knowing their inner details. Johnson encouraged the GEON team to start thinking and planning for transition to the next step so their important work doesn’t come to a halt when the grant ends.

Krishna Sinha, GEON PI and Geoscience Lead, Mid-Atlantic Testbed from Virginia Tech, addressed the issue of science integration. He emphasized that GEON focuses on systems in their entirety, which requires a holistic approach to development of cyberinfrastructure. As a geoscientist, he said the path is for his community to partner with experts in information technology to develop tools, computational resources, knowledge management protocols, and data preservation techniques.

“Our ‘database’ is the rock record through time and space,” he said. “Any scientific question you might want to ask of that record is relevant to our purposes in this project.”

GEON features two testbeds. One, called DYSCERN, is examining the broad dynamic evolution of a plateau in the Rocky Mountains. It requires a good understanding of geologic history over 1.8 billion years. “Generally speaking, we want to determine what happened when, where, why, and how,” said Sinha. “We’re looking at correlations among datasets using three-dimensional visualization of those datasets.”

The other testbed, CREATOR, focuses on the mid-Atlantic. “We believe its surface geology reflects its surface history, and we see evidence of at least two supercontinent building events,” said Sinha. “We want to understand the geometry associated with the breakup of these supercontinents.”

To maintain a research environment that supports these kinds of studies, said Sinha, responding to Johnson’s urging to think about the future, requires an organizational structure for long-term community support. Sinha and others believe this can be achieved through national societies. To that end, a request has been made to the Geological Society of America to create a new division of geoinformatics.

Chaitan Baru, also a GEON PI and lead for IT infrastructure, design, and development representing the San Diego Supercomputer Center and UCSD, said, “Integration is our mantra: We’re trying to integrate heterogeneous datasets, tools, and models. We want to create infrastructure that will help scientists do their day-to-day work – not just heroic computations.”

To achieve that, he said, GEON is using a two-tiered approach: using best practices, including open standards, commercial tools, and software developed in other intersecting projects such as BIRN and SEEK, and developing advanced technologies and conducting computer science research. He echoed Sinha’s notion about the equal partnership between science and information technology. “We’re creating shared science infrastructure – integrated online databases with advanced search and query engines, online models, robust tools, and applications,” he said.

The team is also balancing centralization and coordination with distribution and local autonomy through development of GEONgrid systems and portals. As part of this, the SDSC technical team is visiting partner sites to help with system upgrades and portal customization.

Baru also described work enabling ontology-based “smart searches” across multiple data sets of interest to a particular researcher. The GEON team is developing “geo-ontologies” and registering data sets to relevant ontologies. To integrate data sets for further exploration, the researcher will simply drag and drop data sets of interest into a data integration “shopping cart.”

The team is also working on map integration to enable presenting geoscience information on GIS layers in a useful and intuitive way and knowledge-based integration of Web mapping services, promoting WMS as a standard. They are also developing a visual scientific data analysis and modeling environment to enable, e.g., ingestion of LiDAR data, mineral classification, and gravity modeling.

Progress this year also pertained to visualization. GEON hosted a visualization workshop March 1-2, 2005, to address 4-D (three dimensions of space plus time) representation of earth science data sets and models (all the way back to hundreds of millions of years ago), define visualization requirements of the project, evaluate available tools, identify areas where development is needed, and address issues related to data discovery, retrieval, standards, and interoperability. Conclusions from the workshop included:

- Generalized data access for the scientists is critical.
- The group needs to establish data format standards.
- The GEON portal needs access to data via a visual browser (e.g., Geofusion, a browser to be developed by the team over the coming year).
- The group needs to identify a small intensive science study area on which to focus development efforts.
- The group needs easy access to data and visualization capabilities using the GeoWall.

The word “ontology” emerged again as the subject of the talk by Deborah McGuinness, of Stanford University and McGuinness Associates (her consulting firm). She defined it as a “controlled vocabulary providing technical encoding of the meaning of terms supporting interoperation across reasoning systems, data sets, applications, etc.”

Scientists should be able to access the global, distributed knowledge base of scientific data, and these data should appear integrated and locally available, she said. That’s the perfect world. But, in fact, this world is dominated by lots of differences in terms of instruments supplying the data, protocols, assumptions, the various ways a given technical term is used, and how metadata is applied to data sets. She maintained that knowledge representation tools should help users appear to know more, function at a higher level, and be used without effort. They should also use a language with which the user is comfortable.

As an example, she described the Virtual Solar Terrestrial Observatory, a distributed, scalable education and research environment funded by NSF for searching, integrating, and analyzing observational, experimental, and model databases. With goals similar to GEON’s, it provides virtual access to data sets, models, tools, and material archives related to solar, solar-terrestrial, and space physics.

She ended her talk with some speculative questions about a more perfect scientific world: What if scientists could not only use their own data and tools but remote colleagues’ data and tools with confidence? What if they could understand their colleagues’ assumptions and constraints so as to use the colleagues’ data more effectively? What if they knew whose research would benefit from their own results to further the scientific enterprise in general? What if they knew whose results were consistent or inconsistent with theirs?

These are exactly the kinds of questions the GEON project is working to address.

[1] An ontology can be thought of a map that relates concepts in a particular discipline to each other in a hierarchical fashion from most general to more specific. Its usefulness derives from the consensus of the disciplinary community that designs it. [--Ed.]