UCI Professor's Software Scores Supercomputing Shortcuts

By Anna Lynn Spitzer

03.22.05 – What do a high-performance supercomputer and an efficient cookie-baking operation have in common? Charlie Zender, assistant professor of Earth system science and a Calit2 academic affiliate, says they share a fundamental backbone.

UCI’s Earth System Modeling Facility (ESMF) is a high-performance computer and storage system that allows scientists to make predictions about physical climate, chemistry and biogeochemical cycles of the Earth system. The IBM supercomputer consists of eight powerful servers connected by clustering technology that allows them to work together in parallel. It is capable of processing an enormous number of calculations simultaneously.

Charlie Zender
Charlie Zender

“Imagine that you’re stamping out cookies,” Zender says. “Instead of taking a cookie cutter and stamping out 10 cookies, one at a time, you assemble the cookie cutters into a row of 10 and you stamp one time. The ESMF works with the same principle.”

Zender has written software that makes the ESMF – and by extension, climate prediction research – even more efficient. netCDF Operators (NCO), the software that Zender began developing 10 years ago as a graduate student, was underwritten last fall by a $594,000 National Science Foundation grant. The software, which is used by climate-modeling facilities the world over, gives researchers the ability to analyze huge chunks of very specific data from other supercomputers located anywhere in the world. The researchers do not have to import whole data sets and sift through them; neither do they have to have access to/or accounts on other computers, as was required not long ago.

Zender explains the software system and why it is such a welcome tool for number-crunching scientists.

Q. In the past, how was scientific data analyzed to make weather predictions?

A. Ever since weather prediction began, scientists had to analyze data that was on their local computer. In recent years, with the advent of the Internet, that’s changed slightly.
Researchers could copy data from all over the Internet to their local computers, and then analyze it. But ultimately, that data had to be downloaded to the same computer that the researchers were sitting at.

Earth System Modeling Facility 

Q. How does this NCO software change that paradigm?

A. The new paradigm is called distributed data reduction and analysis. That means that the data is not required to be local anymore, and the researchers do not need to be at any of the computers where the data is. They just need to be able to issue “commands” that will grab the relevant data from other computers, perform the analysis, and transfer the relevant parts of the data from those locations. So if a researcher at UCI wants to compare data from the Calit2 OptIPuter project in San Diego with data on the ESMF, he or she does not need to be sitting in front of either computer and doesn’t need accounts on either computer. Rather than transferring all of the simulation from both computers, the technique automatically does what we call hyper-slabbing. We precut the data so we only receive the data we need.

Q. Why is distributed data reduction and analysis so important?

A. Because these geophysical data sets are huge. We analyze 200-square kilometer chunks – called grid cells – of the world at a time, and there are a lot of 200-square kilometer chunks in a model. The amount of data that generates is huge, but if you can run calculations and only get back the data you need, rather than all of the data, you can create an incredibly large savings on network requirements and bandwidth.

Q. What is unusual about your software?

A. This software package handles data that carries a sort of identification tag so that scientists can recognize it, including what physical properties it represents. It used to be that researchers had to write individual programs for each specific type of data they wanted to locate. This program is one-size-fits-all in the sense that one tool fits all data. So you can use one set of operators to process data of all types, whether it’s data about temperature, precipitation, wind speed or whatever.

Q. The NSF grant also funded certain computer connections?

A. Yes, it funds a connection from the ESMF to the Calit2 OptIPuter in San Diego . They’re already connected by the Internet, of course, but this proposal is going to demonstrate that you really can accelerate data reduction if you have a large enough data pipe and a very fast network connection. When the ESMF is connected to the OptIPuter, we can pull the data we need into the analysis more quickly and also we can go in the opposite direction.

Q. The opposite direction?

A. Falko Kuester (UCI assistant professor of electrical engineering and computer science, and director of the Calit2 Center of GRAVITY [Graphics, Visualization and Imaging Technology]) has another project underway called the HIPerWall, a high-performance visualization system. The HIPerWall is connected to the ESMF, so with these fast switches that this proposal paid for, we will be able to send massive amounts of data to the HIPerWall for real-time visualization of the forecasts.

Q. What’s next?

A. Every five years, the United Nations produces a report on the current state of knowledge about climate change. The next report is due in 2006. In order to answer relevant questions, scientists run climate models; in the last five year period, one model has been run in multiple places with different variables. It’s too much data to store at any one location. So one government agency does one forecast and stores its results there, another agency will do a simulation of a different scenario and store its results at their facility, and so on. So you have the same model with simulations of different scenarios distributed internationally. No one has ever inter-compared these simulations because you’ve got hundreds of terabytes here and hundreds of terabytes there. (A terabyte is one trillion bytes.) There’s no place in the world that could hold all of that data. Our proof of concept is to inter-compare these climate prediction scenarios for the next IPCC (Intergovernmental Panel on Climate Change) climate assessment for the United Nations.