10.13.2003 -- Chicago, Illinois. A new milestone was reached in trans-Atlantic data transmission today by researchers at the University of Illinois at Chicago (UIC) who demonstrated the practicality of transferring even very large data sets over high-speed production networks.
UIC's National Center for Data Mining (NCDM) and Laboratory for Advanced Computing flashed a set of astronomical data across the Atlantic at 6.8 gigabits per second --- 6800 times faster than the 1 megabit per second effective speed that connects most companies to the internet.
In the test, 1.4 terabytes of astronomical data was transmitted from Chicago to Amsterdam in 30 minutes using UDT, a new protocol developed by the NCDM at the University of Illinois at Chicago. In comparison, moving the same amount of data using the TCP Protocol, which is the standard used on the internet today for data transfers, would take 25 days.
Moving large data sets over the internet faces several hurdles:
First, the network infrastructure for long distance 1 Gigabit per second and 10 Gigabit per second network links is still maturing and software that can use this infrastructure is just being developed. The UIC computer clusters used for the test were connected to the SURFnet network in Amsterdam and the Abilene network in Chicago. The test also demonstrated the quality and power of these, two of the world's leading research networks. In the past, high-speed data transfers of very large data sets have usually employed specialized experimental networks and used data protocols that did not allow other network traffic to share the same link.
Second, today's predominant network protocol, TCP, is not effective at moving massive data over long distances. UDP, another network protocol that is also widely deployed, cannot reliably transport data (some data may be lost) and is not friendly to other flows (using it for large data transfers can starve other network traffic). Currently, efforts are underway to improve TCP, to develop new protocols to replace TCP, and/or to develop protocols on top of TCP and UDP that are effective for high performance data transport.
To overcome these problems, in the past, high speed data transfers of very large data sets have used special purpose research networks and employed specialized data protocols that in practice did not allow other network traffic to share the same link.
Friday's test run used a new network protocol called UDP-based Data Transport or UDT, which was developed by the National Center for Data Mining at the University of Illinois at Chicago. Unlike some other protocols now being studied for high speed data transfer, UDP-based protocols can be used over today's Internet without making changes to the network infrastructure. Today's demonstration not only showed that UDT was fast, but also that it was friendly and could effectively coexist with thousands of other networks connections.
The demonstration is part of an ongoing international effort to find and test new ways of reliably moving massive data sets around the globe using advanced networks and new data transfer protocols. Such systems hold enormous promise for advancing scientific research, in addition to numerous commercial applications. Today, although it is becoming common for global business to have important data in different cities, it is still quite difficult to integrate this data to create a common view.
"Using UDT, it is now practical for the first time to move even massive data sets over very long distances in a friendly fashion using today's networks," said Robert Grossman, Director of UIC's National Center for Data Mining and President of Open Data Partners.
UDT is currently being used by several international research projects. UDT is used by the OptIPuter, a research project developing next generation computing infrastructures based upon advanced photonics. UDT also plays a role in research projects developing high performance web services, something that is required in order to scale today's web services to large remote and distributed data sets.
UDT is used as the network transport layer in the joint University of Illinois/Northwestern project on Photonic Data Services (PDS), which is developing open source data services for next generation photonic networks, such as the OptIPuter. The OptIPuter is an example of what are sometimes called lambda grids, distributed computing infrastructures in which applications can set up their own photonic paths (lambdas) supporting data transport at Gigabit per second speeds and higher.
"Moving data at 6.8 Gigabits per second across the Atlantic using UDT is an important milestone for the OptIPuter Project and brings us a bit closer to effective data management over lambda grids," said Larry Smarr, Principal Investigator of the OptIPuter Project and Director of the California Institute for Telecommunications and Information Technology, a UC San Diego/ UC Irvine partnership.
UDT is also being used as one of the layers of a UIC project called Open DMIX (for Data Mining, Data Integration, and Data Exploration), which is developing open source high performance web services for data mining.
"Using UDT and the scalable data mining and data integration web services built on top of it may emerge as an important enabling technology for the grid computing required for next generation virtual observatories," according to Alex Szalay, Alumni Centennial Professor in the Department of Physics and Astronomy at The Johns Hopkins University.
The tests were made possible by support from the following manufacturers and organizations, who have generously contributed their equipment, facilities, and know-how: OMNInet, StarLight, Nortel, SARA and CANARIE. Partial funding for the tests was provided by the National Science Foundation (Grants 0129609, 9977868 and 0225642) and the University of Illinois at Chicago.
For more information, contact:
Shirley Connelly, Associate Director, NCDM
Robert Grossman Director, NCDM
National Center for Data Mining
The National Center for Data Mining (NCDM) at the University of Illinois at Chicago (UIC) was established in 1998 to serve as a national resource for high performance and distributed data mining. The Center sponsors research projects, facilitates standards, operates testbeds, and provides outreach. The Center is coordinating the development of the Predictive Model Markup Language (PMML), the standard for statistical and data mining models, as well as the WS-DMX web services for data mining and data exploration standard. The NCDM also operates the Terra Wide Data Mining Testbed, a worldwide testbed for high performance and distributed data mining. For more information about NCDM, see www.ncdm.uic.edu.
SURFnet operates and innovates the national research network in The Netherlands, to which 150 institutions in higher education and research in the Netherlands are connected. To remain in the lead SURFnet puts in a sustained effort to improve the infrastructure and to develop new applications to give users faster and better access to new Internet services. Currently SURFnet's network innovation is funded by the Dutch government via the GigaPort project. For more information please visit www.surfnet.nl.
About the OptIPuter
The OptIPuter, started in October 2002, is a five-year, $13.5 million project funded by the National Science Foundation. It will enable scientists who are generating massive amounts of data to interactively visualize, analyze and correlate their data from multiple storage sites connected to optical networks. University of California, San Diego and University of Illinois at Chicago lead the research team, with funded partners at Northwestern University, San Diego State University, the Information Sciences Institute at University of Southern California, UC Irvine and Texas A&M University, with industrial partners IBM, Sun Microsystems, Telcordia Technologies, Inc. and Chiaro Networks. See www.optiputer.net.