NSF Funds ‘Big Data’ Innovation Hub for the Western U.S.

By Jan Zverina

San Diego, Calif., Nov. 2, 2015 -- The National Science Foundation (NSF) has announced funding for a ‘Big Data’ Innovation Hub for the Western United States intended to facilitate collaboration among the region’s technology sector and other organizations to address research challenges in areas such as precision medicine, natural resource utilization, hazard management, and metro regional development.

Image from a 15-hour forecast of IWV (Integrated Water Vapor); an estimate of the total amount of water in the atmosphere that could become precipitation. The CalWater 2015 provided an opportunity earlier this year to test new forecast methods using large observational data. Hot colors (red) indicate high values; cool colors (blue) indicate low values. The arrows are wind barbs indicating wind speed and direction.  Image by Andrew Martin/SIO and John Helly/SDSC, SIO

The Western Hub is part of an NSF program announced today that includes four awards totaling more than $5 million to establish regional hubs for data science innovation. The consortia are coordinated by top data scientists at Columbia University (Northeast Hub); Georgia Tech and the University of North Carolina (South Hub); the University of Illinois at Urbana-Champaign (Midwest Hub); and the University of California, San Diego, the University of California, Berkeley, and the University of Washington (West Hub).

Covering all 50 states, they include commitments from 281 organizations – from universities and cities to foundations and Fortune 500 corporations – with the ability to expand further over time. Building upon the White House National Big Data Research and Development Initiative announced in 2012, the awards are made through the Big Data Regional Innovation Hubs (BD Hubs) program, which creates a new framework for multi-sector collaborations among academia, industry and government.

The program calls for creating an infrastructure to define and evaluate those collaborations. The Western BD Hub will connect state and regional organizations including academia, industry, state agencies, and non-profit organizations that regard the potential of large-scale data management and analysis as transforming or adding value to their operations.

Project principal investigators for the Western BD Hub include Michael Norman, director of the San Diego Supercomputer Center (SDSC) at UC San Diego; Michael Franklin, the Thomas M. Siebel Professor of Computer Science and Chair of the Computer Sciences Division at UC Berkeley; and Ed Lazowska, the Bill & Melinda Gates Chair in Computer Science & Engineering at the University of Washington. A full list of project personnel can be found here.

The NSF has defined big data as large, diverse, complex, longitudinal, and/or distributed data sets generated from instruments, sensors, Internet transactions, email, video, click streams, and/or all other digital sources available today and in the future.
“Partnerships created through the Western BD Hub will focus on development and application of big data technologies, data standards, relevant policies and ethics, and innovative data-intensive discovery techniques,” said SDSC Director Michael Norman. “These will be leveraged with the aim of transforming how data is collected, integrated, stored, analyzed, and shared, all with the goal of assessing risks related to regional and long-term decisions.”

Moreover, partnerships enabled by the BD Hubs program will lead to professional certificate programs and student internships, creating a pipeline of graduates from partner institutions to join industry, public/government agencies, national labs, resource-planning agencies, and regulatory commissions.

Larry Smarr, director of the California Institute for Telecommunications and Information Technology (Calit2) at UC San Diego and UC Irvine, says that Calit2 and SDSC will be “exploring the synergies” between the Western BD Hub and the Pacific Research Platform, a science-driven high-capacity data-centric “freeway system” that will give participating universities and other research institutions the ability to move data 1,000 times faster compared to speeds on today’s inter-campus shared Internet. 

“The BD Hubs program represents a unique approach to improving the impact of data science by establishing partnerships among likeminded stakeholders,” said Jim Kurose, NSF’s head of Computer and Information Science and Engineering. “In doing so, it enables teams of data science researchers to come together with domain experts, with cities and municipalities, and with anchor institutions to establish and grow collaborations that will accelerate progress in a wide range of science and education domains with the potential for great societal benefit.”

Big Data “Spokes”

Along with the BD Hubs awards, the NSF posted a solicitation for the next phase of the BD Hubs program. The agency will award approximately $10 million in grants as part of the Big Data Spokes program (BD Spokes) to help initiate research in specific priority areas identified by the BD Hubs. Each BD Spoke will focus on a specific BD Hub priority area and address one or more of three key issues: improving access to data; automating the data lifecycle; and applying data science techniques to solve a domain science problem and/or demonstrate societal impact.

The following are thematic areas which could develop into spokes over the course of the BD Hubs program:
•    Big Data Technology: Widespread interest in big data is fueling a surge of activity and innovation in data management technologies across the entire hardware/software stack. The Western region leads the nation in these efforts through its unique blend of leading universities and national laboratories such as the Lawrence Berkeley National Laboratory and Lawrence Livermore National Laboratory, which develop technology and applications that push the limits of existing technologies. The region also has a developed ecosystem of start-ups and established companies that are at the center of big data research and analysis.
•    Managing Natural Resources and Hazards: Challenges and related opportunities in the region include fresh and salt-water management, land management, plant and animal management, air quality management, and natural disaster management and response related to earthquakes, tsunamis, and wildfires. All of these require the need to catalog, control, mitigate, and defend the region’s resources, while eliminating or mitigating associated hazards.
•    Precision Medicine: Medicine is undergoing a dramatic change through the aggregation, integration and analysis of big data. In early 2015, President Obama announced the launch of the Precision Medicine Initiative to enhance innovation in biomedical research, with the goal of moving the U.S. into an era where medical treatment is tailored to each patient based on data about multiple factors to individually optimize their prognosis. These factors include genetics and other molecular profiles, individual history and lifestyle, and multiple assays of a patient’s physiological state.
•    Metro Data Science: Cities in the Western U.S., and particularly the areas of Seattle, San Francisco, and San Diego represented by the three BD Hub co-leads, are frequently structured as true metropolitan areas, interconnected and interdependent urban, suburban, and rural regions with complex dynamics between citizens, policy, infrastructure, and the environment. With increasing urbanization comes new challenges of creating efficient infrastructures in transportation, utilities, housing, communication, public services, and resource consumption.
•    Data-Enabled Scientific Discovery and Learning: Digitally generated data is streaming in from myriad sources: simulations such as global climate models or earthquake scenarios; networks of powerful sensors on the seafloor or in buildings, roads and bridges; high-bandwidth remote sensing platforms such as satellites and telescopes; high-throughput laboratory instruments; and social science data created ranging from global economic indicators. Applying the data science methodology fields of computer science, statistics, and mathematics to the traditional research domains such as the life, environmental, physical, and social sciences, will advance discovery and the nation’s ability to extract meaningful value and knowledge from the massive amounts data.

BD Hubs Leadership Meeting

The announcement of the BD Hubs awards and BD Spokes solicitation comes days before the first national stakeholders meeting of the BD Hubs, to be held on November 3-5 in Arlington, Virginia. This national BD Hubs “charrette” will provide an opportunity for leaders and researchers representing each BD Hub to discuss governance and sustainability models, coordinate ideas for BD Spokes and identify next steps.

The last day of the meeting will include two public webinars. At the first webinar, BD Hubs representatives will publicly present and discuss their plans, as well as mechanisms for governance and coordination among BD Hubs stakeholders. The second webinar will be held in conjunction with the National Data Science Organizer’s Workshop and will discuss the role of the BD Hubs in engaging with grassroots data science groups, such as Meetup groups and non-profits.


