November 16, 2021 / By Fungible
The Teams Deliver 10M IOPS Performance from Single Server to a Single Storage System
Fungible Inc., a pioneer in data-centric computing, and the San Diego Supercomputer Center (SDSC), a leader and pioneer in high-performance and data-intensive computing at UC San Diego, today announced they have shattered the NVMe over TCP storage initiator performance world record, achieving 10M IOPS.*
Distributed AI, machine learning, and other data-centric workflows have traditionally been constrained in what they can accomplish by the limitations of traditional RDMA, iSCSI, and Fibre Channel based storage protocols and products. The Fungible solution leverages the modern NVMe over TCP standard and Fungible’s Storage Initiator card and storage target to unlock the potential of the rest of the infrastructure. The performance record being announced today exceeds the prior performance record by over 50%. The prior performance record was attained using the Fungible Storage Cluster without the benefit of the Fungible Sto rage Initiator cards. The Fungible Storage Initiator cards were able to deliver this significant increase in performance while simultaneously freeing up a significant amount of server resources to do other work.
The experiment was performed under the auspices of SDSC’s Advanced Technology Lab. The ATL’s team of scientists and engineers surveys, evaluates, and assembles the computing and storage technologies needed for emerging scientific computing and data analysis systems.
“While impressive from a performance perspective, the results of this testing are more about expanding the scope of what AI, machine learning, data analytics, and other data-centric environments can deliver,” said Eric Hayes, CEO of Fungible. “The Fungible Storage Initiator cards developed on our standards-based Fungible DPUTM free up tremendous amounts of server CPU resources to run application code, and the application now has faster access to data than it ever has before. Scale-out data centers, powered by Fungible, can now surpass their performance goals economically, reliably and securely.”
According to John Graham, UC San Diego senior development engineer working at SDSC and the Qualcomm Institute, “The Fungible solution has set a new bar for storage performance in our environment. The results are potentially transformational for large- scale scientific cyberinfrastructure such as the Pacific Research Platform (PRP) and its follow-on, the National Research Platform (NRP). With Fungible’s innovative DPU technology, we are able to deploy a high-performance storage solution that achieves our planned density and cost requirements,” he said. “The PRP and NRP are unique, multi-institutional distributed systems for conducting at-scale AI and data-intensive computing for scientific research in a wide area environment.”
“One of the challenges of doing distributed AI at scale is storage performance, both raw bandwidth and IOPS,” noted Frank Wuerthwein, interim director of SDSC and principal investigator for the National Research Platform. “Fungible’s technology looks very promising in delivering the storage performance we need to achieve our future goals for a wide area, distributed AI and data science platform.”
“We are proud that AMD EPYC processors and their high-performance capabilities were able to help Fungible and SDSC showcase a new level of storage performance,” said Kumaran Siva, corporate vice president, Server Software and Systems, AMD. “Achievements like this have profound impacts on scale-out data centers around the world for scalability of storage technologies.”
Test Methodology
The tests were administered from a Gigabyte R282-Z93 server with a dual 64 core AMD EPYC™ 7763 processor and 2 TB of memory. The 10M IOPS benchmark was achieved using 5 Fungible Storage Initiator Cards running on the PCIe bus of the server with the newly launched NVMe/TCP Storage Initiator (SI) software. The previous record of 6.55 million IOPS was achieved by utilizing Mellanox ConnectX-5 NICs. This record required almost completely saturating the CPU cores on the host AMD EPYC™ processor-powered server. The new record had the added advantage of using only 63% of the CPU cores to drive the higher performance, leaving more of the cores available for user applications.
Media Contacts
Xochitl Rojas-Rocha
xrojasrocha@eng.ucsd.edu