A new “Ultra Ethernet Consortium” (UEC) brings together rivals AMD and Intel, HPE and Microsoft to rethink the foundations of Ethernet – as bandwidth and latency pressure from AI and high-performance computing (HPC) mount on the networking protocol in the data centre.
The UEC, operating under the auspices of the Linux Foundation, intends to develop a modern “Ethernet-based communication stack architecture for high-performance networking” – in a genuinely unusual meeting of minds from across some of the technology world’s biggest companies.
“The new era”, its homepage pronounces, “needs a new network as performant as a supercomputing interconnect, as ubiquitous and cost-effective as Ethernet [and] as scalable as a cloud data center.”
(The Ultra Ethernet Consortium’s founding members are AMD, Arista, Atos’s Eviden, Broadcom, Cisco, HPE, Intel, Meta and Microsoft. It will be accepting applications from prospective new members from Q4 2023.)
It will have four core working groups, collaborating to modernise the physical layer, link layer, transport layer, and software layer.
Ultra Ethernet Consortium: “A vested interest…”
Dr. J Metz, Chair of the Ultra Ethernet Consortium told The Stack: “All of the contributors to the UEC have a vested interest in ensuring the new network transport protocols are interoperable, as each of the members have key roles to play in both AI and HPC workloads.
“The new transport protocols are designed to augment the existing standards for workloads with greater sensitivity to margin-for-error and add in functionality that is highly desirable in such networks.
“For instance, the ability to packet-spray across multipath topologies, enable flexible ordering of delivery, advanced congestion control, and include inherent security at a targeted million end points (orders of magnitude greater than existing environments) all are enticing to UEC Members and, ultimately, anyone who deploys these workloads at scale.
“It’s important to note that UEC has no intention to replace existing Ethernet or even InfiniBand, but rather enhance well-known, well-understood, and well-supported Ethernet protocols across the stack that are highly tuned to these workloads,” he added by email.
In recent decades, many proposals for addressing congestion have been made (e.g., DCQCN, DCTCP, SWIFT, Timely), the UEC's inaugural whitepaper notes: "None of the current algorithms, however, meet all the needs of a transport protocol optimized for AI, which are:
> Ramping quickly to wire rate in a high-speed, low round-trip-time network where there is an uncongested path, without reducing the performance of existing traffic
> Managing path congestion in the fabric and on the last hop to the destination
> Controlling incast by fairly sharing the final link without resulting in expensive packet loss, retransmission, or increased tail latency."
"Weaknesses in system interconnects..."
“Many HPC and AI users are finding it difficult to obtain the full performance from their systems due to weaknesses in the system interconnect capabilities. It’s also difficult for users to integrate and learn multiple new or different solutions. It’s exciting to see this impressive group of leading companies work together to create a new common higher-performance interconnect solution. Buyers in the HPC and AI areas have very demanding workloads, which the UEC approach could greatly help improve interoperability, performance and capabilities. We look forward to seeing a new set of products enter the market in the near future,” said Dr Earl Joseph, CEO of Hyperion Research.
The consortium will see a “tight integration between those who will develop the technology and those who will implement the technology” added UEC Chair Dr J Metz -- by day the technical director of systems design at AMD. He told The Stack: The key goal of Ultra Ethernet was to primarily tackle the integration of the Ethernet stack with advances in efficiency, reliability and scalability from the physical through the software layer. The goal of interoperability with these advances are simply non-negotiable elements for criteria of success.”