AI
130,000 GPUs connected with just two tiers of switches.
OpenAI has developed a new networking protocol to make the way GPUs communicate with each other in large supercomputers faster and more reliable.
The frontier lab has teamed up with AMD, Broadcom, Intel, Microsoft, and NVIDIA on the open protocol called Multipath Reliable Connection (MRC).
The technique is designed to improve the pace of data transfer between GPU clusters – an essential element for training large AI models. OpenAI said that MRC means it can connected 130,000+ GPUs "with only two tiers of switches."
Join peers managing over $100 billion in annual IT spend and subscribe to unlock full access to The Stack’s analysis and events.
Already a member? Sign in