Wall Street had estimated that Nvidia would post sales of $7.2 billion during this current quarter. Nvidia expects them to top $11 billion. The prediction, made during its Q1 earnings, triggered an almost overnight $200 billion spike in the company’s market capitalisation, as executives at the semiconductor and software company predicted jaw-dropping growth driven by demand for its accelerated computing product set worldwide.
Large language models (LLMs) and AI are the talk of every organisation and “AI” showed up 106 times on Nvidia’s earnings call. CFO Colette Kress was clear that this where the sudden Nvidia surge in sales is coming from.
“Generative AI large language models are driving the surge in demand, and it's broad-based across both our consumer Internet companies, our CSPs (cloud service providers), our enterprises, and our AI start-ups.”
“Enterprise demand for AI and accelerated computing is strong. We are seeing momentum in verticals such as automotive, financial services, healthcare, and telecom, where AI and accelerated computing are quickly becoming integral to customers' innovation roadmaps and competitive positioning,” Kress told analysts on the May 24 earnings call.
Nvidia surge driven by data centre transformation
But AI is driving a more fundamental shift in what Nvidia CEO Jensen Huang described as $1 trillion of global installed data centre infrastructure. At conferences around the world over the past week – from Dell’s to Red Hat – this shift was on everyone’s lips and the evolution of the technology stack being closely watched, as a nascent AI revolution looks set to upend a data centre world dominated by what Huang described as “CPUs and dumb NICs” (network interface cards).
Here are five key takeaways from the Nvidia earnings call, from InfiniBand vs Ethernet, via accelerated computing’s “full stack” challenges.
1: AI plus an energy crisis? Hello, big spender…
“We're seeing incredible orders to retool the world's data centres” said Jensen Huang on Nvidia’s blockbuster fiscal Q1 2024 earnings call, as the company reported record quarterly data centre revenue of $4.28 billion.
That’s being driven not just by the need to install the engines that can power AI applications, but the tailwinds of the past few years’ energy crisis. Call it “sustainability” or call it fiscal sustainability, but people are looking to cut energy bills and emissions and that is taking investment
What is accelerated computing?
Accelerated computing takes certain intensive elements of an application and processes them on a separate acceleration device. For example, accelerated computing may see a data-intensive application offload the raw data processing element to a GPU with its parallel processing architecture. Reconfigurable accelerators from vendors like AMD and Nvidia, among others, increasingly let users handle machine learning inference or video processing to any workload using the same accelerator card. Many are task-specific and ship with software libraries designed to support particular needs like financial market risk processing pipelines.
GPUs and other accelerators that can offload processing tasks from the CPU and punch their way through them efficiently can render data centres a lot more efficient. Nvidia, like others in this space, has made hay of this performance + efficiency win-win. It is pushing forward with things like its liquid-cooled A100 PCIe card (a design in a single slot solution with a pair of tubing connectors on the rear, which it is testing with data centre heavyweight Equinix in a bid to drive PUE – a measure of data centre efficiency – down from 1.6 to just 1.15) and argues that switching all the CPU-only servers running AI and HPC worldwide to GPU-accelerated systems would save 11 trillion watt-hours of energy yearly.”
As Huang summed it up: “Accelerated computing is so energy efficient, that the budget of the data center will shift very dramatically towards accelerated computing and you're seeing that now… the beginning of a 10-year transition to basically recycle or reclaim the world's data centers and build it out as accelerated computing. You'll have a pretty dramatic shift in the spend of the data center from traditional computing, and to accelerated computing with smart NICs, smart switches, of course, GPUs.”
2: Smart switches and NICs for the win...
SmartNICs offload a growing array of jobs from CPUs. That’s important because, as Huang has noted, “probably half of the CPU cores inside the data centre are not running applications. That's kind of strange because you created the data center to run services and applications, which is the only thing that makes money... The other half of the computing is soaked up running the software-defined data center, just to provide for those applications [which] commingles the infrastructure, the security plane and the application plane and exposes the data centre to attackers…”
Nvidia plays in this space via the technology it gained with the acquisition of Mellanox, not least its Bluefield programmable SmartNICs; Nvidia claims that by taking on data centre workloads, a single BlueField-3 replaces or frees approximately 300 CPU cores. SmartNICs are capable of taking on networking, security and storage jobs, including enabling firewalls and a hardware root-of-trust for a secure boot; running TLS, IP security and MAC security, or handling storage and data-access protocols such as RoCE, GPUDirect Storage, NVM Express and Transmission Control Protocol, as well supporting the workloads of virtualised data centres with single-root I/O virtualisation and virtual switching and routing.
3: AI, APIs, and examples…
Examples? Nvidia CFO Colette Kress told analysts that customers using Nvidia’s platforms for AI included auto insurance company, CCC Intelligent Solutions, which “is using AI for estimating repairs… AT&T is working with us on AI to improve fleet dispatches so their field technicians can better serve customers… Deloitte [is using AI] for logistics and customer service and Amgen for drug discovery and protein engineering.”
As Huang added the three core engines of the future of computing will be AI-centric: recommender systems, large language models, and vector databases. As he sees it, the future is a world of AI “factories” which will use APIs to connect to all kinds of applications and workflows.
As he put it on the earnings call: “There’ll be hundreds of APIs in [a given company], some of them they built themselves, some of them that come from companies like ServiceNow and Adobe that we're partnering with… they'll create a whole bunch of generative AI APIs that companies can then connect into their workflows or use as an application” and this will require a real rethink of their underlying IT infrastructure owing to the demand such work places on traditional CPU and dumb NIC systems.
“These [AI] applications could have image in, video out, video in, text out, image in, proteins out, text in, 3D out, video in, in the future, 3D graphics out” Huang told analysts. “So, the input and the output requires a lot of pre and post-processing. The pre and post-processing can't be ignored. The model itself is only ~25% of the data -- of the overall processing of inference. The rest of it is about preprocessing and post-processing, security, decoding, all kinds of things like that.” This is where, Nvidia claims, it stands out, because this world is not just about hardware.
4: Competition and the full stack approach
Nvidia captured the headlines over the past week, but AMD and many others are playing well in this space (AMD will be unveiling its own refreshed data centre roadmap at a closely watched event on June 12 that The Stack will be reporting from.) Huang was clear: “We have competition from every direction: Really well-funded and innovative startups, countless of them all over the world; competition from existing semiconductor companies; competition from CSPs with internal projects.”
“NVIDIA's value proposition at the core is.. we're the lowest TCO solution” he claimed. “The reason for that is that accelerated computing a full stack challenge: you have to engineer all of the software and all the libraries and all the algorithms, integrate them into and optimise the frameworks for the architecture of not just one ship but the architecture of an entire data center…your networking, operating system, your distributed computing engines, your understanding of the architecture of the networking gear, the switches and the computing systems, the computing fabric, that entire system is your computer.”
5: Ethernet vs InfiniBand
A major part of this "entire system" to watch is not just the libraries that support it but the technologies that connect systems in the data centre, e.g. Ethernet and InfiniBand; the former the most widely used communication protocol in the LAN, the latter widely used in supercomputing. Where do both sit in a world of accelerated computing?
Asked that question by one analyst, Nvidia's CEO Jensen Huang had a ready answer: "They target different applications in a data center. They both have their place. Nvidia's Quantum InfiniBand has an exceptional roadmap. It's going to be really incredible. But the two networks are very different. InfiniBand is designed for an AI factory, if you will. If that data center is running a few applications for a few people for a specific use case and it's doing it continuously [then use InfiniBand]. But if your data center is a cloud datacenter and it's multi-tenant, doing a bunch of little jobs and is shared by millions of people."
But there is, he added, "a new segment in the middle where the cloud is becoming a generative AI cloud. Still a multi-tenant cloud but it wants to run generative AI workloads. This new segment is a wonderful opportunity. At COMPUTEX, we're going to announce a major product line for this segment..."
Watch this space for more on that soon.
Meanwhile, around the world, every data centre operator's CapEx budget will, says Nvidia's CEO, "lean very heavily into generative AI and into accelerated computing infrastructure; everywhere from the number of GPUs that would be used to the accelerated switches and accelerated networking chips that connect them all" – a claim that could be dismissed as vendor hype, if those real term revenue outlooks were not quite so astonishingly, transformationally bullish.
Follow The Stack on LinkedIn
For multi-tenant cloud transitioning to support generative AI our high-speed Ethernet platform with BlueField-3 DPUs and Spectrum-4 Ethernet switching, offers the highest available Ethernet network performance. BlueField-3 is in production and has been adopted by multiple hyperscale and CSP customers, including Microsoft Azure, Oracle Cloud, CoreWeave, Baidu, and others