AI 7 models, 1 GPU: Alibaba pools out of NVIDIA scarcity Production-tested token-level autoscaling can hugely reduce the need for chips to feed those inefficient LLMs, says Chinese cloud operator. Phillip de Wet Oct 21, 2025 - 3 min read Photo by James Wainscoat on Unsplash Get the full story: Subscribe for free Join peers managing over $100 billion in annual IT spend and subscribe to unlock full access to The Stack’s analysis and events. Subscribe now Already a member? Sign in