Apple found a way to sharply cut token use “Sonata” proposes a "reliable proxy for thinking necessity" The Stack Jun 08, 2026 - 2 min read Apple's HQ. Image credit; Avi Richards, via Unsplash.com Machine learning engineers at Apple quietly created a way to cut AI token use by up to 60%, without undermining model performance. Get the full story: Subscribe for free Join peers managing over $100 billion in annual IT spend and subscribe to unlock full access to The Stack’s analysis and events. Subscribe now Already a member? Sign in