Microsoft researchers: LLMs degrade “artifact fidelity”

19 LLMs, 50% error rates on average. Blame the harness?

Edward Targett

May 18, 2026 - 3 min read

Microsoft researchers warned that “even frontier models” corrupt an average of 25% of document material during extended workflows.

Testing 19 LLMs on a bespoke set of work environments across 52 domains, the researchers found that the models on average degraded 50% of the material they were given to work with, as tasks progressed.

This post is for paying subscribers only

Join peers managing over $100 billion in annual IT spend and subscribe to unlock full access to The Stack’s analysis and events.

Subscribe now

Already a member? Sign in