Microsoft researchers warned that “even frontier models” corrupt an average of 25% of document material during extended workflows.
Testing 19 LLMs on a bespoke set of work environments across 52 domains, the researchers found that the models on average degraded 50% of the material they were given to work with, as tasks progressed.