Claude is getting better at calling out nonsense prompts

If you fed a nonsense prompt into an LLM in before Q3 2025, the model would most likely accept it at face value.

Regardless of vendor and model, nonsensical prompts would be accepted as a reasonable starting point for LLMs at least 50% of the time, according to a benchmark tracking the behaviour.

Towards the end of 2025, first Claude Sonnet and then Claude Opus broke that habit. The Anthropic models began pushing back against bullshit prompts more than 70% of the time, then more than 90% of the time.

Its competitors did not. Fed the same illogical prompt, GPT and Gemini remain less likely to question the sense of a prompt, while assuming that humans know best.

LLMs' inability to distinguish the legitimacy of claims within prompts – and push back on them – is a reminder of how vulnerable current models are to prompt injections and inaccurate outputs.

Get the full story: Subscribe for free

Join peers managing over $100 billion in annual IT spend and subscribe to unlock full access to The Stack’s analysis and events.

Subscribe now

Already a member? Sign in