benchmarks
Claude is wildly outperforming GPT and Gemini in BullshitBench. Its creator thinks some vendors may be losing touch with fundamentals.
Trillion-dollar industry can't even measure its biggest problem -- "calibration" not the answer, says OpenAI