Anthropic’s CISO drinks the AI kool aid - backpedals frantically on security analysis claim

Jason Clinton, CISO of generative AI company Anthropic, has backpedalled on his claims that the company’s latest model “Opus” is “capable of reading source code and identifying complex security vulnerabilities used by APTs” (Advanced Persistent Threat groups) with “trivial prompting.”

A post making these claims was “written for a broad, non-expert audience; sorry for the churn” Clinton said on March 11 in response to demonstrations that his claim was inaccurate, drawing some derision.

In an initial post on X on March 8, Anthropic’s CISO Clinton had said: “I just simply asked the model to role-play a cyberdefense assistant and to look for a class of vulnerability. And yet, even with this trivial prompting, Claude was able to identify the vulnerability which was unveiled in [a Google Project Zero blog ] a month after our training data cutoff.”

He shared a blog with details of this purported successful analysis.

Unfortunately, this claim was balderdash. A close look by Sean Heelan, a developer whose (cometh the hour, cometh the Twitter respondent) credentials include a PhD from the University of Oxford on the topic of automatic exploit generation, promptly appeared to demonstrate this.

“To find CVE-2023-0266 the LLM needs to look at both control.c and control_compat.c. This is what Claude Opus generates when given those. No mention of the write issue unfortunately, and the bug it does report is a false positive (snd_ctl_ioctl takes the required locks)” he wrote.

Clinton backpedalled, writing in response: “Some prompt engineering is required to actually take advantage of this level of analysis… the output is limited to 4K and models need space to enumerate the call paths…

“It's not perfect, of course, and getting the correct output is stochastic, but the problem is found about half of the time,” he added on X.

Heelan responded: “I think you may be misunderstanding me… What I am saying is that Opus does not find the vulnerability at all. What it is referring to in the chat you shared is not CVE-2023-0266, and doesn't appear to be a bug at all. The patch is also wrong,” he added crisply.

Alex Matrosov, CEO of Binarly, a firmware supply chain security platform, chimed in, telling Clinton that “[Sean Heelan] just demonstrated that the entire analysis from the original post is wrong. It shows only the negative value of using LLM in such cases (based on the present example), leading to false statements with extra alert fatigue flavor,” adding: “To give credit, Claude-3 is the best so far for code analysis and contextualization.”

Can LLMs do vuln identification/analysis?

In a March 11 blog, computer security expert Ben Hawkes, founder of Isosceles, reflected that "finding security bugs in real software is hard – impossibly, stupidly hard – at least from the perspective of computational complexity. The basic problem is state explosion, where each system interaction leads to an exponential number of new possibilities, which in turn leads to an exponential number of new possibilities, and so on. If you see 'find a bug in this source code' as a search optimization problem, then the search space is mind boggling.

"One way to make it tractable is to simplify the problem: CTF [capture the flag] problems, CGC [cyber grand challenge]e binaries, looking at a single file/function at a time. But real world security bugs don't work like that, they involve huge codebases with all sorts of cross-function, cross-library, and cross-process interactions that blow up the search space immediately.

"This is why fuzzing has been winning the methodology wars. When the search space is this big, all of the fancy program analysis stuff breaks down, and you're left with some fairly primitive tools – random mutations with a code-coverage/compare-value feedback loop, and a bunch of clever trimming of the search space (like enabling compiler sanitizers to make bugs easier to trigger)..."

“Vulnerable to illusions of understanding”

The incident could be dismissed as a storm in a social media teacup.

Yet it arguably is also a timely example of the extent to which generative AI is regularly given credit for plausible-looking but fatally flawed outputs that readers are not equipped to understand, and yet willing to take as gospel – whether through ignorance or a desire to promote a product.

As a powerful paper in Nature by Yale Professor of Anthropology Lisa Messeri and Princeton Psychology Professor and neuroscientist Molly Crockett noted this week, generative AI can “exploit our cognitive limitations, making us vulnerable to illusions of understanding in which we believe we understand more about the world than we actually do.”

The Stack put the incident to a number of CISOs for their views. One, preferring not to be named, said: “I can understand the desire for trying to find the next level of defensive tooling. The current frameworks and approaches are incredibly resource-intensive, time-sensitive, and for those without big corp funding, reliant on others. Either in the form of threat intel, signatures and pattern matching updates for XDR, and prescribed correlation matrixes in SIEM or CIEM apps… [But] I'm yet to be convinced to risk the family silver on a fragile technology. After all, it only has to hallucinate once. But who knows what next month will bring.”

Former Holland & Barrett CISO Dinis Cruz meanwhile told The Stack that he continued to see huge scope for generative AI to play a role in cybersecurity, irrespective of incidents like this. He said: “[GenAI] is going to be massive help, in fact it will allow a dramatic change in the productivity and capabilities of defence teams (in just about all areas, but big ones are: Data Analysis, Data transformation, Data correlation, Risk Management, Code/Infrastructure analysis ... all in aid to the Human)

[In terms of hallucination risk] My view is that we want to "bring our own content" (i.e. add it to the prompt). We also want models that don't learn (i.e. 'read-only')”, Cruz said, cautioning however that “the people who put LLMs in line with users/use-cases [sic], plug it to APIs and don't control the content that it acts on, will be on the receiving end of crazy exploits including security teams who don't control the data (potentially malicious since some of that data come from attackers) that they feed their LLMs.”

“But the productivity potential for security use cases is insane and will make a MASSIVE difference” he insisted to The Stack.

What are your views on the role of generative AI in supporting vulnerability analysis, application security or other areas of cybersecurity?

Get in touch.

Can LLMs do vuln identification/analysis?

“Vulnerable to illusions of understanding”

See also: Red RAG to a Bull? Generative AI, data security, and toolchain maturity. An ecosystem evolves...