
Developers are blithely incorporating AI components into applications and systems without considering the security and governance implications, and their bosses have no idea what’s happening, according to Sonatype cofounder and CTO Brian Fox.
Sound familiar? During the open source explosion of the 2010s devs happily incorporated code and components they found somewhere out there, giving threat actors a golden ticket to embedding themselves in systems.
Sonatype had a front row seat for this. It operates the Maven Central Repository for Java components. “We derive a lot of ecosystem intelligence from that,” Fox told us during a chat at Kubecon EMEA earlier this month.
Over time it branched into monitoring the broader software supply chain, and “We started to see this new line of attacks, which we call malicious components, open source malware.”
Rather than attempting to steal credit card data or PII, the attackers would target “API keys from the development infrastructure and then come back later and actually hack the company.”
“You can have the best vulnerability program in the world, which is focused on making sure you don't ship stuff out the end to your developers, but does nothing to help defend your users and your infrastructure.”
Moreover, Fox continued, company leaders were often oblivious to the problem – often because they were oblivious that their developers were using open source components in the first place.
“I used to talk to organizations who would be downloading, let's say 100,000 components from Maven Central. Big, big organizations. And we talked to their leaders, and they'd say, perfectly serious, ‘We don't use open source.’”
To which Sonatype would reply, “Well, let me show you something. You might be surprised.”
Developers, their employers, and even the US government got wise to the supply chain threat over time. At least up to a point. Attackers are still exploiting it, with Sonatype’s most recent research showing it logged 512,847 malicious packages in the last year.
A new wave of stupidity
It also put the total number of Python package requests at 530 billion, up 80 percent on the year, and largely driven by AI and cloud.
And that is throwing up a new wave of security and governance challenges. “In terms of AI, we’re seeing a repetition,” Fox told us. And again, employers are often oblivious to the way AI is being used by their developers.
It’s not just that coders are using tools like Copilot to produce code, he said. “We're finding that they've included AI models of different categories in the software itself.” Even though their leaders are “very confident” this isn’t the case.
“We've added capabilities to our system to be able to detect, manage, provide the vulnerability quality about the AI models. So it feels sort of like almost we're coming full circle back again.”
The components in question could be relatively small, Fox says. It could simply be a developer needs an algorithm that does some fuzzy matching between two pieces of data. “Well there are very small AI-like things that are pretty good at that.”
The developer might think of this as just another component, and pull it in. The problem, said Fox, “In 2024 and 2025 a lot of those are not, let's say, deterministic code, but they're built on LLMs in some form or another. They have all of the same problems that the bigger AI chat bot. Which is you don't know what data they were trained on. It's hard to inspect that.”
It might be possible to “trust” the code itself, he says. But’s it’s impossible to inspect the data it was trained on.
Organizations don’t even know these components are there, he says, meaning they’re not looking for them, never mind trying to put governance in place. “So they can't tell the developers, ‘Okay, you can use this one, but not that one’. They kind of do something, like, ‘You can't use anything’. And the developers are like, ‘Right? I'm just gonna do it anyway.’”
The inclusion of AI components raised new governance issues he said. “Because it's not deterministic. It's not like you can even test all of the things, right?”
And, once it’s embedded in an application, “You have no idea what it's doing. And it may be that the model has been trained to do something bad later. It may be that the run time of the model includes specific malicious code directly”
Other issues might be more mundane, Fox said. “The model code itself might be open source, but the particular implementation that your developers pulled in might have been trained on a data set that disallows commercial use.”
Fox is less concerned about AI generated code changing the whole development dynamic. “At the end of the day, when we're looking at dependencies that might be introduced. It's less relevant whether that was recommended by a AI or a human.”
Ultimately, he said, “If the AI recommends a dumb dependency, well then we're going to flag it, just like we would as a junior engineer.”