AI

"Instead of relying on LLMs, we can develop focused, efficient solutions."
Wiz has developed an AI model optimised to identify secrets hidden within code to provide a more efficient and accurate approach to the issue compared to traditional techniques, proving the AI-ification of security isn’t always just hype.
The fine-tuned version of the Llama-3.2-1B small language model (SML), was developed by the cloud security company to run on standard CPU hardware and address the limitations of using large language models (LLMs) and regular expression (regex) patterns for the problem.
AI Research Lead Erez Harush said the research sought to avoid a “bigger is better” mentality: “This approach changes how we think about applying AI to security challenges—instead of relying on LLMs, we can develop focused, efficient solutions tailored to specific security needs.”
In testing, the SML, altered with a Low-Rank Adaptation solution and using a quantization strategy to reduce model weights, recorded 86% precision and 82% recall rates, with a runtime of 27 tokens/sec, higher than the average 60% recall for regex-based techniques.
It detects secrets, such as credentials and API keys, accidentally, or lazily, left hidden in code repositories that could be abused by threat actors to gain access to a company’s system.
Regex, patterns used to match characters in a string of text, has often been used as a relatively self-contained and simple to use way of scanning for secrets, but is limited by the need to input key types to identify and the inability to detect secrets that don’t fit an obvious pattern.
Why keep things small?
While Wiz said LLMs had also been used for the job of secret detection, with “impressive capabilities”, it highlighted the significant computational requirements when using an LLM for the task, along with data privacy concerns related to feeding in potentially harmful secrets and credentials.
For example, Wiz said using an API-based LLM, such as GPT-4o, to scan five million code files could take 2-3 seconds per file, making the entire process a 174 day affair at a potential cost of more than $100k.
This made a SML better suited to the task, Harush said, with Wiz’s model recording a 75% smaller model footprint and 2.3x faster processing on CPU hardware compared to an LLM using 32-bit floating-point precision, with just a 1% drop in accuracy.
Wiz’s model is currently in private preview but will be used for secret scanning across the company’s services in the future, with plans for coverage to be expanded to secret detection in other data types such as configuration files and documentation.
“We also see opportunities for enhanced contextual understanding, training the model to better assess the severity and exploitability of discovered secrets,” Harush added.