Free Malicious Packages Repository is a powerful asset

The last couple of months have been rough for open source security – from burgeoning supply chain attacks, (arguably overblown) concerns around a ubiquitous library, or the liability heavy EU Cyber Resilience Act.

The Open Source Security Foundation's (OpenSSF) new Malicious Packages Repository can be seen in part as a response to these external anxieties – as a "comprehensive, high quality, open source database of reports of malicious packages published on open source package repositories."

OpenSSF's Malicious Packages Repository

This database primarily aims to help developers in stopping malicious dependencies from moving through CI/CD pipelines, refine detection engines, scan for and prevent usage in environments, or accelerate incident response- and already has about 15,000 reports of malicious packages.

Currently, each open source package repository has its own approach to handling malicious packages. When a malicious package is reported by the community, the package repositories' security team is likely to remove the package and its associated metadata, though not all repositories will create a public record of the package. This has meant that records exist only on many disparate public sources, or through proprietary threat intelligence feeds.

The OpenSSF, as a cross-industry foundation enjoys a certain custodian-like legitimacy, and the fact that its members include tech big names such as AWS, Alphabet (formerly Google), GitHub, Dell, IMB, Meta (formerly Facebook) and Microsoft mean that this push for a centralised repository could be significant.

Background Chaos

According to Caleb Brown from the Google Open Source Security Team and Jossef Harush Kadouri from the Software Supply Chain Security at Checkmarx, the repository has been built in direct response to the rising number of malicious package attacks on systems.

"Earlier this year, the Lazarus Group (a prolific North Korean state-backed hacking group) targeted the blockchain and cryptocurrency sectors. The group used sophisticated methods, including deceptive npm packages to compromise various software supply chains. A centralized repository for shared intelligence could have alerted the community to the attack sooner and helped the open-source community understand the complete range of threats," the duo wrote in a OpenSSF blog post.

Just last month, users of Telegram, AWS and Alibaba Cloud were targeted in a unique open source supply chain attack using malicious packages, according to a Checkmarx report. The attacker operating under the pseudonym "kohlersbtuh15", attempted to exploit the open-source community by uploading a series of malicious packages to the PyPi package manager.

("Rather than performing automatic execution, the malicious code within these packages was strategically hidden within functions, designed to trigger only when these functions were called" Checkmarx noted.)

Malicious Packages Repository: How to get it

The reports in the Malicious Packages repository use the Open Source Vulnerability (OSV), a JSON format used for specifying vulnerabilities in open source projects. By using the OSV format for malicious packages it is possible to make use of existing integrations, including the osv.dev API, the osv-scanner tool, and deps.dev.

The OSV format is also extensible, allowing additional data to be recorded like indicators of compromise, or classification data.

Commenting on the repository to The Stack via email, Henrik Plate, security researcher at application security startup, Endor Labs said: "For academic researchers, in particular, it offers a nice opportunity to explore and test new approaches to malware detection without being required to redo the basic plumbing over and over again, e.g. the monitoring of new package publications on various package registries like PyPI or npm.

“Thankfully, this part is covered by the associated OpenSSF package-feeds project, which goes hand in hand with the OpenSSF package-analysis project to populate the database mentioned in the blog post. The database could also be an invaluable dataset for AI/ML training, comparable to the Backstabber’s Knife Collection, if only they would also publish the actual malware (e.g. Python wheels or tarballs). I hope this is going to change in the future.

“From a technical perspective, this seems to mostly rely on the dynamic detection of malicious behavior. To this end, they install packages in a gVisor sandbox and observe potentially malicious activities. What’s noteworthy is that they go a long way to actually trigger the malicious code, at least in the case of Python. For example, they import Python modules present in a given package and try to call into its functions..."

"This approach attempts to overcome the typical problem of dynamic detection, which is that the conditions for malicious code execution are not met, hence, no malicious activity can be detected....“