Researchers from software supply chain security firm Rezilion have investigated the security posture of the 50 most popular generative AI projects on GitHub. They found that the more popular and newer a generative AI open-source project is, the less mature its security is.
Rezilion used the Open Source Security Foundation (OpenSSF) Scorecard to evaluate the large language model (LLM) open-source ecosystem, highlighting significant gaps in security best practices and potential risks in many LLM-based projects.
The findings are published in the Expl[AI]ning the Risk report, authored by researchers Yotam Perkal and Katya Donchenko.
The emergence and popularity of generative AI technology based on LLMs has been explosive, with machines now possessing the ability to generate human-like text, images, and even code. The number of open-source projects integrating these technologies has grown significantly.
For example, there are currently more than 30,000 open-source projects on GitHub using the GPT-3.5 family of LLMs, despite OpenAI only debuting ChatGPT seven months ago.
Despite their demand, generative AI/LLM technologies introduce security issues ranging from the risks of sharing sensitive business information with advanced self-learning algorithms to malicious actors using them to significantly enhance attacks.
Earlier this month, the Open Worldwide Application Security Project (OWASP) published the top 10 most critical vulnerabilities often seen in LLM applications, highlighting their potential impact, ease of exploitation, and prevalence.
Examples of vulnerabilities included prompt injections, data leakage, inadequate sandboxing, and unauthorised code execution.
What is the OpenSSF Scorecard?
The OpenSSF Scorecard is a tool created by the OpenSSF to assess the security of open-source projects and help improve them. The metrics it bases the assessment on are different facts about the repository such as the number of vulnerabilities it has, how often it’s maintained, and if it contains binary files.
By running Scorecard on a project, different parts of its software supply chain will be checked, including the source code, build dependencies, testing, and project maintenance.
The purpose of the checks is to ensure adherence to security best practices and industry standards. Each check has a risk level associated with it, representing the estimated risk associated with not adhering to a specific best practice. Individual check scores are then compiled into a single aggregate score to gauge the overall security posture of a project.
Currently, there are 18 checks that can be divided into three themes: holistic security practices, source code risk assessment, and build process risk assessment. The Scorecard assigns an ordinal scale between 0 to 10 and a risk level score for each check.
A project with a score nearing 10 indicates a highly secure and well-maintained posture, whereas a score approaching 0 represents a weak security posture with inadequate maintenance and increased vulnerability to open-source risks.
Most popular open-source generative AI projects the least secure
Rezilion’s research revealed a troubling trend: The more popular a generative AI/LLM project is (based on GitHub’s star popularity rating system), the lower its security score (based on the OpenSSF Scorecard).
“This highlights the fact that the popularity of a project alone is not a reflection of its quality, let alone its security posture,” the researchers wrote.
The most popular GPT-based project on GitHub, Auto-GPT, which has over 138,000 stars and is less than three months old, has a Scorecard score of just 3.7, according to the report. The average score among the 50 projects checked isn’t much better at 4.6 out of 10.
For wider context, the researchers compared the risk of the most-popular generative AI and LLM projects on GitHub with other popular open-source projects on the platform that are not generative AI- or LLM-related.
They analysed a group of 94 critical projects (defined by the OpenSSF Securing Critical Projects Work Group) with an average Scorecard score of 6.18, along with a group of seven projects that use the OpenSSF Scorecard as part of their SDLC workflow, with an average score of 7.37.
“The maturity and security posture of the open-source ecosystem surrounding LLMs leave a lot to be desired,” the researchers wrote. “In fact, as these systems gain more popularity and adoption, and as long the security standards in which they are developed and maintained remain the same, it seems inevitable that they will become the target of attackers, and significant vulnerabilities affecting them will continue to surface.”
Generative AI, LLMs risks will increase over next 12-18 months
The risks posed to organisations by generative AI/LLMs are expected to evolve over the next 12 to 18 months as the popularity and adoption of these systems continue to grow, said Yotam Perkal, director of vulnerability research at Rezilion.
“Without significant improvements in the security standards and practices surrounding LLMs, the likelihood of targeted attacks and the discovery of vulnerabilities in these systems will increase. Organisations must stay vigilant and prioritise security measures to mitigate evolving risks and ensure the responsible and secure use of LLM technology.”
Organisations can prepare for LLM risks by adopting a secure-by-design approach when developing generative AI-based systems. They should also leverage existing frameworks like the Secure AI Framework (SAIF), NeMo Guardrails, or MITRE ATLAS to incorporate security measures into their AI systems, Perkal added.
“It is also imperative to monitor and log LLM interactions and regularly audit and review the LLM’s responses to detect potential security and privacy issues and update and fine-tune the LLM accordingly. Responsibility for preparing and mitigating LLM risks lies with both the organisations integrating the technology and the developers involved in building and maintaining these systems.”