AI researchers examined and tested over a hundred prominent models in coding tasks, discovering that approximately half of them contained significant security vulnerabilities.
In a recent report by Veracode, it has been revealed that a significant number of vulnerabilities in AI-generated code remain a major concern. According to the 2025 GenAI Code Security Report, 45% of AI-generated code samples fail security tests and introduce OWASP Top 10 vulnerabilities.
The leading vulnerability among these is Cross-Site Scripting (CWE-80), appearing in 86% of relevant AI-generated code samples, along with other issues like log injection (CWE-117) present in 88% of cases.
The research, which tested over 100 large language models (LLMs), found that bigger or more advanced models do not necessarily produce more secure code. Security failures appear to be a systemic issue across all models rather than a problem fixed by scaling up model size.
The findings underline a major potential risk attached to 'vibe coding', in which software developers rely heavily on AI output to quickly generate code for use in software.
In terms of differences across programming languages, the security failure rates vary significantly. Java is the riskiest language, with a security failure rate of about 72%. C# sees a failure rate of about 45%. JavaScript suffers a 43% failure rate. Python has the lowest among these but still significant, at 38%.
Regarding AI models, the research found that LLMs were the worst at generating Java safely, achieving an average score of 28.5% in this widely-used language.
While AI coding assistants improve productivity, their ability to generate secure code lags behind functional accuracy. Many organizations must integrate automated security tools, static analysis, and remediation workflows to detect and fix vulnerabilities early, especially given how AI-generated code often omits key security considerations unless explicitly guided.
Key vulnerabilities include Cross-Site Scripting (CWE-80) and log injection (CWE-117), with LLMs failing to properly defend against these in the vast majority of cases.
To mitigate risks, companies are recommended to embed security checks into development workflows, use AI-powered remediation tools, and maintain strict policy enforcement to prevent insecure AI-generated code from reaching production.
It's worth noting that AI tools are widely used for code generation, with 84% of software developers using AI to produce code more quickly. However, three-quarters (75.3%) of developers do not trust AI outputs, and 61.7% have security concerns over the use of AI code.
Despite these concerns, major tech companies like Google and Microsoft are already integrating AI-generated code into their systems. Alphabet CEO Sundar Pichai revealed that 25% of Google's internal code is now AI-generated, while Microsoft CEO Satya Nadella recently revealed that up to 20-35% of his firm's code is written by AI.
However, even with a large context window, it is unclear whether models can perform the detailed interprocedural dataflow analysis required to determine variables that need sanitization. This raises further questions about the security implications of relying on AI for code generation.
In conclusion, while AI-generated code offers significant benefits in terms of productivity, it is crucial for organizations to prioritize security measures to ensure the code they generate is safe and secure.
Summary table of security failure rates across languages:
| Language | Security Failure Rate (AI-generated code) | |------------|-------------------------------------------| | Java | ~72% | | C# | ~45% | | JavaScript | ~43% | | Python | ~38% |
- The report reveals that 45% of AI-generated code samples fail security tests and introduce OWASP Top 10 vulnerabilities.
- Cross-Site Scripting (CWE-80) is the leading vulnerability among AI-generated code samples, appearing in 86% of relevant cases.
- The research found that bigger or more advanced models do not necessarily produce more secure code.
- In terms of programming languages, Java is the riskiest with a security failure rate of about 72%.
- Regarding AI models, LLMs were the worst at generating Java safely, achieving an average score of 28.5% in this widely-used language.
- While AI coding assistants improve productivity, their ability to generate secure code lags behind functional accuracy.
- Key vulnerabilities include Cross-Site Scripting (CWE-80) and log injection (CWE-117), with LLMs failing to properly defend against these in the vast majority of cases.
- To mitigate risks, companies are recommended to embed security checks into development workflows, use AI-powered remediation tools, and maintain strict policy enforcement.
- Despite concerns about security, major tech companies like Google and Microsoft are integrating AI-generated code into their systems, with Google using 25% of its internal code being AI-generated.