The Range Shrinks, the Threat Remains: Re-evaluating LLM Package Hallucinations on the 2026 Frontier-Model Cohort

· Source: cs.SE updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy, Software Development & Engineering · Depth: Expert, short

Summary

A recent study re-evaluated package name hallucinations by code-generating large language models, replicating Spracklen et al.'s 2025 methodology on five frontier LLMs released from October 2025 to March 2026: Claude Sonnet 4.6, Claude Haiku 4.5, GPT-5.4-mini, Gemini 2.5 Pro, and DeepSeek V3.2. Analyzing 199,845 Python and JavaScript prompts against PyPI and npm, the study found hallucination rates between 4.62% (Claude Haiku 4.5) and 6.10% (GPT-5.4-mini). While this represents an order-of-magnitude compression in inter-model spread compared to previous findings, the "slopsquatting" threat remains. Crucially, 127 package names were identically hallucinated by all five models; after coordinated disclosure, 53 of these (41 on PyPI, 12 on npm) are still registrable, creating a model-agnostic supply-chain attack surface. The research also noted a Python-over-JavaScript hallucination asymmetry and a Jaccard-similarity peak (J = 0.343) between DeepSeek V3.2 and GPT-5.4-mini.

Key takeaway

For AI Security Engineers evaluating supply chain risks from code-generating LLMs, you must recognize that despite reduced inter-model hallucination rate variance, the threat of slopsquatting persists. Your focus should extend beyond individual model vulnerabilities to identifying common hallucinated package names across diverse frontier models. Proactively register these shared, non-existent packages or implement robust internal package validation to mitigate the model-agnostic attack surface revealed by this research.

Key insights

LLM package hallucination rates have converged but still pose a significant, model-agnostic supply-chain security risk.

Principles

Method

The study replicated a methodology using 199,845 paired Python/JavaScript prompts, validating hallucinated package names against PyPI and npm master lists to identify registrable attack surfaces.

In practice

Topics

Code references

Best for: CTO, Research Scientist, VP of Engineering/Data, AI Scientist, AI Security Engineer, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.