A Study of LLMs' Preferences for Libraries and Programming Languages
Summary
A recent study investigated the programming language and library preferences of eight diverse Large Language Models, including GPT-4o, GPT-3.5, Claude3.5, Llama3.2, Qwen2.5, DeepSeekLLM, and Mistral7b. Researchers prompted these LLMs to complete language-agnostic benchmark tasks and generate initial structural code for new projects. The findings reveal a significant bias towards Python, which LLMs used in 90%-97% of benchmark cases and 58% of project initialisation tasks, even when Python was unsuitable. Furthermore, LLMs contradicted their own language recommendations in 83% of project initialisation tasks. Similar biases were observed for well-established libraries like NumPy, with up to 48% unnecessary usage. These results highlight universal biases across models, raising concerns about their reliability in guiding language selection and hindering the discoverability of newer open-source projects.
Key takeaway
For software engineers initiating new projects or using LLMs for code generation, be aware that these models exhibit significant biases towards Python and established libraries. Your LLM may suggest Python even when a compiled language like C++ or Rust is more appropriate for performance-critical tasks. Always explicitly specify your desired programming language and core libraries in prompts. Critically review LLM-generated code for language and library choices, ensuring they align with project requirements and best practices, rather than relying solely on the model's default preferences.
Key insights
LLMs exhibit strong, universal biases towards Python and established libraries, often contradicting their own recommendations.
Principles
- LLM code generation preferences are heavily skewed by training data distribution.
- Internal consistency between LLM recommendations and actual code generation is low.
- Dominant LLM biases can hinder open-source project discoverability.
Method
Eight LLMs were prompted for language-agnostic code benchmarks and initial project code. Responses were analyzed for language/library use and consistency with recommendations.
In practice
- Verify LLM-generated language/library choices against project requirements.
- Explicitly specify desired languages or libraries in prompts.
- Explore alternative, less popular libraries manually.
Topics
- LLM Code Generation
- Programming Language Bias
- Library Selection
- Software Engineering
- Model Evaluation Benchmarks
- Technical Debt
Code references
Best for: AI Engineer, NLP Engineer, Computer Vision Engineer, AI Scientist, Machine Learning Engineer, Software Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.