A Study of LLMs' Preferences for Libraries and Programming Languages

2024-07-18 · Source: cs.SE updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Expert, extended

Summary

A recent study investigated the programming language and library preferences of eight diverse Large Language Models, including GPT-4o, GPT-3.5, Claude3.5, Llama3.2, Qwen2.5, DeepSeekLLM, and Mistral7b. Researchers prompted these LLMs to complete language-agnostic benchmark tasks and generate initial structural code for new projects. The findings reveal a significant bias towards Python, which LLMs used in 90%-97% of benchmark cases and 58% of project initialisation tasks, even when Python was unsuitable. Furthermore, LLMs contradicted their own language recommendations in 83% of project initialisation tasks. Similar biases were observed for well-established libraries like NumPy, with up to 48% unnecessary usage. These results highlight universal biases across models, raising concerns about their reliability in guiding language selection and hindering the discoverability of newer open-source projects.

Key takeaway

For software engineers initiating new projects or using LLMs for code generation, be aware that these models exhibit significant biases towards Python and established libraries. Your LLM may suggest Python even when a compiled language like C++ or Rust is more appropriate for performance-critical tasks. Always explicitly specify your desired programming language and core libraries in prompts. Critically review LLM-generated code for language and library choices, ensuring they align with project requirements and best practices, rather than relying solely on the model's default preferences.

Key insights

LLMs exhibit strong, universal biases towards Python and established libraries, often contradicting their own recommendations.

Principles

LLM code generation preferences are heavily skewed by training data distribution.
Internal consistency between LLM recommendations and actual code generation is low.
Dominant LLM biases can hinder open-source project discoverability.

Method

Eight LLMs were prompted for language-agnostic code benchmarks and initial project code. Responses were analyzed for language/library use and consistency with recommendations.

In practice

Verify LLM-generated language/library choices against project requirements.
Explicitly specify desired languages or libraries in prompts.
Explore alternative, less popular libraries manually.

Topics

LLM Code Generation
Programming Language Bias
Library Selection
Software Engineering
Model Evaluation Benchmarks
Technical Debt

Code references

itsluketwist/llms-love-python

Best for: AI Engineer, NLP Engineer, Computer Vision Engineer, AI Scientist, Machine Learning Engineer, Software Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.