Prompt Quality and Pull Request Outcomes: A Stage-Based Empirical Study of LLM-Assisted Development

2026-06-19 · Source: cs.SE updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering, Data Science & Analytics · Depth: Advanced, extended

Summary

An empirical study analyzed 265 manually validated developer–ChatGPT interactions from open-source pull requests to understand how prompt structure influences downstream outcomes. Researchers operationalized prompt quality using three dimensions: Context, Specificity, and Verification (CSV). The study first evaluated LLM-assisted annotation, finding that Specificity showed the most stable agreement with human judgment, while Context was systematically underscored, and Verification remained difficult to assess consistently, leading to a hybrid human–LLM annotation strategy. Subsequently, the research examined prompt structure's impact on actionable code generation, code adoption, and integration depth. Key findings indicate that Specificity and Context are strongly associated with actionable code generation, Verification is the primary predictor of code adoption, and Context is most strongly linked to integration depth. This suggests prompt characteristics have distinct, stage-dependent effects in collaborative AI-assisted software engineering workflows.

Key takeaway

For Software Engineers developing with LLM assistance, tailor your prompt structure to the specific workflow stage. When generating code, prioritize clear Context and Specificity. For successful code adoption, ensure your prompts include strong Verification cues. To achieve deeper integration of LLM-generated code, provide rich contextual alignment with your existing implementation environment. Treating prompt design as an integral part of the development process will significantly improve downstream collaborative outcomes.

Key insights

Prompt quality dimensions (Context, Specificity, Verification) exert distinct, stage-dependent effects on LLM-assisted pull request outcomes.

Principles

Prompt effectiveness is workflow-stage-dependent.
Reliable LLM annotation requires selective human oversight.
AI-assisted development involves iterative evaluation and alignment.

Method

The study analyzed 265 developer–ChatGPT interactions, operationalizing prompt structure via Context, Specificity, and Verification (CSV) dimensions. It used a hybrid human–LLM annotation strategy and stage-based regression models for code generation, adoption, and integration depth.

In practice

Use Context and Specificity for code generation prompts.
Emphasize Verification for code adoption.
Provide rich Context for deeper code integration.

Topics

Large Language Models
Prompt Engineering
Pull Request Workflows
AI-assisted Software Engineering
Human-AI Collaboration
Code Adoption

Code references

nbd-wtf/nostr-tools

Best for: AI Engineer, NLP Engineer, Research Scientist, AI Scientist, Software Engineer, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.