Prompt Quality and Pull Request Outcomes: A Stage-Based Empirical Study of LLM-Assisted Development
Summary
An empirical study analyzed 265 manually validated developer–ChatGPT interactions from open-source pull requests to understand how prompt structure influences downstream outcomes. Researchers operationalized prompt quality using three dimensions: Context, Specificity, and Verification (CSV). The study first evaluated LLM-assisted annotation, finding that Specificity showed the most stable agreement with human judgment, while Context was systematically underscored, and Verification remained difficult to assess consistently, leading to a hybrid human–LLM annotation strategy. Subsequently, the research examined prompt structure's impact on actionable code generation, code adoption, and integration depth. Key findings indicate that Specificity and Context are strongly associated with actionable code generation, Verification is the primary predictor of code adoption, and Context is most strongly linked to integration depth. This suggests prompt characteristics have distinct, stage-dependent effects in collaborative AI-assisted software engineering workflows.
Key takeaway
For Software Engineers developing with LLM assistance, tailor your prompt structure to the specific workflow stage. When generating code, prioritize clear Context and Specificity. For successful code adoption, ensure your prompts include strong Verification cues. To achieve deeper integration of LLM-generated code, provide rich contextual alignment with your existing implementation environment. Treating prompt design as an integral part of the development process will significantly improve downstream collaborative outcomes.
Key insights
Prompt quality dimensions (Context, Specificity, Verification) exert distinct, stage-dependent effects on LLM-assisted pull request outcomes.
Principles
- Prompt effectiveness is workflow-stage-dependent.
- Reliable LLM annotation requires selective human oversight.
- AI-assisted development involves iterative evaluation and alignment.
Method
The study analyzed 265 developer–ChatGPT interactions, operationalizing prompt structure via Context, Specificity, and Verification (CSV) dimensions. It used a hybrid human–LLM annotation strategy and stage-based regression models for code generation, adoption, and integration depth.
In practice
- Use Context and Specificity for code generation prompts.
- Emphasize Verification for code adoption.
- Provide rich Context for deeper code integration.
Topics
- Large Language Models
- Prompt Engineering
- Pull Request Workflows
- AI-assisted Software Engineering
- Human-AI Collaboration
- Code Adoption
Code references
Best for: AI Engineer, NLP Engineer, Research Scientist, AI Scientist, Software Engineer, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.