Logo image
What characteristics make ChatGPT effective for software issue resolution? An empirical study of task, project, and conversational signals in GitHub issues
Journal article   Open access   Peer reviewed

What characteristics make ChatGPT effective for software issue resolution? An empirical study of task, project, and conversational signals in GitHub issues

Ramtin Ehsani, Sakshi Pathak, Esteban Parra, Sonia Haiduc and Preetha Chatterjee
Empirical software engineering, v 31(1), 22
18 Nov 2025
url
https://doi.org/10.1007/s10664-025-10745-8View
Published, Version of Record (VoR)Open Access via Drexel Libraries Read and Publish Program 2025CC BY V4.0 Open

Abstract

Issue resolution Large language models GitHub Conversation analysis Artificial Intelligence or Cybernetics Counseling
Conversational large-language models (LLMs), such as ChatGPT, are extensively used for issue resolution tasks, particularly for generating ideas to implement new features or resolve bugs. However, not all developer-LLM conversations are useful for effective issue resolution and it is still unknown what makes some of these conversations not helpful. In this paper, we analyze 686 developer-ChatGPT conversations shared within GitHub issue threads to identify characteristics that make these conversations effective for issue resolution. First, we empirically analyze the conversations and their corresponding issue threads to distinguish helpful from unhelpful conversations. We begin by categorizing the types of tasks developers seek help with (e.g., code generation, bug identification and fixing, test generation), to better understand the scenarios in which ChatGPT is most effective. Next, we examine a wide range of conversational, project, and issue-related metrics to uncover statistically significant factors associated with helpful conversations. Finally, we identify common deficiencies in unhelpful ChatGPT responses to highlight areas that could inform the design of more effective developer-facing tools. We found that only 62% of the ChatGPT conversations were helpful for successful issue resolution. Among different tasks related to issue resolution, ChatGPT was most helpful in assisting with code generation, and tool/library/API recommendations, but struggled with generating code explanations. Our conversational metrics reveal that helpful conversations are shorter, more readable, and exhibit higher semantic and linguistic alignment. Our project metrics reveal that larger, more popular projects and experienced developers benefit more from ChatGPT’s assistance. Our issue metrics indicate that ChatGPT is more effective on simpler issues characterized by limited developer activity and faster resolution times. These typically involve well-scoped technical problems such as compilation errors and tool feature requests. In contrast, it performs less effectively on complex issues that demand deep project-specific understanding, such as system-level code debugging and refactoring. The most common deficiencies in unhelpful ChatGPT responses include incorrect information and lack of comprehensiveness. Our findings have wide implications including guiding developers on effective interaction strategies for issue resolution, informing the development of tools or frameworks to support optimal prompt design, and providing insights on fine-tuning LLMs for issue resolution tasks.

Details

InCites Highlights

Data related to this publication, from InCites Benchmarking & Analytics tool:

Collaboration types
Domestic collaboration
Web of Science research areas
Computer Science, Software Engineering
Logo image