Conference proceeding
DISCO: A Dataset of Discord Chat Conversations for Software Engineering Research
2022 IEEE/ACM 19th International Conference on Mining Software Repositories (MSR)
May 2022
Abstract
Today, software developers work on complex and fast-moving projects that often require instant assistance from other domain and subject matter experts. Chat servers such as Discord facilitate live communication and collaboration among developers all over the world. With numerous topics discussed in parallel, mining and analyzing the chat data of these platforms would offer researchers and tool makers opportunities to develop software tools and services such as automated virtual assistants, chat bots, chat summarization techniques, Q&A thesaurus, and more. In this paper, we propose a dataset called DISCO consisting of the one-year public DIScord chat COnversations of four software development communities. We have collected the chat data of the channels containing general programming Q&A discussions from the four Discord servers, applied a disentanglement technique [13] to extract conversations from the chat transcripts, and performed a manual validation of conversations on a random sample (500 conversations). Our dataset consists of 28, 712 conversations, 1,508,093 messages posted by 323, 562 users. As a case study on the dataset, we applied a topic modelling technique for extracting the top five general topics that are most discussed in each Discord channel.
Metrics
Details
- Title
- DISCO: A Dataset of Discord Chat Conversations for Software Engineering Research
- Creators
- Keerthana Muthu Subash - School of Computer Science, Carleton University,Ottawa,CanadaLakshmi Prasanna Kumar - School of Computer Science, Carleton University,Ottawa,CanadaSri Lakshmi Vadlamani - School of Computer Science, Carleton University,Ottawa,CanadaPreetha Chatterjee - Drexel University,Department of Computer Science,Philadelphia,PA,United StatesOlga Baysal - School of Computer Science, Carleton University,Ottawa,Canada
- Publication Details
- 2022 IEEE/ACM 19th International Conference on Mining Software Repositories (MSR)
- Publisher
- Association for Computing Machinery (ACM)
- Grant note
- RGPIN-2021-03809 / Natural Sciences and Engineering Research Council of Canada (NSERC) (10.13039/501100000038)
- Resource Type
- Conference proceeding
- Language
- English
- Academic Unit
- Computer Science
- Web of Science ID
- WOS:000850208000032
- Scopus ID
- 2-s2.0-85134067741
- Other Identifier
- 991019173549304721
InCites Highlights
Data related to this publication, from InCites Benchmarking & Analytics tool:
- Collaboration types
- Domestic collaboration
- International collaboration
- Web of Science research areas
- Computer Science, Artificial Intelligence
- Computer Science, Information Systems
- Computer Science, Software Engineering