Logo image
DISCO: A Dataset of Discord Chat Conversations for Software Engineering Research
Conference proceeding   Open access

DISCO: A Dataset of Discord Chat Conversations for Software Engineering Research

Keerthana Muthu Subash, Lakshmi Prasanna Kumar, Sri Lakshmi Vadlamani, Preetha Chatterjee and Olga Baysal
2022 IEEE/ACM 19th International Conference on Mining Software Repositories (MSR)
May 2022
url
https://doi.org/10.1145/3524842.3528018View
Published, Version of Record (VoR)Maybe Open Access (Publisher Bronze) Open

Abstract

Chat conversations Collaboration conversation disentanglement Data mining Discord Manuals online communities Oral communication Programming software developers Thesauri Virtual assistants
Today, software developers work on complex and fast-moving projects that often require instant assistance from other domain and subject matter experts. Chat servers such as Discord facilitate live communication and collaboration among developers all over the world. With numerous topics discussed in parallel, mining and analyzing the chat data of these platforms would offer researchers and tool makers opportunities to develop software tools and services such as automated virtual assistants, chat bots, chat summarization techniques, Q&A thesaurus, and more. In this paper, we propose a dataset called DISCO consisting of the one-year public DIScord chat COnversations of four software development communities. We have collected the chat data of the channels containing general programming Q&A discussions from the four Discord servers, applied a disentanglement technique [13] to extract conversations from the chat transcripts, and performed a manual validation of conversations on a random sample (500 conversations). Our dataset consists of 28, 712 conversations, 1,508,093 messages posted by 323, 562 users. As a case study on the dataset, we applied a topic modelling technique for extracting the top five general topics that are most discussed in each Discord channel.

Metrics

39 Record Views
15 citations in Scopus

Details

InCites Highlights

Data related to this publication, from InCites Benchmarking & Analytics tool:

Collaboration types
Domestic collaboration
International collaboration
Web of Science research areas
Computer Science, Artificial Intelligence
Computer Science, Information Systems
Computer Science, Software Engineering
Logo image