Journal article
An empirical study of code clones: Density, entropy, and patterns
Science of computer programming, v 242, 103259
May 2025
Abstract
In recent years, there has been a growing consensus among researchers regarding the dual nature of code clones. While some instances of code are valuable for reuse or extraction as components, the utilization of specific code segments can pose significant maintenance challenges for developers. Consequently, the judicious management of code clones has emerged as a pivotal solution to address these issues. Nevertheless, it remains critical to ascertain the number of code clones within a project, and identify components where code clones are more concentrated. In this paper, we introduce three novel metrics, namely Clone Distribution, Clone Density, and Clone Entropy (the dispersion of code clone within a project), for the quantification and characterization of code clones. We have formulated associated mathematical expressions to precisely represent these code clone metrics. We collected a dataset covering three different domains of Java projects, formulated research questions for the proposed three metrics, conducted a large-scale empirical study, and provided detailed numerical statistics. Furthermore, we have introduced a novel clone visualization approach, which effectively portrays Clone Distribution and Clone Density. Developers can leverage this approach to efficiently identify target clones. By reviewing clone code concerning its distribution, we have identified nine distinct code clone patterns and summarized specific clone management strategies that have the potential to enhance the efficiency of clone management practices. Our experiments demonstrate that the proposed code clone metrics provide valuable insights into the nature of code clones, and the visualization approach assists developers in inspecting and summarizing clone code patterns.
Metrics
Details
- Title
- An empirical study of code clones: Density, entropy, and patterns
- Creators
- Bin Hu - Hangzhou Dianzi UniversityDongjin Yu - Hangzhou Dianzi UniversityYijian Wu - Fudan UniversityTianyi Hu - Hangzhou Dianzi UniversityYuanfang Cai - Drexel University
- Publication Details
- Science of computer programming, v 242, 103259
- Publisher
- Elsevier
- Number of pages
- 14
- Grant note
- National Natural Science Foundation of China: 62372145
This work was supported by the National Natural Science Foundation of China (62372145) .
- Resource Type
- Journal article
- Language
- English
- Academic Unit
- Computer Science
- Web of Science ID
- WOS:001391383800001
- Scopus ID
- 2-s2.0-85212317194
- Other Identifier
- 991022008197004721
InCites Highlights
Data related to this publication, from InCites Benchmarking & Analytics tool:
- Collaboration types
- Domestic collaboration
- International collaboration
- Web of Science research areas
- Computer Science, Software Engineering