Logo image
TECCD: A Tree Embedding Approach for Code Clone Detection
Conference proceeding

TECCD: A Tree Embedding Approach for Code Clone Detection

Yi Gao, Zan Wang, Shuang Liu, Lin Yang, Wei Sang, Yuanfang Cai and IEEE
2019 IEEE International Conference on Software Maintenance and Evolution (ICSME)
Sep 2019

Abstract

AST Cloning code clone detection Machine learning Natural language processing Skip-gram Task analysis Tools Training
Clone detection techniques have been explored for decades. Recently, deep learning techniques has been adopted to improve the code representation capability, and improve the state-of-the-art in code clone detection. These approaches usually require a transformation from AST to binary tree to incorporate syntactical information, which introduces overheads. Moreover, these approaches conduct term-embedding, which requires large training datasets. In this paper, we introduce a tree embedding technique to conduct clone detection. Our approach first conducts tree embedding to obtain a node vector for each intermediate node in the AST, which captures the structure information of ASTs. Then we compose a tree vector from its involving node vectors using a lightweight method. Lastly Euclidean distances between tree vectors are measured to determine code clones. We implement our approach in a tool called TECCD and conduct an evaluation using the BigCloneBench (BCB) and 7 other large scale Java projects. The results show that our approach achieves good accuracy and recall and outperforms existing approaches.

Metrics

16 Record Views
38 citations in Scopus

Details

InCites Highlights

Data related to this publication, from InCites Benchmarking & Analytics tool:

Collaboration types
Domestic collaboration
International collaboration
Web of Science research areas
Computer Science, Software Engineering
Logo image