Logo image
Exploring Paraphrasing Techniques on Formal Language for Generating Semantics Preserving Source Code Transformations
Conference proceeding

Exploring Paraphrasing Techniques on Formal Language for Generating Semantics Preserving Source Code Transformations

Aviel J. Stein, Levi Kapllani, Spiros Mancoridis, Rachel Greenstadt and IEEE
2020 IEEE 14TH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC 2020), pp 242-248
01 Jan 2020

Abstract

Computer Science Computer Science, Artificial Intelligence Science & Technology Technology
Automatically identifying and generating equivalent semantic content to a word, phrase, or sentence is an important part of natural language processing (NLP). The research done so far in paraphrases in NLP has been focused exclusively on textual data, but has significant potential if it is applied to formal languages like source code. In this paper, we present a novel technique for generating source code transformations via the use of paraphrases. We explore how to extract and validate source code paraphrases. The transformations can be used for stylometry tasks and processes like refactoring. A machine learning method of identifying valid transformations has the advantage of avoiding the generation of transformations by hand and is more likely to have more valid transformations. Our data set is comprised by 27,300 C++ source code files, consisting of 273 topics each with 10 parallel files. This generates approximately 152,000 paraphrases. Of these paraphrases, 11% yield valid code transformations. We then train a random forest classifier that can identify valid transformations with 83% accuracy. In this paper we also discuss some of the observed relationships between linked paraphrase transformations. We depict the relationships that emerge between alternative equivalent code transformations in a graph formalism.

Metrics

17 Record Views
1 citations in Scopus

Details

InCites Highlights

Data related to this publication, from InCites Benchmarking & Analytics tool:

Collaboration types
Domestic collaboration
Web of Science research areas
Computer Science, Artificial Intelligence
Logo image