Logo image
Linguistic Approach to Segmenting Source Code
Conference proceeding

Linguistic Approach to Segmenting Source Code

Aviel J. Stein, Daniel Schwartz, Yiwen Shi and Spiros Mancoridis
2022 IEEE 16th International Conference on Semantic Computing (ICSC), pp 177-178
Jan 2022

Abstract

Big Data Codes Deep Learning Java Linguistics Natural Language Neural networks Segmentation Semantics Source Code Analysis Syntactics Training
Source code segmentation is the process of dividing the source code of a program into meaningful pieces, such as in preparation for source code analysis (SCA) tasks. Our goal is to segment code based on the semantics of its content. Specifically such that the segments reflect logical locations that are good candidates for the insertion of manually composed comments or automatically generated comments. Instead of focusing on syntactic boundaries for segmentation, such as function and class declarations, we exploit the semantic content of the code. We use code snippets mined from Github as known semantic segments to train a LSTM Neural Network model. It is able to infer locations in the code where a programmer would likely insert comments. The model can operate on any text and performs well across multiple programming languages for detecting candidate segment boundaries within a program. This semantic code segmentation is especially useful for incomplete code repositories under development, which may be also written in more than one programming language. Additionally, our technique supports a detection threshold parameter so users can adjust the number of suggestions provided by our tool.

Metrics

23 Record Views
3 citations in Scopus

Details

InCites Highlights

Data related to this publication, from InCites Benchmarking & Analytics tool:

Web of Science research areas
Computer Science, Artificial Intelligence
Logo image