Conference proceeding
On the Use of Discretized Source Code Metrics for Author Identification
1ST INTERNATIONAL SYMPOSIUM ON SEARCH BASED SOFTWARE ENGINEERING, PROCEEDINGS
01 Jan 2009
Abstract
Intellectual property infringement and plagiarism litigation involving source code would be more easily resolved using code authorship identification tools. Previous efforts in this area have demonstrated the potential of determining the authorship of a disputed piece of source code automatically, This was achieved by using source code metrics to build a database of developer profiles, thus characterizing a population of developers. These profiles were then used to determine the likelihood that the unidentified source code was authored by a given developer.
In this paper we evaluate the effect of discretizing source code metrics for use in building developer profiles. It is well known that machine learning techniques perform better when using categorical variables as opposed to continuous ones. We present a genetic algorithm to discretize metrics to improve source code to author classification. We evaluate the approach with a case study involving 20 open source developers and over 750,000 lines of Java source code.
Metrics
6 Record Views
Details
- Title
- On the Use of Discretized Source Code Metrics for Author Identification
- Creators
- Maxim Shevertalov - Drexel UniversityJay Kothari - Drexel UniversityEdward Stehle - Drexel UniversitySpiros Mancoridis - Drexel University
- Contributors
- M DiPenta (Editor)S Poulding (Editor)
- Publication Details
- 1ST INTERNATIONAL SYMPOSIUM ON SEARCH BASED SOFTWARE ENGINEERING, PROCEEDINGS
- Publisher
- IEEE
- Number of pages
- 10
- Resource Type
- Conference proceeding
- Language
- English
- Academic Unit
- Computer Science (Computing)
- Identifiers
- 991019167583504721
UN Sustainable Development Goals (SDGs)
This output has contributed to the advancement of the following goals:
Source: InCites
InCites Highlights
These are selected metrics from InCites Benchmarking & Analytics tool, related to this output
- Web of Science research areas
- Computer Science, Software Engineering
- Engineering, Electrical & Electronic