Conference proceeding
Using Code Metric Histograms and Genetic Algorithms to Perform Author Identification for Software Forensics
GECCO 2007 : Genetic and Evolutionary Computation Conference, July 7-11, 2007 University College London, London, UK, pp 2082-2089
01 Jan 2007
Abstract
We have developed a technique to characterize software developers styles using a set of source code metrics. This style fingerprint can be used to identify the likely author of a piece of code from a pool of candidates. Author identification has applications in criminal justice, corporate litigation, and plagiarism detection. Furthermore, we call identify candidate developers who share similar styles, making our technique useful For software maintenance as well. Our method involves measuring the differences in histogram distributions for code metrics.
Identifying a combination of metrics that is effective in distinguishing developer styles is key to the utility of the technique. Our case study involves 18 metrics, and the. time involved in exhaustive searching of the problem space prevented Lis from adding additional metrics. Using a genetic algorithm to perform the search, we were able to find good metric combinations in hours as opposed to weeks. The genetic algorithm has enabled its to begin adding new metrics to our catalog of available metrics. This paper documents the results of our experiments in author identification for software forensics and outlines future directions of research to improve the utility of our method.
Metrics
7 Record Views
60 citations in Scopus
Details
- Title
- Using Code Metric Histograms and Genetic Algorithms to Perform Author Identification for Software Forensics
- Creators
- Robert Lange - Drexel Univ, Dept Comp Sci, Philadelphia, PA 19104 USASpiros Mancoridis - Drexel University, Computer Science
- Publication Details
- GECCO 2007 : Genetic and Evolutionary Computation Conference, July 7-11, 2007 University College London, London, UK, pp 2082-2089
- Publisher
- Association for Computing Machinery
- Number of pages
- 8
- Resource Type
- Conference proceeding
- Language
- English
- Academic Unit
- Computer Science
- Web of Science ID
- WOS:000268226900379
- Scopus ID
- 2-s2.0-34548108422
- Other Identifier
- 991014878066304721