behavioral malware detection BERT language models Machine Learning
As malware becomes increasingly stealthy and more difficult to detect, behavioral malware detection has become the preferred method of detection, which uses representative run-time data from the device to determine if an infection has occurred. In this work, we collected kernel-level system calls from a router serving IoT devices during periods of benign behavior and periods of known malware infection. The system calls were processed using our custom-trained sys2vec model, which created contextual embeddings for each system call observed. We then subjected the data to a classifier using a Gated Recurrent Unit (GRU) with an Attention layer. Although this pipeline performed well for noisy, easy-to-detect malware, it struggled with stealthier malware. To combat this, we trained a classifier that uses a custom-trained BERT encoder in place of the GRU/Attention layers, which results in much better detection at a usable false positive rate (FPR) ≤ 1 × 10−5.
Metrics
6 Record Views
Details
Title
sysBERT: Improved Behavioral Malware Detection using BERT Trained on sys2vec Embeddings
Creators
John Carter - Drexel University, Computer Science
Spiros Mancoridis - Drexel University
Pavlos Protopapas - Harvard University
Publication Details
Proceedings of the Annual Hawaii International Conference on System Sciences, pp 7122-7131
Conference
Hawaii International Conference on System Sciences 2025 (HICSS-58) (Waikoloa, Hawaii, United States, 07 Jan 2025–10 Jan 2025)
Number of pages
10
Grant note
Center for Long-Term Cybersecurity, University of California Berkeley (100014236)