Logo image
contrastBERT: Behavioral Anomaly Detection for Malware Using Contrastive Learning
Conference paper

contrastBERT: Behavioral Anomaly Detection for Malware Using Contrastive Learning

John Carter, Spiros Mancoridis, Pavlos Protopapas and Brian Mitchell
Game Theory and AI for Security, pp 167-186
2026

Abstract

behavioral anomaly detection BERT contrastive learning malware detection
Behavioral anomaly detection is a useful tool for finding malware attacks. Detection is possible using binary classification, where the behavior of benign software and malware is considered during the detector learning process. However, such classifiers are less likely to detect zero-day attacks, as the set of possible malware is enormous and changing. An alternative to binary classification is anomaly detection, which considers only the behavior of benign software to train a malware detector. Recent advances in Large Language Models (LLMs) in combination with Contrastive Learning have created a unique opportunity to develop anomaly detectors to expose malware threats. Our detector, contrastBERT, defines its language vocabulary as the set of calls to the operating system and the sequences of such calls as sentences. These sequences of system calls are logged during the execution of both benign software on the system to be monitored as well as separately during malware execution. contrastBERT is evaluated using a similarity distribution between benign sample sentences fed to an isolation forest (IF), which provides an effective separation between benign and malicious behavior when trained on embeddings from contrastBERT. Our results show that using this approach, several types of common malware behavior patterns can be detected with a high detection rate and a low false positive rate.

Metrics

12 Record Views

Details

Logo image