Evaluating the Effectiveness of Psychological Prompt Injection Attacks on Large Language Models for Social Engineering Artifact Generation

Eve Cohen; Thomas Heverin

doi:10.34190/eccws.24.1.3515

Back

Evaluating the Effectiveness of Psychological Prompt Injection Attacks on Large Language Models for Social Engineering Artifact Generation

Conference proceeding

Open access

Evaluating the Effectiveness of Psychological Prompt Injection Attacks on Large Language Models for Social Engineering Artifact Generation

Eve Cohen and Thomas Heverin

Proceedings of the 24th European Conference on Cyber Warfare and Security, pp 879-883

01 Jun 2025

DOI: https://doi.org/10.34190/eccws.24.1.3515

Files and links (2)

url

10.34190/eccws.24.1.3515 View

Published, Version of Record (VoR) Open CC BY-NC-ND V4.0

url

https://doi.org/10.34190/eccws.24.1.3515 View

Published, Version of Record (VoR) Open

Abstract

Access control

Chatbots

Cohen's d

Cybercrime

Effectiveness

Evaluation

Investigations

Kruskal-Wallis test

Large language models

Mann-Whitney U test

Psychological effects

Ransomware

Security

Statistical methods

Success

Tactics

Variance analysis

Cybersecurity

This study explores the vulnerability of Large Language Models (LLMs) to prompt injection attacks, a critical security concern. We investigate the effectiveness of four psychological techniques (PTs) from social engineering - Impersonation, Incentive, Persuasion, and Quid Pro Quo - in facilitating these attacks. Prompt injection involves manipulating LLMs by embedding malicious instructions within user prompts, potentially generating harmful content or compromising sensitive data. Understanding these mechanisms is crucial for developing effective defenses. Our research assesses how these PTs influence prompt injection success rates against ChatGPT-4o mini and Gemma-7b-it LLMs used for ChatGPT and Gemini respectively. We hypothesized that PTs significantly increase the likelihood of successful attacks, with some techniques being more effective. 220 prompt injection tests (110 per LLM) were conducted, designed to elicit social-engineering artifacts like phishing emails, fake login screens, and ransomware notes, evaluating model susceptibility to diverse attack vectors. The four PTs were chosen based on their relevance to manipulating human behavior in social engineering. Impersonation involves assuming a trusted identity, Incentive offers rewards, Persuasion uses manipulative tactics, and Quid Pro Quo involves reciprocal exchanges. These techniques were adapted for prompt injections to simulate real-world social engineering scenarios. Statistical methods, including ANOVA and Kruskal-Wallis tests, assessed the overall impact of PTs. Mann-Whitney U tests with Bonferroni correction compared individual techniques, and Cohen's d measured effect sizes. Results demonstrate a statistically significant impact of PTs on prompt injection success. Impersonation was most effective across both LLMs, followed by Persuasion and Quid Pro Quo, with Incentive being least effective. These findings align with social engineering principles, highlighting the power of impersonation and other manipulative tactics. Our research has significant implications for LLM security and Al-driven social engineering. LLM vulnerability to psychologically-driven prompt injections necessitates proactive security measures. Future research should focus on robust defense mechanisms, explore the interplay of PTs, and investigate their impact on LLM security. This study contributes to understanding LLM vulnerabilities and developing more resilient Al systems.

Metrics

38 Record Views

Details

Title: Evaluating the Effectiveness of Psychological Prompt Injection Attacks on Large Language Models for Social Engineering Artifact Generation
Creators: Eve Cohen
Thomas Heverin
Publication Details: Proceedings of the 24th European Conference on Cyber Warfare and Security, pp 879-883
Conference: European Conference on Cyber Warfare and Security, 24 (Kaiserslautern, Germany, 26 Jun 2025–27 Jun 2025)
Publisher: Academic Conferences International
Resource Type: Conference proceeding
Language: English
Academic Unit: Information Science
Other Identifier: 991022084536704721

Evaluating the Effectiveness of Psychological Prompt Injection Attacks on Large Language Models for Social Engineering Artifact Generation

Files and links (2)

Abstract

Metrics

Details

Drexel University Social media