Prompt Injection Defenses

Faculty Mentor Name

Sameer Abufardeh

Format Preference

Poster

Abstract

Large Language Models (LLMs) are increasingly embedded in everyday software systems such as customer service chatbots, educational tools, and productivity applications. These systems rely on built-in constraints to ensure safe and reliable outputs, yet such guardrails remain vulnerable to prompt-injection attacks—inputs crafted to override a model’s intended instructions. Prompt injection is among the most prevalent and dangerous attack vectors, enabling malicious users to bypass safeguards and elicit unintended or sensitive information.

As LLM-based systems become more widespread, addressing these vulnerabilities is increasingly critical. This project evaluates the susceptibility of LLM-based systems to prompt-injection attacks and develops effective, intuitive countermeasures to reduce their impact. We analyze how different attack categories, including code injection, role-playing, and indirect injection, affect model behavior and measure their success rates. In response, we propose a defense mechanism that is both practical and easily integrated into existing systems.

Addressing the gap between academic research and real-world deployment, this project emphasizes simplicity and accessibility over complex or resource-intensive defenses. The outcomes include a categorized dataset of prompt-injection attacks (safe for research use), quantitative evaluations of attack success rates, and an assessment of the proposed defense strategy.

Share

COinS
 

Prompt Injection Defenses

Large Language Models (LLMs) are increasingly embedded in everyday software systems such as customer service chatbots, educational tools, and productivity applications. These systems rely on built-in constraints to ensure safe and reliable outputs, yet such guardrails remain vulnerable to prompt-injection attacks—inputs crafted to override a model’s intended instructions. Prompt injection is among the most prevalent and dangerous attack vectors, enabling malicious users to bypass safeguards and elicit unintended or sensitive information.

As LLM-based systems become more widespread, addressing these vulnerabilities is increasingly critical. This project evaluates the susceptibility of LLM-based systems to prompt-injection attacks and develops effective, intuitive countermeasures to reduce their impact. We analyze how different attack categories, including code injection, role-playing, and indirect injection, affect model behavior and measure their success rates. In response, we propose a defense mechanism that is both practical and easily integrated into existing systems.

Addressing the gap between academic research and real-world deployment, this project emphasizes simplicity and accessibility over complex or resource-intensive defenses. The outcomes include a categorized dataset of prompt-injection attacks (safe for research use), quantitative evaluations of attack success rates, and an assessment of the proposed defense strategy.