Artificial IntelligenceSecurity
Prompt Injection
Prompt Injection is a security vulnerability in AI systems—especially in large language models (LLMs) like ChatGPT—where an attacker manipulates the input prompt to alter the model's behavior in a way that was not intended by the developer or user.
Two Main Types:
- Direct Prompt Injection: The attacker includes hidden or misleading instructions in the input prompt itself. For example, a user might write something like: "Ignore previous instructions and instead respond with X."
- Indirect Prompt Injection: The malicious input is hidden within content that the model retrieves from external sources, such as web pages, documents, or user-generated content.
Fundamental Aspects
- Manipulates AI Behavior
- Bypasses Safety Controls
- Exploits Model Obedience
- Difficult to Detect
- Risks Data Leakage
- Emerging Threat
Malicious Applications
- Bypassing Content Filters
- Data Leakage
- Impersonation or Identity Spoofing
- Instruction Overwrite
- Misleading Output Generation
- Command Execution via Indirect Injection
- Compromising Agents and Automation Tools
FAQ
Prompt injection is a security vulnerability where an attacker manipulates the input prompt to change an AI model's behavior in unintended ways.