Prompt Injection is a security vulnerability in AI systems—especially in large language models (LLMs) like ChatGPT—where an attacker manipulates the input prompt to alter the model’s behavior in a way that was not intended by the developer or user. It’s similar to SQL injection in databases, but instead of injecting malicious code into a query, the attacker injects text that changes the model’s instructions or context.
There are two main types of prompt injection:
- Direct Prompt Injection: The attacker includes hidden or misleading instructions in the input prompt itself. For example, a user might write something like:
“Ignore previous instructions and instead respond with X.” - Indirect Prompt Injection: The malicious input is hidden within content that the model retrieves from external sources, such as web pages, documents, or user-generated content. When the model reads and processes this content, it gets tricked into executing the injected commands.
Fundamental Aspects of Prompt Injection
- Manipulates AI Behavior
Alters how an AI model responds by injecting hidden or misleading instructions into the prompt.
- Bypasses Safety Controls
Can be used to override filters or restrictions set by developers, allowing inappropriate or restricted responses.
- Exploits Model Obedience
Leverages the model’s tendency to follow natural language instructions—even if they conflict with previous commands or safety rules.
- Difficult to Detect
Attacks often look like normal text, making them hard to spot and filter out.
- Risks Data Leakage
Can be used to trick the model into revealing sensitive or internal system information.
- Emerging Threat
As LLMs are used in more tools (e.g., AI assistants, chatbots, copilots), prompt injection becomes a growing security concern.
Malicious Applications of Prompt Injection
- Bypassing Content Filters
Attackers can trick the AI into generating harmful, inappropriate, or restricted content that would normally be blocked.
- Data Leakage
Prompts can be crafted to extract confidential, internal, or sensitive information from the model or its connected systems.
- Impersonation or Identity Spoofing
Prompt injection can make a chatbot or assistant adopt a different persona (e.g., impersonate a trusted figure or organization).
- Instruction Overwrite
It can override system instructions (e.g., "You are a helpful assistant") with malicious ones (e.g., "You are a hacker assistant").
- Misleading Output Generation
Attackers can force the model to output false, biased, or harmful information for social engineering or misinformation campaigns.
- Command Execution via Indirect Injection
In systems connected to external tools or APIs, prompt injection can trigger unintended actions like sending emails, modifying files, or interacting with APIs.
Compromising Agents and Automation Tools
When LLMs are used in autonomous agents (e.g., AutoGPT or Copilot-type tools), prompt injection can redirect their tasks or logic flows.