GPT models are 10% off from 31st March PDT.Try it now!

Artificial IntelligenceSecurity

Prompt Injection

Prompt Injection is a security vulnerability in AI systems—especially in large language models (LLMs) like ChatGPT—where an attacker manipulates the input prompt to alter the model's behavior in a way that was not intended by the developer or user.

Two Main Types:

  1. Direct Prompt Injection: The attacker includes hidden or misleading instructions in the input prompt itself. For example, a user might write something like: "Ignore previous instructions and instead respond with X."
  2. Indirect Prompt Injection: The malicious input is hidden within content that the model retrieves from external sources, such as web pages, documents, or user-generated content.

Fundamental Aspects

  • Manipulates AI Behavior
  • Bypasses Safety Controls
  • Exploits Model Obedience
  • Difficult to Detect
  • Risks Data Leakage
  • Emerging Threat

Malicious Applications

  • Bypassing Content Filters
  • Data Leakage
  • Impersonation or Identity Spoofing
  • Instruction Overwrite
  • Misleading Output Generation
  • Command Execution via Indirect Injection
  • Compromising Agents and Automation Tools

FAQ

Prompt injection is a security vulnerability where an attacker manipulates the input prompt to change an AI model's behavior in unintended ways.