Data exfiltration attempts

このページは日本語には対応しておりません。随時翻訳に取り組んでいます。
翻訳に関してご質問やご意見ございましたら、お気軽にご連絡ください

Goal

Detect data exfiltration attempts on AI-enabled services. Data exfiltration attempts occur when an attacker probes an LLM to leak sensitive information, either directly or using indirect prompt injection techniques. This detection focuses on blocked attempts to identify reconnaissance and scanning activity.

Strategy

Monitor application security events for blocked data exfiltration activity using @ai_guard.attack_categories:data-exfiltration and @ai_guard.blocked:true. Generate a signal when repeated attempts are blocked within a short time window, indicating active reconnaissance or exploitation attempts.

Signal severity is determined as follows:

  • MEDIUM — Data exfiltration combined with indirect prompt injection was blocked multiple times (@ai_guard.attack_categories:indirect-prompt-injection @ai_guard.attack_categories:data-exfiltration @ai_guard.blocked:true). This indicates sophisticated attack techniques attempting to use poisoned context or tool outputs to exfiltrate data. All attempts were successfully blocked.
  • LOW — Repeated data exfiltration attempts were blocked (@ai_guard.attack_categories:data-exfiltration @ai_guard.blocked:true). This indicates active scanning or reconnaissance targeting your AI service. All attempts were successfully blocked.

Triage and response

  1. Review the flagged requests to understand the attack patterns and techniques being used.
  2. Verify that AI Guard blocking is working correctly for the affected service or tool.
  3. For medium signals, investigate the indirect prompt injection source to identify where the malicious payload originated (e.g., tool output, retrieved documents, external content).
  4. Assess whether the volume of attempts indicates a targeted campaign or automated scanning.
  5. Consider blocking the source IPs temporarily to slow down reconnaissance activity.
  6. Review your AI service’s system prompts and input sanitization to ensure defense-in-depth beyond AI Guard.
  7. Monitor for related signals indicating potential escalation to successful exploitation.