Data exfiltration attempts
Cette page n'est pas encore disponible en français, sa traduction est en cours.
Si vous avez des questions ou des retours sur notre projet de traduction actuel,
n'hésitez pas à nous contacter.
Goal
Detect data exfiltration attempts on AI-enabled services. Data exfiltration attempts occur when an attacker probes an LLM to leak sensitive information, either directly or using indirect prompt injection techniques. This detection focuses on blocked attempts to identify reconnaissance and scanning activity.
Strategy
Monitor application security events for blocked data exfiltration activity using @ai_guard.attack_categories:data-exfiltration and @ai_guard.blocked:true. Generate a signal when repeated attempts are blocked within a short time window, indicating active reconnaissance or exploitation attempts.
Signal severity is determined as follows:
MEDIUM — Data exfiltration combined with indirect prompt injection was blocked multiple times (@ai_guard.attack_categories:indirect-prompt-injection @ai_guard.attack_categories:data-exfiltration @ai_guard.blocked:true). This indicates sophisticated attack techniques attempting to use poisoned context or tool outputs to exfiltrate data. All attempts were successfully blocked.LOW — Repeated data exfiltration attempts were blocked (@ai_guard.attack_categories:data-exfiltration @ai_guard.blocked:true). This indicates active scanning or reconnaissance targeting your AI service. All attempts were successfully blocked.
Triage and response
- Review the flagged requests to understand the attack patterns and techniques being used.
- Verify that AI Guard blocking is working correctly for the affected service or tool.
- For medium signals, investigate the indirect prompt injection source to identify where the malicious payload originated (e.g., tool output, retrieved documents, external content).
- Assess whether the volume of attempts indicates a targeted campaign or automated scanning.
- Consider blocking the source IPs temporarily to slow down reconnaissance activity.
- Review your AI service’s system prompts and input sanitization to ensure defense-in-depth beyond AI Guard.
- Monitor for related signals indicating potential escalation to successful exploitation.