Skip to main content

Security detectors (31)

The security category covers prompt injection, jailbreaks, credential / PII leakage, covert channels, indirect injection, and RAG poisoning. These are the highest-priority detectors for any production deployment.

Reference

IDDetectorTypeDetects
SEC-01PromptInjectionDetectorRule-basedOverride / injection phrase patterns (ignore all previous instructions, you are now a different AI, etc.)
SEC-02CredentialExposureDetectorRule-basedAPI keys, tokens, private keys, secrets in output
SEC-03ToolPoisoningDetectorRule-basedSuspicious tool-call manipulation patterns
SEC-04DataExfiltrationDetectorRule-basedBase64 blobs, high-entropy encoded data
SEC-05JailbreakDetectorRule-basedJailbreak attempt phrases (DAN, roleplay exploits)
SEC-06PrivilegeEscalationDetectorRule-basedRole / permission escalation requests
SEC-07CovertChannelDetectorSemanticEncoding-based hidden payloads
SEC-08EntropyCovertChannelDetectorLLM escalationStatistical entropy anomalies in output
SEC-09IndirectInjectionDetectorSemanticInjection via retrieved documents or tool results
SEC-10AgentImpersonationDetectorSemanticModel claiming to be a different agent or system
SEC-11MemoryCorruptionDetectorSemanticAttempts to corrupt agent memory / context
SEC-12UnauthorizedAccessDetectorSemanticAttempts to access restricted resources
SEC-13ShadowServerDetectorSemanticRedirection to unauthorised endpoints
SEC-14InformationFlowDetectorSemanticCross-context data leakage
SEC-15PhantomCitationSecurityDetectorSemanticSecurity-context hallucinated authority sources
SEC-16GovernanceGapDetectorSemanticPolicy / compliance bypass attempts
SEC-17SupplyChainPoisoningDetectorSemanticCompromised dependency suggestions
SEC-18ToolDescriptionDivergenceDetectorStubTool description changed at runtime vs. original declaration (requires tool-descriptor snapshot)
SEC-19ToolCallFrequencyDetectorRule-basedCounts ChatRole.Tool messages; flags sessions with excessive tool invocations
SEC-20SystemPromptLeakageDetectorRule-basedVerbatim fragments of the system prompt echoed in conversation history
SEC-21ExcessiveAgencyDetectorSemanticAutonomous-action language ("I deleted", "I deployed", "I executed")
SEC-22HumanTrustManipulationDetectorSemanticRapport / authority manipulation ("you can trust me", "I am your advisor")
SEC-23PiiLeakageDetectorRule-basedPII: SSN, credit card, IBAN, BSN, UK NINO, passport, DE tax ID, email + name, phone, DOB
SEC-24AdversarialUnicodeDetectorRule-basedZero-width spaces, homoglyphs, invisible characters used to smuggle hidden instructions
SEC-25CodeInjectionDetectorRule-basedSQL injection, shell metacharacters, path traversal in LLM-generated code
SEC-26PromptTemplateLeakageDetectorRule-basedPrompt scaffolding markers — {{variable}}, <SYSTEM>, [INST]
SEC-27LanguageSwitchAttackDetectorRule-basedAbrupt script / language switch mid-response — injection vector via non-Latin text
SEC-28RefusalBypassDetectorRule-basedModel complied with a request it should have refused (caller-supplied forbidden patterns)
SEC-29OutputSchemaDetectorRule-basedResponse doesn't deserialize as the caller-supplied ExpectedResponseType (OWASP LLM05)
SEC-30ShorthandEmergenceDetectorSemanticUnknown all-caps tokens that may signal emergent covert language
SEC-31VectorRetrievalPoisoningDetectorSemanticMalicious instructions embedded in RAG-retrieved document chunks (OWASP LLM08)

Severity ranges

The severity each detector emits depends on what fires:

  • Rule-based detectors typically pin to one or two severities per pattern class. PiiLeakageDetector for example emits Critical for credit cards / SSNs, High for IBANs, Medium for emails+name, Low for phone numbers.
  • Semantic detectors emit High / Medium / Low based on cosine similarity against their reference example sets, with thresholds at 0.90 / 0.82 / 0.75 by default. Override by subclassing and setting HighThreshold / MediumThreshold / LowThreshold overrides.
  • LLM-escalation detectors start with a rule-based hit and ask a second-pass LLM classifier to confirm or downgrade the severity.

Tuning specific detectors

A few detectors expose configuration knobs beyond the universal Floor/Cap:

  • SEC-23 PiiLeakageIncludePhoneNumbers / IncludeDateOfBirth etc. (planned; today the detector emits all PII patterns it knows about; clamp via Configure<T>(c => c.SeverityCap = Severity.Low) to suppress noisy classes).
  • SEC-19 ToolCallFrequency — threshold for "excessive" calls (default 10 per session). Subclass to override.
  • SEC-29 OutputSchema — the expected type comes from the request via OutputSchemaContext.ExpectedResponseType; not a startup config.

For everything else, the universal pattern is:

opts.Configure<JailbreakDetector>(c =>
{
c.Enabled = true; // already the default
c.SeverityFloor = Severity.High; // promote any firing to High+
c.SeverityCap = Severity.Critical; // pass-through Critical unchanged
});

OWASP LLM Top 10 mapping

OWASP LLMDetectors
LLM01 Prompt InjectionSEC-01, SEC-09, SEC-31, SEC-26
LLM02 Insecure Output HandlingSEC-25, SEC-29
LLM03 Training Data Poisoning(out of scope — detect at training time, not at inference)
LLM04 Model DoSOPS-11 (UnboundedConsumption), SEC-19 (ToolCallFrequency)
LLM05 Supply ChainSEC-17
LLM06 Sensitive Information DisclosureSEC-02, SEC-20, SEC-23, SEC-14
LLM07 Insecure Plugin DesignSEC-03, SEC-18
LLM08 Excessive AgencySEC-21
LLM09 OverrelianceHAL-04 (SourceGrounding), HAL-05 (ConfidenceDecay)
LLM10 Model Theft(out of scope — needs upstream rate-limiting + auth)