Detector overview
AI.Sentinel ships with 55 built-in detectors across three categories:
| Category | Count | Purpose |
|---|---|---|
| Security | 31 | Prompt injection, jailbreaks, PII / credential leakage, covert channels, indirect injection, RAG poisoning |
| Hallucination | 9 | Phantom citations, fabricated authorities, contradictions, stale knowledge, confidence decay |
| Operational | 15 | Repetition loops, blank responses, truncated output, language switches, persona drift, sycophancy |
Detector modes
Every detector falls into one of three execution modes:
- Rule-based — fast regex or heuristic. Always active. Sub-microsecond per call.
- Semantic — uses embedding cosine similarity via
IEmbeddingGenerator. Language-agnostic. No-op untilopts.EmbeddingGeneratoris configured. - LLM escalation — fires a second-pass LLM classifier. No-op until
opts.EscalationClientis configured. Used for ambiguous or low-confidence rule-based hits.
Severity model
Each detector returns a DetectionResult carrying a Severity (None, Low, Medium, High, Critical) and a reason string. The pipeline aggregates per-detector severities into a Threat Risk Score (0–100) that drives the Intervention Engine.
Detector ID convention
Built-in detectors use three prefixes:
SEC-NN— securityHAL-NN— hallucinationOPS-NN— operational
Custom detectors authored via opts.AddDetector<T>() must use a different prefix to avoid collisions with future official detectors. Examples: ACME-01, MYORG-CUSTOM-01.
Tuning
Every detector — built-in or custom — can be disabled or have its severity output clamped via opts.Configure<T>(c => ...). Floor and Cap apply only to firing results; Clean results pass through unchanged.
opts.Configure<WrongLanguageDetector>(c => c.Enabled = false);
opts.Configure<JailbreakDetector>(c => c.SeverityFloor = Severity.High);
opts.Configure<RepetitionLoopDetector>(c => c.SeverityCap = Severity.Low);
Where to next
- Security detectors — 31 detectors
- Hallucination detectors — 9 detectors
- Operational detectors — 15 detectors
- Writing a custom detector — IDetector contract + the SDK