Skip to main content

stygian_charon/vendor_classifier/
mod.rs

1//! Vendor fingerprinting confidence classifier (T89).
2//!
3//! Identifies likely anti-bot vendor(s) for a target and produces
4//! a confidence-scored evidence bundle for policy routing. The
5//! classifier consumes cookies, response headers, challenge URLs,
6//! and body markers; each piece of evidence is labelled by
7//! [`EvidenceSource`] so the diagnostic payload can be audited
8//! without re-running the match.
9//!
10//! ## Vendor taxonomy
11//!
12//! The four **Tier 1** vendors ship with signal catalogues
13//! embedded at compile time:
14//!
15//! | `VendorId`     | Display name                | TOML file                        |
16//! |----------------|-----------------------------|----------------------------------|
17//! | `DataDome`     | DataDome                    | `data/vendors/datadome.toml`     |
18//! | `PerimeterX`   | PerimeterX / HUMAN Security | `data/vendors/perimeter_x.toml`  |
19//! | `Akamai`       | Akamai Bot Manager          | `data/vendors/akamai.toml`       |
20//! | `Cloudflare`   | Cloudflare                  | `data/vendors/cloudflare.toml`   |
21//!
22//! Tier 2 vendors ([`VendorId::Hcaptcha`], [`VendorId::Recaptcha`],
23//! [`VendorId::Kasada`], [`VendorId::FingerprintCom`],
24//! [`VendorId::ShapeSecurity`], [`VendorId::Imperva`]) are present
25//! in the enum so downstream T88/T90 layers can name them, but no
26//! baseline signals ship for them. Operators register their own
27//! catalogues via [`VendorDefinition`].
28//!
29//! [`VendorId::Unknown`] is the catch-all when no vendor matched
30//! or no classification can be produced. It must remain the
31//! **last** variant so it sorts last in the deterministic
32//! tie-break rule.
33//!
34//! ## Determinism
35//!
36//! The classifier is fully deterministic:
37//!
38//! 1. Patterns are case-folded at load time and at the match site,
39//!    so a vendor's score is byte-stable across runs.
40//! 2. The top-score tie-break is **VendorId discriminant order**:
41//!    the lower the variant is declared in [`VendorId`], the higher
42//!    its priority when scores are equal.
43//! 3. Confidence is `top_score / (top_score + second_score)`, so
44//!    a single matched vendor always reports `1.0`.
45//! 4. The `ranked` output is a `Vec` sorted by `(score DESC,
46//!    VendorId ASC)`. The `evidence` bundle is sorted by
47//!    `(source, signal)` and deduplicated so the JSON form is
48//!    byte-stable.
49//!
50//! ## High-confidence threshold
51//!
52//! The classifier carries a configurable threshold
53//! ([`DEFAULT_HIGH_CONFIDENCE_THRESHOLD`] = 0.60). The
54//! [`VendorClassification::is_high_confidence`] flag is set when
55//! the top vendor's confidence crosses the threshold. Callers can
56//! override the threshold via
57//! [`VendorClassifier::with_threshold`].
58//!
59//! ## Feature flag
60//!
61//! The module is **default-on** and lives in
62//! `crates/stygian-charon/src/vendor_classifier/`. It adds two new
63//! public types ([`VendorClassification`] and the underlying
64//! [`VendorScore`]) and a single additive field on
65//! [`crate::bundle::DiagnosticBundle`] (gated by
66//! `#[serde(default, skip_serializing_if = "Option::is_none")]`).
67//! No new feature gate is introduced because the additions are
68//! purely additive.
69//!
70//! # Example
71//!
72//! ```
73//! use stygian_charon::vendor_classifier::{VendorClassifier, VendorId, EvidenceSource};
74//! use std::collections::BTreeMap;
75//!
76//! let classifier = VendorClassifier::with_builtin_defaults();
77//! let cookies = vec!["datadome=xyz; path=/".to_string()];
78//! let mut headers = BTreeMap::new();
79//! headers.insert("x-datadome".to_string(), "protected".to_string());
80//! headers.insert("x-datadome-cid".to_string(), "abc".to_string());
81//! let body = Some("captcha-delivery.com iframe");
82//! let url = "https://www.example.com/cdn-cgi/challenge-platform";
83//!
84//! let classification = classifier.classify(&cookies, &headers, body, url);
85//! assert_eq!(classification.top_vendor, VendorId::DataDome);
86//! assert!(classification.is_high_confidence);
87//! assert!(classification.evidence.source_summary.contains_key(&EvidenceSource::Cookie));
88//! ```
89
90mod builtins;
91mod classifier;
92mod error;
93mod evidence;
94mod vendor;
95
96pub use classifier::{
97    DEFAULT_HIGH_CONFIDENCE_THRESHOLD, VendorClassification, VendorClassifier, VendorScore,
98};
99pub use error::VendorError;
100pub use evidence::{Evidence, EvidenceBundle, EvidenceSource};
101pub use vendor::{VendorDefinition, VendorId, VendorSignal, parse_vendor_definition};