Tiered Escalation Pipeline
Most sites can be scraped with a fast, lightweight HTTP client. A smaller percentage require TLS fingerprint matching to pass network-layer checks. A still-smaller set need a real browser to execute JavaScript and render the page. Running a full browser for every request wastes CPU and memory.
Tiered escalation starts cheap and only pays the higher cost when the site actively blocks the lighter approach. The pipeline tries tiers in order and stops as soon as a satisfactory response is obtained.
Escalation tiers
| Tier | Name | When to use |
|---|---|---|
0 — HttpPlain | Standard HTTP | Most sites; lowest overhead |
1 — HttpTlsProfiled | HTTP + TLS fingerprint | Sites that JA3/JA4-fingerprint at the TCP layer |
2 — BrowserBasic | Headless Chrome, basic CDP stealth | JS-heavy sites without advanced anti-bot |
3 — BrowserAdvanced | Full stealth browser (all patches) | Cloudflare, DataDome, PerimeterX, Akamai |
Tiers are ordered — each higher tier is a strict superset of the previous one's capabilities and cost.
The EscalationPolicy trait
#![allow(unused)] fn main() { use stygian_graph::ports::escalation::{EscalationPolicy, EscalationTier, ResponseContext}; pub trait EscalationPolicy: Send + Sync { /// The tier to attempt first. fn initial_tier(&self) -> EscalationTier; /// Given a response, decide whether to escalate or accept. /// Return `Some(next_tier)` to escalate, `None` to accept the response. fn should_escalate( &self, ctx: &ResponseContext, current: EscalationTier, ) -> Option<EscalationTier>; /// The highest tier this policy may reach. fn max_tier(&self) -> EscalationTier; } }
ResponseContext carries the signals the policy uses to decide:
| Field | Description |
|---|---|
status | HTTP status code |
body_empty | Response body is empty |
has_cloudflare_challenge | Cloudflare, DataDome, or PerimeterX challenge detected |
has_captcha | reCAPTCHA, hCaptcha, or Turnstile widget detected |
DefaultEscalationPolicy
The built-in DefaultEscalationPolicy requires no custom trait implementation.
It combines automatic challenge detection with a per-domain learning cache.
Challenge detection
DefaultEscalationPolicy::context_from_body(status, body) inspects the response
body for well-known markers from all major vendors:
| Vendor | Detected by |
|---|---|
| Cloudflare | "Just a moment", cf-browser-verification, __cf_bm |
| DataDome | "datadome", dd_referrer |
| PerimeterX | _pxParam, _px.js, blockScript |
| reCAPTCHA / hCaptcha / Turnstile | Script tag markers |
All anti-bot challenges map to has_cloudflare_challenge: true, which triggers
escalation on status 403, 429, or any challenge/CAPTCHA detected.
Per-domain learning cache
When EscalatingScrapingService successfully reaches a domain at a tier above
base_tier, it records that tier in the policy's internal cache (TTL: 1 hour by default).
On the next request to the same domain the pipeline skips the tiers it knows won't
work, saving latency.
#![allow(unused)] fn main() { use std::time::Duration; use stygian_graph::adapters::escalation::{DefaultEscalationPolicy, EscalationConfig}; use stygian_graph::ports::escalation::EscalationTier; let policy = DefaultEscalationPolicy::new(EscalationConfig { // Allow escalation all the way to a full stealth browser. max_tier: EscalationTier::BrowserAdvanced, // Start from plain HTTP on the first request to an unknown domain. base_tier: EscalationTier::HttpPlain, // Cache that "this domain needs BrowserBasic" for 30 minutes. cache_ttl: Duration::from_secs(1_800), }); }
| Config field | Default | Description |
|---|---|---|
max_tier | BrowserAdvanced | Highest tier the policy may attempt |
base_tier | HttpPlain | Starting tier for unknown domains |
cache_ttl | 3 600 s (1 h) | How long domain-tier cache entries live |
EscalatingScrapingService
EscalatingScrapingService implements the ScrapingService port and wires the
policy to a set of concrete service implementations.
#![allow(unused)] fn main() { use std::sync::Arc; use stygian_graph::adapters::escalation::{ DefaultEscalationPolicy, EscalationConfig, EscalatingScrapingService, }; use stygian_graph::adapters::http::HttpAdapter; use stygian_graph::ports::escalation::EscalationTier; // 1. Build the policy. let policy = DefaultEscalationPolicy::new(EscalationConfig::default()); // 2. Register a concrete service for each tier you want available. let svc = EscalatingScrapingService::new(policy) .with_tier(EscalationTier::HttpPlain, Arc::new(HttpAdapter::new())) // Add HttpTlsProfiled and BrowserBasic/Advanced services here as needed. ; // 3. Register in the pipeline's service registry. // registry.register(Arc::new(svc)); }
If a tier has no service registered, the next available higher tier is used automatically — you do not need to configure every tier.
Metadata annotations
On success the service annotates the ServiceOutput metadata with two fields:
| Key | Example value |
|---|---|
escalation_tier | "browser_basic" |
escalation_path | ["http_plain","http_tls_profiled"] |
These are useful for observability dashboards and for diagnosing why a particular domain is consistently reaching higher tiers.
Escalation flow
flowchart TD
S([Request]) --> H[HTTP Plain]
H -- 200 OK, no challenge --> R([Accept])
H -- 403 / challenge / CAPTCHA --> T[HTTP TLS-Profiled]
T -- 200 OK --> R
T -- 403 / challenge --> B[Browser Basic]
B -- 200 OK --> R
B -- still blocked --> A[Browser Advanced]
A -- 200 OK --> R
A -- error at max tier --> E([Return error])
style R fill:#22c55e,color:#fff
style E fill:#ef4444,color:#fff
Graph pipeline integration
Register the service as "http_escalating" in the pipeline config. All pipeline
nodes that specify service = "http_escalating" will use it:
# examples/escalation-pipeline.toml
[[services]]
name = "http_escalating"
kind = "http_escalating"
# Optional: set max escalation tier and cache TTL
[services.escalation]
max_tier = "browser_advanced" # default
base_tier = "http_plain" # default
cache_ttl_secs = 1800
[[nodes]]
name = "fetch-protected"
service = "http_escalating"
url = "https://example.com/data"
Custom EscalationPolicy
For specialised logic (status-code allow-lists, per-domain overrides, etc.) you can
implement EscalationPolicy directly:
#![allow(unused)] fn main() { use stygian_graph::ports::escalation::{EscalationPolicy, EscalationTier, ResponseContext}; struct AggressivePolicy; impl EscalationPolicy for AggressivePolicy { fn initial_tier(&self) -> EscalationTier { // Always start from TLS-profiled HTTP — skip plain HTTP entirely. EscalationTier::HttpTlsProfiled } fn should_escalate( &self, ctx: &ResponseContext, current: EscalationTier, ) -> Option<EscalationTier> { // Escalate on any non-2xx response. if ctx.status < 200 || ctx.status >= 300 { current.next().filter(|&t| t <= self.max_tier()) } else { None } } fn max_tier(&self) -> EscalationTier { EscalationTier::BrowserBasic } } }
Detection landscape
Which escalation tier handles which detection vector:
| Detection vector | HttpPlain | HttpTlsProfiled | BrowserBasic | BrowserAdvanced |
|---|---|---|---|---|
| IP reputation / rate limit | — | — | — | — |
| TLS fingerprint (JA3/JA4) | ✗ | ✓ | ✓ | ✓ |
| Missing JavaScript execution | ✗ | ✗ | ✓ | ✓ |
navigator.webdriver flag | ✗ | ✗ | ✓ | ✓ |
| Canvas/WebGL fingerprint | ✗ | ✗ | ✗ | ✓ |
| CDP detection | ✗ | ✗ | partial | ✓ |
| Behavioural analysis | ✗ | ✗ | ✗ | ✓ |
IP reputation is orthogonal to tier — use sticky-session proxy rotation (see Sticky Sessions) in combination with escalation.