Tiered Escalation Pipeline

Most sites can be scraped with a fast, lightweight HTTP client. A smaller percentage require TLS fingerprint matching to pass network-layer checks. A still-smaller set need a real browser to execute JavaScript and render the page. Running a full browser for every request wastes CPU and memory.

Tiered escalation starts cheap and only pays the higher cost when the site actively blocks the lighter approach. The pipeline tries tiers in order and stops as soon as a satisfactory response is obtained.


Escalation tiers

TierNameWhen to use
0 — HttpPlainStandard HTTPMost sites; lowest overhead
1 — HttpTlsProfiledHTTP + TLS fingerprintSites that JA3/JA4-fingerprint at the TCP layer
2 — BrowserBasicHeadless Chrome, basic CDP stealthJS-heavy sites without advanced anti-bot
3 — BrowserAdvancedFull stealth browser (all patches)Cloudflare, DataDome, PerimeterX, Akamai

Tiers are ordered — each higher tier is a strict superset of the previous one's capabilities and cost.


The EscalationPolicy trait

#![allow(unused)]
fn main() {
use stygian_graph::ports::escalation::{EscalationPolicy, EscalationTier, ResponseContext};

pub trait EscalationPolicy: Send + Sync {
    /// The tier to attempt first.
    fn initial_tier(&self) -> EscalationTier;

    /// Given a response, decide whether to escalate or accept.
    /// Return `Some(next_tier)` to escalate, `None` to accept the response.
    fn should_escalate(
        &self,
        ctx: &ResponseContext,
        current: EscalationTier,
    ) -> Option<EscalationTier>;

    /// The highest tier this policy may reach.
    fn max_tier(&self) -> EscalationTier;
}
}

ResponseContext carries the signals the policy uses to decide:

FieldDescription
statusHTTP status code
body_emptyResponse body is empty
has_cloudflare_challengeCloudflare, DataDome, or PerimeterX challenge detected
has_captchareCAPTCHA, hCaptcha, or Turnstile widget detected

DefaultEscalationPolicy

The built-in DefaultEscalationPolicy requires no custom trait implementation. It combines automatic challenge detection with a per-domain learning cache.

Challenge detection

DefaultEscalationPolicy::context_from_body(status, body) inspects the response body for well-known markers from all major vendors:

VendorDetected by
Cloudflare"Just a moment", cf-browser-verification, __cf_bm
DataDome"datadome", dd_referrer
PerimeterX_pxParam, _px.js, blockScript
reCAPTCHA / hCaptcha / TurnstileScript tag markers

All anti-bot challenges map to has_cloudflare_challenge: true, which triggers escalation on status 403, 429, or any challenge/CAPTCHA detected.

Per-domain learning cache

When EscalatingScrapingService successfully reaches a domain at a tier above base_tier, it records that tier in the policy's internal cache (TTL: 1 hour by default). On the next request to the same domain the pipeline skips the tiers it knows won't work, saving latency.

#![allow(unused)]
fn main() {
use std::time::Duration;
use stygian_graph::adapters::escalation::{DefaultEscalationPolicy, EscalationConfig};
use stygian_graph::ports::escalation::EscalationTier;

let policy = DefaultEscalationPolicy::new(EscalationConfig {
    // Allow escalation all the way to a full stealth browser.
    max_tier:  EscalationTier::BrowserAdvanced,
    // Start from plain HTTP on the first request to an unknown domain.
    base_tier: EscalationTier::HttpPlain,
    // Cache that "this domain needs BrowserBasic" for 30 minutes.
    cache_ttl: Duration::from_secs(1_800),
});
}
Config fieldDefaultDescription
max_tierBrowserAdvancedHighest tier the policy may attempt
base_tierHttpPlainStarting tier for unknown domains
cache_ttl3 600 s (1 h)How long domain-tier cache entries live

EscalatingScrapingService

EscalatingScrapingService implements the ScrapingService port and wires the policy to a set of concrete service implementations.

#![allow(unused)]
fn main() {
use std::sync::Arc;
use stygian_graph::adapters::escalation::{
    DefaultEscalationPolicy, EscalationConfig, EscalatingScrapingService,
};
use stygian_graph::adapters::http::HttpAdapter;
use stygian_graph::ports::escalation::EscalationTier;

// 1. Build the policy.
let policy = DefaultEscalationPolicy::new(EscalationConfig::default());

// 2. Register a concrete service for each tier you want available.
let svc = EscalatingScrapingService::new(policy)
    .with_tier(EscalationTier::HttpPlain, Arc::new(HttpAdapter::new()))
    // Add HttpTlsProfiled and BrowserBasic/Advanced services here as needed.
    ;

// 3. Register in the pipeline's service registry.
// registry.register(Arc::new(svc));
}

If a tier has no service registered, the next available higher tier is used automatically — you do not need to configure every tier.

Metadata annotations

On success the service annotates the ServiceOutput metadata with two fields:

KeyExample value
escalation_tier"browser_basic"
escalation_path["http_plain","http_tls_profiled"]

These are useful for observability dashboards and for diagnosing why a particular domain is consistently reaching higher tiers.


Escalation flow

flowchart TD
    S([Request]) --> H[HTTP Plain]
    H -- 200 OK, no challenge --> R([Accept])
    H -- 403 / challenge / CAPTCHA --> T[HTTP TLS-Profiled]
    T -- 200 OK --> R
    T -- 403 / challenge --> B[Browser Basic]
    B -- 200 OK --> R
    B -- still blocked --> A[Browser Advanced]
    A -- 200 OK --> R
    A -- error at max tier --> E([Return error])

    style R fill:#22c55e,color:#fff
    style E fill:#ef4444,color:#fff

Graph pipeline integration

Register the service as "http_escalating" in the pipeline config. All pipeline nodes that specify service = "http_escalating" will use it:

# examples/escalation-pipeline.toml

[[services]]
name = "http_escalating"
kind = "http_escalating"

# Optional: set max escalation tier and cache TTL
[services.escalation]
max_tier  = "browser_advanced"   # default
base_tier = "http_plain"         # default
cache_ttl_secs = 1800

[[nodes]]
name    = "fetch-protected"
service = "http_escalating"
url     = "https://example.com/data"

Custom EscalationPolicy

For specialised logic (status-code allow-lists, per-domain overrides, etc.) you can implement EscalationPolicy directly:

#![allow(unused)]
fn main() {
use stygian_graph::ports::escalation::{EscalationPolicy, EscalationTier, ResponseContext};

struct AggressivePolicy;

impl EscalationPolicy for AggressivePolicy {
    fn initial_tier(&self) -> EscalationTier {
        // Always start from TLS-profiled HTTP — skip plain HTTP entirely.
        EscalationTier::HttpTlsProfiled
    }

    fn should_escalate(
        &self,
        ctx: &ResponseContext,
        current: EscalationTier,
    ) -> Option<EscalationTier> {
        // Escalate on any non-2xx response.
        if ctx.status < 200 || ctx.status >= 300 {
            current.next().filter(|&t| t <= self.max_tier())
        } else {
            None
        }
    }

    fn max_tier(&self) -> EscalationTier {
        EscalationTier::BrowserBasic
    }
}
}

Detection landscape

Which escalation tier handles which detection vector:

Detection vectorHttpPlainHttpTlsProfiledBrowserBasicBrowserAdvanced
IP reputation / rate limit
TLS fingerprint (JA3/JA4)
Missing JavaScript execution
navigator.webdriver flag
Canvas/WebGL fingerprint
CDP detectionpartial
Behavioural analysis

IP reputation is orthogonal to tier — use sticky-session proxy rotation (see Sticky Sessions) in combination with escalation.