Skip to main content

Crate stygian_browser

Crate stygian_browser 

Source
Expand description

§stygian-browser

§stygian-browser

High-performance, anti-detection browser automation library for Rust.

License: AGPL v3 Coverage

Built on the Chrome DevTools Protocol via chromiumoxide with comprehensive stealth features for bypassing modern anti-bot systems: Cloudflare, DataDome, PerimeterX, Akamai.


§Features

FeatureDescriptionDefault
stealthNavigation spoofing, canvas noise, WebGL randomization, CDP protection
tls-configTLS fingerprint profiling via rustls (requires stealth)
mcpMCP (Model Context Protocol) tools
mcp-attachAttach to an existing browser via CDP WebSocket (browser_attach tool; requires mcp)
metricsPrometheus metrics exporter
extractStructured data extraction via #[derive(Extract)]
similaritySimilarity scoring for duplicate detection
browserbaseOptional Browserbase-managed acquisition stage integration (requires MCP/runtime opt-in)
fullAll features enabled

§Installation

[dependencies]
stygian-browser = "*"
tokio = { version = "1", features = ["full"] }

Enable (or disable) stealth features:

[dependencies]
# stealth is the default feature; disable for a minimal build
stygian-browser = { version = "*", default-features = false }

Enable optional Browserbase acquisition integration:

[dependencies]
stygian-browser = { version = "*", features = ["browserbase"] }

When enabled, runner-first acquisition can opt into a Browserbase-managed stage via request flags. Set BROWSERBASE_API_KEY and BROWSERBASE_PROJECT_ID in the runtime environment.


§Quick Start

use stygian_browser::{BrowserConfig, BrowserPool, WaitUntil};
use std::time::Duration;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Build a config — defaults are headless Chrome with Advanced stealth
    let config = BrowserConfig::default();

    // Launch a warm pool (2 browsers ready immediately)
    let pool = BrowserPool::new(config).await?;

    // Acquire a browser handle (< 100 ms from warm pool)
    let handle = pool.acquire().await?;

    // Open a tab and navigate
    let browser = handle
        .browser()
        .ok_or_else(|| std::io::Error::other("browser handle already released"))?;
    let mut page = browser.new_page().await?;
    page.navigate(
        "https://example.com",
        WaitUntil::Selector("body".to_string()),
        Duration::from_secs(30),
    )
    .await?;

    println!("Title: {}", page.title().await?);

    // Release the browser back to the pool
    handle.release().await;
    Ok(())
}

§Configuration

BrowserConfig controls every aspect of browser launch, anti-detection, and pooling.

use stygian_browser::{BrowserConfig, StealthLevel};
use stygian_browser::config::PoolConfig;
use stygian_browser::webrtc::{WebRtcConfig, WebRtcPolicy};
use std::time::Duration;

let config = BrowserConfig::builder()
    // Browser basics
    .headless(true)
    .window_size(1920, 1080)
    // Use a specific Chrome binary
    // .chrome_path("/usr/bin/google-chrome".into())
    // Stealth level
    .stealth_level(StealthLevel::Advanced)
    // Proxy (supports http/https/socks5)
    // .proxy("http://user:pass@proxy.example.com:8080".to_string())
    // WebRTC policy
    .webrtc(WebRtcConfig {
        policy: WebRtcPolicy::DisableNonProxied,
        ..Default::default()
    })
    // Pool settings
    .pool(PoolConfig {
        min_size: 2,
        max_size: 10,
        acquire_timeout: Duration::from_secs(5),
        ..Default::default()
    })
    .build();

§Environment Variable Overrides

All config values can be overridden at runtime without recompiling:

VariableDefaultDescription
STYGIAN_CHROME_PATHauto-detectPath to Chrome/Chromium binary
STYGIAN_HEADLESStruefalse for headed mode
STYGIAN_STEALTH_LEVELadvancednone, basic, advanced
STYGIAN_POOL_MIN2Minimum warm browser count
STYGIAN_POOL_MAX10Maximum concurrent browsers
STYGIAN_POOL_ACQUIRE_TIMEOUT_SECS30Seconds to wait for pool slot
STYGIAN_CDP_FIX_MODEaddBindingaddBinding, isolatedWorld, enableDisable, none
STYGIAN_PROXYProxy URL
STYGIAN_DISABLE_SANDBOXauto-detecttrue to pass --no-sandbox (see note below)

§Stealth Levels

Levelnavigator spoofCanvas noiseWebGL randomCDP protectionHuman behavior
None
Basic
Advanced

Trade-offs:

  • None — maximum performance, no evasion. Suitable for sites with no bot detection.
  • Basic — hides navigator.webdriver, masks the headless UA, enables CDP protection. Fast; appropriate for most scraping workloads.
  • Advanced — full fingerprint injection (canvas noise, WebGL, audio, fonts, hardware concurrency, device memory), human-like mouse/keyboard events. Adds ~10–30 ms overhead per page but passes all major detection suites.

§Browser Pool

The pool maintains a configurable number of warm browser instances and enforces backpressure when all slots are occupied.

use stygian_browser::{BrowserConfig, BrowserPool};
use stygian_browser::config::PoolConfig;
use std::time::Duration;

let config = BrowserConfig::builder()
    .pool(PoolConfig {
        min_size: 2,
        max_size: 8,
        idle_timeout: Duration::from_secs(300),
        acquire_timeout: Duration::from_secs(10),
    })
    .build();

let pool = BrowserPool::new(config).await?;
let stats = pool.stats();
println!("pool: {}/{} browsers, {} active", stats.available, stats.max, stats.active);

Browsers returned via BrowserHandle::release() go back into the pool automatically. Browsers that fail their health check are discarded and replaced with fresh instances.


§Anti-Detection Techniques

  • Overwrites navigator.webdriver to undefined
  • Patches navigator.plugins with a realistic PluginArray
  • Sets navigator.languages, navigator.language, navigator.vendor
  • Aligns navigator.hardwareConcurrency and navigator.deviceMemory with the chosen device profile

§Canvas Fingerprint Noise

Adds sub-pixel noise (<1 px) to HTMLCanvasElement.toDataURL() and CanvasRenderingContext2D.getImageData() — indistinguishable visually but unique per page load.

§WebGL Randomisation

Randomises RENDERER and VENDOR WebGL parameter responses to prevent GPU-based fingerprinting while keeping values plausible (real GPU family names are used).

§CDP Leak Protection

The Chrome DevTools Protocol itself can expose automation. Three modes are available via CdpFixMode:

ModeProtectionCompatibility
AddBindingWraps calls to hide Runtime.enable side-effectsBest overall
IsolatedWorldRuns injection in a separate execution contextModerate
EnableDisableToggles enable/disable around each commandBroad

§Human-Like Behavior (Advanced only)

MouseSimulator generates Bézier-curve mouse paths with:

  • Distance-aware step counts (12 steps for <100 px, up to 120 for >1000 px)
  • Perpendicular control-point offsets for natural arc shapes
  • Sub-pixel micro-tremor jitter (±0.3 px)
  • 10–50 ms inter-event delays

TypingSimulator models:

  • Per-key WPM variation (70–130 WPM base)
  • Configurable typo-and-correct rate
  • Burst/pause rhythm typical of humans

§Integration with stygian-proxy

To use proxies from a stygian-proxy pool dynamically (at browser launch time):

use stygian_proxy::types::ProxyConfig;
use std::sync::Arc;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    use stygian_browser::BrowserPool;
    use stygian_proxy::ProxyManager;

    // Create proxy pool
    let manager = Arc::new(
        ProxyManager::with_round_robin(
            Arc::new(MemoryProxyStore::default()),
            ProxyConfig::default()
        )?
    );

    // Create bridge that implements ProxySource
    let bridge = Arc::new(ProxyManagerBridge::new(manager));

    // Pass to browser config
    let config = BrowserConfig::builder()
        .proxy_source(bridge)
        .build();

    // Each browser context will acquire its own proxy via the bridge
    let pool = BrowserPool::new(config).await?;
    let handle = pool.acquire().await?;

    // This browser is now routed through a proxy from the pool
    // On release: proxy success/failure is automatically recorded
    
    handle.release().await;
    Ok(())
}

When a browser is released after use, the proxy’s circuit breaker is updated:

  • Clean return to idle queue: proxy marked as success ✓
  • Browser unhealthy: proxy marked as failure ✗
  • Browser crashed: proxy marked as failure ✗

use stygian_browser::{BrowserConfig, BrowserPool, WaitUntil};
use stygian_browser::page::ResourceFilter;
use std::time::Duration;

let pool = BrowserPool::new(BrowserConfig::default()).await?;
let handle = pool.acquire().await?;
let browser = handle
    .browser()
    .ok_or_else(|| std::io::Error::other("browser handle already released"))?;
let mut page = browser.new_page().await?;

// Block images/fonts to speed up text-only scraping
page.set_resource_filter(ResourceFilter::block_media()).await?;

page.navigate(
    "https://example.com",
    WaitUntil::Selector("h1".to_string()),
    Duration::from_secs(30),
).await?;

// Evaluate JavaScript
let title: String = page.eval("document.title").await?;
let h1: String = page.eval("document.querySelector('h1')?.textContent ?? ''").await?;

// Full page HTML
let html = page.content().await?;

// Save cookies for session reuse
let cookies = page.save_cookies().await?;

page.close().await?;
handle.release().await;

§WebRTC & Proxy

use stygian_browser::{BrowserConfig};
use stygian_browser::webrtc::{WebRtcConfig, WebRtcPolicy, ProxyLocation};

let config = BrowserConfig::builder()
    .proxy("http://proxy.example.com:8080".to_string())
    .webrtc(WebRtcConfig {
        policy: WebRtcPolicy::DisableNonProxied,
        location: Some(ProxyLocation::new_us_east()),
        ..Default::default()
    })
    .build();

WebRtcPolicy::BlockAll is the safest option for anonymous scraping — it prevents any IP addresses from leaking via WebRTC peer connections.


§MCP Behavior Policy Usage

When compiled with the mcp feature, you can apply structured anti-bot behavior plans at runtime using browser_apply_behavior_json.

Accepted behavior shapes:

  • RuntimePolicy object (direct policy payload)
  • InvestigationBundle object (uses nested policy)
  • Direct override object (for simple headless / stealth_level tuning)

Example: RuntimePolicy payload

{
    "jsonrpc": "2.0",
    "id": 1,
    "method": "tools/call",
    "params": {
        "name": "browser_apply_behavior_json",
        "arguments": {
            "behavior": {
                "execution_mode": "Browser",
                "session_mode": "Sticky",
                "telemetry_level": "Deep",
                "rate_limit_rps": 0.8,
                "max_retries": 4,
                "backoff_base_ms": 1200,
                "enable_warmup": true,
                "enforce_webrtc_proxy_only": true,
                "sticky_session_ttl_secs": 1800,
                "required_stygian_features": ["browser", "stealth"],
                "config_hints": {
                    "proxy_url": "http://127.0.0.1:8080",
                    "viewport_width": "1366",
                    "viewport_height": "768"
                },
                "risk_score": 0.92
            }
        }
    }
}

Example: InvestigationBundle payload

{
    "jsonrpc": "2.0",
    "id": 2,
    "method": "tools/call",
    "params": {
        "name": "browser_apply_behavior_json",
        "arguments": {
            "behavior": {
                "report": {
                    "page_title": "example",
                    "total_requests": 42,
                    "blocked_requests": 4,
                    "status_histogram": {"200": 38, "403": 4},
                    "resource_type_histogram": {"document": 1, "script": 12},
                    "provider_histogram": {"Cloudflare": 4},
                    "top_markers": [],
                    "hosts": [],
                    "suspicious_requests": [],
                    "aggregate": {
                        "provider": "Cloudflare",
                        "confidence": 0.87,
                        "markers": ["turnstile"]
                    }
                },
                "requirements": {
                    "provider": "Cloudflare",
                    "confidence": 0.87,
                    "requirements": [],
                    "recommendation": {
                        "strategy": "StickyProxy",
                        "rationale": "bot challenge observed",
                        "required_stygian_features": ["browser", "stealth"],
                        "config_hints": {"proxy_url": "http://127.0.0.1:8080"}
                    }
                },
                "policy": {
                    "execution_mode": "Browser",
                    "session_mode": "Sticky",
                    "telemetry_level": "Standard",
                    "rate_limit_rps": 1.0,
                    "max_retries": 3,
                    "backoff_base_ms": 800,
                    "enable_warmup": true,
                    "enforce_webrtc_proxy_only": true,
                    "sticky_session_ttl_secs": 1200,
                    "required_stygian_features": ["browser", "stealth"],
                    "config_hints": {"proxy_url": "http://127.0.0.1:8080"},
                    "risk_score": 0.78
                }
            }
        }
    }
}

Bind behavior to an active session by also passing session_id:

{
    "jsonrpc": "2.0",
    "id": 3,
    "method": "tools/call",
    "params": {
        "name": "browser_apply_behavior_json",
        "arguments": {
            "session_id": "01JXYZ...",
            "behavior": {
                "headless": false,
                "stealth_level": "basic",
                "interaction_level": "medium"
            }
        }
    }
}

Response includes:

  • adapter_kind (RuntimePolicy, InvestigationBundle, or DirectOverrides)
  • adapter_kind (runtime_policy, investigation_bundle, or direct_overrides)
  • plan (normalized behavior plan)
  • effective_config (resolved browser config view)
  • session_updated flag

§FAQ

Q: Does this work on macOS / Linux / Windows?
A: macOS and Linux are fully supported. Windows is validated in CI on windows-latest, with runtime behavior depending on the chromiumoxide backend.

Q: Which Chrome versions are supported?
A: The library targets Chrome 120+. Older versions may work but stealth scripts are only tested against current release channels.

Q: Can I use it without a display (CI/CD)?
A: Yes — the default config is headless: true. No display server is required.

Q: Does Advanced stealth guarantee Cloudflare bypass?
A: There is no guarantee. Cloudflare Turnstile and Bot Management use both JavaScript signals and TLS/network-layer heuristics. Advanced stealth eliminates all known JavaScript signals, which is necessary but may not be sufficient.

Q: How do I set a custom Chrome path?
A: Set STYGIAN_CHROME_PATH=/path/to/chrome or use BrowserConfig::builder().chrome_path("/path/to/chrome".into()).build().

Q: Why does stats().idle always return 0?
A: idle is a lock-free approximation. The count is not maintained in the hot acquire/release path to avoid contention. Use available and active instead.

Q: Should I set STYGIAN_DISABLE_SANDBOX=true?
A: Only inside a container (Docker, Kubernetes, etc.) where Chromium’s renderer sandbox cannot function due to missing user namespaces. This is auto-detected via /.dockerenv and /proc/1/cgroup on Linux — you normally don’t need to set it explicitly. Never set this on a bare-metal host without an equivalent isolation boundary; doing so removes a meaningful OS-level security layer.

For highest-security deployments, run each browser session in its own container and let the container runtime provide isolation — the sandbox flag will be set automatically inside the container.


§Testing

# Pure-logic unit tests (no Chrome required)
cargo test --lib -p stygian-browser

# Integration tests (requires Chrome 120+)
cargo test --all-features -p stygian-browser

# Run only ignored Chrome tests explicitly
cargo test --all-features -p stygian-browser -- --include-ignored

# Measure coverage for logic units
cargo tarpaulin -p stygian-browser --lib --ignore-tests --out Lcov

Coverage notes: All tests that launch a real browser instance are annotated #[ignore = "requires Chrome"] so the suite passes in CI without a Chrome binary. Pure-logic coverage (config, stealth scripts, fingerprint generation, simulator math) is high; overall line coverage is structurally bounded by the CDP requirement.


§License

Licensed under the GNU Affero General Public License v3.0 (AGPL-3.0-only).

Browser automation and stealth tooling for sites protected by Cloudflare, DataDome, PerimeterX, and Akamai Bot Manager.

§Features

  • Browser pooling — warm pool with min/max sizing, LRU eviction, and backpressure; sub-100 ms acquire from the warm queue
  • Anti-detection — User-Agent patching and plugin population
  • Human behaviour — Bézier-curve mouse paths, human-paced typing with typos, random scroll and micro-interactions
  • Fingerprint generation — statistically-weighted device profiles matching real-world browser market share distributions

§Quick Start

use stygian_browser::{BrowserPool, BrowserConfig, WaitUntil};
use std::time::Duration;

    // Default config: headless, Advanced stealth, pool of 2–10 browsers
    let config = BrowserConfig::default();
    let pool = BrowserPool::new(config).await?;

    // Acquire a browser from the warm pool (< 100 ms)
    let handle = pool.acquire().await?;

    // Open a tab and navigate
    let mut page = handle.browser().expect("valid browser").new_page().await?;
    page.navigate(
        "https://example.com",
        WaitUntil::DomContentLoaded,
        Duration::from_secs(30),
    ).await?;

    println!("Title: {}", page.title().await?);

    handle.release().await;
    Ok(())

§Stealth Levels

LevelNavigator spoofCanvas noiseWebGL randomCDP protectionHuman behavior
None
Basic
Advanced

§Module Overview

ModuleDescription
browserBrowserInstance — launch, health-check, shutdown
poolBrowserPool + BrowserHandle — warm pool management
pagePageHandle — navigate, eval, content, cookies
configBrowserConfig + builder pattern
errorBrowserError and Result alias
fingerprintDeviceProfile, BrowserKind
webrtcWebRtcConfig, WebRtcPolicy, ProxyLocation
cdp_protectionCDP leak protection modes

Re-exports§

pub use extract::Extractable;
pub use similarity::ElementFingerprint;
pub use similarity::SimilarMatch;
pub use similarity::SimilarityConfig;
pub use acquisition::AcquisitionMode;
pub use acquisition::AcquisitionRequest;
pub use acquisition::AcquisitionResult;
pub use acquisition::AcquisitionRunner;
pub use acquisition::StageFailure;
pub use acquisition::StageFailureKind;
pub use acquisition::StrategyUsed;
pub use behavior_adapter::AdapterKind;
pub use behavior_adapter::AppliedBehaviorPlan;
pub use behavior_adapter::BehaviorInteractionLevel;
pub use behavior_adapter::BrowserBehaviorAdapter;
pub use behavior_adapter::ExecutionMode;
pub use behavior_adapter::PolymorphicBehaviorAdapter;
pub use behavior_adapter::SessionMode;
pub use behavior_adapter::TelemetryLevel;
pub use browser::BrowserInstance;
pub use config::BrowserConfig;
pub use config::HeadlessMode;
pub use config::StealthLevel;
pub use error::BrowserError;
pub use error::Result;
pub use page::NodeHandle;
pub use page::PageHandle;
pub use page::ResourceFilter;
pub use page::WaitUntil;
pub use pool::BrowserHandle;
pub use pool::BrowserPool;
pub use pool::PoolStats;
pub use proxy::DirectLease;
pub use proxy::ProxyLease;
pub use proxy::ProxySource;
pub use stealth::NavigatorProfile;
pub use stealth::StealthConfig;
pub use stealth::StealthProfile;
pub use behavior::InteractionLevel;
pub use behavior::RequestPacer;
pub use fingerprint::BrowserKind;
pub use fingerprint::DeviceProfile;
pub use webrtc::ProxyLocation;
pub use webrtc::WebRtcConfig;
pub use webrtc::WebRtcPolicy;

Modules§

acquisition
Opinionated acquisition runner with deterministic escalation.
audio_noise
Audio fingerprint noise injection.
behavior
Human behavior simulation for anti-detection
behavior_adapter
Polymorphic adapter for structured JSON-driven browser behavior tuning.
browser
Browser instance lifecycle management
canvas_noise
Canvas fingerprint noise injection.
cdp_hardening
Advanced CDP leak hardening.
cdp_protection
CDP (Chrome DevTools Protocol) leak protection
config
Browser configuration and options
diagnostic
Stealth self-diagnostic — JavaScript detection checks.
error
Error types for browser automation operations
extract
Typed DOM extraction via Extract derive macro.
fingerprint
Browser fingerprint generation and JavaScript injection.
mcp
MCP (Model Context Protocol) server for browser automation.
metrics
Performance metrics for stygian-browser.
navigator_coherence
Comprehensive navigator property coherence injection.
noise
Deterministic noise seed engine for fingerprint perturbation.
page
Resource blocking
peripheral_stealth
Peripheral detection surface hardening.
pool
Browser instance pool with warmup, health checks, and idle eviction
prelude
profile
Unified fingerprint identity profile.
proxy
Proxy source port for browser context pools.
recorder
Browser session recording and debugging tools.
rects_noise
ClientRects and TextMetrics fingerprint noise injection.
session
Session persistence for long-running scraping campaigns.
similarity
Adaptive element similarity search for crate::page::PageHandle.
stealth
Stealth configuration and anti-detection features
timing_noise
Performance timing noise injection.
tls
TLS fingerprint profile types with JA3/JA4 representation.
tls_validation
Automated TLS fingerprint validation suite.
validation
Anti-bot service validation suite.
webgl_noise
WebGL parameter spoofing and readPixels noise injection.
webrtc
WebRTC IP leak prevention and geolocation consistency