Crate stygian_browser

Crate stygian_browser 

Source
Expand description

§stygian-browser

§stygian-browser

High-performance, anti-detection browser automation library for Rust.

License: AGPL v3 Coverage

Built on the Chrome DevTools Protocol via chromiumoxide with comprehensive stealth features for bypassing modern anti-bot systems: Cloudflare, DataDome, PerimeterX, Akamai.


§Features

FeatureDescription
Browser poolingWarm pool with configurable min/max, LRU eviction, backpressure
Anti-detectionNavigator spoofing, canvas noise, WebGL randomisation, UA patching
Human behaviorBézier-curve mouse paths, realistic keystroke timing, random interactions
CDP leak protectionHides Runtime.enable artifacts that expose automation
WebRTC controlBlock, proxy-route, or allow WebRTC — prevent IP leaks
Fingerprint generationStatistically-weighted device profiles (Windows, Mac, Linux, Android, iOS)
Stealth levelsNone / Basic / Advanced — tune evasion vs. performance

§Installation

[dependencies]
stygian-browser = { path = "../crates/stygian-browser" }   # workspace
# or once published to crates.io:
# stygian-browser = "0.2"
tokio = { version = "1", features = ["full"] }

Enable (or disable) stealth features:

[dependencies]
# stealth is the default feature; disable for a minimal build
stygian-browser = { version = "0.2", default-features = false }

§Quick Start

use stygian_browser::{BrowserConfig, BrowserPool, WaitUntil};
use std::time::Duration;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Build a config — defaults are headless Chrome with Advanced stealth
    let config = BrowserConfig::default();

    // Launch a warm pool (2 browsers ready immediately)
    let pool = BrowserPool::new(config).await?;

    // Acquire a browser handle (< 100 ms from warm pool)
    let handle = pool.acquire().await?;

    // Open a tab and navigate
    let mut page = handle.browser().unwrap().new_page().await?;
    page.navigate(
        "https://example.com",
        WaitUntil::Selector("body".to_string()),
        Duration::from_secs(30),
    )
    .await?;

    println!("Title: {}", page.title().await?);

    // Release the browser back to the pool
    handle.release().await;
    Ok(())
}

§Configuration

BrowserConfig controls every aspect of browser launch, anti-detection, and pooling.

use stygian_browser::{BrowserConfig, StealthLevel};
use stygian_browser::config::PoolConfig;
use stygian_browser::webrtc::{WebRtcConfig, WebRtcPolicy};
use std::time::Duration;

let config = BrowserConfig::builder()
    // Browser basics
    .headless(true)
    .window_size(1920, 1080)
    // Use a specific Chrome binary
    // .chrome_path("/usr/bin/google-chrome".into())
    // Stealth level
    .stealth_level(StealthLevel::Advanced)
    // Proxy (supports http/https/socks5)
    // .proxy("http://user:pass@proxy.example.com:8080".to_string())
    // WebRTC policy
    .webrtc(WebRtcConfig {
        policy: WebRtcPolicy::DisableNonProxied,
        ..Default::default()
    })
    // Pool settings
    .pool(PoolConfig {
        min_size: 2,
        max_size: 10,
        acquire_timeout: Duration::from_secs(5),
        ..Default::default()
    })
    .build();

§Environment Variable Overrides

All config values can be overridden at runtime without recompiling:

VariableDefaultDescription
STYGIAN_CHROME_PATHauto-detectPath to Chrome/Chromium binary
STYGIAN_HEADLESStruefalse for headed mode
STYGIAN_STEALTH_LEVELadvancednone, basic, advanced
STYGIAN_POOL_MIN2Minimum warm browser count
STYGIAN_POOL_MAX10Maximum concurrent browsers
STYGIAN_POOL_ACQUIRE_TIMEOUT_SECS30Seconds to wait for pool slot
STYGIAN_CDP_FIX_MODEaddBindingaddBinding, isolatedworld, enabledisable
STYGIAN_PROXYProxy URL
STYGIAN_DISABLE_SANDBOXauto-detecttrue to pass --no-sandbox (see note below)

§Stealth Levels

Levelnavigator spoofCanvas noiseWebGL randomCDP protectionHuman behavior
None
Basic
Advanced

Trade-offs:

  • None — maximum performance, no evasion. Suitable for sites with no bot detection.
  • Basic — hides navigator.webdriver, masks the headless UA, enables CDP protection. Fast; appropriate for most scraping workloads.
  • Advanced — full fingerprint injection (canvas noise, WebGL, audio, fonts, hardware concurrency, device memory), human-like mouse/keyboard events. Adds ~10–30 ms overhead per page but passes all major detection suites.

§Browser Pool

The pool maintains a configurable number of warm browser instances and enforces backpressure when all slots are occupied.

use stygian_browser::{BrowserConfig, BrowserPool};
use stygian_browser::config::PoolConfig;
use std::time::Duration;

let config = BrowserConfig::builder()
    .pool(PoolConfig {
        min_size: 2,
        max_size: 8,
        idle_timeout: Duration::from_secs(300),
        acquire_timeout: Duration::from_secs(10),
    })
    .build();

let pool = BrowserPool::new(config).await?;
let stats = pool.stats();
println!("pool: {}/{} browsers, {} active", stats.available, stats.max, stats.active);

Browsers returned via BrowserHandle::release() go back into the pool automatically. Browsers that fail their health check are discarded and replaced with fresh instances.


§Anti-Detection Techniques

  • Overwrites navigator.webdriver to undefined
  • Patches navigator.plugins with a realistic PluginArray
  • Sets navigator.languages, navigator.language, navigator.vendor
  • Aligns navigator.hardwareConcurrency and navigator.deviceMemory with the chosen device profile

§Canvas Fingerprint Noise

Adds sub-pixel noise (<1 px) to HTMLCanvasElement.toDataURL() and CanvasRenderingContext2D.getImageData() — indistinguishable visually but unique per page load.

§WebGL Randomisation

Randomises RENDERER and VENDOR WebGL parameter responses to prevent GPU-based fingerprinting while keeping values plausible (real GPU family names are used).

§CDP Leak Protection

The Chrome DevTools Protocol itself can expose automation. Three modes are available via CdpFixMode:

ModeProtectionCompatibility
AddBindingWraps calls to hide Runtime.enable side-effectsBest overall
IsolatedWorldRuns injection in a separate execution contextModerate
EnableDisableToggles enable/disable around each commandBroad

§Human-Like Behavior (Advanced only)

MouseSimulator generates Bézier-curve mouse paths with:

  • Distance-aware step counts (12 steps for <100 px, up to 120 for >1000 px)
  • Perpendicular control-point offsets for natural arc shapes
  • Sub-pixel micro-tremor jitter (±0.3 px)
  • 10–50 ms inter-event delays

TypingSimulator models:

  • Per-key WPM variation (70–130 WPM base)
  • Configurable typo-and-correct rate
  • Burst/pause rhythm typical of humans

§Page Operations

use stygian_browser::{BrowserConfig, BrowserPool, WaitUntil};
use stygian_browser::page::ResourceFilter;
use std::time::Duration;

let pool = BrowserPool::new(BrowserConfig::default()).await?;
let handle = pool.acquire().await?;
let mut page = handle.browser().unwrap().new_page().await?;

// Block images/fonts to speed up text-only scraping
page.set_resource_filter(ResourceFilter::block_media()).await?;

page.navigate(
    "https://example.com",
    WaitUntil::Selector("h1".to_string()),
    Duration::from_secs(30),
).await?;

// Evaluate JavaScript
let title: String = page.eval("document.title").await?;
let h1: String = page.eval("document.querySelector('h1')?.textContent ?? ''").await?;

// Full page HTML
let html = page.content().await?;

// Save cookies for session reuse
let cookies = page.save_cookies().await?;

page.close().await?;
handle.release().await;

§WebRTC & Proxy

use stygian_browser::{BrowserConfig};
use stygian_browser::webrtc::{WebRtcConfig, WebRtcPolicy, ProxyLocation};

let config = BrowserConfig::builder()
    .proxy("http://proxy.example.com:8080".to_string())
    .webrtc(WebRtcConfig {
        policy: WebRtcPolicy::DisableNonProxied,
        location: Some(ProxyLocation::new_us_east()),
        ..Default::default()
    })
    .build();

WebRtcPolicy::BlockAll is the safest option for anonymous scraping — it prevents any IP addresses from leaking via WebRTC peer connections.


§FAQ

Q: Does this work on macOS / Linux / Windows?
A: macOS and Linux are fully supported. Windows support depends on the chromiumoxide backend; not actively tested.

Q: Which Chrome versions are supported?
A: The library targets Chrome 120+. Older versions may work but stealth scripts are only tested against current release channels.

Q: Can I use it without a display (CI/CD)?
A: Yes — the default config is headless: true. No display server is required.

Q: Does Advanced stealth guarantee Cloudflare bypass?
A: There is no guarantee. Cloudflare Turnstile and Bot Management use both JavaScript signals and TLS/network-layer heuristics. Advanced stealth eliminates all known JavaScript signals, which is necessary but may not be sufficient.

Q: How do I set a custom Chrome path?
A: Set STYGIAN_CHROME_PATH=/path/to/chrome or use BrowserConfig::builder().chrome_path("/path/to/chrome".into()).build().

Q: Why does stats().idle always return 0?
A: idle is a lock-free approximation. The count is not maintained in the hot acquire/release path to avoid contention. Use available and active instead.

Q: Should I set STYGIAN_DISABLE_SANDBOX=true?
A: Only inside a container (Docker, Kubernetes, etc.) where Chromium’s renderer sandbox cannot function due to missing user namespaces. This is auto-detected via /.dockerenv and /proc/1/cgroup on Linux — you normally don’t need to set it explicitly. Never set this on a bare-metal host without an equivalent isolation boundary; doing so removes a meaningful OS-level security layer.

For highest-security deployments, run each browser session in its own container and let the container runtime provide isolation — the sandbox flag will be set automatically inside the container.


§Testing

# Pure-logic unit tests (no Chrome required)
cargo test --lib -p stygian-browser

# Integration tests (requires Chrome 120+)
cargo test --all-features -p stygian-browser

# Run only ignored Chrome tests explicitly
cargo test --all-features -p stygian-browser -- --include-ignored

# Measure coverage for logic units
cargo tarpaulin -p stygian-browser --lib --ignore-tests --out Lcov

Coverage notes: All tests that launch a real browser instance are annotated #[ignore = "requires Chrome"] so the suite passes in CI without a Chrome binary. Pure-logic coverage (config, stealth scripts, fingerprint generation, simulator math) is high; overall line coverage is structurally bounded by the CDP requirement.


§License

Licensed under the GNU Affero General Public License v3.0 (AGPL-3.0-only). High-performance, anti-detection browser automation library for Rust.

Built on Chrome DevTools Protocol (CDP) via chromiumoxide with comprehensive stealth features to bypass modern anti-bot systems: Cloudflare, DataDome, PerimeterX, and Akamai Bot Manager.

§Features

  • Browser pooling — warm pool with min/max sizing, LRU eviction, and backpressure; sub-100 ms acquire from the warm queue
  • Anti-detectionnavigator spoofing, canvas noise, WebGL randomisation, User-Agent patching, and plugin population
  • Human behaviour — Bézier-curve mouse paths, human-paced typing with typos, random scroll and micro-interactions
  • CDP leak protection — hides Runtime.enable side-effects that expose automation
  • WebRTC control — block, proxy-route, or allow WebRTC to prevent IP leaks
  • Fingerprint generation — statistically-weighted device profiles matching real-world browser market share distributions
  • Stealth levelsNone / Basic / Advanced for tuning evasion vs performance

§Quick Start

use stygian_browser::{BrowserPool, BrowserConfig, WaitUntil};
use std::time::Duration;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Default config: headless, Advanced stealth, pool of 2–10 browsers
    let config = BrowserConfig::default();
    let pool = BrowserPool::new(config).await?;

    // Acquire a browser from the warm pool (< 100 ms)
    let handle = pool.acquire().await?;

    // Open a tab and navigate
    let mut page = handle.browser().expect("valid browser").new_page().await?;
    page.navigate(
        "https://example.com",
        WaitUntil::Selector("body".to_string()),
        Duration::from_secs(30),
    ).await?;

    println!("Title: {}", page.title().await?);

    // Return the browser to the pool
    handle.release().await;
    Ok(())
}

§Stealth Levels

LevelnavigatorCanvasWebGLCDP protectHuman behavior
None
Basic
Advanced

§Module Overview

Re-exports§

pub use browser::BrowserInstance;
pub use config::BrowserConfig;
pub use config::HeadlessMode;
pub use config::StealthLevel;
pub use error::BrowserError;
pub use error::Result;
pub use page::PageHandle;
pub use page::ResourceFilter;
pub use page::WaitUntil;
pub use pool::BrowserHandle;
pub use pool::BrowserPool;
pub use pool::PoolStats;
pub use stealth::NavigatorProfile;
pub use stealth::StealthConfig;
pub use stealth::StealthProfile;
pub use behavior::InteractionLevel;
pub use fingerprint::BrowserKind;
pub use fingerprint::DeviceProfile;
pub use webrtc::ProxyLocation;
pub use webrtc::WebRtcConfig;
pub use webrtc::WebRtcPolicy;

Modules§

behavior
Human behavior simulation for anti-detection
browser
Browser instance lifecycle management
cdp_protection
CDP (Chrome DevTools Protocol) leak protection
config
Browser configuration and options
error
Error types for browser automation operations
fingerprint
Browser fingerprint generation and JavaScript injection.
mcp
MCP (Model Context Protocol) server for browser automation.
metrics
Performance metrics for stygian-browser.
page
Page and browsing context management for isolated, parallel scraping
pool
Browser instance pool with warmup, health checks, and idle eviction
prelude
Prelude module for convenient imports
recorder
Browser session recording and debugging tools.
session
Session persistence for long-running scraping campaigns.
stealth
Stealth configuration and anti-detection features
webrtc
WebRTC IP leak prevention and geolocation consistency