Introduction

stygian is a high-performance web scraping toolkit for Rust, delivered as two complementary crates in a single workspace.

CratePurpose
stygian-graphGraph-based scraping engine — DAG pipelines, AI extraction, distributed execution
stygian-browserAnti-detection browser automation — stealth profiles, browser pooling, CDP automation

Both crates share a common philosophy: zero-cost abstractions, extreme composability, and secure defaults.


At a glance

Design goals

  • Hexagonal architecture — the domain core has zero I/O dependencies; all external capabilities are declared as port traits and injected via adapters.
  • DAG execution — scraping pipelines are directed acyclic graphs. Nodes run concurrently within each topological wave, maximising parallelism.
  • AI-first extraction — Claude, GPT-4o, Gemini, GitHub Copilot, and Ollama are first-class adapters. Structured data flows out of raw HTML without writing parsers.
  • Anti-bot resilience — the browser crate ships stealth scripts that pass Cloudflare, DataDome, PerimeterX, and Akamai checks on Advanced stealth level.
  • Fault-tolerant — circuit breakers, retry policies, and idempotency keys are built into the execution path, not bolted on.

Minimum supported Rust version

1.94.0 — Rust 2024 edition. Requires stable toolchain only.


Installation

Add both crates to Cargo.toml:

[dependencies]
stygian-graph   = "0.2"
stygian-browser = "0.2"   # optional — only needed for JS-rendered pages
tokio            = { version = "1", features = ["full"] }
serde_json       = "1"

Enable optional feature groups on stygian-graph:

stygian-graph = { version = "0.2", features = ["browser", "ai-claude", "distributed"] }

Available features:

FeatureIncludes
browserBrowserAdapter backed by stygian-browser
ai-claudeAnthropic Claude adapter
ai-openaiOpenAI adapter
ai-geminiGoogle Gemini adapter
ai-copilotGitHub Copilot adapter
ai-ollamaOllama (local) adapter
distributedRedis/Valkey work queue adapter
metricsPrometheus metrics export

Quick start — scraping pipeline

use stygian_graph::{Pipeline, adapters::HttpAdapter};
use serde_json::json;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let config = json!({
        "nodes": [
            {"id": "fetch",   "service": "http"},
            {"id": "extract", "service": "ai_claude"}
        ],
        "edges": [{"from": "fetch", "to": "extract"}]
    });

    let pipeline = Pipeline::from_config(config)?;
    let results  = pipeline.execute(json!({"url": "https://example.com"})).await?;

    println!("{}", serde_json::to_string_pretty(&results)?);
    Ok(())
}

Quick start — browser automation

use stygian_browser::{BrowserConfig, BrowserPool, WaitUntil};
use std::time::Duration;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let pool   = BrowserPool::new(BrowserConfig::default()).await?;
    let handle = pool.acquire().await?;

    let mut page = handle.browser().new_page().await?;
    page.navigate(
        "https://example.com",
        WaitUntil::Selector("body".to_string()),
        Duration::from_secs(30),
    ).await?;

    println!("Title: {}", page.title().await?);
    handle.release().await;
    Ok(())
}

Repository layout

stygian/
├── crates/
│   ├── stygian-graph/     # Scraping engine
│   └── stygian-browser/   # Browser automation
├── book/                   # This documentation (mdBook)
├── docs/                   # Architecture reference docs
├── examples/               # Example pipeline configs (.toml)
└── .github/workflows/      # CI, release, security, docs

Source, issues, and pull requests live at github.com/greysquirr3l/stygian.


Documentation

ResourceURL
This guidegreysquirr3l.github.io/stygian
API reference (stygian-graph)greysquirr3l.github.io/stygian/api/stygian_graph
API reference (stygian-browser)greysquirr3l.github.io/stygian/api/stygian_browser
crates.io (stygian-graph)crates.io/crates/stygian-graph
crates.io (stygian-browser)crates.io/crates/stygian-browser

Architecture

stygian-graph is built around the Hexagonal Architecture (Ports & Adapters) pattern. The domain core is pure Rust with zero I/O dependencies. All external capabilities — HTTP, AI, caching, queues — are declared as port traits and injected from the outside.


Layer diagram

┌──────────────────────────────────────────────────────┐
│                   CLI / Entry Points                 │
│              (src/bin/stygian.rs)                   │
├──────────────────────────────────────────────────────┤
│                  Application Layer                   │
│   ServiceRegistry · PipelineParser · Metrics         │
├──────────────────────────────────────────────────────┤
│                    Domain Core                       │
│       Pipeline · DagExecutor · WorkerPool            │
│       IdempotencyKey · PipelineTypestate             │
├──────────────────────────────────────────────────────┤
│                      Ports                           │
│  ScrapingService · AIProvider · CachePort · ...      │
├──────────────────────────────────────────────────────┤
│                    Adapters                          │
│   http · claude · openai · gemini · browser · ...    │
└──────────────────────────────────────────────────────┘

The dependency arrow always points inward:

CLI → Application → Domain ← Ports ← Adapters

The domain never imports from adapters. Adapters implement port traits and are injected at startup. This lets you swap any adapter — or mock it in tests — without touching business logic.


Domain layer (src/domain/)

Pure Rust. Only std, serde, and arithmetic/pure-data crates allowed. No tokio, no reqwest, no file I/O.

TypeFilePurpose
Pipelinedomain/graph.rsOwned graph of Nodes and Edges; wraps a petgraph DAG
DagExecutordomain/graph.rsTopological sort → wave-based concurrent execution
WorkerPooldomain/executor.rsBounded Tokio worker pool with back-pressure
IdempotencyKeydomain/idempotency.rsULID-based key for safe retries

Pipeline typestate

Pipelines enforce their lifecycle at compile time using the typestate pattern:

#![allow(unused)]
fn main() {
use stygian_graph::domain::pipeline::{
    PipelineUnvalidated, PipelineValidated,
    PipelineExecuting, PipelineComplete,
};

let p: PipelineUnvalidated = PipelineUnvalidated::new("crawl", metadata);
let p: PipelineValidated   = p.validate()?;      // rejects invalid graphs here
let p: PipelineExecuting   = p.execute();         // only valid pipelines may execute
let p: PipelineComplete    = p.complete(result);  // only executing pipelines may complete
}

Out-of-order transitions are compiler errors. Phantom types carry zero runtime cost.


Ports layer (src/ports.rs)

Port traits are the only interface the domain exposes to infrastructure. No adapter code ever leaks inward.

#![allow(unused)]
fn main() {
/// Any scraping backend — HTTP, browser, Playwright, custom.
pub trait ScrapingService: Send + Sync + 'static {
    fn name(&self) -> &'static str;
    async fn execute(&self, input: ServiceInput) -> Result<ServiceOutput>;
}

/// Any LLM — cloud APIs or local Ollama.
pub trait AIProvider: Send + Sync {
    fn capabilities(&self) -> ProviderCapabilities;
    async fn extract(&self, prompt: String, schema: Value) -> Result<Value>;
    async fn stream_extract(
        &self,
        prompt: String,
        schema: Value,
    ) -> Result<BoxStream<'static, Result<Value>>>;
}

/// Any cache backend — LRU, DashMap, Redis.
pub trait CachePort: Send + Sync {
    async fn get(&self, key: &str) -> Result<Option<String>>;
    async fn set(&self, key: &str, value: String, ttl: Option<Duration>) -> Result<()>;
    async fn invalidate(&self, key: &str) -> Result<()>;
    async fn exists(&self, key: &str) -> Result<bool>;
}

/// Circuit breaker abstraction.
pub trait CircuitBreaker: Send + Sync {
    fn state(&self) -> CircuitState;
    fn record_success(&self);
    fn record_failure(&self);
    fn attempt_reset(&self) -> bool;
}

/// Token-bucket rate limiter abstraction.
pub trait RateLimiter: Send + Sync {
    async fn check_rate_limit(&self, key: &str) -> Result<bool>;
    async fn record_request(&self, key: &str) -> Result<()>;
}
}

Adapters layer (src/adapters/)

Adapters implement port traits and handle real I/O. They are never imported by the domain.

AdapterPortNotes
HttpAdapterScrapingServicereqwest, UA rotation, cookie jar, retry
BrowserAdapterScrapingServicechromiumoxide via stygian-browser
ClaudeProviderAIProviderAnthropic API, streaming
OpenAiProviderAIProviderOpenAI Chat Completions API
GeminiProviderAIProviderGoogle Gemini API
CopilotProviderAIProviderGitHub Copilot API
OllamaProviderAIProviderLocal LLM via Ollama HTTP API
BoundedLruCacheCachePortLRU eviction, NonZeroUsize capacity limit
DashMapCacheCachePortConcurrent hash map with TTL cleanup task
CircuitBreakerImplCircuitBreakerSliding-window failure threshold
NoopCircuitBreakerCircuitBreakerPassthrough — useful in tests
NoopRateLimiterRateLimiterAlways allows — useful in tests

Adding a new adapter

  1. Identify the port trait your adapter will implement (e.g. ScrapingService).
  2. Create src/adapters/my_adapter.rs.
  3. Implement the trait — use native async fn (Rust 2024, no #[async_trait] wrapper needed).
  4. Re-export from src/adapters/mod.rs.
  5. Register via ServiceRegistry at startup.

No changes to domain code are required.


Application layer (src/application/)

Orchestrates adapters, holds runtime configuration, and owns the ServiceRegistry.

ModuleRole
ServiceRegistryRuntime map of name → Arc<dyn ScrapingService>
PipelineParserParses JSON/TOML pipeline configs into Pipeline
MetricsCollectorPrometheus counter/histogram/gauge facade
DagExecutor (app)Top-level entry point: parse → validate → execute → collect
ConfigEnvironment-driven configuration with validation

DAG execution model

Pipeline execution proceeds in topological waves:

  1. Parse — JSON/TOML config → Pipeline domain object.
  2. Validate — check node uniqueness, edge validity, absence of cycles (Kahn's algorithm).
  3. Sort — compute topological order; group into waves of independent nodes.
  4. Execute wave-by-wave — all nodes in a wave run concurrently via tokio::spawn.
  5. Collect — outputs from each wave become inputs to the next.

Wave-based execution maximises parallelism while respecting data dependencies.

Building Pipelines

A stygian pipeline is a directed acyclic graph (DAG) of service nodes. Pipelines can be defined in JSON, TOML, or built programmatically in Rust.


JSON pipeline format

{
  "id": "product-scraper",
  "nodes": [
    {
      "id": "fetch_html",
      "service": "http",
      "config": {
        "timeout_ms": 10000,
        "user_agent": "Mozilla/5.0 (compatible; stygian/0.1)"
      }
    },
    {
      "id": "render_js",
      "service": "browser",
      "config": {
        "wait_strategy": "network_idle",
        "stealth_level": "advanced"
      }
    },
    {
      "id": "extract_data",
      "service": "ai_claude",
      "config": {
        "model": "claude-3-5-sonnet-20241022",
        "max_tokens": 2048,
        "schema": {
          "title":        "string",
          "price":        "number",
          "availability": "boolean",
          "images":       ["string"]
        }
      }
    }
  ],
  "edges": [
    {"from": "fetch_html",  "to": "render_js"},
    {"from": "render_js",   "to": "extract_data"}
  ]
}

Field reference

FieldTypeRequiredDescription
idstringnoHuman-readable pipeline identifier
nodes[].idstringyesUnique node identifier within the pipeline
nodes[].servicestringyesRegistered service name (must exist in ServiceRegistry)
nodes[].configobjectnoService-specific configuration (passed as ServiceInput.config)
edges[].fromstringyesSource node id
edges[].tostringyesTarget node id

TOML pipeline format

The same structure works in TOML:

id = "product-scraper"

[[nodes]]
id      = "fetch_html"
service = "http"
[nodes.config]
timeout_ms = 10000

[[nodes]]
id      = "extract_data"
service = "ai_claude"
[nodes.config]
model      = "claude-3-5-sonnet-20241022"
max_tokens = 2048

[[edges]]
from = "fetch_html"
to   = "extract_data"

Programmatic builder

For pipelines constructed at runtime:

#![allow(unused)]
fn main() {
use stygian_graph::domain::pipeline::PipelineUnvalidated;
use stygian_graph::domain::graph::{Node, Edge};
use serde_json::json;

let pipeline = PipelineUnvalidated::builder()
    .id("product-scraper")
    .node(Node { id: "fetch".into(), service: "http".into(), config: json!({}) })
    .node(Node { id: "extract".into(), service: "ai_claude".into(), config: json!({
        "model": "claude-3-5-sonnet-20241022"
    })})
    .edge(Edge { from: "fetch".into(), to: "extract".into() })
    .build()?;

let validated = pipeline.validate()?;
let results   = validated.execute().await?;
}

Pipeline validation

validate() runs four checks before the pipeline may execute:

  1. Node uniqueness — all node.id values are distinct.
  2. Edge validity — every edge references nodes that exist.
  3. Cycle detection — Kahn's topological sort; fails if a cycle is detected.
  4. Connectivity — all nodes are reachable from at least one source node.

Any validation failure returns a typed PipelineError — never panics.

#![allow(unused)]
fn main() {
use stygian_graph::domain::error::PipelineError;

match pipeline.validate() {
    Ok(validated)                     => { /* proceed */ }
    Err(PipelineError::DuplicateNode(id)) => eprintln!("duplicate node: {id}"),
    Err(PipelineError::CycleDetected)    => eprintln!("pipeline has a cycle"),
    Err(PipelineError::UnknownEdge { from, to }) =>
        eprintln!("edge {from} → {to} references unknown nodes"),
    Err(e) => eprintln!("validation failed: {e}"),
}
}

Idempotency

Every execution is assigned an IdempotencyKey — a ULID that acts as a deduplication token across retries:

#![allow(unused)]
fn main() {
use stygian_graph::domain::idempotency::IdempotencyKey;

// Auto-generated (recommended)
let key = IdempotencyKey::new();

// Deterministic from a stable input (replays return the same result)
let key = IdempotencyKey::from_input("pipeline-1", "https://example.com/product/123");
}

Pass the key to execute_idempotent(). If the same key is seen again within the TTL, the cached result is returned immediately without re-executing.


Branching and fan-out

A node can have multiple outgoing edges. All downstream nodes receive the same output:

{
  "nodes": [
    {"id": "fetch",    "service": "http"},
    {"id": "store_raw","service": "s3"},
    {"id": "extract",  "service": "ai_claude"}
  ],
  "edges": [
    {"from": "fetch", "to": "store_raw"},
    {"from": "fetch", "to": "extract"}
  ]
}

store_raw and extract run concurrently in the same wave.


Conditional execution

Nodes support an optional condition field (JSONPath expression):

{
  "id": "render_js",
  "service": "browser",
  "condition": "$.content_type == 'application/javascript'"
}

When the condition evaluates to false the node is skipped and its outputs are forwarded as empty, allowing downstream nodes to handle the gap gracefully.

Built-in Adapters

stygian-graph ships adapters for every major scraping and AI workload. All adapters implement the port traits defined in src/ports.rs and are registered by name in the ServiceRegistry.


HTTP Adapter

The default content-fetching adapter. Uses reqwest with connection pooling, automatic redirect following, and configurable retry logic.

#![allow(unused)]
fn main() {
use stygian_graph::adapters::{HttpAdapter, HttpConfig};
use std::time::Duration;

let adapter = HttpAdapter::with_config(HttpConfig {
    timeout:          Duration::from_secs(15),
    user_agent:       Some("stygian/0.1".to_string()),
    follow_redirects: true,
    max_redirects:    10,
    ..Default::default()
});
}

Registered service name: "http"

Config fieldDefaultDescription
timeout30 sPer-request timeout
user_agentNoneOverride User-Agent header
follow_redirectstrueFollow 3xx responses
max_redirects10Redirect chain limit
proxyNoneHTTP/HTTPS/SOCKS5 proxy URL

REST API Adapter

Purpose-built for structured JSON REST APIs. Handles authentication, automatic multi-strategy pagination, JSON response extraction, and retry — without the caller needing to manage any of that manually.

#![allow(unused)]
fn main() {
use stygian_graph::adapters::rest_api::{RestApiAdapter, RestApiConfig};
use stygian_graph::ports::{ScrapingService, ServiceInput};
use serde_json::json;
use std::time::Duration;

let adapter = RestApiAdapter::with_config(RestApiConfig {
    timeout:      Duration::from_secs(20),
    max_retries:  3,
    ..Default::default()
});

let input = ServiceInput {
    url: "https://api.github.com/repos/rust-lang/rust/issues".to_string(),
    params: json!({
        "auth":       { "type": "bearer", "token": "${env:GITHUB_TOKEN}" },
        "query":      { "state": "open", "per_page": "100" },
        "pagination": { "strategy": "link_header", "max_pages": 10 },
        "response":   { "data_path": "" }
    }),
};
// let output = adapter.execute(input).await?;
}

Registered service name: "rest-api"

Config fields

FieldDefaultDescription
timeout30 sPer-request timeout
max_retries3Retry attempts on transient errors (429, 5xx, network)
retry_base_delay1 sBase for exponential backoff
proxy_urlNoneHTTP/HTTPS/SOCKS5 proxy URL

ServiceInput.params contract

ParamRequiredDefaultDescription
method"GET"GET, POST, PUT, PATCH, DELETE, HEAD
bodyJSON body for POST/PUT/PATCH
body_rawRaw string body (takes precedence over body)
headersExtra request headers object
queryExtra query string parameters object
accept"application/json"Accept header
authnoneAuthentication object (see below)
response.data_pathfull bodyDot path into the JSON response to extract
response.collect_as_arrayfalseForce multi-page results into a JSON array
pagination.strategy"none""none", "offset", "cursor", "link_header"
pagination.max_pages1Maximum pages to fetch

Authentication

# Bearer token
[nodes.params.auth]
type  = "bearer"
token = "${env:API_TOKEN}"

# HTTP Basic
[nodes.params.auth]
type     = "basic"
username = "${env:API_USER}"
password = "${env:API_PASS}"

# API key in header
[nodes.params.auth]
type   = "api_key_header"
header = "X-Api-Key"
key    = "${env:API_KEY}"

# API key in query string
[nodes.params.auth]
type  = "api_key_query"
param = "api_key"
key   = "${env:API_KEY}"

Pagination strategies

StrategyHow it worksBest for
"none"Single requestSimple endpoints
"offset"Increments page_param from start_pageREST APIs with ?page=N
"cursor"Extracts next cursor from cursor_field (dot path), sends as cursor_paramGraphQL-REST hybrids, Stripe-style
"link_header"Follows RFC 8288 Link: <url>; rel="next"GitHub API, GitLab API

Offset example

[nodes.params.pagination]
strategy        = "offset"
page_param      = "page"
page_size_param = "per_page"
page_size       = 100
start_page      = 1
max_pages       = 20

Cursor example

[nodes.params.pagination]
strategy     = "cursor"
cursor_param = "after"
cursor_field = "meta.next_cursor"
max_pages    = 50

Output

ServiceOutput.data — pretty-printed JSON string of the extracted data.

ServiceOutput.metadata:

{
  "url":        "https://...",
  "page_count": 3
}

OpenAPI Adapter

Purpose-built for APIs with an OpenAPI 3.x specification document. At runtime the adapter fetches and caches the spec, resolves the target operation by operationId or "METHOD /path" syntax, binds arguments to path/ query/body parameters, and delegates the actual HTTP call to the inner RestApiAdapter.

#![allow(unused)]
fn main() {
use stygian_graph::adapters::openapi::{OpenApiAdapter, OpenApiConfig};
use stygian_graph::adapters::rest_api::RestApiConfig;
use stygian_graph::ports::{ScrapingService, ServiceInput};
use serde_json::json;
use std::time::Duration;

let adapter = OpenApiAdapter::with_config(OpenApiConfig {
    rest: RestApiConfig {
        timeout:     Duration::from_secs(20),
        max_retries: 3,
        ..Default::default()
    },
});

let input = ServiceInput {
    url: "https://petstore3.swagger.io/api/v3/openapi.json".to_string(),
    params: json!({
        "operation": "findPetsByStatus",
        "args": { "status": "available" },
        "auth": { "type": "api_key_header", "header": "api_key", "key": "${env:PETSTORE_API_KEY}" },
    }),
};
// let output = adapter.execute(input).await?;
}

Registered service name: "openapi"

ServiceInput contract

FieldRequiredDescription
urlURL of the OpenAPI spec document (.json or .yaml)
params.operationoperationId (e.g. "listPets") or "METHOD /path" (e.g. "GET /pet/findByStatus")
params.argsKey/value map of path, query, and body arguments (all merged; adapter classifies them)
params.authSame shape as REST API auth
params.server.urlOverride spec's servers[0].url at runtime
params.rate_limitProactive rate throttle (see below)

Operation resolution

# By operationId
[nodes.params]
operation = "listPets"

# By HTTP method + path
[nodes.params]
operation = "GET /pet/findByStatus"

Argument binding

The adapter classifies each key in params.args against the operation's declared parameters:

Parameter locationWhat happens
in: pathValue is substituted into the URL template ({petId}42)
in: queryValue is appended to the query string
in: body (requestBody)All remaining keys are collected into the JSON request body
[nodes.params]
operation = "getPetById"

[nodes.params.args]
petId = 42          # path parameter — substituted into /pet/{petId}

Rate limiting

Two independent layers operate simultaneously:

  1. Proactive — optional params.rate_limit block enforced before the HTTP call:
[nodes.params.rate_limit]
max_requests = 100
window_secs  = 60
strategy     = "token_bucket"  # or "sliding_window" (default)
max_delay_ms = 30000
  1. Reactive — inherited from RestApiAdapter: a 429 response with a Retry-After header causes an automatic sleep-and-retry without any additional configuration.

Spec caching

The parsed OpenAPI document is cached per spec URL for the lifetime of the adapter instance (or any clone of it — all clones share the same cache Arc). The cache is populated on the first call and reused on every subsequent call, so spec fetching is a one-time cost per URL.

Full TOML example

[[services]]
name = "openapi"
kind = "openapi"

# Unauthenticated list
[[nodes]]
id      = "list-pets"
service = "openapi"
url     = "https://petstore3.swagger.io/api/v3/openapi.json"

[nodes.params]
operation = "findPetsByStatus"

[nodes.params.args]
status = "available"

# API key auth + server override + proactive rate limit
[[nodes]]
id      = "get-pet"
service = "openapi"
url     = "https://petstore3.swagger.io/api/v3/openapi.json"

[nodes.params]
operation = "getPetById"

[nodes.params.args]
petId = 1

[nodes.params.auth]
type   = "api_key_header"
header = "api_key"
key    = "${env:PETSTORE_API_KEY}"

[nodes.params.server]
url = "https://petstore3.swagger.io/api/v3"

[nodes.params.rate_limit]
max_requests = 50
window_secs  = 60
strategy     = "token_bucket"

Output

ServiceOutput.data — pretty-printed JSON string returned by the resolved endpoint.

ServiceOutput.metadata:

{
  "url":              "https://...",
  "page_count":       1,
  "openapi_spec_url": "https://petstore3.swagger.io/api/v3/openapi.json",
  "operation_id":     "findPetsByStatus",
  "method":           "GET",
  "path_template":    "/pet/findByStatus",
  "server_url":       "https://petstore3.swagger.io/api/v3",
  "resolved_url":     "https://petstore3.swagger.io/api/v3/pet/findByStatus"
}

Browser Adapter

Delegates to stygian-browser for JavaScript-rendered pages. Requires the browser feature flag and a Chrome binary.

#![allow(unused)]
fn main() {
use stygian_graph::adapters::{BrowserAdapter, BrowserAdapterConfig};
use stygian_browser::StealthLevel;
use std::time::Duration;

let adapter = BrowserAdapter::with_config(BrowserAdapterConfig {
    headless:          true,
    stealth_level:     StealthLevel::Advanced,
    timeout:           Duration::from_secs(30),
    viewport_width:    1920,
    viewport_height:   1080,
    ..Default::default()
});
}

Registered service name: "browser"

See the Browser Automation section for the full feature set.


AI Adapters

All AI adapters implement AIProvider and perform structured extraction: they receive raw HTML (or text) from an upstream scraping node and return a typed JSON object matching the schema declared in the node config.

Claude (Anthropic)

#![allow(unused)]
fn main() {
use stygian_graph::adapters::ClaudeAdapter;

let adapter = ClaudeAdapter::new(
    std::env::var("ANTHROPIC_API_KEY")?,
    "claude-3-5-sonnet-20241022",
);
}

Registered service name: "ai_claude"

Config fieldDescription
modelModel ID (e.g. claude-3-5-sonnet-20241022)
max_tokensMax response tokens (default 4096)
system_promptOptional system-level instruction
schemaJSON schema for structured output

OpenAI

#![allow(unused)]
fn main() {
use stygian_graph::adapters::OpenAiAdapter;

let adapter = OpenAiAdapter::new(
    std::env::var("OPENAI_API_KEY")?,
    "gpt-4o",
);
}

Registered service name: "ai_openai"

Gemini (Google)

#![allow(unused)]
fn main() {
use stygian_graph::adapters::GeminiAdapter;

let adapter = GeminiAdapter::new(
    std::env::var("GOOGLE_API_KEY")?,
    "gemini-2.0-flash",
);
}

Registered service name: "ai_gemini"

GitHub Copilot

Uses the Copilot API with your personal access token (PAT) or GitHub App credentials.

#![allow(unused)]
fn main() {
use stygian_graph::adapters::CopilotAdapter;

let adapter = CopilotAdapter::new(
    std::env::var("GITHUB_TOKEN")?,
    "gpt-4o",
);
}

Registered service name: "ai_copilot"

Ollama (local)

Run any GGUF model locally without sending data to an external API.

#![allow(unused)]
fn main() {
use stygian_graph::adapters::OllamaAdapter;

let adapter = OllamaAdapter::new(
    "http://localhost:11434",
    "llama3.3",
);
}

Registered service name: "ai_ollama"


AI fallback chain

Adapters can be wrapped in a fallback chain. If the primary provider fails (rate-limit, outage), the next in the list is tried:

#![allow(unused)]
fn main() {
use stygian_graph::adapters::AiFallbackChain;

let chain = AiFallbackChain::new(vec![
    Arc::new(ClaudeAdapter::new(api_key.clone(), "claude-3-5-sonnet-20241022")),
    Arc::new(OpenAiAdapter::new(openai_key, "gpt-4o")),
    Arc::new(OllamaAdapter::new("http://localhost:11434", "llama3.3")),
]);
}

Registered service name: "ai_fallback"


Resilience adapters

Wrap any ScrapingService with circuit breaker and retry logic without touching the underlying implementation:

#![allow(unused)]
fn main() {
use stygian_graph::adapters::resilience::{
    CircuitBreakerImpl, RetryPolicy, ResilientAdapter,
};
use std::time::Duration;

let cb = CircuitBreakerImpl::new(
    5,                           // open after 5 consecutive failures
    Duration::from_secs(120),    // half-open attempt after 2 min
);

let policy = RetryPolicy::exponential(
    3,                           // max 3 attempts
    Duration::from_millis(200),  // initial back-off
    Duration::from_secs(5),      // max back-off
);

let resilient = ResilientAdapter::new(
    Arc::new(http_adapter),
    Arc::new(cb),
    policy,
);
}

Cache adapters

Two in-process cache implementations are included. Both implement CachePort.

BoundedLruCache

Thread-safe LRU with a hard capacity limit:

#![allow(unused)]
fn main() {
use stygian_graph::adapters::BoundedLruCache;
use std::num::NonZeroUsize;

let cache = BoundedLruCache::new(NonZeroUsize::new(10_000).unwrap());
}

DashMapCache

Concurrent hash-map backed cache with a background TTL cleanup task:

#![allow(unused)]
fn main() {
use stygian_graph::adapters::DashMapCache;
use std::time::Duration;

let cache = DashMapCache::new(Duration::from_secs(300)); // 5-minute default TTL
}

GraphQL adapter

The GraphQlService adapter executes queries against any GraphQL endpoint using GraphQlTargetPlugin implementations registered in a GraphQlPluginRegistry.

For most APIs, use GenericGraphQlPlugin via the fluent builder rather than writing a dedicated struct:

#![allow(unused)]
fn main() {
use stygian_graph::adapters::graphql_plugins::generic::GenericGraphQlPlugin;
use stygian_graph::adapters::graphql_throttle::CostThrottleConfig;

let plugin = GenericGraphQlPlugin::builder()
    .name("github")
    .endpoint("https://api.github.com/graphql")
    .bearer_auth("${env:GITHUB_TOKEN}")
    .header("X-Github-Next-Global-ID", "1")
    .cost_throttle(CostThrottleConfig::default())
    .page_size(30)
    .build()
    .expect("name and endpoint required");
}

For runtime-rotating credentials inject an AuthPort:

#![allow(unused)]
fn main() {
use std::sync::Arc;
use stygian_graph::adapters::graphql::{GraphQlConfig, GraphQlService};
use stygian_graph::ports::auth::{EnvAuthPort, ErasedAuthPort};

let service = GraphQlService::new(GraphQlConfig::default(), Some(Arc::new(registry)))
    .with_auth_port(Arc::new(EnvAuthPort::new("MY_API_TOKEN")) as Arc<dyn ErasedAuthPort>);
}

See the GraphQL Plugins page for the full builder reference, AuthPort implementation guide, proactive cost throttling, and custom plugin examples.


Cloudflare Browser Rendering adapter

Submits a multi-page crawl job to the Cloudflare Browser Rendering API, polls until it completes, and returns the aggregated content. All page rendering is done inside Cloudflare's infrastructure — no local Chrome binary needed.

Feature flag: cloudflare-crawl (not included in default or browser; add it explicitly or use full).

Quick start

# Cargo.toml
[dependencies]
stygian-graph = { version = "0.1", features = ["cloudflare-crawl"] }
#![allow(unused)]
fn main() {
use stygian_graph::adapters::cloudflare_crawl::{
    CloudflareCrawlAdapter, CloudflareCrawlConfig,
};
use std::time::Duration;

let adapter = CloudflareCrawlAdapter::with_config(CloudflareCrawlConfig {
    poll_interval: Duration::from_secs(3),
    job_timeout:   Duration::from_secs(120),
    ..Default::default()
});
}

Registered service name: "cloudflare-crawl"

ServiceInput.params contract

All per-request options are passed via ServiceInput.params. account_id and api_token are required; the rest are optional and forwarded verbatim to the Cloudflare API.

Param keyRequiredDefaultDescription
account_idCloudflare account ID
api_tokenCloudflare API token with Browser Rendering permission
output_format"markdown""markdown", "html", or "raw"
max_depthAPI defaultMaximum crawl depth from the seed URL
max_pagesAPI defaultMaximum pages to crawl
url_patternAPI defaultRegex or glob restricting which URLs are followed
modified_sinceAPI defaultISO-8601 timestamp; skip pages not modified since
max_age_secondsAPI defaultSkip cached pages older than this many seconds
static_modefalseSet "true" to skip JS execution (faster, static HTML only)

Config fields

FieldDefaultDescription
poll_interval2 sHow often to poll for job completion
job_timeout5 minHard timeout per crawl job; returns ServiceError::Timeout if exceeded

Output

ServiceOutput.data contains the page content of all crawled pages joined by newlines. ServiceOutput.metadata is a JSON object:

{
  "job_id":        "some-uuid",
  "pages_crawled": 12,
  "output_format": "markdown"
}

TOML pipeline usage

[[nodes]]
id     = "crawl"
type   = "scrape"
target = "https://docs.example.com"

  [nodes.params]
  account_id    = "${env:CF_ACCOUNT_ID}"
  api_token     = "${env:CF_API_TOKEN}"
  output_format = "markdown"
  max_depth     = "3"
  max_pages     = "50"
  url_pattern   = "https://docs.example.com/**"

  [nodes.service]
  name = "cloudflare-crawl"

Error mapping

ConditionStygianError variant
Missing account_id or api_tokenServiceError::Unavailable
Cloudflare API non-2xxServiceError::Unavailable (with CF error code)
Job still pending after job_timeoutServiceError::Timeout
Unexpected response shapeServiceError::InvalidResponse

Request signing adapters

The SigningPort trait lets any adapter attach signatures, HMAC tokens, device attestation headers, or OAuth material to outbound requests without coupling the adapter to the scheme.

AdapterUse case
NoopSigningAdapterPassthrough — no headers added; useful as a default or in unit tests
HttpSigningAdapterDelegate to any external sidecar (Frida RPC bridge, AWS SigV4 server, OAuth 1.0a service, …)
#![allow(unused)]
fn main() {
use std::sync::Arc;
use stygian_graph::adapters::signing::{HttpSigningAdapter, HttpSigningConfig};
use stygian_graph::ports::signing::ErasedSigningPort;

let signer: Arc<dyn ErasedSigningPort> = Arc::new(
    HttpSigningAdapter::new(HttpSigningConfig {
        endpoint: "http://localhost:27042/sign".to_string(),
        ..Default::default()
    })
);
}

See Request Signing for the full sidecar wire format, Frida RPC bridge example, and guide to implementing a pure-Rust SigningPort.

Request Signing

The SigningPort trait decouples request signing from the adapters that make HTTP calls. Any adapter that needs signed outbound requests holds an Arc<dyn ErasedSigningPort> and calls sign() with the request material before dispatching it.

This separation means the calling adapter never knows or cares how requests are signed — whether by a Frida RPC bridge, an AWS SDK, a pure-Rust HMAC function, or a lightweight Python sidecar.


When to use SigningPort

Signing schemeTypical use
Frida RPC bridgeHook native .so signing code inside a running mobile app (Tinder, Snapchat, …) via a thin HTTP sidecar
AWS Signature V4Sign S3 / API Gateway requests; keep IAM credentials out of the graph pipeline
OAuth 1.0aGenerate per-request oauth_signature for Twitter/X API v1 endpoints
Custom HMACAdd X-Request-Signature + X-Signed-At headers required by trading or payment APIs
Timestamp + nonceAnti-replay headers for any API that validates request freshness
Device attestationAttach Play Integrity / Apple DeviceCheck tokens to every request
mTLS client credentialsSurface a client certificate thumbprint as a header when TLS termination is upstream

Input and output types

SigningInput

The request material passed to the signer:

#![allow(unused)]
fn main() {
use stygian_graph::ports::signing::SigningInput;
use serde_json::json;

let input = SigningInput {
    method:  "POST".to_string(),
    url:     "https://api.example.com/v2/messages".to_string(),
    headers: Default::default(),   // headers already present before signing
    body:    Some(b"{\"text\":\"hello\"}".to_vec()),
    context: json!({ "nonce_seed": 42 }),  // arbitrary caller data
};
}
FieldTypeDescription
methodStringHTTP method ("GET", "POST", …)
urlStringFully-qualified target URL
headersHashMap<String, String>Headers already present on the request
bodyOption<Vec<u8>>Raw request body; None for bodyless methods
contextserde_json::ValueCaller-supplied metadata (nonce seeds, session tokens, …)

SigningOutput

The material to merge into the request:

#![allow(unused)]
fn main() {
use stygian_graph::ports::signing::SigningOutput;
use std::collections::HashMap;

let mut headers = HashMap::new();
headers.insert("Authorization".to_string(), "HMAC-SHA256 sig=abc123".to_string());
headers.insert("X-Signed-At".to_string(), "1710676800000".to_string());

let output = SigningOutput {
    headers,
    query_params:  vec![],
    body_override: None,
};
}
FieldTypeDescription
headersHashMap<String, String>Headers to add or override on the request
query_paramsVec<(String, String)>Query parameters to append to the URL
body_overrideOption<Vec<u8>>If Some, replaces the request body (for digest-in-body schemes)

All fields default to empty — a default SigningOutput is a valid no-op.


Built-in adapters

NoopSigningAdapter

Passes requests through unsigned. Use as a default when signing is optional, or to disable signing in tests:

#![allow(unused)]
fn main() {
use stygian_graph::adapters::signing::NoopSigningAdapter;
use stygian_graph::ports::signing::{SigningPort, SigningInput};
use serde_json::json;

tokio::runtime::Runtime::new().unwrap().block_on(async {
let signer = NoopSigningAdapter;

let output = signer.sign(SigningInput {
    method:  "GET".to_string(),
    url:     "https://example.com".to_string(),
    headers: Default::default(),
    body:    None,
    context: json!({}),
}).await.unwrap();

assert!(output.headers.is_empty());
});
}

HttpSigningAdapter

Delegates signing to any external HTTP sidecar. The sidecar receives a JSON payload describing the request and returns the headers / query params / body override to apply.

#![allow(unused)]
fn main() {
use stygian_graph::adapters::signing::{HttpSigningAdapter, HttpSigningConfig};
use std::time::Duration;

let signer = HttpSigningAdapter::new(HttpSigningConfig {
    endpoint:     "http://localhost:27042/sign".to_string(),
    timeout:      Duration::from_secs(5),
    bearer_token: Some("sidecar-secret".to_string()),
    ..Default::default()
});
}

Config fields

FieldDefaultDescription
endpoint"http://localhost:27042/sign"Full URL of the sidecar's sign endpoint
timeout10 sPer-request timeout when calling the sidecar
bearer_tokenNoneBearer token used to authenticate with the sidecar
extra_headers{}Static headers forwarded on every sidecar call

Sidecar wire format

The sidecar receives a POST with this JSON body:

{
  "method":   "GET",
  "url":      "https://api.tinder.com/v2/profile",
  "headers":  { "Content-Type": "application/json" },
  "body_b64": null,
  "context":  {}
}

The request body (if any) is base64-encoded in body_b64. The sidecar must respond with a JSON object:

{
  "headers":      { "X-Auth-Token": "abc123", "X-Signed-At": "1710676800000" },
  "query_params": [],
  "body_b64":     null
}

All response fields are optional — omit any field that your scheme does not use.


Frida RPC bridge

The most common use of HttpSigningAdapter is hooking a mobile app's native signing function via Frida and exposing it through a thin HTTP sidecar.

┌─────────── Your machine ────────────────────────────────┐
│                                                         │
│  stygian-graph pipeline                                 │
│    └─ HttpSigningAdapter → POST http://localhost:27042  │─ adb forward ─►┐
│                                                         │                │
└─────────────────────────────────────────────────────────┘                │
                                                                           ▼
                                                          ┌──── Android device / emulator ────┐
                                                          │                                   │
                                                          │  Frida sidecar (Python/Flask)     │
                                                          │    └─ frida.attach("com.example") │
                                                          │       └─ libauth.so!computeHMAC() │
                                                          │                                   │
                                                          └───────────────────────────────────┘

Example Python sidecar (frida_sidecar.py):

import frida
from flask import Flask, request, jsonify
import base64

session = frida.get_usb_device().attach("com.example.app")
script = session.create_script("""
    const computeHmac = new NativeFunction(
        Module.findExportByName("libauth.so", "_Z11computeHMACPKcS0_"),  // gitleaks:allow
        'pointer', ['pointer', 'pointer']
    );
    rpc.exports.sign = (url, body) =>
        computeHmac(
            Memory.allocUtf8String(url),
            Memory.allocUtf8String(body)
        ).readUtf8String();
""")
script.load()

app = Flask(__name__)

@app.route("/sign", methods=["POST"])
def sign():
    req = request.json
    body = base64.b64decode(req["body_b64"]).decode() if req.get("body_b64") else ""
    sig = script.exports_sync.sign(req["url"], body)
    return jsonify({"headers": {"X-Request-Signature": sig}})

app.run(host="0.0.0.0", port=27042)

Forward the port and point the adapter at it:

adb forward tcp:27042 tcp:27042
#![allow(unused)]
fn main() {
use stygian_graph::adapters::signing::{HttpSigningAdapter, HttpSigningConfig};

let signer = HttpSigningAdapter::new(HttpSigningConfig {
    endpoint: "http://localhost:27042/sign".to_string(),
    ..Default::default()
});
}

Implementing a custom SigningPort

For pure-Rust schemes, implement SigningPort directly — no sidecar needed:

#![allow(unused)]
fn main() {
use std::collections::HashMap;
use std::time::{SystemTime, UNIX_EPOCH};
use stygian_graph::ports::signing::{SigningError, SigningInput, SigningOutput, SigningPort};

pub struct TimestampNonceAdapter {
    secret: Vec<u8>,
}

impl SigningPort for TimestampNonceAdapter {
    async fn sign(&self, _input: SigningInput) -> Result<SigningOutput, SigningError> {
        let ts = SystemTime::now()
            .duration_since(UNIX_EPOCH)
            .map_err(|e| SigningError::Other(e.to_string()))?
            .as_millis()
            .to_string();

        let mut headers = HashMap::new();
        headers.insert("X-Timestamp".to_string(), ts);
        headers.insert("X-Nonce".to_string(), uuid::Uuid::new_v4().to_string());
        // Add HMAC over (method + url + ts) using self.secret here …

        Ok(SigningOutput { headers, ..Default::default() })
    }
}
}

Follow the Custom Adapters guide for the full checklist.


Wiring into a pipeline

Use Arc<dyn ErasedSigningPort> to hold any signer at runtime:

#![allow(unused)]
fn main() {
use std::sync::Arc;
use stygian_graph::adapters::signing::{HttpSigningAdapter, HttpSigningConfig};
use stygian_graph::ports::signing::ErasedSigningPort;

let signer: Arc<dyn ErasedSigningPort> = Arc::new(
    HttpSigningAdapter::new(HttpSigningConfig {
        endpoint: "http://localhost:27042/sign".to_string(),
        ..Default::default()
    })
);

// Pass `signer` to any adapter or service that accepts `Arc<dyn ErasedSigningPort>`
}

ErasedSigningPort is the object-safe version of SigningPort that enables dynamic dispatch. The blanket impl<T: SigningPort> ErasedSigningPort for T means any concrete SigningPort implementation can be erased without additional boilerplate.


Error handling

SigningError converts to StygianError::Service(ServiceError::AuthenticationFailed) via the From trait, so signing failures surface as authentication errors in the pipeline.

VariantMeaning
BackendUnavailable(msg)Sidecar is unreachable (network error, DNS failure)
InvalidResponse(msg)Sidecar returned an unexpected HTTP status or malformed JSON
CredentialsMissing(msg)Signing key or secret was not configured
Timeout(ms)Sidecar did not respond within the configured timeout
Other(msg)Catch-all for any other signing failure

GraphQL Plugins

stygian-graph ships a generic, builder-based GraphQL plugin system built on top of the GraphQlTargetPlugin port trait. Instead of writing a dedicated struct for each API you want to query, reach for GenericGraphQlPlugin.


GenericGraphQlPlugin

GenericGraphQlPlugin implements GraphQlTargetPlugin and is configured entirely via a fluent builder. Only name and endpoint are required; everything else is optional with sensible defaults.

#![allow(unused)]
fn main() {
use stygian_graph::adapters::graphql_plugins::generic::GenericGraphQlPlugin;
use stygian_graph::adapters::graphql_throttle::CostThrottleConfig;

let plugin = GenericGraphQlPlugin::builder()
    .name("github")
    .endpoint("https://api.github.com/graphql")
    .bearer_auth("${env:GITHUB_TOKEN}")
    .header("X-Github-Next-Global-ID", "1")
    .cost_throttle(CostThrottleConfig::default())
    .page_size(30)
    .description("GitHub GraphQL API v4")
    .build()
    .expect("name and endpoint are required");
}

Builder reference

MethodRequiredDescription
.name(impl Into<String>)yesPlugin identifier used in the registry
.endpoint(impl Into<String>)yesFull GraphQL endpoint URL
.bearer_auth(impl Into<String>)noShorthand: sets a Bearer auth token
.auth(GraphQlAuth)noFull auth struct (Bearer, API key, or custom header)
.header(key, value)noAdd a single request header (repeatable)
.headers(HashMap<String, String>)noBulk-replace all headers
.cost_throttle(CostThrottleConfig)noEnable proactive point-budget throttling
.page_size(usize)noDefault page size for paginated queries (default 50)
.description(impl Into<String>)noHuman-readable description
.build()Returns Result<GenericGraphQlPlugin, BuildError>

Auth options

#![allow(unused)]
fn main() {
use stygian_graph::ports::{GraphQlAuth, GraphQlAuthKind};

// Bearer token (most common)
let plugin = GenericGraphQlPlugin::builder()
    .name("shopify")
    .endpoint("https://my-store.myshopify.com/admin/api/2025-01/graphql.json")
    .bearer_auth("${env:SHOPIFY_ACCESS_TOKEN}")
    .build()
    .unwrap();

// Custom header (e.g. X-Shopify-Access-Token)
let plugin = GenericGraphQlPlugin::builder()
    .name("shopify-legacy")
    .endpoint("https://my-store.myshopify.com/admin/api/2025-01/graphql.json")
    .auth(GraphQlAuth {
        kind:        GraphQlAuthKind::Header,
        token:       "${env:SHOPIFY_ACCESS_TOKEN}".to_string(),
        header_name: Some("X-Shopify-Access-Token".to_string()),
    })
    .build()
    .unwrap();
}

Tokens containing ${env:VAR_NAME} are expanded by the pipeline template processor (expand_template) or by GraphQlService::apply_auth, not by EnvAuthPort itself. EnvAuthPort reads the env var value directly — no template syntax is needed when using it as a runtime auth port.


AuthPort — runtime credential management

For credentials that rotate, expire, or need a refresh flow, implement the AuthPort trait and inject it into GraphQlService.

#![allow(unused)]
fn main() {
use stygian_graph::ports::auth::{AuthPort, AuthError, TokenSet};

pub struct MyOAuthPort { /* ... */ }

impl AuthPort for MyOAuthPort {
    async fn load_token(&self) -> Result<Option<TokenSet>, AuthError> {
        // Return None if no cached token exists; Some(token) if you have one.
        Ok(Some(TokenSet {
            access_token:  fetch_stored_token().await?,
            refresh_token: Some(fetch_stored_refresh_token().await?),
            expires_at:    Some(std::time::SystemTime::now()
                               + std::time::Duration::from_secs(3600)),
            scopes:        vec!["read".to_string()],
        }))
    }

    async fn refresh_token(&self) -> Result<TokenSet, AuthError> {
        // Exchange the refresh token for a new access token.
        Ok(TokenSet {
            access_token:  exchange_refresh_token().await?,
            refresh_token: Some(fetch_new_refresh_token().await?),
            expires_at:    Some(std::time::SystemTime::now()
                               + std::time::Duration::from_secs(3600)),
            scopes:        vec!["read".to_string()],
        })
    }
}
}

Wiring into GraphQlService

#![allow(unused)]
fn main() {
use std::sync::Arc;
use stygian_graph::adapters::graphql::{GraphQlConfig, GraphQlService};
use stygian_graph::ports::auth::ErasedAuthPort;

let service = GraphQlService::new(GraphQlConfig::default(), None)
    .with_auth_port(Arc::new(MyOAuthPort { /* ... */ }) as Arc<dyn ErasedAuthPort>);
}

The service calls resolve_token before each request. If the token is expired (or within 60 seconds of expiry), refresh_token is called automatically.

EnvAuthPort — zero-config static token

For non-rotating tokens, EnvAuthPort reads a bearer token from an environment variable at load time:

#![allow(unused)]
fn main() {
use stygian_graph::ports::auth::EnvAuthPort;

let auth = EnvAuthPort::new("GITHUB_TOKEN");
}

If GITHUB_TOKEN is not set, EnvAuthPort::load_token returns Ok(None). An error (AuthError::TokenNotFound) is only raised later when resolve_token is called and finds no token available.


Cost throttling

GraphQL APIs that expose extensions.cost.throttleStatus (Shopify Admin API, Jobber, and others) can be configured for proactive point-budget management.

CostThrottleConfig

#![allow(unused)]
fn main() {
use stygian_graph::ports::graphql_plugin::CostThrottleConfig;

let config = CostThrottleConfig {
    max_points:                  10_000.0,  // bucket capacity
    restore_per_sec:             500.0,     // points restored per second
    min_available:               50.0,      // don't send if fewer points remain
    max_delay_ms:                30_000,    // wait at most 30 s before giving up
    estimated_cost_per_request:  100.0,     // pessimistic per-request reservation
};
}
FieldDefaultDescription
max_points10_000.0Total bucket capacity
restore_per_sec500.0Points/second restored
min_available50.0Points threshold below which we pre-sleep
max_delay_ms30_000Hard ceiling on proactive sleep duration (ms)
estimated_cost_per_request100.0Pessimistic reservation per request to prevent concurrent tasks from all passing the pre-flight check simultaneously

Attach config to a plugin via .cost_throttle(config) on the builder, or override GraphQlTargetPlugin::cost_throttle_config() on a custom plugin implementation.

How budget tracking works

  1. Pre-flight reserve: pre_flight_reserve inspects the current LiveBudget for the plugin. If the projected available points (net of in-flight reservations) fall below min_available + estimated_cost_per_request, it sleeps for the exact duration needed to restore enough points, up to max_delay_ms. It then atomically reserves estimated_cost_per_request points so concurrent tasks immediately see a reduced balance and cannot all pass the pre-flight check simultaneously.
  2. Post-response: update_budget parses extensions.cost.throttleStatus out of the response JSON and updates the per-plugin LiveBudget to the true server-reported balance. release_reservation is then called to remove the in-flight reservation.
  3. Reactive back-off: If a request is throttled anyway (HTTP 429 or extensions.cost signals exhaustion), reactive_backoff_ms computes an exponential delay.

The budgets are stored in a HashMap<String, PluginBudget> keyed by plugin name and protected by a tokio::sync::RwLock, so all concurrent requests share the same view of remaining points.


Request rate limiting

While cost throttling manages point budgets returned by the server, request rate limiting operates entirely client-side: it counts (or tokens) before each request leaves the process. The two systems are complementary and can both be active at the same time.

Enable rate limiting by returning a RateLimitConfig from GraphQlTargetPlugin::rate_limit_config(). The default implementation returns None (disabled).

RateLimitConfig

#![allow(unused)]
fn main() {
use std::time::Duration;
use stygian_graph::ports::graphql_plugin::{RateLimitConfig, RateLimitStrategy};

let config = RateLimitConfig {
    max_requests: 100,                          // requests allowed per window
    window:       Duration::from_secs(60),      // rolling window duration
    max_delay_ms: 30_000,                       // hard cap on pre-flight sleep (ms)
    strategy:     RateLimitStrategy::SlidingWindow,  // algorithm (see below)
};
}
FieldDefaultDescription
max_requests100Maximum requests allowed inside any rolling window
window60 sRolling window duration
max_delay_ms30 000Maximum pre-flight sleep before giving up with an error
strategySlidingWindowRate-limiting algorithm — see below

RateLimitStrategy

RateLimitStrategy selects which algorithm protects outgoing requests. Both variants also honour server-returned Retry-After headers regardless of which is active.

VariantBehaviourBest for
SlidingWindow (default)Counts requests in a rolling time window; blocks new requests once max_requests is reached until old timestamps expireAPIs with strict fixed-window quotas (e.g. "100 req / 60 s")
TokenBucketRefills tokens at max_requests / window per second; absorbed bursts up to max_requests capacity before blockingAPIs that advertise burst allowances — allows short spikes then throttles gracefully

Sliding window example

#![allow(unused)]
fn main() {
use std::time::Duration;
use stygian_graph::ports::graphql_plugin::{RateLimitConfig, RateLimitStrategy};

// GitHub GraphQL API: 5 000 points/hour with a hard req/s ceiling
let config = RateLimitConfig {
    max_requests: 10,
    window:       Duration::from_secs(1),
    max_delay_ms: 5_000,
    strategy:     RateLimitStrategy::SlidingWindow,
};
}

Token bucket example

#![allow(unused)]
fn main() {
use std::time::Duration;
use stygian_graph::ports::graphql_plugin::{RateLimitConfig, RateLimitStrategy};

// Shopify Admin API: 2 req/s sustained, bursts up to 40
let config = RateLimitConfig {
    max_requests: 40,                        // bucket depth = burst capacity
    window:       Duration::from_secs(20),   // refill rate = 40 / 20 = 2 req/s
    max_delay_ms: 30_000,
    strategy:     RateLimitStrategy::TokenBucket,
};
}

Wiring into a custom plugin

#![allow(unused)]
fn main() {
use std::time::Duration;
use stygian_graph::ports::graphql_plugin::{
    GraphQlTargetPlugin, RateLimitConfig, RateLimitStrategy,
};

pub struct ShopifyPlugin { token: String }

impl GraphQlTargetPlugin for ShopifyPlugin {
    fn name(&self)     -> &str { "shopify" }
    fn endpoint(&self) -> &str { "https://my-store.myshopify.com/admin/api/2025-01/graphql.json" }

    fn rate_limit_config(&self) -> Option<RateLimitConfig> {
        Some(RateLimitConfig {
            max_requests: 40,
            window:       Duration::from_secs(20),
            max_delay_ms: 30_000,
            strategy:     RateLimitStrategy::TokenBucket,
        })
    }
    // ...other methods omitted
}
}

For GenericGraphQlPlugin, pass it via the builder:

#![allow(unused)]
fn main() {
use stygian_graph::adapters::graphql_plugins::generic::GenericGraphQlPlugin;
use stygian_graph::ports::graphql_plugin::{RateLimitConfig, RateLimitStrategy};
use std::time::Duration;

let plugin = GenericGraphQlPlugin::builder()
    .name("shopify")
    .endpoint("https://my-store.myshopify.com/admin/api/2025-01/graphql.json")
    .bearer_auth("${env:SHOPIFY_ACCESS_TOKEN}")
    .rate_limit(RateLimitConfig {
        max_requests: 40,
        window:       Duration::from_secs(20),
        max_delay_ms: 30_000,
        strategy:     RateLimitStrategy::TokenBucket,
    })
    .build()
    .expect("name and endpoint are required");
}

Rate limiting vs cost throttling

Request rate limitingCost throttling
What it countsRaw request countQuery complexity points
Data sourceLocal state onlyServer extensions.cost response
Algorithm optionsSliding window, token bucketLeaky-bucket (token refill)
Reactive to 429?Yes — honours Retry-AfterYes — exponential back-off
Enabled byrate_limit_config()cost_throttle_config()

Writing a custom plugin

For complex APIs — multi-tenant endpoints, per-request header mutations, non-standard auth flows — implement GraphQlTargetPlugin directly:

#![allow(unused)]
fn main() {
use std::collections::HashMap;
use stygian_graph::ports::{GraphQlAuth, GraphQlAuthKind};
use stygian_graph::ports::graphql_plugin::{CostThrottleConfig, GraphQlTargetPlugin};

pub struct AcmeApi {
    token: String,
}

impl GraphQlTargetPlugin for AcmeApi {
    fn name(&self) -> &str { "acme" }
    fn endpoint(&self) -> &str { "https://api.acme.io/graphql" }

    fn version_headers(&self) -> HashMap<String, String> {
        [("Acme-Api-Version".to_string(), "2025-01".to_string())]
            .into_iter()
            .collect()
    }

    fn default_auth(&self) -> Option<GraphQlAuth> {
        Some(GraphQlAuth {
            kind:        GraphQlAuthKind::Bearer,
            token:       self.token.clone(),
            header_name: None,
        })
    }

    fn default_page_size(&self) -> usize { 25 }
    fn description(&self) -> &str { "Acme Corp GraphQL API" }
    fn supports_cursor_pagination(&self) -> bool { true }

    // opt-in to proactive throttling
    fn cost_throttle_config(&self) -> Option<CostThrottleConfig> {
        Some(CostThrottleConfig::default())
    }
}
}

Register it the same way as any built-in plugin:

#![allow(unused)]
fn main() {
use std::sync::Arc;
use stygian_graph::application::graphql_plugin_registry::GraphQlPluginRegistry;

let mut registry = GraphQlPluginRegistry::new();
registry.register(Arc::new(AcmeApi { token: /* ... */ }));
}

Custom Adapters

This guide walks through building a production-quality custom adapter from scratch. The worked example implements a hypothetical PlaywrightService — a browser automation adapter that drives the Playwright protocol instead of CDP.

By the end you will know how to:

  • Implement any port trait as a new adapter
  • Register the adapter in the service registry
  • Wrap the adapter in resilience primitives
  • Write integration tests against it

Prerequisites

Read Architecture first. Adapters live in src/adapters/ and implement traits defined in src/ports.rs. The domain never imports adapters — only ports.


Step 1: Choose the right port

PortUse when
ScrapingServiceFetching or processing content in a new way
AIProviderAdding a new LLM or language model API
CachePortAdding a new cache backend (Redis, Memcached, …)
SigningPortAttaching signatures, HMAC tokens, or authentication material to outgoing requests

PlaywrightService fetches rendered HTML, so it implements ScrapingService.


Step 2: Scaffold the adapter

Create src/adapters/playwright.rs:

#![allow(unused)]
fn main() {
//! Playwright browser adapter.

use std::time::Duration;
use serde_json::Value;

use crate::domain::error::{StygianError, ServiceError};
use crate::ports::{ScrapingService, ServiceInput, ServiceOutput};

// ── Configuration ─────────────────────────────────────────────────────────────

/// Configuration for the Playwright adapter.
///
/// # Example
///
/// ```rust
/// use stygian_graph::adapters::playwright::PlaywrightConfig;
///
/// let cfg = PlaywrightConfig {
///     ws_endpoint:     "ws://localhost:3000/playwright".into(),
///     default_timeout: std::time::Duration::from_secs(30),
///     headless:        true,
/// };
/// ```
#[derive(Debug, Clone)]
pub struct PlaywrightConfig {
    pub ws_endpoint:     String,
    pub default_timeout: Duration,
    pub headless:        bool,
}

impl Default for PlaywrightConfig {
    fn default() -> Self {
        Self {
            ws_endpoint:     "ws://localhost:3000/playwright".into(),
            default_timeout: Duration::from_secs(30),
            headless:        true,
        }
    }
}

// ── Adapter ───────────────────────────────────────────────────────────────────

/// Browser adapter backed by a Playwright JSON-RPC server.
pub struct PlaywrightService {
    config: PlaywrightConfig,
}

impl PlaywrightService {
    pub fn new(config: PlaywrightConfig) -> Self {
        Self { config }
    }
}

// ── Port implementation ───────────────────────────────────────────────────────

impl ScrapingService for PlaywrightService {
    fn name(&self) -> &'static str {
        "playwright"
    }

    /// Navigate to `input.url`, wait for network idle, return rendered HTML.
    ///
    /// # Errors
    ///
    /// `ServiceError::Unavailable` — Playwright server is unreachable.  
    /// `ServiceError::Timeout`     — navigation exceeded `default_timeout`.
    async fn execute(&self, input: ServiceInput) -> crate::domain::error::Result<ServiceOutput> {
        // Real implementation would:
        //   1. Connect to self.config.ws_endpoint via WebSocket
        //   2. Send Browser.newPage  
        //   3. Navigate to input.url with self.config.default_timeout
        //   4. Await "networkidle" lifecycle event
        //   5. Call Page.getContent() for rendered HTML
        //   6. Close the page
        Err(StygianError::Service(ServiceError::Unavailable(
            format!("PlaywrightService not connected (url={})", input.url),
        )))
    }
}
}

Step 3: Re-export

Add a pub mod playwright; line to src/adapters/mod.rs:

#![allow(unused)]
fn main() {
// src/adapters/mod.rs
pub mod http;
pub mod browser;
pub mod claude;
// ... existing adapters ...
pub mod playwright;   // ← add this line
}

Step 4: Register in the service registry

In your binary entry point or application/executor.rs startup code:

#![allow(unused)]
fn main() {
use std::sync::Arc;
use stygian_graph::adapters::playwright::{PlaywrightConfig, PlaywrightService};
use stygian_graph::application::registry::ServiceRegistry;

let registry = ServiceRegistry::new();

let config = PlaywrightConfig {
    ws_endpoint: std::env::var("PLAYWRIGHT_WS_ENDPOINT")
        .unwrap_or_else(|_| "ws://localhost:3000/playwright".into()),
    ..Default::default()
};

registry.register(
    "playwright".into(),
    Arc::new(PlaywrightService::new(config)),
);
}

Pipelines can now reference the adapter with service = "playwright" in any node.


Step 5: Add resilience wrappers

Wrap before registering to get circuit-breaker and retry behaviour for free:

#![allow(unused)]
fn main() {
use std::sync::Arc;
use std::time::Duration;
use stygian_graph::adapters::resilience::{CircuitBreakerImpl, RetryPolicy, ResilientAdapter};

let cb = CircuitBreakerImpl::new(
    5,                          // open after 5 consecutive failures
    Duration::from_secs(120),   // half-open probe after 2 min
);

let policy = RetryPolicy::exponential(
    3,                          // max 3 attempts
    Duration::from_millis(200), // initial back-off
    Duration::from_secs(5),     // cap
);

let resilient = ResilientAdapter::new(
    Arc::new(PlaywrightService::new(config)),
    Arc::new(cb),
    policy,
);

registry.register("playwright".into(), Arc::new(resilient));
}

Step 6: Integration tests

Use stygian's built-in mock transport for unit tests that don't require a real server:

#![allow(unused)]
fn main() {
#[cfg(test)]
mod tests {
    use super::*;
    use stygian_graph::ports::ServiceInput;

    fn make_input(url: &str) -> ServiceInput {
        ServiceInput { url: url.into(), config: serde_json::Value::Null, ..Default::default() }
    }

    #[tokio::test]
    async fn returns_unavailable_when_not_connected() {
        let svc = PlaywrightService::new(PlaywrightConfig::default());
        let err = svc.execute(make_input("https://example.com")).await.unwrap_err();
        assert!(err.to_string().contains("not connected"));
    }
}
}

For full integration tests that require a real Playwright server, mark them with #[ignore = "requires playwright server"] so CI passes without the dependency.


Adapter checklist

Before merging a new adapter:

  • Implements the correct port trait
  • name() returns a lowercase, kebab-case identifier
  • All public types have doc comments with an example
  • Default impl provided where sensible
  • Config can be loaded from environment variables
  • At least one unit test that does not require external services
  • Integration tests marked #[ignore = "requires …"]
  • Re-exported from src/adapters/mod.rs
  • Registered in the service registry example in the binary

Distributed Execution

stygian-graph supports horizontal scaling via a Redis/Valkey-backed work queue. Multiple worker processes can consume from the same queue, enabling throughput that scales linearly with the number of nodes.


Overview

In distributed mode, the DistributedDagExecutor splits pipeline execution across a shared work queue:

Producer process              Redis/Valkey               Worker processes
─────────────────             ──────────────             ─────────────────────
PipelineParser                                           DistributedDagExecutor
    ↓                                                          ↓
DagExecutor            ──── work items ────►          ServiceRegistry
    ↓                                                          ↓
IdempotencyKey         ◄─── results ────────          ScrapingService / AI

Setup

1. Start Redis or Valkey

# Docker — Valkey (Redis-compatible, open-source fork)
docker run -d --name valkey -p 6379:6379 valkey/valkey:8

# Or Redis
docker run -d --name redis -p 6379:6379 redis:7

2. Enable the distributed feature

# Cargo.toml
stygian-graph = { version = "0.1", features = ["distributed"] }

3. Create a work queue and executor

use stygian_graph::adapters::{DistributedDagExecutor, RedisWorkQueue};
use stygian_graph::application::registry::ServiceRegistry;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let queue    = RedisWorkQueue::new("redis://localhost:6379").await?;
    let registry = ServiceRegistry::default_with_env()?;

    // Spawn 10 concurrent workers on this process
    let executor = DistributedDagExecutor::new(Arc::new(queue), registry, 10);

    // Execute a wave of tasks
    let results = executor
        .execute_wave("pipeline-id-1", tasks, &services)
        .await?;

    for result in results {
        println!("{}", serde_json::to_string_pretty(&result)?);
    }
    Ok(())
}

Work queue operations

RedisWorkQueue implements the WorkQueue port trait:

#![allow(unused)]
fn main() {
pub trait WorkQueue: Send + Sync {
    /// Enqueue a task and return its unique ID.
    async fn enqueue(&self, task: Task) -> Result<TaskId>;

    /// Block until a task is available; returns `None` if queue is empty and closed.
    async fn dequeue(&self) -> Result<Option<Task>>;

    /// Acknowledge successful completion.
    async fn ack(&self, id: &TaskId) -> Result<()>;

    /// Return a task to the queue for another worker to retry.
    async fn nack(&self, id: &TaskId) -> Result<()>;
}
}

Tasks that are nack-ed (e.g. worker crashes mid-execution) are requeued automatically after a visibility timeout.


Idempotency in distributed mode

Every task carries an IdempotencyKey. Before executing, the worker checks a shared idempotency store (backed by the same Redis instance):

  • Key not seen — execute, store result under key.
  • Key already present — return stored result immediately, skip execution.

This makes distributed task execution safe to retry — duplicate network deliveries and worker restarts produce the same observable outcome.

#![allow(unused)]
fn main() {
use stygian_graph::domain::idempotency::IdempotencyKey;

// Deterministic key from pipeline id + input URL — replays the same result
let key = IdempotencyKey::from_input("pipeline-1", "https://example.com/item/42");
}

Pipeline config for distributed mode

Enable distributed execution in a TOML pipeline:

[execution]
mode    = "distributed"
queue   = "redis://localhost:6379"
workers = 20

[[nodes]]
id      = "fetch"
service = "http"

[[nodes]]
id      = "extract"
service = "ai_claude"

[[edges]]
from = "fetch"
to   = "extract"

Scaling tips

  • Worker count — start at num_cpus * 4 for I/O-heavy pipelines; reduce for CPU-heavy extraction workloads.
  • Redis connection pool — the adapter maintains a pool internally; set REDIS_MAX_CONNECTIONS to tune (default 20).
  • Backpressure — the work queue has a configurable maximum depth. When full, enqueue() blocks the producer, preventing runaway memory growth.
  • Multiple queues — use separate queue names per pipeline type to prevent high-volume crawls from starving high-priority extractions.

Observability

stygian-graph exposes structured metrics and distributed tracing out of the box. Both are opt-in: neither requires code changes to your adapters or domain logic.


Prometheus metrics

Enable the metrics feature flag:

stygian-graph = { version = "0.1", features = ["metrics"] }

Creating a collector

#![allow(unused)]
fn main() {
use stygian_graph::application::MetricsCollector;

let metrics = MetricsCollector::new();
}

MetricsCollector registers counters, histograms, and gauges on the global Prometheus registry automatically. It is Clone + Send + Sync and safe to share across threads.

Exposing /metrics

Attach the Prometheus scrape handler to any HTTP server. Example with Axum:

#![allow(unused)]
fn main() {
use axum::{Router, routing::get};
use stygian_graph::application::MetricsCollector;

let metrics  = MetricsCollector::new();
let handler  = metrics.prometheus_handler();

let app = Router::new()
    .route("/metrics", get(handler))
    .route("/health",  get(|| async { "ok" }));

axum::serve(listener, app).await?;
}

Available metrics

Metric nameTypeLabelsDescription
stygian_requests_totalcounterservice, statusTotal requests per adapter
stygian_request_duration_secondshistogramserviceRequest latency distribution
stygian_errors_totalcounterservice, error_kindErrors by type
stygian_worker_pool_activegaugepoolActive workers
stygian_worker_pool_queuedgaugepoolQueued tasks
stygian_circuit_breaker_stategaugeservice0=closed, 1=open, 2=half-open
stygian_cache_hits_totalcountercacheCache hits
stygian_cache_misses_totalcountercacheCache misses

Structured tracing

stygian-graph instruments all hot paths with the tracing crate. Any compatible subscriber (JSON, OTLP, Jaeger) receives full span trees.

Basic JSON logging

#![allow(unused)]
fn main() {
use tracing_subscriber::{layer::SubscriberExt, util::SubscriberInitExt, EnvFilter};

tracing_subscriber::registry()
    .with(EnvFilter::from_default_env()
        .add_directive("stygian_graph=debug".parse()?)
        .add_directive("stygian_browser=info".parse()?))
    .with(tracing_subscriber::fmt::layer().json())
    .init();
}

Set RUST_LOG=stygian_graph=trace at runtime for full span output.

OpenTelemetry export (Jaeger / OTLP)

[dependencies]
opentelemetry          = "0.22"
opentelemetry-otlp     = { version = "0.15", features = ["grpc-tonic"] }
tracing-opentelemetry  = "0.23"
#![allow(unused)]
fn main() {
use opentelemetry_otlp::WithExportConfig;
use tracing_opentelemetry::OpenTelemetryLayer;

let tracer = opentelemetry_otlp::new_pipeline()
    .tracing()
    .with_exporter(
        opentelemetry_otlp::new_exporter()
            .tonic()
            .with_endpoint("http://localhost:4317"),
    )
    .install_batch(opentelemetry::runtime::Tokio)?;

tracing_subscriber::registry()
    .with(EnvFilter::from_default_env())
    .with(OpenTelemetryLayer::new(tracer))
    .init();
}

Key spans

SpanAttributesEmitted by
dag_executepipeline_id, node_count, wave_countDagExecutor
wave_executewave, node_ids[]DagExecutor
service_callservice, urlServiceRegistry
ai_extractprovider, model, tokens_in, tokens_outAI adapters
cache_lookuphit, key_prefixCache adapters
circuit_breakerservice, state_transitionCircuitBreakerImpl

Health checks

MetricsCollector exposes a health-check endpoint that reports the state of every registered service:

#![allow(unused)]
fn main() {
let health_json = metrics.health_check(&registry).await;
// {"status":"ok","services":{"http":"healthy","ai_claude":"healthy"}}
}

A service is reported as "degraded" when its circuit breaker is half-open, and "unhealthy" when it is open.

Performance Tuning

This guide covers strategies for maximising throughput and efficiency: worker pool sizing, channel depth, memory management, cache tuning, and profiling.


Worker pool sizing

WorkerPool bounds concurrency for a set of services. The optimal size depends on whether work is I/O-bound or CPU-bound.

I/O-bound services (HTTP, browser, AI APIs)

Network operations spend most of their time waiting for remote responses. A large pool hides latency with concurrency:

#![allow(unused)]
fn main() {
// Rule of thumb: 10–100× logical CPU count
let concurrency = num_cpus::get() * 50;
let queue_depth  = concurrency * 4;   // back-pressure buffer

let pool = WorkerPool::new(concurrency, queue_depth);
}

Start with 50× and adjust based on:

  • Target API rate limits (reduce if hitting 429s)
  • Available file descriptors (ulimit -n)
  • Memory pressure (every in-flight request buffers a response body)

CPU-bound services (parsing, transforms, NLP)

CPU work cannot overlap on the same core. Match the pool to physical cores to avoid context-switch overhead:

#![allow(unused)]
fn main() {
// Rule of thumb: 1–2× logical CPU count
let concurrency = num_cpus::get();
let queue_depth  = concurrency * 2;

let pool = WorkerPool::new(concurrency, queue_depth);
}

Mixed workloads

When a pipeline mixes HTTP fetching and CPU-heavy extraction, use separate pools per service type:

#![allow(unused)]
fn main() {
let http_pool = WorkerPool::new(num_cpus::get() * 40, 512);
let cpu_pool  = WorkerPool::new(num_cpus::get(),        32);
}

Back-pressure

queue_depth controls how many tasks accumulate before callers block. A shallow queue applies back-pressure sooner, limiting memory. A deep queue smooths bursts but can hide overload. A ratio of 4–8× concurrency is a good starting point.


Channel sizing

Stygian uses bounded tokio::sync::mpsc channels internally.

Channel depthThroughputLatencyMemory
1LowLowMinimal
64 (default)MediumMediumLow
512HighHigherModerate
UnboundedMaxVariableUncapped

Always use bounded channels in library code. Unbounded channels are only appropriate for producers with a known-small, finite rate (e.g. a timer).

Detect saturation at runtime: when capacity - len consistently approaches zero, the downstream consumer is a bottleneck — either scale it out or increase depth.


Memory optimisation

Limit response body size

Unbounded response buffering will exhaust memory on large responses:

#![allow(unused)]
fn main() {
const MAX_BODY_BYTES: usize = 8 * 1024 * 1024; // 8 MiB

let body = response.bytes().await?;
if body.len() > MAX_BODY_BYTES {
    return Err(ScrapingError::ResponseTooLarge(body.len()));
}
}

Arena allocation for wave processing

Allocate all intermediate data for a wave from a single arena and free it in one shot:

# Cargo.toml
bumpalo = "3"
#![allow(unused)]
fn main() {
use bumpalo::Bump;

async fn process_wave(inputs: &[ServiceInput]) {
    let arena = Bump::new();
    let urls: Vec<&str> = inputs
        .iter()
        .map(|i| arena.alloc_str(&i.url))
        .collect();
    // arena dropped here — all scratch freed in one operation
}
}

Buffer pooling

Avoid allocating a fresh Vec<u8> for every HTTP response — reuse from a pool:

#![allow(unused)]
fn main() {
use tokio::sync::Mutex;

struct BufferPool {
    pool:     Mutex<Vec<Vec<u8>>>,
    buf_size: usize,
}

impl BufferPool {
    async fn acquire(&self) -> Vec<u8> {
        let mut p = self.pool.lock().await;
        p.pop().unwrap_or_else(|| Vec::with_capacity(self.buf_size))
    }

    async fn release(&self, mut buf: Vec<u8>) {
        buf.clear();
        self.pool.lock().await.push(buf);
    }
}
}

Cache tuning

Measure hit rate first

A cache that rarely hits wastes memory and adds lookup overhead:

#![allow(unused)]
fn main() {
use std::sync::atomic::{AtomicU64, Ordering::Relaxed};

struct CacheMetrics { hits: AtomicU64, misses: AtomicU64 }

fn hit_rate(m: &CacheMetrics) -> f64 {
    let h = m.hits.load(Relaxed)   as f64;
    let m = m.misses.load(Relaxed) as f64;
    if h + m == 0.0 { return 0.0; }
    h / (h + m)
}
}

Target ≥ 0.80 for URL-keyed caches before increasing capacity.

LRU vs DashMap

BoundedLruCacheDashMapCache
Eviction policyLRU (capacity-based)TTL-based (background task)
Best forFixed working setTime-sensitive data
Concurrency overheadHigher (LRU list mutex)Lower (sharded map)

For AI extraction results with a 24 h freshness requirement, use DashMapCache with ttl = Duration::from_secs(86_400). For deduplication of seen URLs, BoundedLruCache is the right choice.


Rayon vs Tokio

TokioRayon
Use forI/O-bound work (network, disk)CPU-bound work (parsing, transforms)
Blocking?No — async tasks never block threadsYes — tasks block Rayon threads
Thread poolShared Tokio runtimeSeparate Rayon thread pool

Rule: never call synchronous CPU-heavy work directly in a Tokio task. Offload with tokio::task::spawn_blocking or rayon::spawn:

#![allow(unused)]
fn main() {
// CPU-heavy HTML parsing — do NOT do this in an async fn directly
let html = response.text().await?;
let extracted = tokio::task::spawn_blocking(move || {
    heavy_parsing(html)
}).await??;
}

Profiling

Flamegraph with cargo-flamegraph

cargo install flamegraph
# Profile a benchmark
cargo flamegraph --bench dag_executor -- --bench

Criterion benchmarks

The stygian-graph crate ships Criterion benchmarks in benches/:

cargo bench                       # run all benchmarks
cargo bench dag_executor          # run a specific group
cargo bench -- --save-baseline v0 # save baseline for comparison
cargo bench -- --load-baseline v0 # compare against baseline

Key benchmark targets:

BenchmarkWhat it measures
dag_executor/wave_1010-node wave execution overhead
dag_executor/wave_100100-node wave execution overhead
http_adapter/singleSingle HTTP request round-trip
cache/lru_hitLRU cache read under contention

Benchmarks (Apple M4 Pro)

OperationLatency
DAG executor overhead per wave~50 µs
HTTP adapter (cached DNS)~2 ms
Browser acquisition (warm pool)<100 ms
LRU cache read (1 M ops/s)~1 µs

Browser Automation Overview

stygian-browser is a high-performance, anti-detection browser automation library for Rust. It is built on the Chrome DevTools Protocol via chromiumoxide and ships a comprehensive suite of stealth features for bypassing modern bot-detection systems.


Feature summary

FeatureDescription
Browser poolingWarm pool with configurable min/max, LRU eviction, backpressure, and per-context segregation
Anti-detectionnavigator spoofing, canvas noise, WebGL randomisation, UA patching
Headless modeHeadlessMode::New default (--headless=new) — shares Chrome's headed rendering pipeline, harder to fingerprint-detect
Human behaviourBézier-curve mouse paths, realistic keystroke timing, typo simulation
CDP leak protectionHides Runtime.enable artifacts that expose automation
WebRTC controlBlock, proxy-route, or allow — prevents IP leaks
Fingerprint generationStatistically-weighted device profiles (Windows, Mac, Linux, mobile)
Stealth levelsNone / Basic / Advanced — tune evasion vs. performance
Resource filteringBlock images, fonts, media per-tab to speed up text scraping
Cookie persistenceSave/restore full session state (cookies + localStorage); inject_cookies() for seeding individual tokens

Use cases

ScenarioRecommended config
Public HTML scraping (no bot detection)StealthLevel::None, HTTP adapter preferred
Single-page app renderingStealthLevel::Basic, WaitUntil::NetworkIdle
Cloudflare / DataDome protected sitesStealthLevel::Advanced
Price monitoring (authenticated sessions)StealthLevel::Basic + cookie persistence
CAPTCHA-adjacent flowsStealthLevel::Advanced + human behaviour + proxy

Quick start

use stygian_browser::{BrowserConfig, BrowserPool, WaitUntil};
use std::time::Duration;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Default config: headless, Advanced stealth, 2 warm browsers
    let pool   = BrowserPool::new(BrowserConfig::default()).await?;
    let handle = pool.acquire().await?;            // < 100 ms from warm pool

    let mut page = handle.browser().new_page().await?;

    page.navigate(
        "https://example.com",
        WaitUntil::Selector("body".to_string()),
        Duration::from_secs(30),
    ).await?;

    println!("Title: {}", page.title().await?);
    println!("HTML:  {} bytes", page.content().await?.len());

    handle.release().await;               // returns browser to pool
    Ok(())
}

Performance targets

OperationTarget
Browser acquisition (warm pool)< 100 ms
Browser launch (cold start)< 2 s
Advanced stealth injection overhead10–30 ms per page
Pool health check< 5 ms

Platform support

PlatformStatus
macOS (Apple Silicon / Intel)Fully supported, actively tested
Linux (x86-64, ARM64)Fully supported, CI tested
WindowsDepends on chromiumoxide backend; not actively tested
Headless CI (GitHub Actions)Supported — default config is headless: true

Installation

[dependencies]
stygian-browser = "0.1"
tokio            = { version = "1", features = ["full"] }

To disable stealth features for a minimal build:

stygian-browser = { version = "0.1", default-features = false }

Chrome 120+ must be available on the system or specified via STYGIAN_CHROME_PATH. On CI, install it with apt-get install google-chrome-stable or use the browser-actions/setup-chrome GitHub Action.

Configuration

All browser behaviour is controlled through BrowserConfig. Every field can be set programmatically or overridden at runtime via environment variables — no recompilation needed.


Builder pattern

#![allow(unused)]
fn main() {
use stygian_browser::{BrowserConfig, HeadlessMode, StealthLevel};
use stygian_browser::config::PoolConfig;
use stygian_browser::webrtc::{WebRtcConfig, WebRtcPolicy};
use std::time::Duration;

let config = BrowserConfig::builder()
    // ── Browser ──────────────────────────────────────────────────────────
    .headless(true)
    .headless_mode(HeadlessMode::New)      // default; --headless=new shares headed rendering
    .window_size(1920, 1080)
    // .chrome_path("/usr/bin/google-chrome".into())   // auto-detect if omitted
    // .user_data_dir("/tmp/my-profile")   // omit for auto unique temp dir per instance

    // ── Stealth ───────────────────────────────────────────────────────────
    .stealth_level(StealthLevel::Advanced)

    // ── Network ───────────────────────────────────────────────────────────
    // .proxy("http://user:pass@proxy.example.com:8080".to_string())
    .webrtc(WebRtcConfig {
        policy: WebRtcPolicy::DisableNonProxied,
        ..Default::default()
    })

    // ── Pool ──────────────────────────────────────────────────────────────
    .pool(PoolConfig {
        min_size:        2,
        max_size:        10,
        idle_timeout:    Duration::from_secs(300),
        acquire_timeout: Duration::from_secs(10),
    })
    .build();
}

Field reference

Browser settings

FieldTypeDefaultDescription
headlessbooltrueRun without visible window
headless_modeHeadlessModeNewNew = --headless=new (full Chromium rendering, default since Chrome 112); Legacy = classic --headless flag for Chromium < 112
window_sizeOption<(u32, u32)>(1920, 1080)Browser viewport dimensions
chrome_pathOption<PathBuf>auto-detectPath to Chrome/Chromium binary
stealth_levelStealthLevelAdvancedAnti-detection level
proxyOption<String>NoneProxy URL (http://, https://, socks5://)
user_data_dirOption<PathBuf>auto-generatedPer-instance temp dir ($TMPDIR/stygian-<id>); set explicitly to share a persistent profile. Auto-generation prevents SingletonLock races between concurrent pools.
argsVec<String>[]Additional Chrome command-line flags

Pool settings (PoolConfig)

FieldTypeDefaultDescription
min_sizeusize2Browsers kept warm at all times
max_sizeusize10Maximum concurrent browsers
idle_timeoutDuration5 minEvict idle browser after this duration
acquire_timeoutDuration30 sMax wait for a pool slot

WebRTC settings (WebRtcConfig)

FieldTypeDefaultDescription
policyWebRtcPolicyDisableNonProxiedWebRTC IP leak policy
locationOption<ProxyLocation>NoneSimulated geo-location for WebRTC

WebRtcPolicy variants:

VariantBehaviour
AllowDefault browser behaviour — real IPs may leak
DisableNonProxiedBlock direct connections; only proxied paths allowed
BlockAllBlock all WebRTC — safest for anonymous scraping

Environment variable overrides

All config values can be overridden without touching source code:

VariableDefaultDescription
STYGIAN_CHROME_PATHauto-detectPath to Chrome/Chromium binary
STYGIAN_HEADLESStrueSet false for headed mode
STYGIAN_HEADLESS_MODEnewnew (--headless=new) or legacy (classic --headless for Chromium < 112)
STYGIAN_STEALTH_LEVELadvancednone, basic, advanced
STYGIAN_POOL_MIN2Minimum warm browsers
STYGIAN_POOL_MAX10Maximum concurrent browsers
STYGIAN_POOL_IDLE_SECS300Idle timeout before browser eviction
STYGIAN_POOL_ACQUIRE_SECS5Seconds to wait for a pool slot
STYGIAN_LAUNCH_TIMEOUT_SECS10Browser launch timeout
STYGIAN_CDP_TIMEOUT_SECS30Per-operation CDP timeout
STYGIAN_CDP_FIX_MODEaddBindingaddBinding, isolatedworld, enabledisable
STYGIAN_PROXYProxy URL
STYGIAN_PROXY_BYPASSComma-separated proxy bypass list (e.g. <local>,localhost)
STYGIAN_DISABLE_SANDBOXauto-detecttrue inside containers, false on bare metal

Examples

Minimal — fast text scraping

#![allow(unused)]
fn main() {
use stygian_browser::{BrowserConfig, StealthLevel};

let config = BrowserConfig::builder()
    .headless(true)
    .stealth_level(StealthLevel::None)  // no overhead
    .build();
}

Headed debugging session

#![allow(unused)]
fn main() {
let config = BrowserConfig::builder()
    .headless(false)
    .stealth_level(StealthLevel::Basic)
    .build();
}

Proxy with full stealth

#![allow(unused)]
fn main() {
use stygian_browser::webrtc::{WebRtcConfig, WebRtcPolicy};

let config = BrowserConfig::builder()
    .headless(true)
    .stealth_level(StealthLevel::Advanced)
    .proxy("socks5://user:pass@proxy.example.com:1080".to_string())
    .webrtc(WebRtcConfig { policy: WebRtcPolicy::BlockAll, ..Default::default() })
    .build();
}

Anti-detection for JS-heavy sites (X/Twitter, LinkedIn)

StealthLevel::Advanced combined with HeadlessMode::New is the most evasion-resistant configuration. HeadlessMode::New is the default since v0.1.11 — existing code elevates automatically.

#![allow(unused)]
fn main() {
use stygian_browser::{BrowserConfig, HeadlessMode, StealthLevel};
use stygian_browser::webrtc::{WebRtcConfig, WebRtcPolicy};

let config = BrowserConfig::builder()
    .headless(true)
    .headless_mode(HeadlessMode::New)   // default; shared rendering pipeline with headed Chrome
    .stealth_level(StealthLevel::Advanced)
    .webrtc(WebRtcConfig { policy: WebRtcPolicy::BlockAll, ..Default::default() })
    .build();
}

For Chromium ≥ 112 (all modern Chrome / Chromium builds), New is the right choice. Legacy falls back to the classic --headless flag which uses an older rendering pipeline — use it only when targeting Chromium < 112.

#![allow(unused)]
fn main() {
// Only needed for Chromium < 112
let config = BrowserConfig::builder()
    .headless_mode(HeadlessMode::Legacy)
    .build();
}

Stealth & Anti-Detection

stygian-browser implements a layered anti-detection system. Each layer targets a different class of bot-detection signal.


Stealth levels

Levelnavigator spoofCanvas noiseWebGL randomCDP protectionHuman behaviour
None
Basic
Advanced

Trade-offs:

  • None — maximum performance; no evasion. Suitable for internal services or sites with no bot detection.
  • Basic — hides navigator.webdriver, masks the headless User-Agent, enables CDP protection. Adds < 1 ms overhead. Appropriate for most scraping workloads.
  • Advanced — full fingerprint injection (canvas, WebGL, audio, fonts, hardware concurrency, device memory) plus human-like mouse and keyboard events. Adds 10–30 ms per page but passes all major detection suites.

Headless mode

The classic --headless flag (HeadlessMode::Legacy) is a well-known detection signal: sites like X/Twitter and LinkedIn inspect the Chrome renderer version string and reject old-headless sessions before any session state is even checked.

Since v0.1.11, stygian-browser defaults to --headless=new (HeadlessMode::New), which shares the same rendering pipeline as headed Chrome and is significantly harder to fingerprint-detect.

#![allow(unused)]
fn main() {
use stygian_browser::{BrowserConfig, HeadlessMode};

// Default since v0.1.11 — no change needed for existing code
let config = BrowserConfig::builder()
    .headless_mode(HeadlessMode::New)
    .build();

// Legacy mode: only needed for Chromium < 112
let config = BrowserConfig::builder()
    .headless_mode(HeadlessMode::Legacy)
    .build();
}

Or via env var (no recompilation):

STYGIAN_HEADLESS_MODE=legacy cargo run   # opt back to old behaviour

Executed on every new document context before any page script runs.

  • Sets navigator.webdriverundefined
  • Patches navigator.plugins with a realistic PluginArray
  • Sets navigator.languages, navigator.language, navigator.vendor
  • Aligns navigator.hardwareConcurrency and navigator.deviceMemory with the chosen device fingerprint

Two layers of protection prevent webdriver detection:

  1. Instance patchObject.defineProperty(navigator, 'webdriver', { get: () => undefined }) hides the flag from direct access (navigator.webdriver === undefined).
  2. Prototype patchObject.defineProperty(Navigator.prototype, 'webdriver', ...) hides the underlying getter from Object.getOwnPropertyDescriptor(Navigator.prototype, 'webdriver'), which some scanners (e.g. pixelscan.net, Akamai) probe directly.

Both patches are injected into every new document context before any page script runs.

The fingerprint is drawn from statistically-weighted device profiles:

#![allow(unused)]
fn main() {
use stygian_browser::fingerprint::{DeviceProfile, Platform};

let profile = DeviceProfile::random();   // weighted: Windows 60%, Mac 25%, Linux 15%
println!("Platform:    {:?}", profile.platform);
println!("CPU cores:   {}",   profile.hardware_concurrency);
println!("Device RAM:  {} GB", profile.device_memory);
println!("Screen:      {}×{}", profile.screen_width, profile.screen_height);
}

Canvas fingerprint noise

HTMLCanvasElement.toDataURL() and CanvasRenderingContext2D.getImageData() are patched to add sub-pixel noise (< 1 px) — visually indistinguishable but unique per page load, preventing cross-site canvas fingerprint correlation.

The noise function is applied via JavaScript injection into each document:

// Simplified representation of the injected script
const origToDataURL = HTMLCanvasElement.prototype.toDataURL;
HTMLCanvasElement.prototype.toDataURL = function(...args) {
    const result = origToDataURL.apply(this, args);
    return injectNoise(result, sessionNoiseSeed);
};

WebGL randomisation

GPU-based fingerprinting reads RENDERER and VENDOR strings from WebGL. These are intercepted and replaced with plausible — but randomised — GPU family names:

Real valueSpoofed value (example)
ANGLE (Apple, ANGLE Metal Renderer: Apple M4 Pro, Unspecified Version)ANGLE (NVIDIA, ANGLE Metal Renderer: NVIDIA GeForce RTX 3070 Ti)
Google SwiftShaderANGLE (Intel, ANGLE Metal Renderer: Intel Iris Pro)

The spoofed values are consistent within a session and coherent with the chosen device profile.


CDP leak protection

The Chrome DevTools Protocol itself can expose automation. Three modes are available, set via STYGIAN_CDP_FIX_MODE or BrowserConfig::cdp_fix_mode:

ModeProtectionCompatibility
AddBinding (default)Wraps calls to hide Runtime.enable side-effectsBest overall
IsolatedWorldRuns injection in a separate execution contextModerate
EnableDisableToggles enable/disable around each commandBroad

Human behaviour simulation (Advanced only)

Mouse movement — MouseSimulator

Generates Bézier-curve paths with natural arc shapes:

  • Distance-aware step counts (12 steps for < 100 px, up to 120 for > 1 000 px)
  • Perpendicular control-point offsets for curved trajectories
  • Sub-pixel micro-tremor jitter (± 0.3 px per step)
  • 10–50 ms inter-event delays
#![allow(unused)]
fn main() {
use stygian_browser::behavior::MouseSimulator;

let sim = MouseSimulator::new();
// Move from (100, 200) to (450, 380) with realistic arc
sim.move_to(&page, 100.0, 200.0, 450.0, 380.0).await?;
sim.click(&page, 450.0, 380.0).await?;
}

Keyboard — TypingSimulator

Models realistic typing cadence:

  • Per-key WPM variation (70–130 WPM base rate)
  • Configurable typo-and-correct probability
  • Burst/pause rhythm typical of human typists
#![allow(unused)]
fn main() {
use stygian_browser::behavior::TypingSimulator;

let typer = TypingSimulator::new()
    .wpm(90)
    .typo_rate(0.03);   // 3% typo probability

typer.type_into(&page, "#search-input", "rust async web scraping").await?;
}

Network Information API spoofing

navigator.connection (Network Information API) reveals connection quality and type.
Headless browsers return null here, which is an immediate headless signal on connection-aware scanners.

Advanced stealth injects a realistic NetworkInformation-like object:

PropertySpoofed value
effectiveType"4g"
type"wifi"
downlinkSeeded from performance.timeOrigin (stable per session, ≈ 10 Mbps range)
rttSeeded jitter (50–100 ms range)
saveDatafalse

Battery Status API spoofing

navigator.getBattery() returns null in headless Chrome — a clear automation signal for scanners that enumerate battery state.

Advanced stealth overrides getBattery() to resolve with a plausible disconnected-battery state:

PropertySpoofed value
chargingfalse
chargingTimeInfinity
dischargingTimeSeeded (≈ 3600–7200 s)
levelSeeded (0.65–0.95)

The seed values are derived from performance.timeOrigin so they are stable within a page load but differ across sessions, preventing replay detection.


Fingerprint consistency

All spoofed signals are derived from a single DeviceProfile generated at browser launch. The profile is consistent across tabs and across the entire session, preventing inconsistency-based detection (e.g. a Windows User-Agent combined with macOS font metrics).

Browser Pool

The pool maintains a configurable number of warm browser instances, enforces a maximum concurrency limit, and applies backpressure when all slots are occupied.


How it works

BrowserPool
├── [Browser 0] — idle       ← returned immediately on acquire()
├── [Browser 1] — active     ← in use by a caller
└── [Browser 2] — idle       ← available

acquire()  →  lease one idle browser  →  returns BrowserHandle
release()  →  return browser to pool  →  health check, keep or discard

When all browsers are active and the pool is at max_size, callers block in acquire() until one becomes available or acquire_timeout expires.


Creating a pool

#![allow(unused)]
fn main() {
use stygian_browser::{BrowserConfig, BrowserPool};
use stygian_browser::config::PoolConfig;
use std::time::Duration;

let config = BrowserConfig::builder()
    .pool(PoolConfig {
        min_size:        2,    // launch 2 browsers immediately
        max_size:        8,    // cap at 8 concurrent browsers
        idle_timeout:    Duration::from_secs(300),
        acquire_timeout: Duration::from_secs(10),
    })
    .build();

let pool = BrowserPool::new(config).await?;
}

BrowserPool::new() launches min_size browsers in parallel and returns once they are all ready. Subsequent calls to acquire() return warm instances with no launch overhead.


Acquiring and releasing

#![allow(unused)]
fn main() {
// Acquire — blocks if pool is saturated
let handle = pool.acquire().await?;

// Do work on the browser
let mut page = handle.browser().new_page().await?;
page.navigate("https://example.com", WaitUntil::Load, Duration::from_secs(30)).await?;

// Release — returns browser to pool; discards if health check fails
handle.release().await;
}

BrowserHandle also implements Drop: if you forget to call release() the browser is returned to the pool automatically, though doing so inside an async context is preferred.


Pool stats

#![allow(unused)]
fn main() {
let stats = pool.stats();

println!("available : {}", stats.available);   // idle, ready to use
println!("active    : {}", stats.active);       // currently leased out
println!("max       : {}", stats.max);          // pool capacity
println!("total     : {}", stats.total);        // active + available
}

Note: stats().idle always returns 0 — this counter is intentionally not maintained on the hot acquire/release path to avoid contention. Use available and active instead.


Health checks

When a browser is released, the pool runs a lightweight health check:

  1. Check the CDP connection is still open (no round-trip required).
  2. Verify that the browser process is alive.
  3. If either fails, discard the browser and spawn a replacement asynchronously.

Discarded browsers are replaced in the background — the pool never drops below min_size for long, and callers are never blocked waiting for a replace that will never arrive.


Idle eviction

Browsers that have been idle longer than idle_timeout are gracefully closed and removed from the pool. This reclaims system memory when scraping activity is low.

If eviction would drop the pool below min_size, eviction is skipped for those browsers (they stay warm).

Eviction applies to both the shared queue and all per-context queues. Empty context queues are pruned automatically.


Context segregation

When multiple bots or tenants share a single pool, use acquire_for() to keep their browser instances isolated. Browsers acquired for one context are never returned to a different context.

#![allow(unused)]
fn main() {
// Bot A and Bot B use the same pool, but their browsers never mix
let a = pool.acquire_for("bot-a").await?;
let b = pool.acquire_for("bot-b").await?;

// Each handle knows its context
assert_eq!(a.context_id(), Some("bot-a"));
assert_eq!(b.context_id(), Some("bot-b"));

a.release().await;
b.release().await;
}

The global max_size still governs total capacity across every context. If the pool is full, acquire_for() blocks just like acquire().

Releasing a context

When a bot or tenant is deprovisioned, drain its idle browsers:

#![allow(unused)]
fn main() {
let shut_down = pool.release_context("bot-a").await;
println!("Closed {shut_down} browsers for bot-a");
}

Active handles for that context are unaffected; they will be disposed normally when released or dropped.

Listing contexts

#![allow(unused)]
fn main() {
let ids = pool.context_ids().await;
println!("Active contexts with idle browsers: {ids:?}");
}

Shared vs scoped

MethodQueueReuse scope
acquire()sharedany acquire() caller
acquire_for("x")scoped to "x"only acquire_for("x")

Both paths share the same semaphore and max_size, so global backpressure is applied regardless of how browsers were acquired.


Cold start behaviour

If acquire() is called when the pool is empty (e.g. on first call before min_size browsers have launched) or when all browsers are active and total < max_size, a new browser is launched on demand. Cold starts take < 2 s on modern hardware.


Graceful shutdown

#![allow(unused)]
fn main() {
// Closes all browsers gracefully — waits for active handles to be released first
pool.shutdown().await;
}

shutdown() signals the pool to stop accepting new acquire() calls, waits for all active handles to be released (or times out after acquire_timeout), then closes every browser.

Page Operations

A Page represents a single browser tab. You get one by calling browser.new_page() on a BrowserHandle.


#![allow(unused)]
fn main() {
use stygian_browser::WaitUntil;
use std::time::Duration;

// Wait for a specific CSS selector to appear
page.navigate("https://example.com", WaitUntil::Selector("h1".into()), Duration::from_secs(30)).await?;

// Wait for the DOM to be parsed (fast; before images/CSS load)
page.navigate("https://example.com", WaitUntil::DomContentLoaded, Duration::from_secs(30)).await?;

// Wait for network to go idle (good for SPAs — ≤ 2 in-flight requests for 500 ms)
page.navigate("https://example.com", WaitUntil::NetworkIdle, Duration::from_secs(30)).await?;
}

WaitUntil variants from fastest to safest:

VariantCondition
DomContentLoadedHTML fully parsed; DOM ready (fires before images/stylesheets)
NetworkIdleLoad event fired and ≤ 2 in-flight requests for 500 ms
Selector(css)document.querySelector(selector) returns a non-null element

Reading page content

#![allow(unused)]
fn main() {
// Full page HTML
let html  = page.content().await?;

// Page title
let title = page.title().await?;

// Current URL (may differ from navigated URL after redirects)
let url   = page.url().await?;

// HTTP status code of the last navigation, if available
// (None if no navigation has committed yet, the URL is non-HTTP such as file://,
//  or network events were not captured)
if let Some(status) = page.status_code()? {
    println!("HTTP {status}");
}
}

JavaScript evaluation

#![allow(unused)]
fn main() {
// Evaluate an expression; return type must implement serde::DeserializeOwned
let title:   String = page.eval("document.title").await?;
let is_auth: bool   = page.eval("!!document.cookie.match(/session=/)").await?;
let item_count: u32 = page.eval("document.querySelectorAll('.item').length").await?;

// Execute a statement (return value ignored)
page.eval::<serde_json::Value>("window.scrollTo(0, document.body.scrollHeight)").await?;
}

Element interaction

High-level click/type helpers are provided by the human behaviour module (stygian_browser::behavior) when the stealth feature is enabled. These simulate realistic mouse paths and typing cadence. See the Stealth & Anti-Detection page for full usage.

#![allow(unused)]
fn main() {
use stygian_browser::behavior::{MouseSimulator, TypingSimulator};

let mouse = MouseSimulator::new();
mouse.move_to(&page, 100.0, 200.0, 450.0, 380.0).await?;
mouse.click(&page, 450.0, 380.0).await?;

let typer = TypingSimulator::new().wpm(90);
typer.type_into(&page, "#search-input", "rust async scraping").await?;

// For selector-based waiting without mouse simulation:
page.wait_for_selector(".results", Duration::from_secs(10)).await?;
}

Screenshots

#![allow(unused)]
fn main() {
// Full-page screenshot — returns raw PNG bytes
let png: Vec<u8> = page.screenshot().await?;
tokio::fs::write("screenshot.png", &png).await?;
}

Resource filtering

Block resource types to reduce bandwidth and speed up text-only scraping:

#![allow(unused)]
fn main() {
use stygian_browser::page::{ResourceFilter, ResourceType};

// Block images, fonts, CSS, and media
page.set_resource_filter(ResourceFilter::block_media()).await?;

// Block only images and fonts (keep styles for layout-sensitive work)
page.set_resource_filter(ResourceFilter::block_images_and_fonts()).await?;

// Custom filter
page.set_resource_filter(
    ResourceFilter::default()
        .block(ResourceType::Image)
        .block(ResourceType::Font)
        .block(ResourceType::Stylesheet)
).await?;
}

Must be called before navigate() to take effect.


Session persistence is handled via the session module. Save and restore full session state (cookies + localStorage) across runs, or inject individual cookies without a full round-trip.

#![allow(unused)]
fn main() {
use stygian_browser::session::{save_session, restore_session, SessionSnapshot, SessionCookie};

// Save full session state after login
let snapshot: SessionSnapshot = save_session(&page).await?;
snapshot.save_to_file("session.json")?;

// Restore in a later run
let snapshot = SessionSnapshot::load_from_file("session.json")?;
restore_session(&page, &snapshot).await?;

// Inject individual cookies without a full snapshot (e.g. seed a known token)
let cookies = vec![SessionCookie {
    name:      "session".to_string(),
    value:     "abc123".to_string(),
    domain:    ".example.com".to_string(),
    path:      "/".to_string(),
    expires:   -1.0,   // session cookie
    http_only: true,
    secure:    true,
    same_site: "Lax".to_string(),
}];
page.inject_cookies(&cookies).await?;
}

Check whether a saved snapshot is still fresh before restoring:

#![allow(unused)]
fn main() {
let mut snapshot = SessionSnapshot::load_from_file("session.json")?;
snapshot.ttl_secs = Some(3600);   // 1-hour TTL
if snapshot.is_expired() {
    // re-authenticate
} else {
    restore_session(&page, &snapshot).await?;
}
}

Closing a tab

#![allow(unused)]
fn main() {
page.close().await?;
}

Always close pages explicitly when done. Unreleased pages count against the browser's internal tab limit and may degrade performance.


Complete example

use stygian_browser::{BrowserConfig, BrowserPool, WaitUntil};
use stygian_browser::page::ResourceFilter;
use std::time::Duration;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let pool   = BrowserPool::new(BrowserConfig::default()).await?;
    let handle = pool.acquire().await?;
    let mut page = handle.browser().new_page().await?;

    // Block images to reduce bandwidth
    page.set_resource_filter(ResourceFilter::block_media()).await?;

    page.navigate(
        "https://example.com/products",
        WaitUntil::Selector(".product-list".to_string()),
        Duration::from_secs(30),
    ).await?;

    let count: u32 = page.eval("document.querySelectorAll('.product').length").await?;
    println!("{count} products found");

    let url = page.url().await?;
    let status = page.status_code()?;
    match status {
        Some(code) => println!("{url} → HTTP {code}"),
        None => println!("{url} → HTTP status unknown"),
    }

    let html = page.content().await?;
    // … pass html to an AI extraction node …

    page.close().await?;
    handle.release().await;
    Ok(())
}

Proxy Rotation Overview

stygian-proxy is a high-performance, resilient proxy rotation library for the Stygian scraping ecosystem. It manages a pool of proxy endpoints, tracks per-proxy health and latency metrics, and integrates directly with both stygian-graph HTTP adapters and stygian-browser page contexts.


Feature summary

FeatureDescription
Rotation strategiesRound-robin, random, weighted (by proxy weight), least-used (by request count)
Per-proxy metricsAtomic latency and success-rate tracking — zero lock contention
Async health checkerConfigurable-interval background task; each proxy probed concurrently via JoinSet
Circuit breakerPer-proxy lock-free FSM: Closed → Open → HalfOpen; auto-recovery after cooldown
In-memory poolNo external database required; satisfies the ProxyStoragePort trait
graph integrationProxyManagerPort trait for stygian-graph HTTP adapters (feature graph)
browser integrationPer-context proxy binding for stygian-browser (feature browser)
SOCKS supportSocks4 and Socks5 proxy types (feature socks)

Quick start

Add the dependency:

[dependencies]
stygian-proxy = { version = "0.1", features = ["graph"] }

Build a pool and make a request:

use std::sync::Arc;
use stygian_proxy::{MemoryProxyStore, ProxyConfig, ProxyManager};
use stygian_proxy::types::{Proxy, ProxyType};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let storage = Arc::new(MemoryProxyStore::default());
    let manager = Arc::new(
        ProxyManager::with_round_robin(storage, ProxyConfig::default())?
    );

    // Register proxies at startup (or dynamically at any time)
    manager.add_proxy(Proxy {
        url: "http://proxy1.example.com:8080".into(),
        proxy_type: ProxyType::Http,
        username: None,
        password: None,
        weight: 1,
        tags: vec!["us-east".into()],
    }).await?;

    // Start background health checks
    let (cancel, _task) = manager.start();

    // Acquire a proxy for a request
    let handle = manager.acquire_proxy().await?;
    println!("using proxy: {}", handle.proxy_url);

    // Signal success — omitting this counts as a failure toward the circuit breaker
    handle.mark_success();

    cancel.cancel(); // stop health checker
    Ok(())
}

Architecture

┌─────────────────────────────────────────┐
│              ProxyManager               │
│                                         │
│  ┌──────────────┐  ┌─────────────────┐  │
│  │ HealthChecker│  │ CircuitBreakers │  │
│  │  (background)│  │  (per proxy)    │  │
│  └──────────────┘  └─────────────────┘  │
│                                         │
│  ┌──────────────────────────────────┐   │
│  │       RotationStrategy           │   │
│  │  RoundRobin / Random / Weighted  │   │
│  │  / LeastUsed                     │   │
│  └──────────────────────────────────┘   │
│                                         │
│  ┌──────────────────────────────────┐   │
│  │       ProxyStoragePort           │   │
│  │  MemoryProxyStore (built-in)     │   │
│  └──────────────────────────────────┘   │
└─────────────────────────────────────────┘
         │                    │
         ▼                    ▼
  stygian-graph        stygian-browser
  HTTP adapters        page contexts

ProxyManager is the main entry point. It composes a storage backend, a rotation strategy, a background health checker, and a map of per-proxy circuit breakers. Callers interact only with acquire_proxy()ProxyHandlemark_success().


Cargo features

FeatureEnables
(default: none)Core pool, strategies, health checker, circuit breaker
graphProxyManagerPort trait + blanket impl + NoopProxyManager
browserBrowserProxySource trait + ProxyManagerBridge
socksProxyType::Socks4 and ProxyType::Socks5 variants

ProxyConfig defaults

FieldDefaultDescription
health_check_urlhttps://httpbin.org/ipURL probed to verify liveness
health_check_interval60 sHow often to run checks
health_check_timeout5 sPer-probe HTTP timeout
circuit_open_threshold5Consecutive failures before circuit opens
circuit_half_open_after30 sCooldown before attempting recovery

Rotation Strategies

stygian-proxy ships four built-in rotation strategies. All implement the RotationStrategy trait and operate on a slice of ProxyCandidate values built from the live pool. Strategies that find zero healthy candidates return ProxyError::AllProxiesUnhealthy rather than panicking.


Comparison

StrategyBest forNotes
RoundRobinStrategyEven distribution across identical proxiesAtomic counter, lock-free
RandomStrategySpreading load unpredictablyrand::rng() per call; no shared state
WeightedStrategyPrioritising faster or higher-quota proxiesWeighted random sampling; O(n)
LeastUsedStrategyNever overloading a single proxyPicks the candidate with the lowest total request count

RoundRobinStrategy

Distributes requests evenly across healthy proxies in insertion order. Uses an atomic counter so there is no Mutex on the hot path.

#![allow(unused)]
fn main() {
use std::sync::Arc;
use stygian_proxy::{MemoryProxyStore, ProxyConfig, ProxyManager};

let storage = Arc::new(MemoryProxyStore::default());
let manager = ProxyManager::with_round_robin(storage, ProxyConfig::default()).unwrap();
}

ProxyManager::with_round_robin is the recommended default. The counter wraps safely at u64::MAX, which at 1 million requests per second takes ~585,000 years.


RandomStrategy

Picks a healthy proxy at random on every call. Useful when you want to avoid any predictable rotation pattern that fingerprinting could detect.

#![allow(unused)]
fn main() {
use std::sync::Arc;
use stygian_proxy::{MemoryProxyStore, ProxyConfig, ProxyManager};
use stygian_proxy::strategy::RandomStrategy;

let storage = Arc::new(MemoryProxyStore::default());
let manager = ProxyManager::builder()
    .storage(storage)
    .strategy(Arc::new(RandomStrategy))
    .config(ProxyConfig::default())
    .build()
    .unwrap();
}

WeightedStrategy

Each Proxy has a weight: u32 field (default 1). WeightedStrategy performs weighted random sampling so proxies with higher weights are selected proportionally more often.

#![allow(unused)]
fn main() {
use stygian_proxy::types::{Proxy, ProxyType};

// This proxy is 3× more likely to be selected than a weight-1 proxy.
let fast_proxy = Proxy {
    url: "http://fast.example.com:8080".into(),
    proxy_type: ProxyType::Http,
    weight: 3,
    ..Default::default()
};
}

Use this strategy when proxies have different capacities, quotas, or observed speeds.


LeastUsedStrategy

Selects the healthy proxy with the lowest total request count at the time of the call. This maximises even distribution over time even when proxies are added dynamically.

#![allow(unused)]
fn main() {
use std::sync::Arc;
use stygian_proxy::{MemoryProxyStore, ProxyConfig, ProxyManager};
use stygian_proxy::strategy::LeastUsedStrategy;

let storage = Arc::new(MemoryProxyStore::default());
let manager = ProxyManager::builder()
    .storage(storage)
    .strategy(Arc::new(LeastUsedStrategy))
    .config(ProxyConfig::default())
    .build()
    .unwrap();
}

Custom strategies

Implement RotationStrategy to plug in your own selection logic:

#![allow(unused)]
fn main() {
use async_trait::async_trait;
use stygian_proxy::error::ProxyResult;
use stygian_proxy::strategy::{ProxyCandidate, RotationStrategy};

/// Always pick the proxy with the best success rate.
pub struct BestSuccessRateStrategy;

#[async_trait]
impl RotationStrategy for BestSuccessRateStrategy {
    async fn select<'a>(
        &self,
        candidates: &'a [ProxyCandidate],
    ) -> ProxyResult<&'a ProxyCandidate> {
        candidates
            .iter()
            .filter(|c| c.healthy)
            .max_by(|a, b| {
                let ra = a.metrics.success_rate();
                let rb = b.metrics.success_rate();
                ra.partial_cmp(&rb).unwrap_or(std::cmp::Ordering::Equal)
            })
            .ok_or(stygian_proxy::error::ProxyError::AllProxiesUnhealthy)
    }
}
}

Pass it to ProxyManager::builder().storage(storage).strategy(Arc::new(BestSuccessRateStrategy)).config(config).build().unwrap().


ProxyCandidate fields

FieldTypeDescription
idUuidStable proxy identifier
weightu32Relative selection weight
metricsArc<ProxyMetrics>Shared atomics: requests, failures, latency
healthyboolResult of the last health check

ProxyMetrics exposes success_rate() -> f64 and avg_latency_ms() -> f64 computed from the atomic counters without any locking.

Health Checking & Circuit Breaker

stygian-proxy keeps the pool fresh through two complementary mechanisms: an async health checker that periodically probes each proxy, and a per-proxy circuit breaker that trips automatically when a proxy starts failing live requests.


Health checker

HealthChecker runs a background tokio task that probes every registered proxy on a configurable interval. Probes run concurrently via JoinSet so a slow or timing-out proxy does not delay checks for healthy ones.

Starting the health checker

#![allow(unused)]
fn main() {
use std::sync::Arc;
use stygian_proxy::{MemoryProxyStore, ProxyConfig, ProxyManager};

let storage = Arc::new(MemoryProxyStore::default());
let config = ProxyConfig {
    health_check_url: "https://httpbin.org/ip".into(),
    health_check_interval: std::time::Duration::from_secs(60),
    health_check_timeout: std::time::Duration::from_secs(5),
    ..ProxyConfig::default()
};
let manager = Arc::new(
    ProxyManager::with_round_robin(storage, config)?
);

let (cancel, _task) = manager.start();

// When shutting down:
cancel.cancel();
}

On-demand check

Call health_checker.check_once().await to run a single probe cycle without spawning a background task. Useful in tests or before the first request batch.

Health state

Each proxy's health state is stored in HealthMap — an Arc<DashMap<Uuid, bool>>. The rotation strategy reads this map when building the ProxyCandidate slice; proxies marked unhealthy are filtered out before selection.


Circuit breaker

Every proxy gets its own CircuitBreaker when it is added to the pool. The circuit breaker is a lock-free atomic FSM with three states:

          failure ≥ threshold
  CLOSED ──────────────────────► OPEN
    ▲                               │
    │  success on probe             │ half_open_after elapsed
    │                               ▼
    └──────────────────────── HALF-OPEN
          success on probe
StateBehaviour
ClosedProxy is selectable; failures increment counter
OpenProxy is excluded from selection; requests are not attempted
Half-OpenOne probe attempt allowed; success → Closed, failure → Open (timer reset)

How it integrates with ProxyHandle

acquire_proxy() returns a ProxyHandle. The handle holds an Arc<CircuitBreaker>.

  • handle.mark_success() — resets the failure counter, moves the circuit to Closed.
  • Drop without mark_success — records a failure; opens the circuit after circuit_open_threshold consecutive failures.
#![allow(unused)]
fn main() {
let handle = manager.acquire_proxy().await?;

match do_request(&handle.proxy_url).await {
    Ok(_) => handle.mark_success(),
    Err(e) => {
        // handle is dropped here → failure recorded automatically
        eprintln!("request failed: {e}");
    }
}
}

ProxyHandle::direct()

For code paths that conditionally use a proxy, ProxyHandle::direct() returns a sentinel handle with an empty URL and a noop circuit breaker that can never trip. Pass it wherever a ProxyHandle is expected when no proxy should be used.

#![allow(unused)]
fn main() {
use stygian_proxy::manager::ProxyHandle;

let handle = if use_proxy {
    manager.acquire_proxy().await?
} else {
    ProxyHandle::direct()
};
}

PoolStats

manager.pool_stats().await returns a PoolStats snapshot:

#![allow(unused)]
fn main() {
let stats = manager.pool_stats().await?;
println!("total={} healthy={} open_circuits={}",
    stats.total, stats.healthy, stats.open);
}
FieldDescription
totalTotal proxies registered
healthyProxies that passed the last health check
openProxies whose circuit breaker is currently Open

Tuning recommendations

ScenarioRecommendation
High-churn scraping (many short requests)Lower circuit_open_threshold to 2–3; shorter circuit_half_open_after (10–15 s)
Long-lived connectionsRaise circuit_open_threshold to 10+; extend circuit_half_open_after (60–120 s)
Residential proxies (naturally flaky)Use WeightedStrategy with lower weights for known flaky proxies; keep threshold at default (5)
Dev / testinghealth_check_interval = 5 s; health_check_url pointing to a local echo server

Ecosystem Integrations

stygian-proxy can be wired into both stygian-graph HTTP adapters and stygian-browser page contexts via optional Cargo features.


stygian-graph (graph feature)

Enable the graph feature to get ProxyManagerPort — the trait that decouples graph HTTP adapters from any specific proxy implementation.

[dependencies]
stygian-proxy = { version = "0.1", features = ["graph"] }

ProxyManagerPort

#![allow(unused)]
fn main() {
use stygian_proxy::graph::ProxyManagerPort;

// ProxyManager already implements ProxyManagerPort via a blanket impl.
// Inside any HTTP adapter:
async fn fetch(proxy_src: &dyn ProxyManagerPort, url: &str) -> Result<String, Box<dyn std::error::Error>> {
    let handle = proxy_src.acquire_proxy().await?;
    // ... make request through handle.proxy_url ...
    handle.mark_success();
    Ok(String::new())
}
}

BoxedProxyManager

BoxedProxyManager is a type alias for Arc<dyn ProxyManagerPort>. Use it to store a proxy source in HttpAdapter or any other adapter without naming the concrete type:

#![allow(unused)]
fn main() {
use std::sync::Arc;
use stygian_proxy::graph::{BoxedProxyManager, ProxyManagerPort};
use stygian_proxy::{MemoryProxyStore, ProxyConfig, ProxyManager};

let storage = Arc::new(MemoryProxyStore::default());
let manager = Arc::new(
    ProxyManager::with_round_robin(storage, ProxyConfig::default()).unwrap()
);

// Coerce to the trait object
let boxed: BoxedProxyManager = manager;
}

NoopProxyManager

When no proxying is needed, pass NoopProxyManager instead. It returns ProxyHandle::direct() on every call — a noop handle with an empty URL.

#![allow(unused)]
fn main() {
use std::sync::Arc;
use stygian_proxy::graph::{BoxedProxyManager, NoopProxyManager};

let no_proxy: BoxedProxyManager = Arc::new(NoopProxyManager);
}

This avoids Option<BoxedProxyManager> branches in adapter code: always pass a BoxedProxyManager, just swap implementations at construction time.


stygian-browser (browser feature)

Enable the browser feature to bind a specific proxy to each browser page context.

[dependencies]
stygian-proxy = { version = "0.1", features = ["browser"] }

ProxyManagerBridge

ProxyManagerBridge wraps a ProxyManager and exposes bind_proxy(), which acquires one proxy and returns (proxy_url, ProxyHandle). The URL can be passed directly to chromiumoxide when launching a browser context.

#![allow(unused)]
fn main() {
use std::sync::Arc;
use stygian_proxy::{MemoryProxyStore, ProxyConfig, ProxyManager};
use stygian_proxy::browser::ProxyManagerBridge;

let storage = Arc::new(MemoryProxyStore::default());
let manager = Arc::new(
    ProxyManager::with_round_robin(storage, ProxyConfig::default()).unwrap()
);
let bridge = ProxyManagerBridge::new(Arc::clone(&manager));

// In your browser context setup:
let (proxy_url, handle) = bridge.bind_proxy().await?;
// Pass proxy_url to chromiumoxide BrowserConfig::builder().proxy(proxy_url)
// ...
handle.mark_success(); // after the page session completes
}

BrowserProxySource

BrowserProxySource is the trait implemented by ProxyManagerBridge. Implement it directly to plug in any proxy source without depending on ProxyManager:

#![allow(unused)]
fn main() {
use async_trait::async_trait;
use stygian_proxy::browser::BrowserProxySource;
use stygian_proxy::manager::ProxyHandle;
use stygian_proxy::error::ProxyResult;

pub struct MyProxySource;

#[async_trait]
impl BrowserProxySource for MyProxySource {
    async fn bind_proxy(&self) -> ProxyResult<(String, ProxyHandle)> {
        // return (proxy_url, handle)
        Ok((
            "http://my-proxy.example.com:8080".into(),
            ProxyHandle::direct(),
        ))
    }
}
}

Failure tracking across integrations

ProxyHandle uses RAII to track request outcomes. The same contract applies regardless of whether it came from a graph or browser integration:

  1. Acquire a handle from acquire_proxy() (graph) or bind_proxy() (browser).
  2. Perform the I/O operation.
  3. Call handle.mark_success() on success.
  4. Drop the handle (success or failure is recorded in the circuit breaker).

If the handle is dropped without mark_success(), the failure counter increments. After circuit_open_threshold consecutive failures the circuit opens and the proxy is skipped until after the circuit_half_open_after cooldown.

Environment Variables

All configuration values for both crates can be overridden at runtime via environment variables. No recompilation required.


stygian-browser

VariableDefaultDescription
STYGIAN_CHROME_PATHauto-detectAbsolute path to Chrome or Chromium binary
STYGIAN_HEADLESStruefalse for headed mode (displays browser window)
STYGIAN_HEADLESS_MODEnewnew (--headless=new) or legacy (classic --headless for Chromium < 112)
STYGIAN_STEALTH_LEVELadvancednone, basic, or advanced
STYGIAN_POOL_MIN2Minimum warm browser instances
STYGIAN_POOL_MAX10Maximum concurrent browser instances
STYGIAN_POOL_IDLE_SECS300Idle timeout before a browser is evicted
STYGIAN_POOL_ACQUIRE_SECS5Seconds to wait for a pool slot before error
STYGIAN_LAUNCH_TIMEOUT_SECS10Browser launch timeout
STYGIAN_CDP_TIMEOUT_SECS30Per-operation CDP command timeout
STYGIAN_CDP_FIX_MODEaddBindingaddBinding, isolatedworld, or enabledisable
STYGIAN_PROXYProxy URL (http://, https://, or socks5://)
STYGIAN_PROXY_BYPASSComma-separated proxy bypass list (e.g. <local>,localhost)
STYGIAN_DISABLE_SANDBOXauto-detecttrue inside containers, false on bare metal

Chrome binary auto-detection order

When STYGIAN_CHROME_PATH is not set, the library searches:

  1. google-chrome on $PATH
  2. chromium on $PATH
  3. /usr/bin/google-chrome
  4. /usr/bin/chromium-browser
  5. /Applications/Google Chrome.app/Contents/MacOS/Google Chrome (macOS)
  6. C:\Program Files\Google\Chrome\Application\chrome.exe (Windows)

stygian-proxy

stygian-proxy is configured by constructing ProxyConfig directly in code. There are no environment variable overrides; pass the struct to ProxyManager::with_round_robin or ProxyManager::with_strategy.

FieldDefaultDescription
health_check_urlhttps://httpbin.org/ipURL probed to verify proxy liveness
health_check_interval60 sHow often the background task runs
health_check_timeout5 sPer-probe HTTP timeout
circuit_open_threshold5Consecutive failures before circuit opens
circuit_half_open_after30 sCooldown before attempting recovery

stygian-graph

VariableDefaultDescription
STYGIAN_LOGinfoAlias for RUST_LOG if set; otherwise defers to RUST_LOG
STYGIAN_WORKERSnum_cpus * 4Default worker pool concurrency
STYGIAN_QUEUE_DEPTHworkers * 4Worker pool channel depth
STYGIAN_CACHE_CAPACITY10000Default LRU cache entry limit
STYGIAN_CACHE_TTL_SECS300Default DashMap cache TTL in seconds
STYGIAN_HTTP_TIMEOUT_SECS30HTTP adapter request timeout
STYGIAN_HTTP_MAX_REDIRECTS10HTTP redirect chain limit

AI provider variables

VariableUsed by
ANTHROPIC_API_KEYClaudeAdapter
OPENAI_API_KEYOpenAiAdapter
GOOGLE_API_KEYGeminiAdapter
GITHUB_TOKENCopilotAdapter
OLLAMA_BASE_URLOllamaAdapter (default: http://localhost:11434)

Distributed execution variables

VariableDefaultDescription
REDIS_URLredis://localhost:6379Redis/Valkey connection URL
REDIS_MAX_CONNECTIONS20Redis connection pool size
STYGIAN_QUEUE_NAMEstygian:workDefault work queue key
STYGIAN_VISIBILITY_TIMEOUT_SECS60Task visibility timeout for in-flight items

Tracing and logging

VariableExampleDescription
RUST_LOGstygian_graph=debug,stygian_browser=infoLog level per crate
RUST_LOGtraceEnable all tracing (very verbose)
OTEL_EXPORTER_OTLP_ENDPOINThttp://localhost:4317OTLP trace export endpoint
OTEL_SERVICE_NAMEstygian-scraperService name in trace metadata

Testing & Coverage


Running tests

# All workspace tests (no Chrome required)
cargo test --workspace

# All features including browser integration tests
cargo test --workspace --all-features

# Run only the tests that need Chrome (previously ignored)
cargo test --workspace --all-features -- --include-ignored

# Specific crate
cargo test -p stygian-graph
cargo test -p stygian-browser

Test organisation

stygian-graph

Tests live alongside their module in src/ (unit tests) and in crates/stygian-graph/tests/ (integration tests).

LayerLocationApproach
Domainsrc/domain/*/testsPure Rust, no I/O
Adapterssrc/adapters/*/testsMock port implementations
Applicationsrc/application/*/testsIn-process service registry
Integrationtests/End-to-end with real HTTP (httpbin)

All tests in the graph crate pass without any external services.

stygian-browser

Browser tests fall into two categories:

CategoryAttributeRuns in CI
Pure-logic tests (config, stealth scripts, math)none✅ always
Integration tests (real Chrome required)#[ignore = "requires Chrome"]❌ opt-in

To run integration tests locally:

# Ensure Chrome 120+ is on PATH, then:
cargo test -p stygian-browser --all-features -- --include-ignored

Coverage

Coverage is measured with cargo-tarpaulin.

Install

cargo install cargo-tarpaulin

Measure

# Workspace summary (excludes Chrome-gated tests)
cargo tarpaulin --workspace --all-features --ignore-tests --out Lcov

# stygian-graph only
cargo tarpaulin -p stygian-graph --all-features --ignore-tests --out Lcov

# stygian-browser logic-only (no Chrome)
cargo tarpaulin -p stygian-browser --lib --ignore-tests --out Lcov

Current numbers

ScopeLine coverageNotes
Workspace65.74 %2 882 / 4 384 lines · 209 tests
stygian-graph~72 %All unit and integration logic covered
stygian-browserstructurally boundedChrome-gated tests excluded from CI

High-coverage modules in stygian-graph:

ModuleCoverage
application/config.rs~100 %
application/executor.rs~100 %
domain/idempotency.rs~100 %
application/registry.rs~100 %
adapters/claude.rs~95 %
adapters/openai.rs~95 %
adapters/gemini.rs~95 %

Why browser coverage is bounded

Every test that launches a real Chrome instance is annotated:

#![allow(unused)]
fn main() {
#[tokio::test]
#[ignore = "requires Chrome"]
async fn pool_acquire_release() {
    // ...
}
}

This keeps cargo test fast and green in CI environments without a Chrome binary. Pure-logic coverage (fingerprint generation, stealth script math, config validation, simulator algorithms) is high; only the CDP I/O paths are excluded.


Adding tests

Unit test pattern (graph)

#![allow(unused)]
fn main() {
#[cfg(test)]
mod tests {
    use super::*;

    #[tokio::test]
    async fn my_adapter_returns_error_on_empty_url() {
        let adapter = MyAdapter::default();
        let result  = adapter.execute(ServiceInput::default()).await;
        assert!(result.is_err());
    }
}
}

Browser test pattern (logic only)

#![allow(unused)]
fn main() {
#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn stealth_level_debug_display() {
        assert_eq!(format!("{:?}", StealthLevel::Advanced), "Advanced");
    }

    #[tokio::test]
    #[ignore = "requires Chrome"]
    async fn pool_warm_start() {
        let pool = BrowserPool::new(BrowserConfig::default()).await.unwrap();
        assert!(pool.stats().available >= 1);
        pool.shutdown().await;
    }
}
}

CI test matrix

The GitHub Actions CI workflow (.github/workflows/ci.yml) runs:

JobRuns onCommand
testubuntu-latestcargo test --workspace --all-features
clippyubuntu-latestcargo clippy --workspace --all-features -- -D warnings
fmtubuntu-latestcargo fmt --check
docsubuntu-latestcargo doc --workspace --no-deps --all-features
msrvubuntu-latestcargo +1.94.0 check --workspace
cross-platformwindows-latest, macos-latestcargo test --workspace

Browser integration tests (--include-ignored) are not run in CI — they require a Chrome binary and a display server which are not available in the default runner images.