Introduction
stygian is a high-performance web scraping toolkit for Rust, delivered as five complementary crates in a single workspace.
| Crate | Purpose |
|---|---|
stygian-graph | Graph-based scraping engine — DAG pipelines, AI extraction, distributed execution |
stygian-browser | Anti-detection browser automation — stealth profiles, browser pooling, CDP automation |
stygian-proxy | Proxy pool management — rotation strategies, circuit breakers, sticky sessions |
stygian-charon | Diagnostics and policy planning — HAR forensics, SLO assessment, runtime acquisition guidance |
stygian-mcp | Unified Model Context Protocol server — LLM agent integration |
All crates share a common philosophy: zero-cost abstractions, extreme composability, and secure defaults.
At a glance
Design goals
- Hexagonal architecture — the domain core has zero I/O dependencies; all external capabilities are declared as port traits and injected via adapters.
- DAG execution — scraping pipelines are directed acyclic graphs. Nodes run concurrently within each topological wave, maximising parallelism.
- AI-first extraction — Claude, GPT-4o, Gemini, GitHub Copilot, and Ollama are first-class adapters. Structured data flows out of raw HTML without writing parsers.
- Anti-bot resilience — the browser crate ships stealth scripts that pass Cloudflare, DataDome, PerimeterX, and Akamai checks on Advanced stealth level.
- Fault-tolerant — circuit breakers, retry policies, and idempotency keys are built into the execution path, not bolted on.
Minimum supported Rust version
1.94.0 — Rust 2024 edition. Requires stable toolchain only.
Installation
Add crates to Cargo.toml:
[dependencies]
stygian-graph = "*"
stygian-browser = "*" # optional — only needed for JS-rendered pages
stygian-proxy = "*" # optional — proxy pool management
stygian-charon = "*" # optional — anti-bot diagnostics and policy planning
tokio = { version = "1", features = ["full"] }
serde_json = "1"
Enable optional feature groups on stygian-graph:
stygian-graph = { version = "*", features = ["browser", "redis", "mcp"] }
Available features:
| Feature | Includes |
|---|---|
browser | BrowserAdapter backed by stygian-browser (default) |
redis | Redis/Valkey cache and distributed work queue adapters |
object-storage | S3-compatible object storage adapter |
api | REST API server binary |
postgres | PostgreSQL storage adapter |
cloudflare-crawl | Cloudflare Browser Rendering crawl adapter |
escalation | Default tiered escalation policy adapter |
wasm-plugins | WASM plugin system via wasmtime |
mcp | MCP server — exposes scraping & pipeline tools over JSON-RPC 2.0 |
full | All of the above |
Quick start — scraping pipeline
use stygian_graph::domain::graph::{Pipeline, Node}; use serde_json::json; #[tokio::main] async fn main() -> Result<(), Box<dyn std::error::Error>> { let mut pipeline = Pipeline::new("my_scraper"); pipeline.add_node(Node::new( "fetch", "http", json!({"url": "https://example.com"}), )); pipeline.validate()?; println!("Pipeline '{}' has {} nodes", pipeline.name, pipeline.nodes.len()); Ok(()) }
Quick start — browser automation
use stygian_browser::{BrowserConfig, BrowserPool, WaitUntil}; use std::time::Duration; #[tokio::main] async fn main() -> Result<(), Box<dyn std::error::Error>> { let pool = BrowserPool::new(BrowserConfig::default()).await?; let handle = pool.acquire().await?; let mut page = handle.browser().expect("browser is available").new_page().await?; page.navigate( "https://example.com", WaitUntil::Selector("body".to_string()), Duration::from_secs(30), ).await?; println!("Title: {}", page.title().await?); handle.release().await; Ok(()) }
Repository layout
stygian/
├── crates/
│ ├── stygian-graph/ # Scraping engine
│ ├── stygian-browser/ # Browser automation
│ ├── stygian-proxy/ # Proxy pool management
│ ├── stygian-charon/ # Anti-bot diagnostics and policy planning
│ └── stygian-mcp/ # Unified MCP aggregator binary
├── book/ # This documentation (mdBook)
├── docs/ # Architecture reference docs
├── examples/ # Example pipeline configs (.toml)
└── .github/workflows/ # CI, release, security, docs
Source, issues, and pull requests live at github.com/greysquirr3l/stygian.
Documentation
| Resource | URL |
|---|---|
| This guide | greysquirr3l.github.io/stygian |
API reference (stygian-graph) | greysquirr3l.github.io/stygian/api/stygian_graph |
API reference (stygian-browser) | greysquirr3l.github.io/stygian/api/stygian_browser |
API reference (stygian-charon) | greysquirr3l.github.io/stygian/api/stygian_charon |
crates.io (stygian-graph) | crates.io/crates/stygian-graph |
crates.io (stygian-browser) | crates.io/crates/stygian-browser |
crates.io (stygian-charon) | crates.io/crates/stygian-charon |