Introduction

stygian is a high-performance web scraping toolkit for Rust, delivered as two complementary crates in a single workspace.

CratePurpose
stygian-graphGraph-based scraping engine — DAG pipelines, AI extraction, distributed execution
stygian-browserAnti-detection browser automation — stealth profiles, browser pooling, CDP automation

Both crates share a common philosophy: zero-cost abstractions, extreme composability, and secure defaults.


At a glance

Design goals

  • Hexagonal architecture — the domain core has zero I/O dependencies; all external capabilities are declared as port traits and injected via adapters.
  • DAG execution — scraping pipelines are directed acyclic graphs. Nodes run concurrently within each topological wave, maximising parallelism.
  • AI-first extraction — Claude, GPT-4o, Gemini, GitHub Copilot, and Ollama are first-class adapters. Structured data flows out of raw HTML without writing parsers.
  • Anti-bot resilience — the browser crate ships stealth scripts that pass Cloudflare, DataDome, PerimeterX, and Akamai checks on Advanced stealth level.
  • Fault-tolerant — circuit breakers, retry policies, and idempotency keys are built into the execution path, not bolted on.

Minimum supported Rust version

1.94.0 — Rust 2024 edition. Requires stable toolchain only.


Installation

Add both crates to Cargo.toml:

[dependencies]
stygian-graph   = "0.2"
stygian-browser = "0.2"   # optional — only needed for JS-rendered pages
tokio            = { version = "1", features = ["full"] }
serde_json       = "1"

Enable optional feature groups on stygian-graph:

stygian-graph = { version = "0.2", features = ["browser", "ai-claude", "distributed"] }

Available features:

FeatureIncludes
browserBrowserAdapter backed by stygian-browser
ai-claudeAnthropic Claude adapter
ai-openaiOpenAI adapter
ai-geminiGoogle Gemini adapter
ai-copilotGitHub Copilot adapter
ai-ollamaOllama (local) adapter
distributedRedis/Valkey work queue adapter
metricsPrometheus metrics export

Quick start — scraping pipeline

use stygian_graph::{Pipeline, adapters::HttpAdapter};
use serde_json::json;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let config = json!({
        "nodes": [
            {"id": "fetch",   "service": "http"},
            {"id": "extract", "service": "ai_claude"}
        ],
        "edges": [{"from": "fetch", "to": "extract"}]
    });

    let pipeline = Pipeline::from_config(config)?;
    let results  = pipeline.execute(json!({"url": "https://example.com"})).await?;

    println!("{}", serde_json::to_string_pretty(&results)?);
    Ok(())
}

Quick start — browser automation

use stygian_browser::{BrowserConfig, BrowserPool, WaitUntil};
use std::time::Duration;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let pool   = BrowserPool::new(BrowserConfig::default()).await?;
    let handle = pool.acquire().await?;

    let mut page = handle.browser().new_page().await?;
    page.navigate(
        "https://example.com",
        WaitUntil::Selector("body".to_string()),
        Duration::from_secs(30),
    ).await?;

    println!("Title: {}", page.title().await?);
    handle.release().await;
    Ok(())
}

Repository layout

stygian/
├── crates/
│   ├── stygian-graph/     # Scraping engine
│   └── stygian-browser/   # Browser automation
├── book/                   # This documentation (mdBook)
├── docs/                   # Architecture reference docs
├── examples/               # Example pipeline configs (.toml)
└── .github/workflows/      # CI, release, security, docs

Source, issues, and pull requests live at github.com/greysquirr3l/stygian.


Documentation

ResourceURL
This guidegreysquirr3l.github.io/stygian
API reference (stygian-graph)greysquirr3l.github.io/stygian/api/stygian_graph
API reference (stygian-browser)greysquirr3l.github.io/stygian/api/stygian_browser
crates.io (stygian-graph)crates.io/crates/stygian-graph
crates.io (stygian-browser)crates.io/crates/stygian-browser