Observability

stygian-graph exposes structured metrics and distributed tracing out of the box. Both are opt-in: neither requires code changes to your adapters or domain logic.


Prometheus metrics

Enable the metrics feature flag:

stygian-graph = { version = "0.1", features = ["metrics"] }

Creating a collector

#![allow(unused)]
fn main() {
use stygian_graph::application::MetricsCollector;

let metrics = MetricsCollector::new();
}

MetricsCollector registers counters, histograms, and gauges on the global Prometheus registry automatically. It is Clone + Send + Sync and safe to share across threads.

Exposing /metrics

Attach the Prometheus scrape handler to any HTTP server. Example with Axum:

#![allow(unused)]
fn main() {
use axum::{Router, routing::get};
use stygian_graph::application::MetricsCollector;

let metrics  = MetricsCollector::new();
let handler  = metrics.prometheus_handler();

let app = Router::new()
    .route("/metrics", get(handler))
    .route("/health",  get(|| async { "ok" }));

axum::serve(listener, app).await?;
}

Available metrics

Metric nameTypeLabelsDescription
stygian_requests_totalcounterservice, statusTotal requests per adapter
stygian_request_duration_secondshistogramserviceRequest latency distribution
stygian_errors_totalcounterservice, error_kindErrors by type
stygian_worker_pool_activegaugepoolActive workers
stygian_worker_pool_queuedgaugepoolQueued tasks
stygian_circuit_breaker_stategaugeservice0=closed, 1=open, 2=half-open
stygian_cache_hits_totalcountercacheCache hits
stygian_cache_misses_totalcountercacheCache misses

Structured tracing

stygian-graph instruments all hot paths with the tracing crate. Any compatible subscriber (JSON, OTLP, Jaeger) receives full span trees.

Basic JSON logging

#![allow(unused)]
fn main() {
use tracing_subscriber::{layer::SubscriberExt, util::SubscriberInitExt, EnvFilter};

tracing_subscriber::registry()
    .with(EnvFilter::from_default_env()
        .add_directive("stygian_graph=debug".parse()?)
        .add_directive("stygian_browser=info".parse()?))
    .with(tracing_subscriber::fmt::layer().json())
    .init();
}

Set RUST_LOG=stygian_graph=trace at runtime for full span output.

OpenTelemetry export (Jaeger / OTLP)

[dependencies]
opentelemetry          = "0.22"
opentelemetry-otlp     = { version = "0.15", features = ["grpc-tonic"] }
tracing-opentelemetry  = "0.23"
#![allow(unused)]
fn main() {
use opentelemetry_otlp::WithExportConfig;
use tracing_opentelemetry::OpenTelemetryLayer;

let tracer = opentelemetry_otlp::new_pipeline()
    .tracing()
    .with_exporter(
        opentelemetry_otlp::new_exporter()
            .tonic()
            .with_endpoint("http://localhost:4317"),
    )
    .install_batch(opentelemetry::runtime::Tokio)?;

tracing_subscriber::registry()
    .with(EnvFilter::from_default_env())
    .with(OpenTelemetryLayer::new(tracer))
    .init();
}

Key spans

SpanAttributesEmitted by
dag_executepipeline_id, node_count, wave_countDagExecutor
wave_executewave, node_ids[]DagExecutor
service_callservice, urlServiceRegistry
ai_extractprovider, model, tokens_in, tokens_outAI adapters
cache_lookuphit, key_prefixCache adapters
circuit_breakerservice, state_transitionCircuitBreakerImpl

Health checks

MetricsCollector exposes a health-check endpoint that reports the state of every registered service:

#![allow(unused)]
fn main() {
let health_json = metrics.health_check(&registry).await;
// {"status":"ok","services":{"http":"healthy","ai_claude":"healthy"}}
}

A service is reported as "degraded" when its circuit breaker is half-open, and "unhealthy" when it is open.