Monitoring Guide

This guide focuses on visibility: dashboards, metrics, and signals that help catch issues early.

Dashboard deployment

Local: run the dashboard on a developer workstation for quick checks.
Internal: deploy behind an auth proxy (SSO, basic auth, or IP allowlist).
Shared ops: place behind a reverse proxy with TLS termination and strict access controls.

Restrict control actions to operators and ensure TLS is terminated at the proxy or upstream load balancer.

See the Dashboard guide for setup details.

Key indicators

Queue depth: growing queues indicate stalled workers or upstream spikes.
Queue latency: long gaps between enqueue time and start time point to saturation or routing issues.
Retry rate: a rise in retries usually signals transient dependency issues.
DLQ volume: poison pills or schema mismatches show up here.
Worker heartbeats: missing heartbeats point to crashed or partitioned workers.
Scheduler drift: drift spikes indicate schedule store or broker delays.

Signals & metrics

Use lifecycle signals to emit structured events early.
Export metrics to your standard stack (OTLP, Prometheus, Datadog, etc.).
Correlate task IDs with traces/logs.

Metrics Exporters
Signal Hooks
Queue Depth Gauge
Env Vars

lib/observability.dart
void configureMetrics() {
  StemMetrics.instance.configure(exporters: [ConsoleMetricsExporter()]);
}

lib/observability.dart
void registerSignals() {
  StemSignals.taskRetry.connect((payload, _) {
    metrics.recordRetry(delay: payload.nextRetryAt.difference(DateTime.now()));
  });

  StemSignals.workerHeartbeat.connect((payload, _) {
    heartbeatGauge.set(1, tags: {'worker': payload.worker.id});
  });
}

lib/observability.dart
final queueDepthGauge = GaugeMetric();

void recordQueueDepth(String queue, int depth) {
  queueDepthGauge.set(depth.toDouble(), tags: {'queue': queue});
}

export STEM_METRIC_EXPORTERS=otlp:http://localhost:4318/v1/metrics
export STEM_OTLP_ENDPOINT=http://localhost:4318

CLI probes

CLI Commands
Dart (Heartbeats)

stem observe queues
stem observe workers
stem observe schedules
stem worker stats --json

Set STEM_SCHEDULE_STORE_URL before running stem observe schedules.

lib/observability_ops.dart
Future<void> listWorkerHeartbeats() async {
  final backend = await RedisResultBackend.connect(
    Platform.environment['STEM_RESULT_BACKEND_URL']!,
  );
  final heartbeats = await backend.listWorkerHeartbeats();
  for (final hb in heartbeats) {
    print('${hb.workerId} -> queues=${hb.queues} inflight=${hb.inflight}');
  }
  await backend.close();
}

Dashboard deployment​

Key indicators​

Signals & metrics​

CLI probes​

Next steps​

Dashboard deployment

Key indicators

Signals & metrics

CLI probes

Next steps