Monitoring Guide
This guide focuses on visibility: dashboards, metrics, and signals that help catch issues early.
Dashboard deployment
- Local: run the dashboard on a developer workstation for quick checks.
- Internal: deploy behind an auth proxy (SSO, basic auth, or IP allowlist).
- Shared ops: place behind a reverse proxy with TLS termination and strict access controls.
Restrict control actions to operators and ensure TLS is terminated at the proxy or upstream load balancer.
See the Dashboard guide for setup details.
Key indicators
- Queue depth: growing queues indicate stalled workers or upstream spikes.
- Queue latency: long gaps between enqueue time and start time point to saturation or routing issues.
- Retry rate: a rise in retries usually signals transient dependency issues.
- DLQ volume: poison pills or schema mismatches show up here.
- Worker heartbeats: missing heartbeats point to crashed or partitioned workers.
- Scheduler drift: drift spikes indicate schedule store or broker delays.
Signals & metrics
- Use lifecycle signals to emit structured events early.
- Export metrics to your standard stack (OTLP, Prometheus, Datadog, etc.).
- Correlate task IDs with traces/logs.
- Metrics Exporters
- Signal Hooks
- Queue Depth Gauge
- Env Vars
lib/observability.dart
void configureMetrics() {
StemMetrics.instance.configure(exporters: [ConsoleMetricsExporter()]);
}
lib/observability.dart
void registerSignals() {
StemSignals.taskRetry.connect((payload, _) {
metrics.recordRetry(delay: payload.nextRetryAt.difference(DateTime.now()));
});
StemSignals.workerHeartbeat.connect((payload, _) {
heartbeatGauge.set(1, tags: {'worker': payload.worker.id});
});
}
lib/observability.dart
final queueDepthGauge = GaugeMetric();
void recordQueueDepth(String queue, int depth) {
queueDepthGauge.set(depth.toDouble(), tags: {'queue': queue});
}
export STEM_METRIC_EXPORTERS=otlp:http://localhost:4318/v1/metrics
export STEM_OTLP_ENDPOINT=http://localhost:4318
CLI probes
- CLI Commands
- Dart (Heartbeats)
stem observe queues
stem observe workers
stem observe schedules
stem worker stats --json
Set STEM_SCHEDULE_STORE_URL before running stem observe schedules.
lib/observability_ops.dart
Future<void> listWorkerHeartbeats() async {
final backend = await RedisResultBackend.connect(
Platform.environment['STEM_RESULT_BACKEND_URL']!,
);
final heartbeats = await backend.listWorkerHeartbeats();
for (final hb in heartbeats) {
print('${hb.workerId} -> queues=${hb.queues} inflight=${hb.inflight}');
}
await backend.close();
}