Skip to content
Documentation Prelude Collector 1.0.0

Output selection

How to choose between NATS, Prometheus, InfluxDB, Kafka, webhook, and file outputs — with decision criteria and recommended defaults.

Recommendation

Start with NATS as the canonical bus and Prometheus for operator-facing dashboards. Add InfluxDB or Kafka when you have a specific reason. Use webhook for integration glue and file for archive or replay — not for production query.

Why this matters

The collector publishes every Snapshot to every enabled Output in parallel. There is no per-subscription routing and no per-Output queue, retry, or DLQ — the manager fans each batch out, waits for every Output to return, increments a per-backend success/failure counter, and moves on. A failed delivery is dropped.

That makes "pick the right one" more important than it sounds: each Output has its own query model, retention story, and operational cost, and they pull your downstream architecture in different directions. Pick badly and you end up paying twice — once to operate the Output you do not really need, and once to migrate off it later. Pick a flaky receiver and you lose data without a safety net.

Decision criteria

Score each Output against the four questions that actually matter:

  1. Query model — How will downstream consumers read the data? PromQL, Flux/SQL, raw events, or "shipped to another team's system?"
  2. Retention — How long do you need the data, and who pays for storing it?
  3. Scale — How many Snapshots per second sustained? How bursty?
  4. Ops cost — Who runs the backend, and how much do they want to run another one?

At a glance

Output Best for Query model Retention story Ops cost
NATS Fan-out, integration with the rest of your platform Subject subscribe None (bus, not store) Low
Prometheus Operator dashboards, alerting on rates and percentiles PromQL Days to weeks (TSDB) Medium
InfluxDB Long-horizon time series, ad-hoc Flux/SQL Flux / InfluxQL Months to years Medium
Kafka High-throughput streaming pipelines, multi-team consumers Consumer groups Configurable, days+ High
Webhook Glue: ticketing, paging, custom apps None — push events Wherever the receiver puts it Low
File Archive, replay, offline analysis, audit None — read on disk As long as you keep the volume Very low

How to choose

NATS — the default bus

NATS is the recommended canonical bus for streaming Snapshots: configure the NATS Output (PUT /api/v1/outputs/nats) and every record lands on collector.data.{model-name}.{device-id}. Use it as the canonical fan-out point so every other consumer (your own services, dashboards, automation) hangs off NATS rather than off the collector directly — that keeps the collector decoupled from changes in your downstream stack.

The NATS output backend's connection is the single NATS connection used for data export, and it also carries the collector's internal signaling (ICMP reachability, OneBoard device-sync). Configure it once in the web UI under Output Settings → NATS; it is not enabled out of the box.

Caveat: NATS is a bus, not a store. The collector publishes to core NATS subjects (no JetStream stream config); replay is whatever the NATS server itself retains. Without server-side JetStream retention, missed messages are gone. Do not treat NATS as your historical archive.

Prometheus — operator dashboards

The collector exposes metrics on port 9090 at /metrics/collector by default. Metric names follow collector_{model}_{field} (no prelude_ prefix). For interface counters, queue depths, and similar high-cardinality operational data, scraping into a Prometheus you already run is the shortest path to a useful Grafana panel. PromQL is the right tool for "what is the 95th percentile of egress error rate over the last hour?"

Caveat: Prometheus retention is intentionally short and TSDB cardinality is the thing you will hit first. Plan retention and label hygiene up front, not when the alerts about Prometheus itself start firing.

InfluxDB — long-horizon time series

Reach for InfluxDB when you need months or years of history that Prometheus is not built to keep, or when your team prefers Flux/SQL. It is also the easier path when you want to mix high-frequency counter data with annotation events in the same query.

The InfluxDB Output uses the v2 client's non-blocking write API and batches points internally. Tune batch-size (records per flush) and flush-interval (max wait in ms) on the backend config — leave both unset to use the client defaults. Async write errors are logged but don't increment the backend's per-batch failures counter, so treat the collector log channel as the primary signal for InfluxDB trouble.

Caveat: it is another database to run. If you already run Prometheus and "long horizon" means 90 days, scaling Prometheus storage is usually cheaper than adding InfluxDB.

Kafka — multi-team streaming

Kafka makes sense when more than one downstream team consumes the data, when you need durable replay, or when you are feeding a stream processor (Flink, Spark, your own consumers) that already speaks Kafka. Treat it as platform infrastructure, not as a Collector detail.

Caveat: Kafka is the most expensive Output to operate. Do not add it just because you might need it later — add it when the second consumer team shows up.

Webhook — integration glue

Webhook is the right tool for "when this Snapshot looks like X, post to that ticketing system." It is push-based, fire-and-forget, and trivially testable. Pair it with upstream filters so you do not POST every Snapshot to a bug tracker.

Two delivery modes are available via the batch-mode config flag: individual mode (one HTTP request per record — easy mental model, easy receivers) and batch mode (one request per collection cycle, body is a JSON array — fewer round-trips, but a single failed request counts every record in the batch as failed). Pick batch mode for high-throughput receivers; individual for ticket-style "one event per request" integrations.

Caveat: webhook receivers vary wildly in throughput. The collector fans out to every Output in parallel per batch, but there is no per-Output queue, retry, or DLQ — a slow or failing webhook doesn't back up an internal queue, it simply drops records (and adds latency to that batch's wait, since the manager waits for all Outputs to return). Track failures on /api/v1/outputs/metrics and put a buffer in front of the receiver itself when the receiver can't keep up.

File — archive and replay

File output is the cheapest possible long-term archive: write Snapshots to a mounted volume and let your existing backup tooling handle the rest. It is also the easiest way to replay a known good sequence into a test pipeline.

Caveat: there is no query language. Files are inputs to other systems, not a system of record by themselves.

For a new deployment that has not yet decided what it wants:

  • Enable NATS. It is already plumbed and gives every future consumer a place to attach.
  • Enable Prometheus on its default port (9090) and path (/metrics/collector), and point an existing Prometheus at it. You will want a dashboard within a week.
  • Enable file to a retained volume. Cheap insurance, useful for replay during outages.
  • Leave InfluxDB, Kafka, and webhook off until a specific need shows up.

Trade-offs

What you give up by following the defaults:

  • One source of truth. With multiple Outputs enabled, two teams can disagree about a number because they queried different backends. Standardize on which Output answers which kind of question.
  • Effort spent on retention you may not need. Prometheus and InfluxDB both want retention policies; without them you eventually fill the disk. Treat retention as a Day 1 decision.
  • Webhook reliability. Webhook is the easiest Output to enable and the easiest one to overload a receiver with. There is no collector-side retry or buffer, so a flaky receiver loses records. Expect to add a queue or proxy in front of the receiver, or stand up a small intermediary that absorbs bursts, before calling it "production."

When to deviate

  • You already have a streaming platform. If Kafka is the organization's standard, send to Kafka first and let other teams consume from there. Skip the NATS-as-bus pattern.
  • You do not run Prometheus and do not want to. Use the Collector's metrics endpoint for self-monitoring and send operational data to InfluxDB or to a vendor's hosted TSDB through webhook. Do not stand up Prometheus just to follow the recommendation.
  • You are in an air-gapped or compliance-bound environment. File output may be the only safe target. Lean on it; size the volume generously; back it up.
  • You are doing event-shaped work, not metrics. SNMP traps, syslog-derived events, alarm state changes — these belong on a bus (NATS or Kafka), not in a TSDB. Time-series tools handle event sparsity badly.
Filtering by: