Skip to content
Documentation Prelude Collector 1.0.0

Data flow

An end-to-end walkthrough of a single sample as it travels from device to output backend.

This page follows a single value as it moves through Prelude Collector, from the moment the collector opens a session to a device to the moment a snapshot lands in your downstream backend. Read it once to build a mental model of how the stages line up. After that, you should be able to predict where to look when something does not behave the way you expected.

If you have not yet, skim the Architecture overview and the Glossary first — this page assumes you know what a device, a model, a mapping, a transform, a snapshot, and an output are.

The pipeline at a glance

flowchart LR Device["Device<br/>(router, switch, ...)"] -->|protocol session| Collector["Prelude Collector<br/>(subscription worker)"] Collector -->|raw response| Mapping["Mapping<br/>(path -> field)"] Mapping -->|field values| Model["Model instance"] Model -->|per-field functions| Transform["Transforms<br/>(built-in + Starlark)"] Transform -->|finalized values| Snapshot["Snapshot<br/>(model + metadata)"] Snapshot -->|fan-out| Output["Outputs<br/>(NATS, Prometheus,<br/>InfluxDB, Kafka,<br/>webhook, file)"]

The collector walks left to right on every collection cycle, once per enabled subscription. Each box is a stage you can configure and inspect on its own.

Stage 1 — Collection

Collection begins with a subscription waking up on its interval. The collector picks up the protocol record attached to the device, authenticates, and opens a session.

What happens in this stage depends on the protocol:

  • gNMI — the collector issues a Get or Subscribe request against the paths the model declares.
  • NETCONF — the collector sends <get> or <get-config> RPCs against the relevant subtrees.
  • SNMP — the collector walks or polls the OIDs the model declares, using the configured community or v3 credentials.
  • CLI — the collector logs in over SSH, issues the configured commands, and captures the output.

The output of this stage is a raw response.
For gNMI and NETCONF that is structured XML or protobuf.
For SNMP it is a set of (OID, value) pairs.
For CLI it is a block of text.
None of this is useful to your downstream systems yet — that is what the next stage is for.

If a session fails — wrong credentials, unreachable host, protocol mismatch — the collector records the failure against the subscription, surfaces it in the UI and API, and waits for the next interval. Transient failures do not stop the collector; you keep collecting from every other device while one is down.

Stage 2 — Mapping

Mapping is where raw bytes turn into named field values.

For each field in the model, the collector finds the matching mapping, evaluates its source expression against the raw response, and pulls out the value. The expression is protocol-specific:

  • For gNMI and NETCONF, it is a YANG-style path or XPath.
  • For SNMP, it is an OID (or a column in an SNMP table).
  • For CLI, it is a TTP (Template Text Parser) template.

Mappings include key extraction. When the model has a repeating field — for example, one entry per interface — the mapping declares which part of the response is the key (the interface name) and which part is the value (the counter). The collector emits one field instance per key.

By the end of this stage, every field in the model either has a raw value bound to it or is marked as missing. Missing fields are not an error by themselves; they simply do not appear in the resulting snapshot.

Stage 3 — Transformation

Once a field has a raw value, the collector runs the transform chain configured on that field. A transform is a function: one value in, one value out. You can chain multiple transforms on the same field — the output of each becomes the input of the next.

Two kinds of transforms exist:

  • Built-in transforms — shipped with the collector. Use them for unit conversion, scale changes, string parsing, type coercion, bitfield decoding, OID-to-label lookups, and other common operations.
  • Custom transforms — Starlark scripts you author. Use them when the built-ins are not enough. Starlark is sandboxed and Python-like; scripts are short, easy to test in isolation, and hot-reloaded on save so you can iterate without restarting the collector.

This is also where vendor differences disappear.

If two vendors report the same metric on different scales — say, bytes versus megabytes — you give each device the appropriate transform on its mapping and the model sees a single, consistent value. Downstream consumers do not need to know which vendor produced any given snapshot.

Stage 4 — Snapshot

When every mapping has been resolved and every transform has run, the collector assembles a snapshot. The snapshot is the concrete instance of the model, populated with the finalized field values and tagged with the metadata your downstream systems need to make sense of it:

  • The device name and identifier.
  • The subscription that produced the snapshot.
  • The protocol that was used.
  • The timestamp at which collection started.
  • The duration of the collection cycle.
  • The model name and version.

Snapshots are immutable. Each collection cycle produces a new snapshot — the collector does not edit previous ones in place. This keeps your downstream systems simple: every event they receive is a complete, self-describing data point.

Stage 5 — Output

Finally, the snapshot is published to every enabled output. Outputs run in parallel, and a slow output does not block fast ones. The collector supports several backends out of the box:

  • NATS — the default backend. Useful for fan-out to multiple consumers and for chaining the collector into the rest of your platform.
  • Prometheus — direct scrape endpoint or remote-write target, depending on how you configure it.
  • InfluxDB — line-protocol writes for time-series storage.
  • Kafka — for high-throughput streaming pipelines.
  • Webhook — HTTP POST against any URL you control. Use this to push into ticketing, observability, or custom systems.
  • File — write snapshots to disk for archive, replay, or offline analysis.

You can enable any combination at the same time. Adding or removing an output does not affect the others — change them through the API or the web UI and the collector picks up the change without a restart.

What you control where

A short reference for "where does this knob live":

  • Which devices to monitor — Device records.
  • How to reach them — Protocol records on each device.
  • What to collect and how often — Subscriptions.
  • What the data looks like to your downstream systems — Models and their fields.
  • Where the data comes from in the protocol response — Mappings on each field.
  • How to clean up the values — Transforms (built-in or custom).
  • Where the data goes — Outputs.

Every one of these is editable at runtime. Most changes take effect on the next collection cycle of the affected subscription.

Next

You now know how a single value flows from a device to a backend. The fastest way to lock in the model is to actually run the pipeline: head to the Tutorials and walk through a first collection end to end. From there, the API reference and the protocol-specific guides will fill in the details.

Filtering by: