Skip to content
Documentation Prelude Topology Engine 1.0.0

Health

How to probe Prelude TE for liveness and readiness with the /api/health endpoint, and where to look when something is not flowing.

Prelude TE exposes a dedicated health endpoint suitable for load balancers, container healthchecks, Kubernetes probes, and ops dashboards.

GET /api/health

A public JSON endpoint — no authentication required — that returns a snapshot of the engine and its subsystems.

curl -fsSL https://te.example.com/api/health

Response shape

{
  "status": "ok",
  "service": "prelude-te",
  "version": "1.2.3",
  "commit": "a1b2c3d",
  "uptime_seconds": 84213,
  "timestamp": "2026-05-26T08:31:14Z",
  "checks": {
    "database": { "status": "ok" },
    "bgp": {
      "status": "ok",
      "running": true,
      "peers_total": 4,
      "peers_established": 4
    },
    "topology": {
      "status": "ok",
      "running": true,
      "domains": 2,
      "nodes": 318,
      "links": 642
    },
    "outputs": {
      "status": "ok",
      "running": true,
      "total": 1,
      "connected": 1,
      "errors": 0
    },
    "licensing": {
      "status": "ok",
      "tier": "standard",
      "trial_state": "registered",
      "trial_days_remaining": 0
    }
  }
}

Global status

The top-level status aggregates the per-check statuses:

Value Meaning HTTP
ok Every subsystem is healthy. 200
degraded A non-critical check is unhealthy (e.g. no peer established, output errors). 200
down The database is unreachable. The engine cannot serve traffic. 503

The database is the only critical dependency: if its check is down, the global status is down and the endpoint returns HTTP 503. Any other unhealthy check yields degraded with HTTP 200.

This split is intentional: a Kubernetes liveness probe that restarts on 503 should not restart the engine just because BGP peers are flapping. Use down as the only "restart me" signal.

Per-check details

Check status becomes degraded when…
database down if the DB connection or ping fails. No degraded state.
bgp At least one peer is enabled but none are in established state, or BGP isn't running.
topology The topology manager is not running.
outputs One or more outputs are in error state, or the output manager is not running.
licensing The built-in trial has expired and no license has been registered yet.

The bgp, topology, and outputs checks also expose live counters (peers, domains, nodes, links, connected outputs, errors) so you can read operational state without a second round-trip.

Trimmed payload for liveness probes

For Kubernetes liveness probes that only need an up/down verdict, pass ?verbose=false to get a smaller payload:

curl -fsSL https://te.example.com/api/health?verbose=false
{
  "status": "ok",
  "service": "prelude-te",
  "uptime_seconds": 84213,
  "timestamp": "2026-05-26T08:31:14Z"
}

Kubernetes probe example

livenessProbe:
  httpGet:
    path: /api/health?verbose=false
    port: 8080
  initialDelaySeconds: 30
  periodSeconds: 10
readinessProbe:
  httpGet:
    path: /api/health
    port: 8080
  initialDelaySeconds: 10
  periodSeconds: 5

Both probes succeed on HTTP 200 (ok or degraded) and fail on HTTP 503 (down). Use the readiness probe's verbose payload in logs and dashboards to see why a pod is degraded without leaving the cluster.

Operational signals to watch

Beyond the endpoint itself, a working Prelude TE deployment shows three healthy signals at the same time:

  • At least one peer is establishedchecks.bgp.peers_established > 0, or open BGP → Peers in the web UI. A fleet where every peer is idle or stuck in connect / active means nothing is feeding the graph.
  • The topology is non-empty and recentchecks.topology.nodes > 0, or open Topology in the UI. Pair this with the per-domain last-update timestamp from GET /api/topology/stats for a liveness signal on the data pipeline.
  • Enabled outputs are connectedchecks.outputs.connected == checks.outputs.total and checks.outputs.errors == 0. Open Outputs to see each output's last-error when something is off.

When something is wrong

Walk the pipeline from the source to the sink:

  1. Peer state — if checks.bgp.peers_established is 0 while peers are enabled, look at each peer's State history for the reason of the last failure. See Peers.
  2. Topology stats — if peers are up but checks.topology.nodes stays at 0, the router may not be exporting BGP-LS NLRIs. Confirm the BGP-LS AFI/SAFI is enabled on the peer side.
  3. Output state — if topology is populated but a downstream consumer is silent, check checks.outputs.errors and the output's last-error on the Outputs detail page. See Outputs / NATS.
  4. Logs — the engine writes per-module logs under storage/logs/ (e.g. prelude-te.log, access.log). Tail them when the UI and /api/health signals are not enough.

For Prometheus-scrapable metrics — peer state, session counters, topology counts, change rates — see Metrics.

See Support for how to escalate.

Filtering by: