Skip to main content

metadata

Attaches Expanso runtime metadata — pipeline IDs, node info, runtime counters, and custom fields — to every message that flows through it. Use it to declaratively stamp events with provenance, lineage, or operational context without writing Bloblang.

pipeline:
processors:
- metadata:
include: [core, orchestrator, node, pipeline]
custom:
pipeline_owner: [email protected]
pii_redaction: enabled
target: body
format: nested
body_key: lineage

When to Use

Use the metadata processor when you need to:

  • Stamp events with provenance — attach run_id, pipeline_name, pipeline_version, and node_id to every message for lineage tooling or downstream auditing.
  • Tag output by node identity — write the node's region, environment, and cluster_name onto messages so downstream systems can route or partition by location.
  • Attach static custom fields — add owner emails, compliance tiers, or environment labels declaratively, instead of writing a mapping block.
  • Surface runtime counters — opt in to the runtime category to splice records_in, bytes_out, error_count, and duration_ms into the body for observability sinks.

Don't use this if:

  • You only need to reference a single key in an interpolated field — use the implicit @pipeline_id / @node_id keys directly. See the pipeline metadata guide.
  • You need conditional or computed metadata — use mapping and write meta foo = ... directly.

Configuration

# Common config fields, showing default values
metadata:
include: [core, orchestrator, node, pipeline] # categories of fields to attach
custom: {} # extra static key/value pairs
target: meta # "meta" or "body"

Fields

FieldTypeDefaultDescription
includestring list[core, orchestrator, node, pipeline]Categories of metadata fields to resolve. Valid values: core, orchestrator, node, pipeline, runtime. Note runtime is not in the default — opt in explicitly.
excludestring list[]Specific field names (or category names) to omit from the resolved set.
customstring→string map{}User-supplied key/value pairs to attach. Keys must not collide with reserved field names.
targetstringmetaWhere to write the fields. meta writes to message metadata; body splices into the JSON message body.
formatstringflatWhen target: body: flat merges fields at the JSON root; nested places them under body_key. Ignored when target: meta.
body_keystring""Required when target: body and format: nested. Must be empty otherwise.

Field Categories

Each category in include resolves to a fixed set of keys.

core — OpenLineage-aligned identity

KeyDescription
run_idThe unique identifier for this pipeline execution
job_nameThe pipeline name
job_namespaceThe execution namespace
event_timeCurrent time, RFC3339Nano in UTC, recomputed per message
producerURL identifying the Expanso edge agent and its version (e.g., https://expanso.io/edge/v1.2.3)

These field names are intentionally aligned with the OpenLineage spec. Expanso Edge is moving toward emitting OpenLineage events natively, so attaching these fields now means your messages are already shaped for lineage tooling.

orchestrator — orchestrator-populated context

KeyDescription
job_idThe pipeline (job) identifier
deployment_idThe deployment identifier (reserved; currently empty)
namespaceThe execution namespace
eval_idThe evaluation identifier
rollout_waveThe rollout wave for staged deployments

node — infrastructure context

KeyDescription
node_idThe edge node running the pipeline
hostnameThe node's hostname
regionFrom the node's region label
environmentFrom the node's environment label
cluster_nameFrom the node's cluster_name label
agent_versionThe edge agent version

Only the region, environment, and cluster_name labels are promoted by this category. Arbitrary node labels remain available via the @node_label_* keys — see node labels in the metadata guide.

pipeline — pipeline provenance

KeyDescription
pipeline_nameThe pipeline name
pipeline_versionThe pipeline version
git_commit_shaReserved — populated when the pipeline is built from a git source
git_repo_urlReserved — populated when the pipeline is built from a git source
git_branchReserved — populated when the pipeline is built from a git source

The three git_* fields are slot fields. They are emitted as empty strings until pipelines are built from a git source.

runtime — per-execution counters

KeyDescription
start_timeWhen this pipeline execution started, RFC3339Nano in UTC
records_inMessages observed by this processor since the execution started
records_outMessages successfully written by this processor
bytes_inBytes observed by this processor
bytes_outBytes successfully written by this processor
error_countBody-write failures recorded by this processor
duration_msMilliseconds since start_time

The runtime category is not included by default — list it in include to opt in. Counter values reflect the processor's own observations of the message stream up to and including the current message.

Targets and Formats

The target and format fields together control where resolved metadata is written:

targetformatbody_keyResult
meta(ignored)must be emptyEach resolved key becomes a Bento message metadata entry, accessible via @key_name and ${! metadata("key_name") }.
bodyflatmust be emptyResolved keys are merged at the JSON body's root. Metadata keys overwrite body keys on collision.
bodynestedrequiredResolved keys are placed under body[body_key], leaving the rest of the body untouched.

When target: body, the body must be a JSON object. Non-JSON bodies, JSON arrays, and JSON scalars are passed through unchanged; the processor logs a warning and increments error_count (visible via the runtime category).

Custom Fields

The custom: map accepts string→string pairs that are attached alongside the resolved category fields. Custom keys must not collide with reserved field names.

metadata:
include: [core]
custom:
pipeline_owner: [email protected]
compliance_tier: pii
cost_center: data-platform

To drop a built-in field while keeping the rest of its category, list it in exclude:

metadata:
include: [pipeline]
exclude: [git_commit_sha, git_repo_url, git_branch]

exclude accepts either field names or category names. Only include is checked against the closed set of categories at submission time, so unknown entries in exclude pass silently.

Reserved Field Names

The following names cannot be used as custom keys:

run_id, job_name, job_namespace, event_time, producer,
job_id, deployment_id, namespace, eval_id, rollout_wave,
node_id, hostname, region, environment, cluster_name, agent_version,
pipeline_name, pipeline_version, git_commit_sha, git_repo_url, git_branch,
start_time, records_in, records_out, bytes_in, bytes_out, error_count, duration_ms,
pipeline_id, execution_id, job_version

Submitting a pipeline with a colliding custom: key is rejected with a clear error message — see Validation Errors.

The internal _execution_id field is auto-populated by the runtime at pipeline build time. User-supplied values are rejected at submission.

Validation Errors

Pipeline submission validates the metadata processor's configuration. The errors you may see and how to fix each:

ErrorFix
unknown category "X"; valid categories: [core orchestrator node pipeline runtime]Replace X in include with one of the listed categories.
body_key cannot be set when target is "meta"Remove body_key, or change target to body.
body_key is only meaningful with format: "nested"Remove body_key for format: flat, or change format to nested.
body_key is required when target: "body" and format: "nested"Set body_key to the JSON object key the metadata should be placed under.
unknown format "X"; must be "flat" or "nested"Use flat or nested.
unknown target "X"; must be "meta" or "body"Use meta or body.
custom key "X" collides with a reserved field nameRename the key, or use exclude to drop the reserved field if you want to override its value (note: exclude removes — it does not override).
_execution_id is internal and auto-populated by the Expanso runtime; remove it from your configRemove the _execution_id entry from the processor's config.

Examples

Attach the default categories to message metadata, then reference them in a downstream output:

pipeline:
processors:
- metadata: {}

output:
kafka:
addresses: [localhost:9092]
topic: events.${! metadata("region") }
metadata:
exclude_prefixes: []

After the processor runs, every message carries metadata keys like @run_id, @job_name, @node_id, @region, etc. You can read them via metadata("...") in interpolation or @... in Bloblang.

  • Pipeline metadata guide — the implicit @pipeline_id, @node_id, @namespace, and @node_label_* keys available without this processor.
  • Bloblang guide — for conditional or computed metadata via mapping.
  • mapping processor — the imperative alternative for editing metadata.