metadata
Attaches Expanso runtime metadata — pipeline IDs, node info, runtime counters, and custom fields — to every message that flows through it. Use it to declaratively stamp events with provenance, lineage, or operational context without writing Bloblang.
pipeline:
processors:
- metadata:
include: [core, orchestrator, node, pipeline]
custom:
pipeline_owner: [email protected]
pii_redaction: enabled
target: body
format: nested
body_key: lineage
When to Use
Use the metadata processor when you need to:
- Stamp events with provenance — attach
run_id,pipeline_name,pipeline_version, andnode_idto every message for lineage tooling or downstream auditing. - Tag output by node identity — write the node's
region,environment, andcluster_nameonto messages so downstream systems can route or partition by location. - Attach static custom fields — add owner emails, compliance tiers, or environment labels declaratively, instead of writing a
mappingblock. - Surface runtime counters — opt in to the
runtimecategory to splicerecords_in,bytes_out,error_count, andduration_msinto the body for observability sinks.
Don't use this if:
- You only need to reference a single key in an interpolated field — use the implicit
@pipeline_id/@node_idkeys directly. See the pipeline metadata guide. - You need conditional or computed metadata — use
mappingand writemeta foo = ...directly.
Configuration
- Common
- Advanced
# Common config fields, showing default values
metadata:
include: [core, orchestrator, node, pipeline] # categories of fields to attach
custom: {} # extra static key/value pairs
target: meta # "meta" or "body"
# All config fields, showing default values
metadata:
include: [core, orchestrator, node, pipeline]
exclude: [] # specific field names to drop
custom: {}
target: meta # "meta" or "body"
format: flat # "flat" or "nested" (only when target: body)
body_key: "" # required when target: body and format: nested
Fields
| Field | Type | Default | Description |
|---|---|---|---|
include | string list | [core, orchestrator, node, pipeline] | Categories of metadata fields to resolve. Valid values: core, orchestrator, node, pipeline, runtime. Note runtime is not in the default — opt in explicitly. |
exclude | string list | [] | Specific field names (or category names) to omit from the resolved set. |
custom | string→string map | {} | User-supplied key/value pairs to attach. Keys must not collide with reserved field names. |
target | string | meta | Where to write the fields. meta writes to message metadata; body splices into the JSON message body. |
format | string | flat | When target: body: flat merges fields at the JSON root; nested places them under body_key. Ignored when target: meta. |
body_key | string | "" | Required when target: body and format: nested. Must be empty otherwise. |
Field Categories
Each category in include resolves to a fixed set of keys.
core — OpenLineage-aligned identity
| Key | Description |
|---|---|
run_id | The unique identifier for this pipeline execution |
job_name | The pipeline name |
job_namespace | The execution namespace |
event_time | Current time, RFC3339Nano in UTC, recomputed per message |
producer | URL identifying the Expanso edge agent and its version (e.g., https://expanso.io/edge/v1.2.3) |
These field names are intentionally aligned with the OpenLineage spec. Expanso Edge is moving toward emitting OpenLineage events natively, so attaching these fields now means your messages are already shaped for lineage tooling.
orchestrator — orchestrator-populated context
| Key | Description |
|---|---|
job_id | The pipeline (job) identifier |
deployment_id | The deployment identifier (reserved; currently empty) |
namespace | The execution namespace |
eval_id | The evaluation identifier |
rollout_wave | The rollout wave for staged deployments |
node — infrastructure context
| Key | Description |
|---|---|
node_id | The edge node running the pipeline |
hostname | The node's hostname |
region | From the node's region label |
environment | From the node's environment label |
cluster_name | From the node's cluster_name label |
agent_version | The edge agent version |
Only the region, environment, and cluster_name labels are promoted by this category. Arbitrary node labels remain available via the @node_label_* keys — see node labels in the metadata guide.
pipeline — pipeline provenance
| Key | Description |
|---|---|
pipeline_name | The pipeline name |
pipeline_version | The pipeline version |
git_commit_sha | Reserved — populated when the pipeline is built from a git source |
git_repo_url | Reserved — populated when the pipeline is built from a git source |
git_branch | Reserved — populated when the pipeline is built from a git source |
The three git_* fields are slot fields. They are emitted as empty strings until pipelines are built from a git source.
runtime — per-execution counters
| Key | Description |
|---|---|
start_time | When this pipeline execution started, RFC3339Nano in UTC |
records_in | Messages observed by this processor since the execution started |
records_out | Messages successfully written by this processor |
bytes_in | Bytes observed by this processor |
bytes_out | Bytes successfully written by this processor |
error_count | Body-write failures recorded by this processor |
duration_ms | Milliseconds since start_time |
The runtime category is not included by default — list it in include to opt in. Counter values reflect the processor's own observations of the message stream up to and including the current message.
Targets and Formats
The target and format fields together control where resolved metadata is written:
target | format | body_key | Result |
|---|---|---|---|
meta | (ignored) | must be empty | Each resolved key becomes a Bento message metadata entry, accessible via @key_name and ${! metadata("key_name") }. |
body | flat | must be empty | Resolved keys are merged at the JSON body's root. Metadata keys overwrite body keys on collision. |
body | nested | required | Resolved keys are placed under body[body_key], leaving the rest of the body untouched. |
When target: body, the body must be a JSON object. Non-JSON bodies, JSON arrays, and JSON scalars are passed through unchanged; the processor logs a warning and increments error_count (visible via the runtime category).
Custom Fields
The custom: map accepts string→string pairs that are attached alongside the resolved category fields. Custom keys must not collide with reserved field names.
metadata:
include: [core]
custom:
pipeline_owner: [email protected]
compliance_tier: pii
cost_center: data-platform
To drop a built-in field while keeping the rest of its category, list it in exclude:
metadata:
include: [pipeline]
exclude: [git_commit_sha, git_repo_url, git_branch]
exclude accepts either field names or category names. Only include is checked against the closed set of categories at submission time, so unknown entries in exclude pass silently.
Reserved Field Names
The following names cannot be used as custom keys:
run_id, job_name, job_namespace, event_time, producer,
job_id, deployment_id, namespace, eval_id, rollout_wave,
node_id, hostname, region, environment, cluster_name, agent_version,
pipeline_name, pipeline_version, git_commit_sha, git_repo_url, git_branch,
start_time, records_in, records_out, bytes_in, bytes_out, error_count, duration_ms,
pipeline_id, execution_id, job_version
Submitting a pipeline with a colliding custom: key is rejected with a clear error message — see Validation Errors.
The internal _execution_id field is auto-populated by the runtime at pipeline build time. User-supplied values are rejected at submission.
Validation Errors
Pipeline submission validates the metadata processor's configuration. The errors you may see and how to fix each:
| Error | Fix |
|---|---|
unknown category "X"; valid categories: [core orchestrator node pipeline runtime] | Replace X in include with one of the listed categories. |
body_key cannot be set when target is "meta" | Remove body_key, or change target to body. |
body_key is only meaningful with format: "nested" | Remove body_key for format: flat, or change format to nested. |
body_key is required when target: "body" and format: "nested" | Set body_key to the JSON object key the metadata should be placed under. |
unknown format "X"; must be "flat" or "nested" | Use flat or nested. |
unknown target "X"; must be "meta" or "body" | Use meta or body. |
custom key "X" collides with a reserved field name | Rename the key, or use exclude to drop the reserved field if you want to override its value (note: exclude removes — it does not override). |
_execution_id is internal and auto-populated by the Expanso runtime; remove it from your config | Remove the _execution_id entry from the processor's config. |
Examples
- Default to meta
- Body, nested
- Body, flat
- Custom + selective
Attach the default categories to message metadata, then reference them in a downstream output:
pipeline:
processors:
- metadata: {}
output:
kafka:
addresses: [localhost:9092]
topic: events.${! metadata("region") }
metadata:
exclude_prefixes: []
After the processor runs, every message carries metadata keys like @run_id, @job_name, @node_id, @region, etc. You can read them via metadata("...") in interpolation or @... in Bloblang.
Splice metadata under a lineage key at the top level of the JSON body:
pipeline:
processors:
- metadata:
include: [core, pipeline, node]
target: body
format: nested
body_key: lineage
Given an incoming body of {"event_id": "abc123", "level": "info"}, the processor emits:
{
"event_id": "abc123",
"level": "info",
"lineage": {
"run_id": "01HF...",
"job_name": "events-pipeline",
"job_namespace": "default",
"event_time": "2026-04-30T12:34:56.789Z",
"producer": "https://expanso.io/edge/v1.4.2",
"pipeline_name": "events-pipeline",
"pipeline_version": "v3",
"git_commit_sha": "",
"git_repo_url": "",
"git_branch": "",
"node_id": "node-eu-west-1-a",
"hostname": "edge-01",
"region": "eu-west-1",
"environment": "production",
"cluster_name": "edge-eu",
"agent_version": "1.4.2"
}
}
Merge metadata at the body's root. Use this when downstream consumers expect a flat JSON object:
pipeline:
processors:
- metadata:
include: [pipeline, node]
exclude: [git_commit_sha, git_repo_url, git_branch, hostname, agent_version]
target: body
format: flat
If the incoming body has a key that collides with a metadata field, the metadata value wins. For example, an incoming body of {"region": "us-east-1", "user_id": 42} becomes {"region": "eu-west-1", "user_id": 42, "node_id": "...", "pipeline_name": "...", ...} — the body's region is overwritten with the node's region. Use exclude to drop fields that would collide if you need to preserve the body's values.
Combine selective categories, exclusions, and a custom block to attach a focused set of fields:
pipeline:
processors:
- metadata:
include: [pipeline]
exclude: [git_commit_sha, git_repo_url, git_branch]
custom:
pipeline_owner: [email protected]
compliance_tier: pii
target: body
format: nested
body_key: tags
Result: messages carry body.tags = { pipeline_name, pipeline_version, pipeline_owner, compliance_tier } and nothing else.
Related
- Pipeline metadata guide — the implicit
@pipeline_id,@node_id,@namespace, and@node_label_*keys available without this processor. - Bloblang guide — for conditional or computed metadata via
mapping. mappingprocessor — the imperative alternative for editing metadata.