What is Expanso and how does it work?

Expanso is a managed platform for deploying intelligent data pipelines at the edge. It processes data where it's generated - reducing bandwidth, latency, and costs. You deploy lightweight agents on your infrastructure, build pipelines using our visual builder or YAML, and control everything from a central SaaS platform.

Can I run AI/ML models directly in my data pipelines?

Yes! Expanso supports running ONNX, TensorFlow Lite, and other models as native pipeline steps. Execute low-latency inference on streaming data, enrich events with model outputs (like risk scores), and make decisions at the edge without cloud round-trips.

How many pre-built components are available?

Expanso provides 200+ pre-built components including inputs (Kafka, HTTP, files), processors (transformations, filtering, PII masking, aggregations), and outputs (S3, Snowflake, Datadog, Splunk). Browse the complete catalog in our Component Reference.

Do I need to write code to build pipelines?

No - use our drag-and-drop visual pipeline builder to create sophisticated pipelines without code. For advanced use cases, you can also write pipelines in YAML or use the Bloblang transformation language for complex data mappings.

How does Expanso help with data governance and compliance?

Expanso includes built-in governance features: automatic PII detection and masking, policy enforcement at the edge, RBAC, SSO integration, and comprehensive audit trails. Mask sensitive data before it ever leaves your network.

metadata

Attaches Expanso runtime metadata — pipeline IDs, node info, runtime counters, and custom fields — to every message that flows through it. Use it to declaratively stamp events with provenance, lineage, or operational context without writing Bloblang.

pipeline:
  processors:
    - metadata:
        include: [core, orchestrator, node, pipeline]
        custom:
          pipeline_owner: [email protected]
          pii_redaction: enabled
        target: body
        format: nested
        body_key: lineage

When to Use

Use the metadata processor when you need to:

Stamp events with provenance — attach run_id, pipeline_name, pipeline_version, and node_id to every message for lineage tooling or downstream auditing.
Tag output by node identity — write the node's region, environment, and cluster_name onto messages so downstream systems can route or partition by location.
Attach static custom fields — add owner emails, compliance tiers, or environment labels declaratively, instead of writing a mapping block.
Surface runtime counters — opt in to the runtime category to splice records_in, bytes_out, error_count, and duration_ms into the body for observability sinks.

Don't use this if:

You only need to reference a single key in an interpolated field — use the implicit @pipeline_id / @node_id keys directly. See the pipeline metadata guide.
You need conditional or computed metadata — use mapping and write meta foo = ... directly.

Configuration

Common
Advanced

# Common config fields, showing default values
metadata:
  include: [core, orchestrator, node, pipeline]   # categories of fields to attach
  custom: {}                                      # extra static key/value pairs
  target: meta                                    # "meta" or "body"

# All config fields, showing default values
metadata:
  include: [core, orchestrator, node, pipeline]
  exclude: []                                     # specific field names to drop
  custom: {}
  target: meta                                    # "meta" or "body"
  format: flat                                    # "flat" or "nested" (only when target: body)
  body_key: ""                                    # required when target: body and format: nested

Fields

Field	Type	Default	Description
`include`	string list	`[core, orchestrator, node, pipeline]`	Categories of metadata fields to resolve. Valid values: `core`, `orchestrator`, `node`, `pipeline`, `runtime`. Note `runtime` is not in the default — opt in explicitly.
`exclude`	string list	`[]`	Specific field names (or category names) to omit from the resolved set.
`custom`	string→string map	`{}`	User-supplied key/value pairs to attach. Keys must not collide with reserved field names.
`target`	string	`meta`	Where to write the fields. `meta` writes to message metadata; `body` splices into the JSON message body.
`format`	string	`flat`	When `target: body`: `flat` merges fields at the JSON root; `nested` places them under `body_key`. Ignored when `target: meta`.
`body_key`	string	`""`	Required when `target: body` and `format: nested`. Must be empty otherwise.

Field Categories

Each category in include resolves to a fixed set of keys.

`core` — OpenLineage-aligned identity

Key	Description
`run_id`	The unique identifier for this pipeline execution
`job_name`	The pipeline name
`job_namespace`	The execution namespace
`event_time`	Current time, RFC3339Nano in UTC, recomputed per message
`producer`	URL identifying the Expanso edge agent and its version (e.g., `https://expanso.io/edge/v1.2.3`)

These field names are intentionally aligned with the OpenLineage spec. Expanso Edge is moving toward emitting OpenLineage events natively, so attaching these fields now means your messages are already shaped for lineage tooling.

`orchestrator` — orchestrator-populated context

Key	Description
`job_id`	The pipeline (job) identifier
`deployment_id`	The deployment identifier (reserved; currently empty)
`namespace`	The execution namespace
`eval_id`	The evaluation identifier
`rollout_wave`	The rollout wave for staged deployments

`node` — infrastructure context

Key	Description
`node_id`	The edge node running the pipeline
`hostname`	The node's hostname
`region`	From the node's `region` label
`environment`	From the node's `environment` label
`cluster_name`	From the node's `cluster_name` label
`agent_version`	The edge agent version

Only the region, environment, and cluster_name labels are promoted by this category. Arbitrary node labels remain available via the @node_label_* keys — see node labels in the metadata guide.

`pipeline` — pipeline provenance

Key	Description
`pipeline_name`	The pipeline name
`pipeline_version`	The pipeline version
`git_commit_sha`	Reserved — populated when the pipeline is built from a git source
`git_repo_url`	Reserved — populated when the pipeline is built from a git source
`git_branch`	Reserved — populated when the pipeline is built from a git source

The three git_* fields are slot fields. They are emitted as empty strings until pipelines are built from a git source.

`runtime` — per-execution counters

Key	Description
`start_time`	When this pipeline execution started, RFC3339Nano in UTC
`records_in`	Messages observed by this processor since the execution started
`records_out`	Messages successfully written by this processor
`bytes_in`	Bytes observed by this processor
`bytes_out`	Bytes successfully written by this processor
`error_count`	Body-write failures recorded by this processor
`duration_ms`	Milliseconds since `start_time`

The runtime category is not included by default — list it in include to opt in. Counter values reflect the processor's own observations of the message stream up to and including the current message.

Targets and Formats

The target and format fields together control where resolved metadata is written:

`target`	`format`	`body_key`	Result
`meta`	(ignored)	must be empty	Each resolved key becomes a Bento message metadata entry, accessible via `@key_name` and `${! metadata("key_name") }`.
`body`	`flat`	must be empty	Resolved keys are merged at the JSON body's root. Metadata keys overwrite body keys on collision.
`body`	`nested`	required	Resolved keys are placed under `body[body_key]`, leaving the rest of the body untouched.

When target: body, the body must be a JSON object. Non-JSON bodies, JSON arrays, and JSON scalars are passed through unchanged; the processor logs a warning and increments error_count (visible via the runtime category).

Custom Fields

The custom: map accepts string→string pairs that are attached alongside the resolved category fields. Custom keys must not collide with reserved field names.

metadata:
  include: [core]
  custom:
    pipeline_owner: [email protected]
    compliance_tier: pii
    cost_center: data-platform

To drop a built-in field while keeping the rest of its category, list it in exclude:

metadata:
  include: [pipeline]
  exclude: [git_commit_sha, git_repo_url, git_branch]

exclude accepts either field names or category names. Only include is checked against the closed set of categories at submission time, so unknown entries in exclude pass silently.

Reserved Field Names

The following names cannot be used as custom keys:

run_id, job_name, job_namespace, event_time, producer,
job_id, deployment_id, namespace, eval_id, rollout_wave,
node_id, hostname, region, environment, cluster_name, agent_version,
pipeline_name, pipeline_version, git_commit_sha, git_repo_url, git_branch,
start_time, records_in, records_out, bytes_in, bytes_out, error_count, duration_ms,
pipeline_id, execution_id, job_version

Submitting a pipeline with a colliding custom: key is rejected with a clear error message — see Validation Errors.

The internal _execution_id field is auto-populated by the runtime at pipeline build time. User-supplied values are rejected at submission.

Validation Errors

Pipeline submission validates the metadata processor's configuration. The errors you may see and how to fix each:

Error	Fix
`unknown category "X"; valid categories: [core orchestrator node pipeline runtime]`	Replace `X` in `include` with one of the listed categories.
`body_key cannot be set when target is "meta"`	Remove `body_key`, or change `target` to `body`.
`body_key is only meaningful with format: "nested"`	Remove `body_key` for `format: flat`, or change `format` to `nested`.
`body_key is required when target: "body" and format: "nested"`	Set `body_key` to the JSON object key the metadata should be placed under.
`unknown format "X"; must be "flat" or "nested"`	Use `flat` or `nested`.
`unknown target "X"; must be "meta" or "body"`	Use `meta` or `body`.
`custom key "X" collides with a reserved field name`	Rename the key, or use `exclude` to drop the reserved field if you want to override its value (note: `exclude` removes — it does not override).
`_execution_id is internal and auto-populated by the Expanso runtime; remove it from your config`	Remove the `_execution_id` entry from the processor's config.

Examples

Default to meta
Body, nested
Body, flat
Custom + selective

Attach the default categories to message metadata, then reference them in a downstream output:

pipeline:
  processors:
    - metadata: {}

output:
  kafka:
    addresses: [localhost:9092]
    topic: events.${! metadata("region") }
    metadata:
      exclude_prefixes: []

After the processor runs, every message carries metadata keys like @run_id, @job_name, @node_id, @region, etc. You can read them via metadata("...") in interpolation or @... in Bloblang.

Splice metadata under a lineage key at the top level of the JSON body:

pipeline:
  processors:
    - metadata:
        include: [core, pipeline, node]
        target: body
        format: nested
        body_key: lineage

Given an incoming body of {"event_id": "abc123", "level": "info"}, the processor emits:

{
  "event_id": "abc123",
  "level": "info",
  "lineage": {
    "run_id": "01HF...",
    "job_name": "events-pipeline",
    "job_namespace": "default",
    "event_time": "2026-04-30T12:34:56.789Z",
    "producer": "https://expanso.io/edge/v1.4.2",
    "pipeline_name": "events-pipeline",
    "pipeline_version": "v3",
    "git_commit_sha": "",
    "git_repo_url": "",
    "git_branch": "",
    "node_id": "node-eu-west-1-a",
    "hostname": "edge-01",
    "region": "eu-west-1",
    "environment": "production",
    "cluster_name": "edge-eu",
    "agent_version": "1.4.2"
  }
}

Merge metadata at the body's root. Use this when downstream consumers expect a flat JSON object:

pipeline:
  processors:
    - metadata:
        include: [pipeline, node]
        exclude: [git_commit_sha, git_repo_url, git_branch, hostname, agent_version]
        target: body
        format: flat

If the incoming body has a key that collides with a metadata field, the metadata value wins. For example, an incoming body of {"region": "us-east-1", "user_id": 42} becomes {"region": "eu-west-1", "user_id": 42, "node_id": "...", "pipeline_name": "...", ...} — the body's region is overwritten with the node's region. Use exclude to drop fields that would collide if you need to preserve the body's values.

Combine selective categories, exclusions, and a custom block to attach a focused set of fields:

pipeline:
  processors:
    - metadata:
        include: [pipeline]
        exclude: [git_commit_sha, git_repo_url, git_branch]
        custom:
          pipeline_owner: [email protected]
          compliance_tier: pii
        target: body
        format: nested
        body_key: tags

Result: messages carry body.tags = { pipeline_name, pipeline_version, pipeline_owner, compliance_tier } and nothing else.

Pipeline metadata guide — the implicit @pipeline_id, @node_id, @namespace, and @node_label_* keys available without this processor.
Bloblang guide — for conditional or computed metadata via mapping.
mapping processor — the imperative alternative for editing metadata.

When to Use​

Configuration​

Fields​

Field Categories​

core — OpenLineage-aligned identity​

orchestrator — orchestrator-populated context​

node — infrastructure context​

pipeline — pipeline provenance​

runtime — per-execution counters​

Targets and Formats​

Custom Fields​

Reserved Field Names​

Validation Errors​

Examples​

Related​