Skip to main content

Lineage Configuration

The lineage block on an edge configuration enables native OpenLineage event emission. When enabled, the edge agent emits a signed OpenLineage RunEvent at every pipeline lifecycle transition (START, COMPLETE, FAIL, ABORT) to a configurable backend.

When lineage.enabled is false (or the block is omitted), the edge starts no goroutines, opens no connections, and incurs zero overhead.

For a walkthrough of enabling lineage end-to-end, see the lineage how-to guide. For the event format and signature contract, see Verify Lineage Events.

Minimal configuration

The smallest config that emits events to a Marquez instance on the same host:

lineage:
enabled: true
transport: http
http:
endpoint: http://localhost:5000/api/v1/lineage

Complete configuration

lineage:
enabled: true
transport: http # "http" or "file"
queue_size: 1024 # bounded buffer between emit and worker
drain_timeout: 5s # max wait at edge shutdown

http:
endpoint: https://lineage.example.com/api/v1/lineage
timeout: 3s
auth:
type: bearer
token_env: OPENLINEAGE_TOKEN

file:
path: /var/lib/expanso/lineage/events.jsonl
rotation_size_mb: 64

Field reference

Top level

FieldTypeDefaultDescription
enabledboolfalseEnable lineage emission. When false, no events are emitted and no transport is constructed.
transportstringRequired when enabled. One of http or file.
queue_sizeint1024Bounded buffer between Emit() and the delivery worker. Events drop on overflow; lineage_events_dropped_total increments. Must be ≥ 0.
drain_timeoutduration5sMaximum time the edge waits for the worker to flush in-flight events at shutdown. Go duration string.
httpobjectHTTP transport configuration. Used when transport: http.
fileobjectFile transport configuration. Used when transport: file.

http

FieldTypeDefaultDescription
endpointstringRequired. The OpenLineage-compatible URL to POST events to, typically <marquez>/api/v1/lineage.
timeoutduration3sPer-request HTTP timeout covering DNS, TCP, TLS, request body, and full response read.
authobjectOptional. Authentication configuration. When omitted, no Authorization header is sent.

http.auth

FieldTypeDefaultDescription
typestringRequired when auth is set. Only bearer is supported.
token_envstringRequired when type is set. Name of the environment variable holding the bearer token. Read once at edge startup.

file

FieldTypeDefaultDescription
pathstringRequired. Absolute path to the active events file. Each line is one JSON event.
rotation_size_mbint64Rotate the file once it exceeds this size in MB. 0 disables rotation. Must be ≥ 0.

Validation rules

The edge validates lineage config at startup. If enabled is false, all other fields are accepted as-is (validation is skipped entirely).

When enabled is true:

  • transport must be exactly http or file.
  • queue_size must be ≥ 0.
  • If transport: http:
    • http.endpoint must be non-empty.
    • If http.auth.type is set, it must be bearer.
    • If http.auth.type is set, http.endpoint must start with https://. The edge rejects HTTP endpoints when auth is configured, to prevent sending a bearer token in cleartext.
    • If http.auth.type is set, http.auth.token_env must be non-empty.
  • If transport: file:
    • file.path must be non-empty and absolute.
    • file.rotation_size_mb must be ≥ 0.

Invalid configuration causes startup failure with a message identifying the offending field.

Authentication

The HTTP transport reads the bearer token once at edge startup, from the environment variable named in http.auth.token_env. Rotating the token requires restarting the edge.

For tokens stored in Vault, AWS Secrets Manager, or GCP Secret Manager, resolve the secret out-of-band before edge startup and export it under the configured env var name. With systemd, the standard pattern is an EnvironmentFile= directive populated by a fetch script run before the edge service unit starts.

Defaults summary

The edge supplies defaults for fields that have a value in the table above. The most important ones:

  • queue_size: 1024 — sized for typical lifecycle event throughput; raise it if pipelines transition faster than the worker can drain.
  • drain_timeout: 5s — covers most graceful shutdowns; raise it for slow backends.
  • http.timeout: 3s — short, because the pipeline does not wait on lineage emission; failed events drop and increment the counter rather than retrying.
  • file.rotation_size_mb: 64 — keeps single files small enough to ship through standard log-rotation tooling. Set to 0 to disable rotation entirely.