Skip to main content

Secret Providers

A pipeline JobSpec can declare two top-level blocks — secret_providers and secrets — that tell the edge agent where to fetch credentials and which values the pipeline needs. The agent authenticates to your existing secret store, resolves every reference at deployment, caches the values, refreshes them on a schedule, and supplies them to the pipeline runtime.

This is the recommended approach for production pipelines. You keep credentials in the secret store you already operate, the edge agent does the fetching, and rotation in your store flows through to the running pipeline without a redeploy.

When to use this

Use secret_providers when the edge agent has direct network access to your secret store (Vault, AWS Secrets Manager, GCP Secret Manager, Azure Key Vault, etc.). For nodes that can't reach a secret store directly, see Local Secrets (environment variable interpolation) or External Secret Managers (sidecar/init-container injection patterns).


The two blocks

A pipeline JobSpec gains two new top-level fields, peer to config, selector, and rollout:

  • secret_providers declares the backends — where to talk and how to authenticate. A provider's connection and auth are shared by every secret entry that references it.
  • secrets declares the values the pipeline needs. Each entry names a variable, picks a provider via from:, and specifies the per-reference lookup (path, field, version, refresh override).

The split means a pipeline with many secrets from one Vault declares the connection once and references it from each entry.

spec:
type: pipeline
name: kafka-to-s3

secret_providers:
corp_vault:
provider: hashicorp_vault
address: https://vault.internal
auth:
method: approle
role_id_file: /etc/expanso/vault-role-id
secret_id_file: /etc/expanso/vault-secret-id

aws_prod:
provider: aws_sts
role_arn: arn:aws:iam::123456789:role/pipeline-writer
region: us-east-1

secrets:
KAFKA_PASSWORD:
from: corp_vault
mount: kv
secret_path: kafka/prod
field: password
refresh: 30m

AWS:
from: aws_prod

config:
input:
kafka:
addresses: [kafka.internal:9092]
topics: [events]
sasl:
mechanism: PLAIN
user: kafka-reader
password: ${KAFKA_PASSWORD}

output:
aws_s3:
bucket: pipeline-out
credentials:
id: ${AWS_ACCESS_KEY_ID}
secret: ${AWS_SECRET_ACCESS_KEY}
token: ${AWS_SESSION_TOKEN}

The KAFKA_PASSWORD entry resolves to a single value. The AWS entry uses a credential provider (aws_sts) that emits three values: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_SESSION_TOKEN, all available for interpolation under that prefix.


Quick start: AWS Secrets Manager

A minimal pipeline that reads an API token from AWS Secrets Manager and uses it on an HTTP output:

spec:
type: pipeline
name: events-to-api

secret_providers:
aws:
provider: aws_secrets_manager
region: us-east-1
# auth omitted: uses the AWS SDK default credential chain
# (instance profile, IRSA, env vars, shared credentials).

secrets:
API_TOKEN:
from: aws
secret_id: prod/events-api/token

config:
input:
kafka:
addresses: [kafka.internal:9092]
topics: [events]
consumer_group: events-forwarder

pipeline:
processors:
- mapping: |
root = this

output:
http_client:
url: https://api.example.com/ingest
verb: POST
headers:
Authorization: Bearer ${API_TOKEN}

When this JobSpec is deployed, each edge node resolves prod/events-api/token against AWS Secrets Manager using the SDK's default credential chain, caches the value, and substitutes it into the Authorization header before the pipeline starts. From then on, the agent re-fetches the secret every 30 minutes (the default) and updates the value when it changes.

For complete per-provider field tables and authentication options, see Provider Reference.


How references work in pipeline configs

Pipeline configurations support three reference syntaxes for secrets. All three resolve through the same provider chain, but they differ in when the value is read and how rotation flows through to the running component.

${VAR} — static interpolation

Resolved once at config parse time. Works in every field, including connection-level credentials such as Kafka SASL passwords, database DSNs, and AWS credential blocks. This is the universal path.

input:
kafka:
sasl:
mechanism: PLAIN
user: kafka-reader
password: ${KAFKA_PASSWORD}

When KAFKA_PASSWORD is declared in the secrets block, this reference resolves from the cached value. When it isn't, the reference falls through to the process environment.

A declared secret takes precedence over an accidentally-set process env var of the same name. Operator-set process environment remains visible when no matching secret is declared.

For the full interpolation syntax — including ${VAR:default} defaults and escaping — see the interpolation guide.

${! env("VAR") !} — Bloblang function

Re-evaluated each time a component evaluates the expression. Useful in fields that the component evaluates per message, such as HTTP headers and dynamic URLs.

output:
http_client:
url: https://api.example.com/ingest
headers:
Authorization: Bearer ${! env("API_TOKEN") !}

${! secret("VAR") !} — declared-secret function

Semantic alias for env() against a declared secret. Behaves identically at runtime but enables parse-time validation that VAR is actually declared in the secrets block.

output:
http_client:
url: https://api.example.com/ingest
headers:
Authorization: Bearer ${! secret("API_TOKEN") !}

Per-message vs static evaluation

Whether a reference is re-evaluated per message depends on the component, not the syntax. ${! secret("VAR") !} in a Kafka SASL password field is still captured at connection time — Kafka SASL credentials are bound when the session opens, regardless of how the reference is written. See the rotation behavior table below for which component classes pick up new values in place.


Authentication

Each provider entry takes an optional auth block describing how the edge agent authenticates to that backend. All secret entries that share a provider share its authentication.

Default credential chain (cloud providers)

For cloud-SDK-backed providers (aws_secrets_manager, aws_parameter_store, aws_sts, gcp_secret_manager, azure_key_vault), omitting auth delegates to the SDK's default credential chain. This covers the common case: nodes on cloud VMs with attached IAM roles, GKE/EKS workload identity, or operator-provisioned process environments.

secret_providers:
aws:
provider: aws_secrets_manager
region: us-east-1
# auth omitted — SDK picks credentials from instance profile,
# IRSA, environment, or shared credentials file.

hashicorp_vault has no meaningful default chain and always requires explicit auth.

Explicit methods

The available methods depend on the provider. The most common are:

  • approle (Vault) — role_id plus secret_id, supplied as files, env vars, or inline.
  • kubernetes (Vault, AWS) — projected service account token. Used for IRSA on AWS.
  • jwt (Vault) — generic OIDC/JWT, used for GitHub Actions OIDC, SPIFFE, or any customer-run OIDC issuer.
  • static (all providers) — long-lived credentials. Discouraged for production.
  • file (all providers) — a customer-side process writes a usable credential to a known path; the provider re-reads it on each refresh. Integrates with delivery tools like Vault Agent, Teleport Machine ID, and SPIRE without a dedicated provider per tool.

For the full per-provider auth-method matrix and the exact fields each method accepts, see Provider Reference.


Refresh and rotation

The edge agent refreshes secrets on a schedule. Whether a refreshed value flows into the running pipeline in place or restarts the execution depends on how the consuming component uses the value.

Refresh cadence

The effective refresh interval is the first match of:

  1. The per-secret refresh override on the entry.
  2. The system default of 30 minutes.

When the backend returns a native TTL shorter than the selected interval (AWS STS Credentials.Expiration, Vault lease_duration, Azure Key Vault attributes.exp), the resolver refreshes at 70% of that TTL instead. The shorter of the two wins.

Rotation behavior by component class

When a refresh returns a new value, whether the execution restarts or picks up the change in place depends on the component:

Component classRotation behavior
HTTP inputs/outputs (headers, URLs) referenced via ${! secret() !} or ${! env() !}In-place
Kafka with OAUTHBEARERIn-place
AWS components (aws_s3, aws_sqs, aws_kinesis, aws_dynamodb, etc.)Restart
Kafka with PLAIN / SCRAM SASLRestart
SQL components (sql_insert, sql_raw)Restart
All other static-field usages via ${VAR}Restart

Default rule: any refresh that returns a new value restarts the affected execution unless the component is explicitly in the in-place class. Restarts are graceful — in-flight messages are flushed and replay relies on at-least-once delivery semantics.

The choice of reference syntax does not change this. ${! secret() !} in a Kafka SASL password field is still frozen at connection time because Kafka SASL is the constraint, not the syntax.

Rotation cadence vs restart cost

For credentials that rotate hourly (such as STS sessions or short-lived Vault dynamic secrets), the restart cadence on classes outside the in-place set may be too high. Either lengthen rotation upstream or set a longer per-secret refresh override to control the restart frequency. Ongoing work is expanding the in-place class — AWS components and Kafka SASL are both candidates.


Errors

The edge agent surfaces three failure modes through the execution status:

Initial fetch failure at deployment

If any declared secret cannot be fetched when the job is deployed to a node, the execution fails to start. The error reported on the execution names the provider, the secret entry, and the root cause — for example, "Vault login rejected AppRole (secret_id expired)". The job does not enter Running until every declared secret resolves.

For fleet-wide deployments, partial failures are visible per node: 3 of 12 nodes failed: node-abc: KAFKA_PASSWORD, Vault login rejected AppRole (secret_id expired).

Transient refresh failure

When a refresh fails while a cached value is still available, the execution continues on the cached value with no immediate impact. The resolver retries with exponential backoff.

Sustained refresh failure

After 5 consecutive failures within 10 minutes, the execution transitions to Degraded, which propagates to job state via the standard rollout logic and fires alerts. In-flight work continues on the cached value. When refresh recovers, the execution returns to Running.


Offline behavior

Edge nodes face intermittent connectivity. By default, resolved secret values are cached on disk under the agent's data directory so jobs can resume after a node restart without a network round-trip:

  • Cache directory mode 0700, owned by the agent user.
  • File mode 0600 per cached value.
  • TTL metadata stored alongside each value; expired entries are discarded at startup.

This matches standard practice for credential-cache layouts. The protection profile matches the node's other operational state (bootstrap tokens, configuration).

Disabling the disk cache

Operators with strict no-secrets-on-disk requirements can disable the on-disk cache in the edge configuration:

edge.yaml
secrets:
disable_disk_cache: true # default: false

The trade-off is that the node cannot resume jobs while offline — after a restart, jobs wait for connectivity before any cached value is available.

For a tmpfs-backed cache, mount {data_dir}/secrets_cache on tmpfs at the operating system level. There is no separate config option for that; it's an operator mount decision.

Connectivity scenarios

  • Offline from both Expanso and the secret backend. Jobs continue on cached values until TTLs expire. The offline budget is the shortest TTL among the job's declared secrets.
  • Offline from Expanso, online to the secret backend. Unaffected.
  • Offline from the secret backend, online to Expanso. Jobs continue on cached values; the execution enters Degraded once the cache expires and the backend remains unreachable.
  • Node restart while offline. With the disk cache enabled (default), cached values survive. Expired entries are discarded at startup; if all values are expired, the job waits for connectivity.

Limits and validation

The control plane validates the JobSpec on submission. Invalid configurations are rejected with structured errors before the job is queued.

LimitValue
Max providers per job16
Max secrets per job64
Max provider key length64 characters
Max secret key length64 characters
Default refresh interval30 minutes
Degraded threshold5 failures in 10 minutes

Validation rules:

  1. provider must be a known type.
  2. Provider-specific required fields must be set (see Provider Reference).
  3. auth.method must be compatible with the provider type.
  4. Static auth must include the credential fields the provider requires.
  5. Every from: in secrets must reference a declared key in secret_providers.
  6. Per-provider lookup fields must be present on each secret entry.
  7. version_stage and version_id are mutually exclusive on AWS Secrets Manager entries.
  8. refresh must be a positive duration string when set (e.g., 15m, 1h).

Provider summary

ProviderTypeRequired provider fieldsAuth requiredValue type
HashiCorp Vaulthashicorp_vaultaddressYesSingle
AWS Secrets Manageraws_secrets_managerregionOptionalSingle
AWS Parameter Storeaws_parameter_storeregionOptionalSingle
AWS STSaws_stsrole_arn, regionOptionalMulti (3 values)
GCP Secret Managergcp_secret_managerprojectOptionalSingle
Azure Key Vaultazure_key_vaultvault_urlOptionalSingle
Filefile(none)n/aSingle

Single-value providers produce one value at the entry name: KAFKA_PASSWORD from a hashicorp_vault entry makes ${KAFKA_PASSWORD} available.

Multi-value providers produce several values with provider-documented suffixes: AWS from an aws_sts entry makes ${AWS_ACCESS_KEY_ID}, ${AWS_SECRET_ACCESS_KEY}, and ${AWS_SESSION_TOKEN} available.

For each provider's full field list, auth methods, and a copy-paste example, see Provider Reference.


Next steps

  • Provider Reference — per-provider field tables and complete YAML examples for every supported backend.
  • Local Secrets — environment-variable interpolation for setups where the agent does not fetch from a secret store directly.
  • External Secret Managers — sidecar, init-container, and CSI-driver injection patterns.
  • Interpolation Guide — full syntax reference for ${VAR} and Bloblang interpolations.