Secret Providers
A pipeline JobSpec can declare two top-level blocks — secret_providers and secrets — that tell the edge agent where to fetch credentials and which values the pipeline needs. The agent authenticates to your existing secret store, resolves every reference at deployment, caches the values, refreshes them on a schedule, and supplies them to the pipeline runtime.
This is the recommended approach for production pipelines. You keep credentials in the secret store you already operate, the edge agent does the fetching, and rotation in your store flows through to the running pipeline without a redeploy.
Use secret_providers when the edge agent has direct network access to your secret store (Vault, AWS Secrets Manager, GCP Secret Manager, Azure Key Vault, etc.). For nodes that can't reach a secret store directly, see Local Secrets (environment variable interpolation) or External Secret Managers (sidecar/init-container injection patterns).
The two blocks
A pipeline JobSpec gains two new top-level fields, peer to config, selector, and rollout:
secret_providersdeclares the backends — where to talk and how to authenticate. A provider's connection and auth are shared by every secret entry that references it.secretsdeclares the values the pipeline needs. Each entry names a variable, picks a provider viafrom:, and specifies the per-reference lookup (path, field, version, refresh override).
The split means a pipeline with many secrets from one Vault declares the connection once and references it from each entry.
spec:
type: pipeline
name: kafka-to-s3
secret_providers:
corp_vault:
provider: hashicorp_vault
address: https://vault.internal
auth:
method: approle
role_id_file: /etc/expanso/vault-role-id
secret_id_file: /etc/expanso/vault-secret-id
aws_prod:
provider: aws_sts
role_arn: arn:aws:iam::123456789:role/pipeline-writer
region: us-east-1
secrets:
KAFKA_PASSWORD:
from: corp_vault
mount: kv
secret_path: kafka/prod
field: password
refresh: 30m
AWS:
from: aws_prod
config:
input:
kafka:
addresses: [kafka.internal:9092]
topics: [events]
sasl:
mechanism: PLAIN
user: kafka-reader
password: ${KAFKA_PASSWORD}
output:
aws_s3:
bucket: pipeline-out
credentials:
id: ${AWS_ACCESS_KEY_ID}
secret: ${AWS_SECRET_ACCESS_KEY}
token: ${AWS_SESSION_TOKEN}
The KAFKA_PASSWORD entry resolves to a single value. The AWS entry uses a credential provider (aws_sts) that emits three values: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_SESSION_TOKEN, all available for interpolation under that prefix.
Quick start: AWS Secrets Manager
A minimal pipeline that reads an API token from AWS Secrets Manager and uses it on an HTTP output:
spec:
type: pipeline
name: events-to-api
secret_providers:
aws:
provider: aws_secrets_manager
region: us-east-1
# auth omitted: uses the AWS SDK default credential chain
# (instance profile, IRSA, env vars, shared credentials).
secrets:
API_TOKEN:
from: aws
secret_id: prod/events-api/token
config:
input:
kafka:
addresses: [kafka.internal:9092]
topics: [events]
consumer_group: events-forwarder
pipeline:
processors:
- mapping: |
root = this
output:
http_client:
url: https://api.example.com/ingest
verb: POST
headers:
Authorization: Bearer ${API_TOKEN}
When this JobSpec is deployed, each edge node resolves prod/events-api/token against AWS Secrets Manager using the SDK's default credential chain, caches the value, and substitutes it into the Authorization header before the pipeline starts. From then on, the agent re-fetches the secret every 30 minutes (the default) and updates the value when it changes.
For complete per-provider field tables and authentication options, see Provider Reference.
How references work in pipeline configs
Pipeline configurations support three reference syntaxes for secrets. All three resolve through the same provider chain, but they differ in when the value is read and how rotation flows through to the running component.
${VAR} — static interpolation
Resolved once at config parse time. Works in every field, including connection-level credentials such as Kafka SASL passwords, database DSNs, and AWS credential blocks. This is the universal path.
input:
kafka:
sasl:
mechanism: PLAIN
user: kafka-reader
password: ${KAFKA_PASSWORD}
When KAFKA_PASSWORD is declared in the secrets block, this reference resolves from the cached value. When it isn't, the reference falls through to the process environment.
A declared secret takes precedence over an accidentally-set process env var of the same name. Operator-set process environment remains visible when no matching secret is declared.
For the full interpolation syntax — including ${VAR:default} defaults and escaping — see the interpolation guide.
${! env("VAR") !} — Bloblang function
Re-evaluated each time a component evaluates the expression. Useful in fields that the component evaluates per message, such as HTTP headers and dynamic URLs.
output:
http_client:
url: https://api.example.com/ingest
headers:
Authorization: Bearer ${! env("API_TOKEN") !}
${! secret("VAR") !} — declared-secret function
Semantic alias for env() against a declared secret. Behaves identically at runtime but enables parse-time validation that VAR is actually declared in the secrets block.
output:
http_client:
url: https://api.example.com/ingest
headers:
Authorization: Bearer ${! secret("API_TOKEN") !}
Per-message vs static evaluation
Whether a reference is re-evaluated per message depends on the component, not the syntax. ${! secret("VAR") !} in a Kafka SASL password field is still captured at connection time — Kafka SASL credentials are bound when the session opens, regardless of how the reference is written. See the rotation behavior table below for which component classes pick up new values in place.
Authentication
Each provider entry takes an optional auth block describing how the edge agent authenticates to that backend. All secret entries that share a provider share its authentication.
Default credential chain (cloud providers)
For cloud-SDK-backed providers (aws_secrets_manager, aws_parameter_store, aws_sts, gcp_secret_manager, azure_key_vault), omitting auth delegates to the SDK's default credential chain. This covers the common case: nodes on cloud VMs with attached IAM roles, GKE/EKS workload identity, or operator-provisioned process environments.
secret_providers:
aws:
provider: aws_secrets_manager
region: us-east-1
# auth omitted — SDK picks credentials from instance profile,
# IRSA, environment, or shared credentials file.
hashicorp_vault has no meaningful default chain and always requires explicit auth.
Explicit methods
The available methods depend on the provider. The most common are:
approle(Vault) —role_idplussecret_id, supplied as files, env vars, or inline.kubernetes(Vault, AWS) — projected service account token. Used for IRSA on AWS.jwt(Vault) — generic OIDC/JWT, used for GitHub Actions OIDC, SPIFFE, or any customer-run OIDC issuer.static(all providers) — long-lived credentials. Discouraged for production.file(all providers) — a customer-side process writes a usable credential to a known path; the provider re-reads it on each refresh. Integrates with delivery tools like Vault Agent, Teleport Machine ID, and SPIRE without a dedicated provider per tool.
For the full per-provider auth-method matrix and the exact fields each method accepts, see Provider Reference.
Refresh and rotation
The edge agent refreshes secrets on a schedule. Whether a refreshed value flows into the running pipeline in place or restarts the execution depends on how the consuming component uses the value.
Refresh cadence
The effective refresh interval is the first match of:
- The per-secret
refreshoverride on the entry. - The system default of 30 minutes.
When the backend returns a native TTL shorter than the selected interval (AWS STS Credentials.Expiration, Vault lease_duration, Azure Key Vault attributes.exp), the resolver refreshes at 70% of that TTL instead. The shorter of the two wins.
Rotation behavior by component class
When a refresh returns a new value, whether the execution restarts or picks up the change in place depends on the component:
| Component class | Rotation behavior |
|---|---|
HTTP inputs/outputs (headers, URLs) referenced via ${! secret() !} or ${! env() !} | In-place |
| Kafka with OAUTHBEARER | In-place |
AWS components (aws_s3, aws_sqs, aws_kinesis, aws_dynamodb, etc.) | Restart |
| Kafka with PLAIN / SCRAM SASL | Restart |
SQL components (sql_insert, sql_raw) | Restart |
All other static-field usages via ${VAR} | Restart |
Default rule: any refresh that returns a new value restarts the affected execution unless the component is explicitly in the in-place class. Restarts are graceful — in-flight messages are flushed and replay relies on at-least-once delivery semantics.
The choice of reference syntax does not change this. ${! secret() !} in a Kafka SASL password field is still frozen at connection time because Kafka SASL is the constraint, not the syntax.
For credentials that rotate hourly (such as STS sessions or short-lived Vault dynamic secrets), the restart cadence on classes outside the in-place set may be too high. Either lengthen rotation upstream or set a longer per-secret refresh override to control the restart frequency. Ongoing work is expanding the in-place class — AWS components and Kafka SASL are both candidates.
Errors
The edge agent surfaces three failure modes through the execution status:
Initial fetch failure at deployment
If any declared secret cannot be fetched when the job is deployed to a node, the execution fails to start. The error reported on the execution names the provider, the secret entry, and the root cause — for example, "Vault login rejected AppRole (secret_id expired)". The job does not enter Running until every declared secret resolves.
For fleet-wide deployments, partial failures are visible per node: 3 of 12 nodes failed: node-abc: KAFKA_PASSWORD, Vault login rejected AppRole (secret_id expired).
Transient refresh failure
When a refresh fails while a cached value is still available, the execution continues on the cached value with no immediate impact. The resolver retries with exponential backoff.
Sustained refresh failure
After 5 consecutive failures within 10 minutes, the execution transitions to Degraded, which propagates to job state via the standard rollout logic and fires alerts. In-flight work continues on the cached value. When refresh recovers, the execution returns to Running.
Offline behavior
Edge nodes face intermittent connectivity. By default, resolved secret values are cached on disk under the agent's data directory so jobs can resume after a node restart without a network round-trip:
- Cache directory mode
0700, owned by the agent user. - File mode
0600per cached value. - TTL metadata stored alongside each value; expired entries are discarded at startup.
This matches standard practice for credential-cache layouts. The protection profile matches the node's other operational state (bootstrap tokens, configuration).
Disabling the disk cache
Operators with strict no-secrets-on-disk requirements can disable the on-disk cache in the edge configuration:
secrets:
disable_disk_cache: true # default: false
The trade-off is that the node cannot resume jobs while offline — after a restart, jobs wait for connectivity before any cached value is available.
For a tmpfs-backed cache, mount {data_dir}/secrets_cache on tmpfs at the operating system level. There is no separate config option for that; it's an operator mount decision.
Connectivity scenarios
- Offline from both Expanso and the secret backend. Jobs continue on cached values until TTLs expire. The offline budget is the shortest TTL among the job's declared secrets.
- Offline from Expanso, online to the secret backend. Unaffected.
- Offline from the secret backend, online to Expanso. Jobs continue on cached values; the execution enters
Degradedonce the cache expires and the backend remains unreachable. - Node restart while offline. With the disk cache enabled (default), cached values survive. Expired entries are discarded at startup; if all values are expired, the job waits for connectivity.
Limits and validation
The control plane validates the JobSpec on submission. Invalid configurations are rejected with structured errors before the job is queued.
| Limit | Value |
|---|---|
| Max providers per job | 16 |
| Max secrets per job | 64 |
| Max provider key length | 64 characters |
| Max secret key length | 64 characters |
| Default refresh interval | 30 minutes |
| Degraded threshold | 5 failures in 10 minutes |
Validation rules:
providermust be a known type.- Provider-specific required fields must be set (see Provider Reference).
auth.methodmust be compatible with the provider type.- Static auth must include the credential fields the provider requires.
- Every
from:insecretsmust reference a declared key insecret_providers. - Per-provider lookup fields must be present on each secret entry.
version_stageandversion_idare mutually exclusive on AWS Secrets Manager entries.refreshmust be a positive duration string when set (e.g.,15m,1h).
Provider summary
| Provider | Type | Required provider fields | Auth required | Value type |
|---|---|---|---|---|
| HashiCorp Vault | hashicorp_vault | address | Yes | Single |
| AWS Secrets Manager | aws_secrets_manager | region | Optional | Single |
| AWS Parameter Store | aws_parameter_store | region | Optional | Single |
| AWS STS | aws_sts | role_arn, region | Optional | Multi (3 values) |
| GCP Secret Manager | gcp_secret_manager | project | Optional | Single |
| Azure Key Vault | azure_key_vault | vault_url | Optional | Single |
| File | file | (none) | n/a | Single |
Single-value providers produce one value at the entry name: KAFKA_PASSWORD from a hashicorp_vault entry makes ${KAFKA_PASSWORD} available.
Multi-value providers produce several values with provider-documented suffixes: AWS from an aws_sts entry makes ${AWS_ACCESS_KEY_ID}, ${AWS_SECRET_ACCESS_KEY}, and ${AWS_SESSION_TOKEN} available.
For each provider's full field list, auth methods, and a copy-paste example, see Provider Reference.
Next steps
- Provider Reference — per-provider field tables and complete YAML examples for every supported backend.
- Local Secrets — environment-variable interpolation for setups where the agent does not fetch from a secret store directly.
- External Secret Managers — sidecar, init-container, and CSI-driver injection patterns.
- Interpolation Guide — full syntax reference for
${VAR}and Bloblang interpolations.