What is Expanso and how does it work?

Expanso is a managed platform for deploying intelligent data pipelines at the edge. It processes data where it's generated - reducing bandwidth, latency, and costs. You deploy lightweight agents on your infrastructure, build pipelines using our visual builder or YAML, and control everything from a central SaaS platform.

Can I run AI/ML models directly in my data pipelines?

Yes! Expanso supports running ONNX, TensorFlow Lite, and other models as native pipeline steps. Execute low-latency inference on streaming data, enrich events with model outputs (like risk scores), and make decisions at the edge without cloud round-trips.

How many pre-built components are available?

Expanso provides 200+ pre-built components including inputs (Kafka, HTTP, files), processors (transformations, filtering, PII masking, aggregations), and outputs (S3, Snowflake, Datadog, Splunk). Browse the complete catalog in our Component Reference.

Do I need to write code to build pipelines?

No - use our drag-and-drop visual pipeline builder to create sophisticated pipelines without code. For advanced use cases, you can also write pipelines in YAML or use the Bloblang transformation language for complex data mappings.

How does Expanso help with data governance and compliance?

Expanso includes built-in governance features: automatic PII detection and masking, policy enforcement at the edge, RBAC, SSO integration, and comprehensive audit trails. Mask sensitive data before it ever leaves your network.

types.OrchestratorConfig

api object

auth object

Auth configures authentication for the API

tokenstring

listen_addrstring

Listen address - defaults to localhost:9010 Empty string disables the API server

data_dirstring

Core data directory - all subdirectories are managed automatically

evaluation_broker object

initial_retry_delayinteger<int64>

InitialRetryDelay is the delay before re-enqueuing a Nacked evaluation for the first time. Defaults to 5 seconds if not set. Set a lower value (e.g., 100ms) for tests to avoid blocking subsequent evaluations for the same job.

Possible values: [-9223372036854776000, 9223372036854776000, 1, 1000, 1000000, 1000000000, 60000000000, 3600000000000]

max_retry_countinteger

MaxRetryCount specifies the maximum number of times an evaluation can be retried before being marked as failed.

visibility_timeoutinteger<int64>

VisibilityTimeout specifies how long an evaluation can be claimed before it's returned to the queue.

Possible values: [-9223372036854776000, 9223372036854776000, 1, 1000, 1000000, 1000000000, 60000000000, 3600000000000]

log object

formatstring

Log format: json, text

levelstring

Log level: trace, debug, info, warn, error - defaults to info

namestring

Node name - defaults to hostname

name_providerstring

Name provider for auto-generation: "cloud", "hostname", "uuid", "machine-id"

node_manager object

connected_afterinteger<int64>

ConnectedAfter is how long a node must be stable in Connecting state before being promoted to Connected. This provides flapping protection - a node that keeps crashing and restarting will reset this timer on each handshake. Default: 30s. Must be less than disconnect_timeout.

Possible values: [-9223372036854776000, 9223372036854776000, 1, 1000, 1000000, 1000000000, 60000000000, 3600000000000]

disconnect_timeoutinteger<int64>

DisconnectTimeout is how long to wait without heartbeats before marking a node as disconnected. This value is sent to edge nodes during handshake so both sides use the same threshold. Default: 90s. Increase for unreliable networks.

Possible values: [-9223372036854776000, 9223372036854776000, 1, 1000, 1000000, 1000000000, 60000000000, 3600000000000]

heartbeat_intervalinteger<int64>

HeartbeatInterval is how often edge nodes should send heartbeats. This value is sent to edge nodes during handshake. Default: 15s.

Possible values: [-9223372036854776000, 9223372036854776000, 1, 1000, 1000000, 1000000000, 60000000000, 3600000000000]

lost_timeoutinteger<int64>

LostTimeout is how long a node must remain disconnected before marking it as lost. Default: 1h. Must be greater than disconnect_timeout. Lost nodes are removed from scheduling and become eligible for garbage collection.

Possible values: [-9223372036854776000, 9223372036854776000, 1, 1000, 1000000, 1000000000, 60000000000, 3600000000000]

scheduler object

execution_limit_backoffinteger<int64>

ExecutionLimitBackoff is the duration to wait before creating a new scheduling run when hitting execution limits.

Possible values: [-9223372036854776000, 9223372036854776000, 1, 1000, 1000000, 1000000000, 60000000000, 3600000000000]

max_executions_per_runinteger

MaxExecutionsPerRun limits the total number of scheduler operations per evaluation (including creating, stopping, replacing, and failing executions). Set to 0 for no limit.

queue_backoffinteger<int64>

QueueBackoff specifies the time to wait before retrying a failed job.

Possible values: [-9223372036854776000, 9223372036854776000, 1, 1000, 1000000, 1000000000, 60000000000, 3600000000000]

queue_timeout_neverinteger<int64>

DefaultQueueTimeoutNeverRestart is the default queue timeout for jobs with "never" restart policy. Batch jobs get fast feedback when no matching nodes exist. Set to 0 for no default (wait indefinitely).

Possible values: [-9223372036854776000, 9223372036854776000, 1, 1000, 1000000, 1000000000, 60000000000, 3600000000000]

queue_timeout_otherinteger<int64>

DefaultQueueTimeoutOtherPolicies is the default queue timeout for "always" or "on-failure" restart policies. Services wait for matching nodes (e.g., auto-scaling scenarios). Set to 0 for no default (wait indefinitely).

Possible values: [-9223372036854776000, 9223372036854776000, 1, 1000, 1000000, 1000000000, 60000000000, 3600000000000]

worker_countinteger

WorkerCount specifies the number of concurrent workers for job scheduling.

shutdown_timeoutinteger<int64>

ShutdownTimeout is the maximum time to wait for graceful shutdown

Possible values: [-9223372036854776000, 9223372036854776000, 1, 1000, 1000000, 1000000000, 60000000000, 3600000000000]

store object

gc object

deleted_jobs_retentioninteger<int64>

DeletedJobsRetention is how long to keep soft-deleted jobs before permanent deletion. Default: 7 days. Increase for longer audit history.

Possible values: [-9223372036854776000, 9223372036854776000, 1, 1000, 1000000, 1000000000, 60000000000, 3600000000000]

lost_nodes_retentioninteger<int64>

LostNodesRetention is how long to keep lost node records after they're marked as lost. Default: 7 days. Measured from when the node transitions to Lost state. Independent of node_manager.lost_timeout.

Possible values: [-9223372036854776000, 9223372036854776000, 1, 1000, 1000000, 1000000000, 60000000000, 3600000000000]

terminal_evaluations_retentioninteger<int64>

TerminalEvaluationsRetention is how long to keep terminal evaluation records (complete/failed/cancelled). Default: 1 day. Evaluations are short-lived scheduling decisions.

Possible values: [-9223372036854776000, 9223372036854776000, 1, 1000, 1000000, 1000000000, 60000000000, 3600000000000]

terminal_executions_retentioninteger<int64>

TerminalExecutionsRetention is how long to keep terminal execution records (complete/failed/stopped). Default: 7 days. Increase for longer execution history.

Possible values: [-9223372036854776000, 9223372036854776000, 1, 1000, 1000000, 1000000000, 60000000000, 3600000000000]

streaming_proxy object

Streaming proxy configuration

read_timeout_secondsinteger

ReadTimeoutSeconds is the read timeout in seconds for connections to the remote log server before the connection is closed. A value of 0 means no timeout.

remote_endpointstring

RemoteEndpoint is the endpoint of the log server to proxy to, typically a Loki instance

remote_tokenstring

RemoteToken is the authentication token used to access the log server

telemetry object

Simplified telemetry configuration

authentication object

Optional authentication configuration for telemetry exporters

namespacestring

Namespace is used to group telemetry data for all nodes in a namespace

tokenstring

Token is the authentication token or password

typestring

Type represents the authentication type, currently only supports "Basic"

do_not_trackboolean

DoNotTrack disables telemetry collection (default: false, meaning telemetry is enabled)

endpointstring

Endpoint is the telemetry collector endpoint and should not include a path, use EndpointPath for that. Examples: "localhost:4317", "https://collector.example.com:4318"

endpoint_pathstring

Some endpoints have a path under which they serve /v1/metrics or similar, but this cannot be included in Endpoint directly.

export_intervalinteger<int64>

ExportInterval is how often metrics are exported (default: 30s)

Possible values: [-9223372036854776000, 9223372036854776000, 1, 1000, 1000000, 1000000000, 60000000000, 3600000000000]

headers object

Headers are optional headers for authentication

property name*string

include_go_metricsboolean

IncludeGoMetrics enables collection of Go runtime metrics (GC, goroutines, etc.)

insecureboolean

Insecure disables TLS verification (for development/testing)

process_metrics_intervalinteger<int64>

ProcessMetricsInterval is how often process metrics are collected (default: 15s) Process metrics (CPU, memory, file descriptors) are always enabled

Possible values: [-9223372036854776000, 9223372036854776000, 1, 1000, 1000000, 1000000000, 60000000000, 3600000000000]

protocolstring

Protocol specifies the export protocol: "grpc" or "http"

resource_attributes object

ResourceAttributes are additional attributes to include with all telemetry data

property name*string

transport object

addressstring

credentials_pathstring

insecureboolean

listen_addrstring

ListenAddr - when set, runs embedded server for nodes to connect to If specified, this overrides any server address from credentials/bootstrapping

network_idstring

node_idstring

Connection config settings

refresh_addressstring

require_tlsboolean

reverse_proxyboolean

types.OrchestratorConfig
{
  "api": {
    "auth": {
      "token": "string"
    },
    "listen_addr": "string"
  },
  "data_dir": "string",
  "evaluation_broker": {
    "initial_retry_delay": -9223372036854776000,
    "max_retry_count": 0,
    "visibility_timeout": -9223372036854776000
  },
  "log": {
    "format": "string",
    "level": "string"
  },
  "name": "string",
  "name_provider": "string",
  "node_manager": {
    "connected_after": -9223372036854776000,
    "disconnect_timeout": -9223372036854776000,
    "heartbeat_interval": -9223372036854776000,
    "lost_timeout": -9223372036854776000
  },
  "scheduler": {
    "execution_limit_backoff": -9223372036854776000,
    "max_executions_per_run": 0,
    "queue_backoff": -9223372036854776000,
    "queue_timeout_never": -9223372036854776000,
    "queue_timeout_other": -9223372036854776000,
    "worker_count": 0
  },
  "shutdown_timeout": -9223372036854776000,
  "store": {
    "gc": {
      "deleted_jobs_retention": -9223372036854776000,
      "lost_nodes_retention": -9223372036854776000,
      "terminal_evaluations_retention": -9223372036854776000,
      "terminal_executions_retention": -9223372036854776000
    }
  },
  "streaming_proxy": {
    "read_timeout_seconds": 0,
    "remote_endpoint": "string",
    "remote_token": "string"
  },
  "telemetry": {
    "authentication": {
      "namespace": "string",
      "token": "string",
      "type": "string"
    },
    "do_not_track": true,
    "endpoint": "string",
    "endpoint_path": "string",
    "export_interval": -9223372036854776000,
    "headers": {},
    "include_go_metrics": true,
    "insecure": true,
    "process_metrics_interval": -9223372036854776000,
    "protocol": "string",
    "resource_attributes": {}
  },
  "transport": {
    "address": "string",
    "credentials_path": "string",
    "insecure": true,
    "listen_addr": "string",
    "network_id": "string",
    "node_id": "string",
    "refresh_address": "string",
    "require_tls": true,
    "reverse_proxy": true
  }
}