What is Expanso and how does it work?

Expanso is a managed platform for deploying intelligent data pipelines at the edge. It processes data where it's generated - reducing bandwidth, latency, and costs. You deploy lightweight agents on your infrastructure, build pipelines using our visual builder or YAML, and control everything from a central SaaS platform.

Can I run AI/ML models directly in my data pipelines?

Yes! Expanso supports running ONNX, TensorFlow Lite, and other models as native pipeline steps. Execute low-latency inference on streaming data, enrich events with model outputs (like risk scores), and make decisions at the edge without cloud round-trips.

How many pre-built components are available?

Expanso provides 200+ pre-built components including inputs (Kafka, HTTP, files), processors (transformations, filtering, PII masking, aggregations), and outputs (S3, Snowflake, Datadog, Splunk). Browse the complete catalog in our Component Reference.

Do I need to write code to build pipelines?

No - use our drag-and-drop visual pipeline builder to create sophisticated pipelines without code. For advanced use cases, you can also write pipelines in YAML or use the Bloblang transformation language for complex data mappings.

How does Expanso help with data governance and compliance?

Expanso includes built-in governance features: automatic PII detection and masking, policy enforcement at the edge, RBAC, SSO integration, and comprehensive audit trails. Mask sensitive data before it ever leaves your network.

Offline-Resilient Configuration

Handle intermittent network connectivity in edge environments with local buffering and automatic retry.

Pipeline

input:
  subprocess:
    name: oc
    args: [logs, --all-containers, --prefix, --follow, --all-namespaces]
    codec: lines
    restart_on_exit: true

pipeline:
  processors:
    - mapping: |
        root = this
        root.cluster = env("CLUSTER_NAME")
        root.timestamp = now()

# Buffer for offline periods
buffer:
  system_window:
    timestamp_mapping: 'root = this.timestamp'
    size: 1h

output:
  retry:
    max_retries: 10
    backoff:
      initial_interval: 30s
      max_interval: 10m
    output:
      aws_s3:
        bucket: sno-logs
        path: 'logs/${! env("CLUSTER_NAME") }/${! timestamp_unix() }.jsonl'
        batching:
          count: 5000
          period: 5m

What This Does

Local buffering: Queues up to 1 hour of logs in memory during network outages
Automatic retry: Retries failed S3 writes up to 10 times
Exponential backoff: Starts with 30s delay, increases to 10m maximum
Seamless recovery: Automatically catches up when connectivity returns
No data loss: Logs are not dropped during temporary outages

Buffer Configuration

Time-based window:

buffer:
  system_window:
    timestamp_mapping: 'root = this.timestamp'
    size: 1h  # Buffer 1 hour of data

Size-based window:

buffer:
  system_window:
    timestamp_mapping: 'root = this.timestamp'
    size: 100MB  # Buffer 100MB of data

Adjust based on:

Expected outage duration
Log volume
Available memory

Retry Strategy

Current settings:

Max retries: 10
Initial interval: 30s
Max interval: 10m

Retry schedule:

30s
1m
2m
4m
8m
10m (max) 7-10. 10m each

Total retry time: ~60 minutes before giving up

Customization Examples

Shorter retry window (for frequent, short outages):

output:
  retry:
    max_retries: 5
    backoff:
      initial_interval: 10s
      max_interval: 2m

Longer retry window (for extended outages):

output:
  retry:
    max_retries: 20
    backoff:
      initial_interval: 1m
      max_interval: 30m

Memory Considerations

Buffer memory usage:

1h of logs at 1000 logs/min = ~60,000 logs
Average log size 500 bytes = ~30MB memory

For resource-constrained SNO:

buffer:
  system_window:
    timestamp_mapping: 'root = this.timestamp'
    size: 30m  # Smaller buffer for limited memory

Network Failure Scenarios

Scenario 1: Short outage (5 minutes)

Logs buffer locally
S3 write fails, retry after 30s
Connection restored, retry succeeds
All logs delivered

Scenario 2: Extended outage (2 hours)

Logs buffer for 1 hour (buffer limit)
Older logs are dropped to make room for new ones
Connection restored after 2 hours
Last hour of logs delivered

Scenario 3: Persistent failure

All retries exhausted after ~60 minutes
Logs are dropped
Error logged for monitoring

Monitoring Buffer Health

Add metrics to track buffer usage:

pipeline:
  processors:
    - mapping: |
        root = this
        root.cluster = env("CLUSTER_NAME")
        root.timestamp = now()
        root.buffer_size = metadata("buffer_size").or(0)

Best Practices

Right-size buffer: Match to expected outage duration and available memory

Monitor retries: Track failed deliveries to identify persistent network issues

Combine with batching: Larger batches reduce network overhead when connectivity returns

Test offline behavior: Simulate network outages to verify recovery

Next Steps

Best Practices: Additional resilience recommendations
Collect Logs: Add buffering to basic log collection
Buffer Component: Component reference

Pipeline​

What This Does​

Buffer Configuration​

Retry Strategy​

Customization Examples​

Memory Considerations​

Network Failure Scenarios​

Monitoring Buffer Health​

Best Practices​

Next Steps​