What is Expanso and how does it work?

Expanso is a managed platform for deploying intelligent data pipelines at the edge. It processes data where it's generated - reducing bandwidth, latency, and costs. You deploy lightweight agents on your infrastructure, build pipelines using our visual builder or YAML, and control everything from a central SaaS platform.

Can I run AI/ML models directly in my data pipelines?

Yes! Expanso supports running ONNX, TensorFlow Lite, and other models as native pipeline steps. Execute low-latency inference on streaming data, enrich events with model outputs (like risk scores), and make decisions at the edge without cloud round-trips.

How many pre-built components are available?

Expanso provides 200+ pre-built components including inputs (Kafka, HTTP, files), processors (transformations, filtering, PII masking, aggregations), and outputs (S3, Snowflake, Datadog, Splunk). Browse the complete catalog in our Component Reference.

Do I need to write code to build pipelines?

No - use our drag-and-drop visual pipeline builder to create sophisticated pipelines without code. For advanced use cases, you can also write pipelines in YAML or use the Bloblang transformation language for complex data mappings.

How does Expanso help with data governance and compliance?

Expanso includes built-in governance features: automatic PII detection and masking, policy enforcement at the edge, RBAC, SSO integration, and comprehensive audit trails. Mask sensitive data before it ever leaves your network.

Basic K3s Log Collection

Stream logs from all pods and namespaces to S3 with batching.

Pipeline

input:
  subprocess:
    name: kubectl
    args:
      - logs
      - --all-containers=true
      - --prefix=true
      - --follow
      - --tail=-1
      - --all-namespaces
    codec: lines
    restart_on_exit: true

pipeline:
  processors:
    - mapping: |
        root.raw_log = this
        root.timestamp = now()
        root.node_id = env("NODE_ID")
        root.cluster = env("CLUSTER_NAME").or("k3s-edge")

output:
  aws_s3:
    bucket: edge-k3s-logs
    path: 'logs/${! env("NODE_ID") }/${! timestamp_unix() }.jsonl'
    batching:
      count: 1000
      period: 1m
    processors:
      - archive:
          format: concatenate

What This Does

Follows logs from all containers in all namespaces using kubectl logs --follow
Adds metadata: node identifier, cluster name, and timestamp to each log entry
Batches logs: Collects 1000 logs or waits 1 minute before writing
Writes to S3: Organizes logs by node and timestamp for easy retrieval
Auto-restarts: If kubectl process exits, it automatically restarts

Key Configuration

--all-containers=true: Includes logs from all containers in each pod

--prefix=true: Adds [namespace/pod/container] prefix to each log line

--follow: Continuously streams new logs (like tail -f)

restart_on_exit: true: Ensures log collection continues even if kubectl crashes

Batching: Reduces S3 API calls by writing 1000 logs at once instead of individual files

Environment Variables

Set these environment variables where Expanso runs:

NODE_ID: Unique identifier for this edge node (e.g., edge-site-42)
CLUSTER_NAME: Name of the K3s cluster (optional, defaults to k3s-edge)

Next Steps

Parse Metadata: Extract namespace, pod, and container from log prefix
Multiple Destinations: Send logs to both S3 and Elasticsearch
Filter by Log Level: Reduce volume by only collecting errors

Pipeline​

What This Does​

Key Configuration​

Environment Variables​

Next Steps​

Pipeline

What This Does

Key Configuration

Environment Variables

Next Steps