What is Expanso and how does it work?

Expanso is a managed platform for deploying intelligent data pipelines at the edge. It processes data where it's generated - reducing bandwidth, latency, and costs. You deploy lightweight agents on your infrastructure, build pipelines using our visual builder or YAML, and control everything from a central SaaS platform.

Can I run AI/ML models directly in my data pipelines?

Yes! Expanso supports running ONNX, TensorFlow Lite, and other models as native pipeline steps. Execute low-latency inference on streaming data, enrich events with model outputs (like risk scores), and make decisions at the edge without cloud round-trips.

How many pre-built components are available?

Expanso provides 200+ pre-built components including inputs (Kafka, HTTP, files), processors (transformations, filtering, PII masking, aggregations), and outputs (S3, Snowflake, Datadog, Splunk). Browse the complete catalog in our Component Reference.

Do I need to write code to build pipelines?

No - use our drag-and-drop visual pipeline builder to create sophisticated pipelines without code. For advanced use cases, you can also write pipelines in YAML or use the Bloblang transformation language for complex data mappings.

How does Expanso help with data governance and compliance?

Expanso includes built-in governance features: automatic PII detection and masking, policy enforcement at the edge, RBAC, SSO integration, and comprehensive audit trails. Mask sensitive data before it ever leaves your network.

Pipelines

A pipeline defines how data flows from inputs, through processors, to outputs.

Pipeline Structure

Every pipeline has three main sections:

input:
  # Where data comes from
  kafka:
    addresses: ["localhost:9092"]
    topics: ["logs"]

pipeline:
  processors:
    # What to do with the data
    - mapping: |
        root.message = this.msg.uppercase()
        root.timestamp = now()

output:
  # Where data goes
  s3:
    bucket: processed-logs
    path: "logs/${!timestamp_unix()}.json"

Inputs

Sources of data:

Message queues: Kafka, RabbitMQ, NATS
Databases: PostgreSQL, MongoDB, MySQL
Files: Local, S3, SFTP
HTTP: Webhooks, APIs
Streams: TCP, UDP, WebSocket
Generate: Test data

Browse all inputs →

Processors

Transform, filter, and enrich data:

Mapping: Transform with Bloblang
Filter: Drop unwanted messages
Parse: JSON, CSV, XML, Avro, Protobuf
Enrich: Lookup external data
Aggregate: Batch, window, group

Browse all processors →

Outputs

Destinations for processed data:

Message queues: Kafka, RabbitMQ, NATS
Databases: PostgreSQL, Elasticsearch
Object storage: S3, GCS, Azure Blob
HTTP: Webhooks, APIs
Files: Local, S3, SFTP
Observability: Datadog, Prometheus

Browse all outputs →

Simple Example

Read files, filter, output to terminal:

input:
  file:
    paths: ["/var/log/*.log"]

pipeline:
  processors:
    - mapping: |
        # Only keep ERROR logs
        root = if !this.contains("ERROR") { deleted() }

output:
  stdout:
    codec: lines

Multi-Output Example

Route data to different destinations:

input:
  http_server:
    address: "0.0.0.0:8080"

pipeline:
  processors:
    - mapping: |
        root = this.parse_json()

output:
  broker:
    pattern: fan_out
    outputs:
      # Errors to Slack
      - switch:
          cases:
            - check: this.level == "ERROR"
              output:
                http_client:
                  url: "https://slack.com/webhook"

      # Metrics to Prometheus
      - switch:
          cases:
            - check: this.type == "metric"
              output:
                prometheus_push_gateway:
                  url: "http://prometheus:9091"

      # Everything to S3
      - s3:
          bucket: all-events

Deployment

Pipelines deploy to nodes based on:

Direct selection: Choose specific nodes
Label selectors: Target nodes with matching labels
Network: All nodes in a network

Example with labels:

# Deploy to production log processors only
selector:
  env: production
  role: log-processor

What's Next?

👉 Components - Learn about building blocks

👉 Build a Pipeline - Get hands-on

👉 Bloblang Guide - Master data transformations

Pipeline Structure​

Inputs​

Processors​

Outputs​

Simple Example​

Multi-Output Example​

Deployment​

What's Next?​