What is Expanso and how does it work?

Expanso is a managed platform for deploying intelligent data pipelines at the edge. It processes data where it's generated - reducing bandwidth, latency, and costs. You deploy lightweight agents on your infrastructure, build pipelines using our visual builder or YAML, and control everything from a central SaaS platform.

Can I run AI/ML models directly in my data pipelines?

Yes! Expanso supports running ONNX, TensorFlow Lite, and other models as native pipeline steps. Execute low-latency inference on streaming data, enrich events with model outputs (like risk scores), and make decisions at the edge without cloud round-trips.

How many pre-built components are available?

Expanso provides 200+ pre-built components including inputs (Kafka, HTTP, files), processors (transformations, filtering, PII masking, aggregations), and outputs (S3, Snowflake, Datadog, Splunk). Browse the complete catalog in our Component Reference.

Do I need to write code to build pipelines?

No - use our drag-and-drop visual pipeline builder to create sophisticated pipelines without code. For advanced use cases, you can also write pipelines in YAML or use the Bloblang transformation language for complex data mappings.

How does Expanso help with data governance and compliance?

Expanso includes built-in governance features: automatic PII detection and masking, policy enforcement at the edge, RBAC, SSO integration, and comprehensive audit trails. Mask sensitive data before it ever leaves your network.

Core Concepts

Learn the fundamental concepts behind Expanso Edge and how it enables distributed data pipeline orchestration.

What is Expanso?

Expanso Edge is a managed platform for deploying and orchestrating data pipelines at the edge. Process data where it's generated, reducing bandwidth costs and latency while maintaining centralized control.

Key capabilities:

Edge-native processing at the data source
Managed SaaS with no infrastructure to maintain
Visual pipeline builder for rapid development
200+ built-in components for any use case

Architecture Overview

Core Components

1. Expanso Cloud (Control Plane)

The Control Plane is the central management interface hosted at cloud.expanso.io.

Responsibilities:

Pipeline creation and configuration
Agent deployment and monitoring
Metrics aggregation and visualization
User authentication and authorization
Network management

You don't run this - it's a managed service provided by Expanso.

2. Networks

A Network is a logical grouping of agents that work together.

Key Features:

Isolates agents and pipelines
Simplifies management and organization
Enables team collaboration
Supports label-based deployment

Example Use Cases:

One network per environment (dev, staging, production)
One network per region (us-east, eu-west, ap-south)
One network per team or project

Creating a Network:

# Via CLI
expanso network create my-network

# Or use Expanso Cloud UI

3. Agents

An Agent is the runtime that executes pipelines on your infrastructure.

Characteristics:

Lightweight Go binary (~50MB)
Runs on Linux, macOS, Windows
Connects to Expanso Cloud over TLS
Executes one or more pipelines
Reports metrics and health

Agent Lifecycle:

Bootstrap: Register with Expanso Cloud using a token
Connect: Establish secure connection to control plane
Receive: Download pipeline configurations
Execute: Run pipelines and process data
Report: Send metrics and logs to control plane

Deployment Options:

Bare metal servers
Virtual machines
Docker containers
Kubernetes pods
Edge devices (Raspberry Pi, IoT gateways)

4. Pipelines

A Pipeline defines how data flows from inputs, through processors, to outputs.

Pipeline Structure:

input:
  # Where data comes from
  kafka:
    addresses: ["localhost:9092"]
    topics: ["logs"]

pipeline:
  processors:
    # What to do with the data
    - mapping: |
        root.message = this.msg.uppercase()
        root.timestamp = now()

output:
  # Where data goes
  s3:
    bucket: processed-logs
    path: "logs/${!timestamp_unix()}.json"

Pipeline Components:

Inputs

Receive data from external sources:

Message queues (Kafka, RabbitMQ, NATS)
Databases (PostgreSQL, MongoDB)
Files (local, S3, SFTP)
HTTP servers (webhooks, APIs)
Streams (TCP, UDP, WebSocket)

Processors

Transform, filter, and enrich data:

Mapping: Transform with Bloblang language
Filtering: Drop unwanted messages
Aggregation: Batch, window, group
Enrichment: Lookup external data
Parsing: JSON, CSV, XML, Avro, Protobuf

Outputs

Send processed data to destinations:

Message queues (Kafka, RabbitMQ, NATS)
Databases (PostgreSQL, Elasticsearch)
Object storage (S3, GCS, Azure Blob)
HTTP endpoints (webhooks, APIs)
Observability platforms (Datadog, Prometheus)

Data Flow

Simple Pipeline Flow

Multi-Output Pipeline

Key Concepts in Depth

Edge-First Processing

Traditional approach:

Edge Device → Cloud → Process → Store
             ↑
         High cost, High latency

Expanso approach:

Edge Device → Process at Edge → Cloud (filtered/aggregated)
                    ↑
            Low cost, Low latency

Benefits:

Reduced bandwidth by filtering at source
Lower latency through local processing
Less data transfer and cloud processing costs
Keep sensitive data on-premises
Continue processing during network outages

Bloblang Mapping Language

Bloblang is Expanso's transformation language for manipulating data.

Example:

# Parse and transform log entry
root.level = this.severity.uppercase()
root.message = this.msg
root.timestamp = this.time.parse_timestamp("2006-01-02")
root.user_id = this.user.id.string()

# Add metadata
root.processed_at = now()
root.source = "expanso-edge"

# Conditional logic
root.alert = if this.severity == "ERROR" {
  true
} else {
  false
}

See the Bloblang Guide for more details.

Labels and Selectors

Use labels to organize and target agents for deployment.

Label Examples:

env: production
region: us-east-1
datacenter: dc-1
role: log-processor

Deployment with Selectors:

# Deploy only to production agents in us-east-1
selector:
  env: production
  region: us-east-1

This enables:

Gradual rollouts
Environment-specific deployments
Geographic targeting
Role-based deployments

State and Persistence

Agents maintain state for:

Checkpoints: Resume from last processed position
Caches: Store lookup data locally
Deduplication: Track processed messages
Aggregations: Maintain windows and counters

Storage Options:

Local disk (default)
Memory (fast, not persistent)
External (Redis, etcd)

Deployment Patterns

Pattern 1: Edge Filtering

Filter data at the edge, send only relevant data to cloud.

Edge Agent: Filter 95% of noise → Cloud: Store 5% of important data

Use Case: IoT sensors generating high-frequency data

Pattern 2: Regional Aggregation

Aggregate data regionally, send summaries to central location.

Region 1 Agent: Aggregate hourly → Central Storage
Region 2 Agent: Aggregate hourly → Central Storage

Use Case: Multi-location retail analytics

Pattern 3: Hub and Spoke

Edge agents send to regional hubs, hubs aggregate and forward.

Edge Agents → Regional Hub → Central Cloud

Use Case: Manufacturing facilities with central monitoring

Pattern 4: Mesh Processing

Agents communicate directly for distributed processing.

Agent 1 ↔ Agent 2 ↔ Agent 3 → Distributed Result

Use Case: Distributed machine learning inference

Security Model

Authentication

Agents authenticate using bootstrap tokens
Tokens are single-use and time-limited
mTLS for all agent-cloud communication

Authorization

Role-based access control (RBAC)
Network-level isolation
Pipeline deployment permissions

Data Security

End-to-end encryption in transit
At-rest encryption for agent state
PII redaction at the edge
Audit logging

Monitoring and Observability

Metrics

Agents export metrics:

Throughput: Messages/sec, bytes/sec
Latency: Processing time, queue depth
Errors: Error rates, failed messages
Resources: CPU, memory, disk usage

Logs

Three levels of logs:

Pipeline logs: Data flowing through pipelines
Agent logs: Agent operations and health
Audit logs: User actions and deployments

Health Checks

Agents report health every 10 seconds:

Connection status
Pipeline status
Resource availability
Error conditions

What's Next?

Now that you understand the core concepts:

Quick Start: Build your first pipeline
Installation: Deploy agents on your infrastructure
Components: Explore available inputs, processors, and outputs
Use Cases: See real-world examples
Bloblang Guide: Learn the transformation language

What is Expanso?​

Architecture Overview​

Core Components​

1. Expanso Cloud (Control Plane)​

2. Networks​

3. Agents​

4. Pipelines​

Inputs​

Processors​

Outputs​

Data Flow​

Simple Pipeline Flow​

Multi-Output Pipeline​

Key Concepts in Depth​

Edge-First Processing​

Bloblang Mapping Language​

Labels and Selectors​

State and Persistence​

Deployment Patterns​

Pattern 1: Edge Filtering​

Pattern 2: Regional Aggregation​

Pattern 3: Hub and Spoke​

Pattern 4: Mesh Processing​

Security Model​

Authentication​

Authorization​

Data Security​

Monitoring and Observability​

Metrics​

Logs​

Health Checks​

What's Next?​