Skip to main content

Core Concepts

Learn the fundamental concepts behind Expanso Edge and how it enables distributed data pipeline orchestration.

What is Expanso?

Expanso Edge is a managed platform for deploying and orchestrating data pipelines at the edge. Process data where it's generated, reducing bandwidth costs and latency while maintaining centralized control.

Key capabilities:

  • Edge-native processing at the data source
  • Managed SaaS with no infrastructure to maintain
  • Visual pipeline builder for rapid development
  • 200+ built-in components for any use case

Architecture Overview


Core Components

1. Expanso Cloud (Control Plane)

The Control Plane is the central management interface hosted at cloud.expanso.io.

Responsibilities:

  • Pipeline creation and configuration
  • Agent deployment and monitoring
  • Metrics aggregation and visualization
  • User authentication and authorization
  • Network management

You don't run this - it's a managed service provided by Expanso.

2. Networks

A Network is a logical grouping of agents that work together.

Key Features:

  • Isolates agents and pipelines
  • Simplifies management and organization
  • Enables team collaboration
  • Supports label-based deployment

Example Use Cases:

  • One network per environment (dev, staging, production)
  • One network per region (us-east, eu-west, ap-south)
  • One network per team or project

Creating a Network:

# Via CLI
expanso network create my-network

# Or use Expanso Cloud UI

3. Agents

An Agent is the runtime that executes pipelines on your infrastructure.

Characteristics:

  • Lightweight Go binary (~50MB)
  • Runs on Linux, macOS, Windows
  • Connects to Expanso Cloud over TLS
  • Executes one or more pipelines
  • Reports metrics and health

Agent Lifecycle:

  1. Bootstrap: Register with Expanso Cloud using a token
  2. Connect: Establish secure connection to control plane
  3. Receive: Download pipeline configurations
  4. Execute: Run pipelines and process data
  5. Report: Send metrics and logs to control plane

Deployment Options:

  • Bare metal servers
  • Virtual machines
  • Docker containers
  • Kubernetes pods
  • Edge devices (Raspberry Pi, IoT gateways)

4. Pipelines

A Pipeline defines how data flows from inputs, through processors, to outputs.

Pipeline Structure:

input:
# Where data comes from
kafka:
addresses: ["localhost:9092"]
topics: ["logs"]

pipeline:
processors:
# What to do with the data
- mapping: |
root.message = this.msg.uppercase()
root.timestamp = now()

output:
# Where data goes
s3:
bucket: processed-logs
path: "logs/${!timestamp_unix()}.json"

Pipeline Components:

Inputs

Receive data from external sources:

  • Message queues (Kafka, RabbitMQ, NATS)
  • Databases (PostgreSQL, MongoDB)
  • Files (local, S3, SFTP)
  • HTTP servers (webhooks, APIs)
  • Streams (TCP, UDP, WebSocket)

Processors

Transform, filter, and enrich data:

  • Mapping: Transform with Bloblang language
  • Filtering: Drop unwanted messages
  • Aggregation: Batch, window, group
  • Enrichment: Lookup external data
  • Parsing: JSON, CSV, XML, Avro, Protobuf

Outputs

Send processed data to destinations:

  • Message queues (Kafka, RabbitMQ, NATS)
  • Databases (PostgreSQL, Elasticsearch)
  • Object storage (S3, GCS, Azure Blob)
  • HTTP endpoints (webhooks, APIs)
  • Observability platforms (Datadog, Prometheus)

Data Flow

Simple Pipeline Flow

Multi-Output Pipeline


Key Concepts in Depth

Edge-First Processing

Traditional approach:

Edge Device → Cloud → Process → Store

High cost, High latency

Expanso approach:

Edge Device → Process at Edge → Cloud (filtered/aggregated)

Low cost, Low latency

Benefits:

  • Reduced bandwidth by filtering at source
  • Lower latency through local processing
  • Less data transfer and cloud processing costs
  • Keep sensitive data on-premises
  • Continue processing during network outages

Bloblang Mapping Language

Bloblang is Expanso's transformation language for manipulating data.

Example:

# Parse and transform log entry
root.level = this.severity.uppercase()
root.message = this.msg
root.timestamp = this.time.parse_timestamp("2006-01-02")
root.user_id = this.user.id.string()

# Add metadata
root.processed_at = now()
root.source = "expanso-edge"

# Conditional logic
root.alert = if this.severity == "ERROR" {
true
} else {
false
}

See the Bloblang Guide for more details.

Labels and Selectors

Use labels to organize and target agents for deployment.

Label Examples:

env: production
region: us-east-1
datacenter: dc-1
role: log-processor

Deployment with Selectors:

# Deploy only to production agents in us-east-1
selector:
env: production
region: us-east-1

This enables:

  • Gradual rollouts
  • Environment-specific deployments
  • Geographic targeting
  • Role-based deployments

State and Persistence

Agents maintain state for:

  • Checkpoints: Resume from last processed position
  • Caches: Store lookup data locally
  • Deduplication: Track processed messages
  • Aggregations: Maintain windows and counters

Storage Options:

  • Local disk (default)
  • Memory (fast, not persistent)
  • External (Redis, etcd)

Deployment Patterns

Pattern 1: Edge Filtering

Filter data at the edge, send only relevant data to cloud.

Edge Agent: Filter 95% of noise → Cloud: Store 5% of important data

Use Case: IoT sensors generating high-frequency data

Pattern 2: Regional Aggregation

Aggregate data regionally, send summaries to central location.

Region 1 Agent: Aggregate hourly → Central Storage
Region 2 Agent: Aggregate hourly → Central Storage

Use Case: Multi-location retail analytics

Pattern 3: Hub and Spoke

Edge agents send to regional hubs, hubs aggregate and forward.

Edge Agents → Regional Hub → Central Cloud

Use Case: Manufacturing facilities with central monitoring

Pattern 4: Mesh Processing

Agents communicate directly for distributed processing.

Agent 1 ↔ Agent 2 ↔ Agent 3 → Distributed Result

Use Case: Distributed machine learning inference


Security Model

Authentication

  • Agents authenticate using bootstrap tokens
  • Tokens are single-use and time-limited
  • mTLS for all agent-cloud communication

Authorization

  • Role-based access control (RBAC)
  • Network-level isolation
  • Pipeline deployment permissions

Data Security

  • End-to-end encryption in transit
  • At-rest encryption for agent state
  • PII redaction at the edge
  • Audit logging

Monitoring and Observability

Metrics

Agents export metrics:

  • Throughput: Messages/sec, bytes/sec
  • Latency: Processing time, queue depth
  • Errors: Error rates, failed messages
  • Resources: CPU, memory, disk usage

Logs

Three levels of logs:

  • Pipeline logs: Data flowing through pipelines
  • Agent logs: Agent operations and health
  • Audit logs: User actions and deployments

Health Checks

Agents report health every 10 seconds:

  • Connection status
  • Pipeline status
  • Resource availability
  • Error conditions

What's Next?

Now that you understand the core concepts: