Core Concepts
Learn the fundamental concepts behind Expanso Edge and how it enables distributed data pipeline orchestration.
What is Expanso?
Expanso Edge is a managed platform for deploying and orchestrating data pipelines at the edge. Process data where it's generated, reducing bandwidth costs and latency while maintaining centralized control.
Key capabilities:
- Edge-native processing at the data source
- Managed SaaS with no infrastructure to maintain
- Visual pipeline builder for rapid development
- 200+ built-in components for any use case
Architecture Overview
Core Components
1. Expanso Cloud (Control Plane)
The Control Plane is the central management interface hosted at cloud.expanso.io.
Responsibilities:
- Pipeline creation and configuration
- Agent deployment and monitoring
- Metrics aggregation and visualization
- User authentication and authorization
- Network management
You don't run this - it's a managed service provided by Expanso.
2. Networks
A Network is a logical grouping of agents that work together.
Key Features:
- Isolates agents and pipelines
- Simplifies management and organization
- Enables team collaboration
- Supports label-based deployment
Example Use Cases:
- One network per environment (dev, staging, production)
- One network per region (us-east, eu-west, ap-south)
- One network per team or project
Creating a Network:
# Via CLI
expanso network create my-network
# Or use Expanso Cloud UI
3. Agents
An Agent is the runtime that executes pipelines on your infrastructure.
Characteristics:
- Lightweight Go binary (~50MB)
- Runs on Linux, macOS, Windows
- Connects to Expanso Cloud over TLS
- Executes one or more pipelines
- Reports metrics and health
Agent Lifecycle:
- Bootstrap: Register with Expanso Cloud using a token
- Connect: Establish secure connection to control plane
- Receive: Download pipeline configurations
- Execute: Run pipelines and process data
- Report: Send metrics and logs to control plane
Deployment Options:
- Bare metal servers
- Virtual machines
- Docker containers
- Kubernetes pods
- Edge devices (Raspberry Pi, IoT gateways)
4. Pipelines
A Pipeline defines how data flows from inputs, through processors, to outputs.
Pipeline Structure:
input:
# Where data comes from
kafka:
addresses: ["localhost:9092"]
topics: ["logs"]
pipeline:
processors:
# What to do with the data
- mapping: |
root.message = this.msg.uppercase()
root.timestamp = now()
output:
# Where data goes
s3:
bucket: processed-logs
path: "logs/${!timestamp_unix()}.json"
Pipeline Components:
Inputs
Receive data from external sources:
- Message queues (Kafka, RabbitMQ, NATS)
- Databases (PostgreSQL, MongoDB)
- Files (local, S3, SFTP)
- HTTP servers (webhooks, APIs)
- Streams (TCP, UDP, WebSocket)
Processors
Transform, filter, and enrich data:
- Mapping: Transform with Bloblang language
- Filtering: Drop unwanted messages
- Aggregation: Batch, window, group
- Enrichment: Lookup external data
- Parsing: JSON, CSV, XML, Avro, Protobuf
Outputs
Send processed data to destinations:
- Message queues (Kafka, RabbitMQ, NATS)
- Databases (PostgreSQL, Elasticsearch)
- Object storage (S3, GCS, Azure Blob)
- HTTP endpoints (webhooks, APIs)
- Observability platforms (Datadog, Prometheus)
Data Flow
Simple Pipeline Flow
Multi-Output Pipeline
Key Concepts in Depth
Edge-First Processing
Traditional approach:
Edge Device → Cloud → Process → Store
↑
High cost, High latency
Expanso approach:
Edge Device → Process at Edge → Cloud (filtered/aggregated)
↑
Low cost, Low latency
Benefits:
- Reduced bandwidth by filtering at source
- Lower latency through local processing
- Less data transfer and cloud processing costs
- Keep sensitive data on-premises
- Continue processing during network outages
Bloblang Mapping Language
Bloblang is Expanso's transformation language for manipulating data.
Example:
# Parse and transform log entry
root.level = this.severity.uppercase()
root.message = this.msg
root.timestamp = this.time.parse_timestamp("2006-01-02")
root.user_id = this.user.id.string()
# Add metadata
root.processed_at = now()
root.source = "expanso-edge"
# Conditional logic
root.alert = if this.severity == "ERROR" {
true
} else {
false
}
See the Bloblang Guide for more details.
Labels and Selectors
Use labels to organize and target agents for deployment.
Label Examples:
env: production
region: us-east-1
datacenter: dc-1
role: log-processor
Deployment with Selectors:
# Deploy only to production agents in us-east-1
selector:
env: production
region: us-east-1
This enables:
- Gradual rollouts
- Environment-specific deployments
- Geographic targeting
- Role-based deployments
State and Persistence
Agents maintain state for:
- Checkpoints: Resume from last processed position
- Caches: Store lookup data locally
- Deduplication: Track processed messages
- Aggregations: Maintain windows and counters
Storage Options:
- Local disk (default)
- Memory (fast, not persistent)
- External (Redis, etcd)
Deployment Patterns
Pattern 1: Edge Filtering
Filter data at the edge, send only relevant data to cloud.
Edge Agent: Filter 95% of noise → Cloud: Store 5% of important data
Use Case: IoT sensors generating high-frequency data
Pattern 2: Regional Aggregation
Aggregate data regionally, send summaries to central location.
Region 1 Agent: Aggregate hourly → Central Storage
Region 2 Agent: Aggregate hourly → Central Storage
Use Case: Multi-location retail analytics
Pattern 3: Hub and Spoke
Edge agents send to regional hubs, hubs aggregate and forward.
Edge Agents → Regional Hub → Central Cloud
Use Case: Manufacturing facilities with central monitoring
Pattern 4: Mesh Processing
Agents communicate directly for distributed processing.
Agent 1 ↔ Agent 2 ↔ Agent 3 → Distributed Result
Use Case: Distributed machine learning inference
Security Model
Authentication
- Agents authenticate using bootstrap tokens
- Tokens are single-use and time-limited
- mTLS for all agent-cloud communication
Authorization
- Role-based access control (RBAC)
- Network-level isolation
- Pipeline deployment permissions
Data Security
- End-to-end encryption in transit
- At-rest encryption for agent state
- PII redaction at the edge
- Audit logging
Monitoring and Observability
Metrics
Agents export metrics:
- Throughput: Messages/sec, bytes/sec
- Latency: Processing time, queue depth
- Errors: Error rates, failed messages
- Resources: CPU, memory, disk usage
Logs
Three levels of logs:
- Pipeline logs: Data flowing through pipelines
- Agent logs: Agent operations and health
- Audit logs: User actions and deployments
Health Checks
Agents report health every 10 seconds:
- Connection status
- Pipeline status
- Resource availability
- Error conditions
What's Next?
Now that you understand the core concepts:
- Quick Start: Build your first pipeline
- Installation: Deploy agents on your infrastructure
- Components: Explore available inputs, processors, and outputs
- Use Cases: See real-world examples
- Bloblang Guide: Learn the transformation language