What is Expanso and how does it work?

Expanso is a managed platform for deploying intelligent data pipelines at the edge. It processes data where it's generated - reducing bandwidth, latency, and costs. You deploy lightweight agents on your infrastructure, build pipelines using our visual builder or YAML, and control everything from a central SaaS platform.

Can I run AI/ML models directly in my data pipelines?

Yes! Expanso supports running ONNX, TensorFlow Lite, and other models as native pipeline steps. Execute low-latency inference on streaming data, enrich events with model outputs (like risk scores), and make decisions at the edge without cloud round-trips.

How many pre-built components are available?

Expanso provides 200+ pre-built components including inputs (Kafka, HTTP, files), processors (transformations, filtering, PII masking, aggregations), and outputs (S3, Snowflake, Datadog, Splunk). Browse the complete catalog in our Component Reference.

Do I need to write code to build pipelines?

No - use our drag-and-drop visual pipeline builder to create sophisticated pipelines without code. For advanced use cases, you can also write pipelines in YAML or use the Bloblang transformation language for complex data mappings.

How does Expanso help with data governance and compliance?

Expanso includes built-in governance features: automatic PII detection and masking, policy enforcement at the edge, RBAC, SSO integration, and comprehensive audit trails. Mask sensitive data before it ever leaves your network.

Edge Processing

Process data where it's generated instead of sending everything to the cloud.

Traditional vs Edge-First

Traditional Approach

Edge Device → All Data to Cloud → Process → Store
             ↑
         High cost, High latency, Privacy concerns

Problems:

Send ALL data to cloud (expensive)
Round-trip latency for results
Network dependency (offline = broken)
Sensitive data leaves premises

Edge-First Approach

Edge Device → Process Locally → Filtered/Aggregated to Cloud
                    ↑
            Low cost, Low latency, Privacy preserved

Benefits:

Filter and aggregate at source (cheaper)
Immediate local results (faster)
Continue processing offline (resilient)
Keep sensitive data on-premises (compliant)

Key Benefits

1. Reduced Bandwidth Costs

Example: IoT sensors generate 1GB/day of raw data:

Traditional: Send 1GB/day = 30GB/month
Edge: Filter 95%, send 50MB/day = 1.5GB/month

Savings: 95% reduction in bandwidth costs

2. Lower Latency

Example: Edge analytics for retail:

Traditional: Edge → Cloud → Process → Results = 200ms+ round-trip
Edge: Edge → Process → Results = under 10ms local

Improvement: 20x faster response time

3. Offline Resilience

Example: Factory floor processing:

Traditional: Network down = No processing
Edge: Network down = Continue processing, sync when reconnected

Result: 100% uptime for critical processing

4. Privacy & Compliance

Example: Healthcare data:

Traditional: PHI leaves premises (compliance risk)
Edge: Redact PHI at source, only send anonymized data

Result: Maintain compliance, reduce risk

Common Patterns

Pattern 1: Edge Filtering

Filter out noise, send only valuable data:

# Keep only ERROR-level logs
pipeline:
  processors:
    - mapping: |
        root = if this.level != "ERROR" { deleted() }

Use case: Log aggregation (reduce volume 90%+)

Pattern 2: Regional Aggregation

Aggregate locally, send summaries:

# Aggregate metrics every hour
pipeline:
  processors:
    - window:
        period: 1h
    - mapping: |
        root.avg = this.values.sum() / this.values.length()

Use case: Multi-location analytics

Pattern 3: Local Enrichment

Enrich with local data before sending:

pipeline:
  processors:
    - http:
        url: "http://local-service/enrich"
        verb: POST

Use case: Add local context (device info, location)

Pattern 4: PII Redaction

Remove sensitive data at the edge:

pipeline:
  processors:
    - mapping: |
        root = this
        root.email = this.email.hash("sha256")
        root.ssn = deleted()

Use case: Compliance (GDPR, HIPAA)

When to Use Edge Processing

Good fit:

High data volume from edge sources
Latency-sensitive applications
Privacy/compliance requirements
Unreliable network connectivity
High bandwidth costs

Not ideal:

Low data volume
Cloud-native sources (already in cloud)
Centralized processing required
Edge resources constrained

What's Next?

👉 Quickstart - Deploy edge pipelines in 10 minutes

👉 Use Cases - See edge processing in action

👉 Examples - Real pipeline configurations

Traditional vs Edge-First​

Traditional Approach​

Edge-First Approach​

Key Benefits​

1. Reduced Bandwidth Costs​

2. Lower Latency​

3. Offline Resilience​

4. Privacy & Compliance​

Common Patterns​

Pattern 1: Edge Filtering​

Pattern 2: Regional Aggregation​

Pattern 3: Local Enrichment​

Pattern 4: PII Redaction​

When to Use Edge Processing​

What's Next?​