What is Expanso and how does it work?

Expanso is a managed platform for deploying intelligent data pipelines at the edge. It processes data where it's generated - reducing bandwidth, latency, and costs. You deploy lightweight agents on your infrastructure, build pipelines using our visual builder or YAML, and control everything from a central SaaS platform.

Can I run AI/ML models directly in my data pipelines?

Yes! Expanso supports running ONNX, TensorFlow Lite, and other models as native pipeline steps. Execute low-latency inference on streaming data, enrich events with model outputs (like risk scores), and make decisions at the edge without cloud round-trips.

How many pre-built components are available?

Expanso provides 200+ pre-built components including inputs (Kafka, HTTP, files), processors (transformations, filtering, PII masking, aggregations), and outputs (S3, Snowflake, Datadog, Splunk). Browse the complete catalog in our Component Reference.

Do I need to write code to build pipelines?

No - use our drag-and-drop visual pipeline builder to create sophisticated pipelines without code. For advanced use cases, you can also write pipelines in YAML or use the Bloblang transformation language for complex data mappings.

How does Expanso help with data governance and compliance?

Expanso includes built-in governance features: automatic PII detection and masking, policy enforcement at the edge, RBAC, SSO integration, and comprehensive audit trails. Mask sensitive data before it ever leaves your network.

Step 4: Process Real Data

Now let's process real, live data from your system. We'll tail a log file that changes continuously.

Permission Note

Reading system logs may require elevated permissions. If you get a permission error:

Run with sudo expanso-edge run --config logs-pipeline.yaml
Or choose a log file you have access to (like application logs)

Create the Pipeline

Create logs-pipeline.yaml:

Linux
macOS
Windows

input:
  file:
    paths: [/var/log/syslog]
    codec: lines

pipeline:
  processors:
    - mapping: |
        root.raw = this
        root.length = this.length()
        root.processed_at = now()

output:
  stdout:
    codec: lines

input:
  file:
    paths: [/var/log/system.log]
    codec: lines

pipeline:
  processors:
    - mapping: |
        root.raw = this
        root.length = this.length()
        root.processed_at = now()

output:
  stdout:
    codec: lines

input:
  file:
    paths: [C:\Windows\System32\log.txt]
    codec: lines

pipeline:
  processors:
    - mapping: |
        root.raw = this
        root.length = this.length()
        root.processed_at = now()

output:
  stdout:
    codec: lines

Run it:

# May need sudo for system logs
sudo expanso-edge run --config logs-pipeline.yaml

You'll see new log entries appear in real-time as your system generates them:

{"length":142,"processed_at":"2024-12-26T10:10:00Z","raw":"Dec 26 10:10:00 myhost kernel: ..."}
{"length":98,"processed_at":"2024-12-26T10:10:01Z","raw":"Dec 26 10:10:01 myhost systemd: ..."}

What's Happening?

file input - Tails the file, reading new lines as they're written
codec: lines - Treats each line as a separate message
Real-time processing - As logs are written, they flow through your pipeline

Try Adding a Filter

Want to only see error messages? Add a filter processor:

input:
  file:
    paths: [/var/log/syslog]
    codec: lines

pipeline:
  processors:
    # Only keep lines containing "error" (case-insensitive)
    - mapping: |
        root = if !this.lowercase().contains("error") {
          deleted()
        }

    # Add metadata
    - mapping: |
        root.raw = this
        root.severity = "ERROR"
        root.processed_at = now()

output:
  stdout:
    codec: lines

Recap: What You've Learned

Step	Concept	Key Takeaway
1	Hello World	Simplest pipeline: generate → stdout
2	Make a Change	Fast iteration, Bloblang functions
3	Transformation	Processors add/modify fields, do calculations
4	Real Data	File input tails logs in real-time

You now understand the core pipeline pattern:

[Input] → [Processor(s)] → [Output]

Next Steps

Now that you've built pipelines locally:

Test and Debug - Validate configurations and troubleshoot issues
Explore Components - Browse 200+ inputs, processors, and outputs
Learn Bloblang - Master the transformation language
Deploy to Production - Set up Expanso Cloud and deploy pipelines to your infrastructure

Tips

Start Simple: Test each part of your pipeline separately before combining them.

Use stdout for Debugging: Always output to stdout first to see exactly what's happening.

Limit Test Data: Use count to produce a fixed number of messages:

input:
  generate:
    count: 5  # Generate exactly 5 messages then stop
    interval: ""  # As fast as possible
    mapping: |
      root = {"test": "data"}

Fast Iteration: Edit YAML → Stop (Ctrl+C) → Re-run. The feedback loop is instant.

Create the Pipeline​

What's Happening?​

Try Adding a Filter​

Recap: What You've Learned​

Next Steps​

Tips​