What is Expanso and how does it work?

Expanso is a managed platform for deploying intelligent data pipelines at the edge. It processes data where it's generated - reducing bandwidth, latency, and costs. You deploy lightweight agents on your infrastructure, build pipelines using our visual builder or YAML, and control everything from a central SaaS platform.

Can I run AI/ML models directly in my data pipelines?

Yes! Expanso supports running ONNX, TensorFlow Lite, and other models as native pipeline steps. Execute low-latency inference on streaming data, enrich events with model outputs (like risk scores), and make decisions at the edge without cloud round-trips.

How many pre-built components are available?

Expanso provides 200+ pre-built components including inputs (Kafka, HTTP, files), processors (transformations, filtering, PII masking, aggregations), and outputs (S3, Snowflake, Datadog, Splunk). Browse the complete catalog in our Component Reference.

Do I need to write code to build pipelines?

No - use our drag-and-drop visual pipeline builder to create sophisticated pipelines without code. For advanced use cases, you can also write pipelines in YAML or use the Bloblang transformation language for complex data mappings.

How does Expanso help with data governance and compliance?

Expanso includes built-in governance features: automatic PII detection and masking, policy enforcement at the edge, RBAC, SSO integration, and comprehensive audit trails. Mask sensitive data before it ever leaves your network.

Testing & Debugging

Learn how to validate configurations, add debugging output, and troubleshoot common issues when building pipelines.

Quick Reference

# Validate pipeline syntax
expanso-cli job validate my-pipeline.yaml --offline

# Run pipeline locally with verbose logging
expanso-edge run --config my-pipeline.yaml --verbose

# Test with limited data
# (add count: 10 to your generate input)

# Output to terminal for inspection
# (use stdout output in your config)

Editor Setup

Get autocomplete and validation while writing pipeline configs - makes authoring faster and catches errors before you run anything.

Expanso provides a JSON Schema for pipeline YAML files. Point your editor at it and you'll get autocomplete for component names, validation for config fields, and inline docs without leaving your editor.

What you'll get:

Browse available components as you type (inputs, processors, outputs)
Catch typos and missing fields immediately
See component docs on hover
Write configs faster with fewer errors

Configure VS Code

Add this to .vscode/settings.json in your workspace (or global settings):

{
  "yaml.schemas": {
    "https://docs.expanso.io/schemas/pipeline.schema.json": [
      "**/*.pipeline.yaml",
      "**/*pipeline*.yaml"
    ]
  }
}

This maps the Expanso pipeline schema to any YAML files matching these patterns.

Per-workspace setup (recommended):

Create .vscode/settings.json in your project directory
Add the schema configuration above
Commit it to version control so your team gets autocomplete automatically

Global setup (applies to all projects):

Open VS Code settings (Cmd+, or Ctrl+,)
Search for "yaml.schemas"
Click "Edit in settings.json"
Add the schema configuration

Configure IntelliJ IDEA / PyCharm

Go to Preferences → Languages & Frameworks → Schemas and DTDs → JSON Schema Mappings
Click + to add a new schema
Set:
- Name: Expanso Pipeline Schema
- Schema file or URL: https://docs.expanso.io/schemas/pipeline.schema.json
- Schema version: JSON Schema version 7
Add file path patterns:
- **/*.pipeline.yaml
- **/*pipeline*.yaml
Click OK

Configure Other Editors

Most editors with YAML support can use JSON Schema for validation.

Neovim (with yaml-language-server):

Add to your LSP config or .luarc.json:

{
  "yaml.schemas": {
    "https://docs.expanso.io/schemas/pipeline.schema.json": "**/*.pipeline.yaml"
  }
}

Sublime Text (with LSP-yaml):

Add to LSP-yaml settings:

{
  "settings": {
    "yaml.schemas": {
      "https://docs.expanso.io/schemas/pipeline.schema.json": "**/*.pipeline.yaml"
    }
  }
}

General approach: Look for YAML language server or JSON Schema support in your editor's docs. Configure it to associate https://docs.expanso.io/schemas/pipeline.schema.json with your pipeline YAML files.

What You'll See

Once configured:

Autocomplete for components:

Start typing under input:, processors:, or output: and you'll get suggestions for all available components (kafka, http_server, generate, mapping, etc.).

Parameter completion:

When configuring a component, autocomplete shows valid configuration fields. For example, typing under kafka: suggests addresses, topics, consumer_group, and other Kafka-specific options.

Inline validation:

Red squiggly lines appear immediately when you:

Misspell a component name
Forget required fields
Use invalid configuration keys
Have incorrect YAML indentation

Hover documentation:

Hover over any component or field to see its description and usage notes without switching to the docs.

File Naming

The schema works best with these naming patterns:

*.pipeline.yaml - Standard pipeline configs
log-processor.pipeline.yaml - Descriptive names
my-pipeline.yaml - Also works if you add the pattern to your schema config

You can customize the file patterns in your editor's schema configuration to match your team's naming conventions.

Limitations

The JSON Schema provides syntax validation and autocomplete, but it doesn't validate:

Bloblang expressions: Your mapping logic syntax isn't checked
Runtime values: Environment variables like ${VAR} aren't validated
Component availability: All components show in autocomplete, but some may not be available in your Expanso Edge version

Always validate before deploying:

expanso-cli job validate my-pipeline.yaml --offline

This catches issues the schema can't detect, like invalid Bloblang syntax or missing environment variables.

Tips

Use descriptive file names: log-processor.pipeline.yaml is clearer than pipeline1.yaml.

Commit your workspace settings: Share .vscode/settings.json with your team so everyone gets autocomplete automatically.

Validate often: Run expanso-cli job validate --offline to catch issues the schema misses.

Validate Before Running

Always validate your pipeline configuration before running it. This catches syntax errors and configuration mistakes early.

Client-Side Validation

Check syntax and structure without connecting to a server:

expanso-cli job validate my-pipeline.yaml --offline

What it checks:

✅ Valid YAML syntax
✅ Required fields present
✅ Component names exist
✅ Configuration structure

Example output (success):

✓ Configuration is valid

Example output (error):

Error: validation failed
  - input.file.paths: required field missing
  - output.unknown_component: component does not exist

Server-Side Validation

For comprehensive validation against your Expanso Cloud environment:

expanso-cli job validate my-pipeline.yaml

This performs both client-side and server-side checks (requires connection to Expanso Cloud).

Add Debug Output

When developing pipelines, add stdout outputs at each stage to see what's happening.

Debug at Each Stage

input:
  file:
    paths: [./data.json]
    codec: lines

pipeline:
  processors:
    # First transformation
    - mapping: |
        root = this.parse_json()
    
    # Debug output - see what we parsed
    - stdout:
        codec: lines
    
    # Second transformation
    - mapping: |
        root.processed = true
        root.timestamp = now()

output:
  stdout:
    codec: lines

Use Labels for Clarity

Add labels to track which stage produced output:

pipeline:
  processors:
    - label: parse_json
      mapping: |
        root = this.parse_json()
    
    - label: debug_parsed
      stdout:
        codec: lines
    
    - label: add_metadata
      mapping: |
        root.processed_at = now()
    
    - label: debug_final
      stdout:
        codec: lines

Enable Verbose Logging

Run with verbose logging to see detailed execution information:

expanso-edge run --config my-pipeline.yaml --verbose

Log levels available:

--log-level trace - Everything (very detailed)
--log-level debug - Debug information
--log-level info - Standard information (default)
--log-level warn - Warnings only
--log-level error - Errors only

Example with debug level:

expanso-edge run --config my-pipeline.yaml --log-level debug

Console vs File Logging

When running pipelines, your console only shows warnings and errors for pipeline execution logs. All detailed logs (including DEBUG and INFO messages) are written to log files. See Access Detailed Pipeline Logs below to learn how to view full debug logs.

Access Detailed Pipeline Logs

When you run pipelines, your terminal stays clean by showing only warnings and errors. But all the detailed debug logs (INFO and DEBUG messages) are still there—they're written to log files on disk so you can dig into them when you need to troubleshoot.

Where Logs Are Stored

Pipeline logs live in your edge agent's data directory at:

{data_dir}/executions/{job_id}/logs/pipeline.log

Default data directory locations:

Linux: ~/.local/share/expanso/edge (or /var/lib/expanso/edge when running as root/system service)
macOS: ~/Library/Application Support/expanso/edge
Windows: %LOCALAPPDATA%\expanso\edge

You can override the data directory using the --data-dir flag or EXPANSO_DATA_DIR environment variable.

View Pipeline Logs

You have two ways to access logs:

Option 1: Using the CLI (recommended)

The CLI gives you formatted log viewing with filtering options:

# View all pipeline logs
expanso-cli job logs <job-id>

# Filter by log level
expanso-cli job logs <job-id> --level debug
expanso-cli job logs <job-id> --level warn

# Follow logs in real-time
expanso-cli job logs <job-id> --follow

This is the quickest way to view logs with nice formatting and real-time tailing.

Option 2: Direct file access

If you need to access log files directly (for scripting, archiving, or using other log tools):

# List executions to find the ID
expanso-cli execution list

# View logs directly
cat ~/.local/share/expanso/edge/executions/<job_id>/logs/pipeline.log

Console vs File Logging

Here's how Expanso splits logging between your terminal and disk:

Output	Log Levels	Use Case
Console (stdout)	WARN, ERROR only	Quick monitoring, spotting problems
Log files	All levels (DEBUG, INFO, WARN, ERROR)	Detailed debugging, troubleshooting

Why split logs this way?

Pipeline debug logs can get extremely verbose—showing every message processed, every transformation applied, and so on. By limiting console output to warnings and errors, your terminal stays readable while full details are preserved in files for when you need them.

Quick guide for which to use:

Console output - Great when you're running pipelines interactively during development and just want to see that things are working (or catch errors quickly)
Log files - Essential when troubleshooting specific pipeline behavior, investigating data transformation issues, debugging Bloblang mapping logic, or analyzing performance details

Test with Limited Data

When testing, use a fixed number of messages instead of continuous generation.

Limit Generated Messages

input:
  generate:
    count: 10  # Stop after 10 messages
    interval: ""  # Generate as fast as possible
    mapping: |
      root.test_id = uuid_v4()
      root.timestamp = now()

Process Specific Files

Test against a small sample file first:

# Create test file with 5 records
head -n 5 large-file.json > test-data.json

# Test against small file first
expanso-edge run --config my-pipeline.yaml

input:
  file:
    paths: [./test-data.json]
    codec: lines

Common Issues and Solutions

Issue: File Access Errors

Symptoms: expanso-cli job deploy or expanso-cli job validate fail with errors like "no such file or directory" or "permission denied".

Cause: The file path doesn't exist, is inaccessible, or points to a directory instead of a file.

Solution: Add the -v (verbose) flag to get detailed diagnostics:

# Default error (minimal details)
expanso-cli job deploy my-pipeline.yaml
# Output: failed to read job specification file: open /path/to/file: no such file or directory

# Verbose mode (detailed diagnostics)
expanso-cli job deploy my-pipeline.yaml -v

Example verbose output:

cannot read job specification '/var/log/app/pipeline.yaml': parent directory does not exist
  path: /var/log/app/pipeline.yaml
  path_exists: no
  parent_exists: no
  parent_path: /var/log/app

What the diagnostic fields mean:

Field	Description
`path`	The file path you requested
`path_exists`	Whether the path exists (yes/no)
`parent_exists`	Whether the parent directory exists (yes/no)
`parent_path`	The parent directory path
`path_type`	Type if path exists (file/directory)
`is_symlink`	Whether it's a symbolic link
`symlink_target`	Where the symlink points
`symlink_broken`	Whether the symlink target is missing
`permission_denied`	Whether access was blocked by permissions

Common scenarios:

Parent directory missing (parent_exists: no): Create the directory first, then add your file.
File missing (parent_exists: yes, path_exists: no): Check for typos in the filename.
Permission denied (permission_denied: yes): Check file permissions or run as a different user.
Broken symlink (symlink_broken: yes): Fix the symlink target or remove it.
Path is a directory (path_type: directory): You provided a directory path instead of a file path.

Issue: Pipeline Runs But No Output

Symptoms: Pipeline starts but nothing appears in output.

Possible causes:

Data is being filtered out

Check your mapping processors - are you using deleted()?

# This will delete all messages!
- mapping: |
    root = if this.status == "active" { deleted() }

Solution: Add debug output before the filter to see what's being dropped:

- label: before_filter
  stdout:
    codec: lines

- mapping: |
    root = if this.status != "active" { deleted() }

Input has no data

Solution: Check your input source has data available:

# For files
ls -lh ./data.json
cat ./data.json

# For generate input - check your mapping is valid

Output is going somewhere else

Solution: Temporarily change output to stdout for debugging:

output:
  stdout:
    codec: lines

Issue: Parsing Errors

Symptoms: Errors like failed to parse JSON or invalid format.

Cause: Data format doesn't match what you're trying to parse.

Solution: Print the raw input first:

input:
  file:
    paths: [./data.json]
    codec: lines

pipeline:
  processors:
    # Debug: see raw data
    - label: raw_input
      stdout:
        codec: lines
    
    # Then try parsing
    - mapping: |
        root = this.parse_json()

Look at the raw data and adjust your parsing logic accordingly.

Issue: Mapping/Bloblang Errors

Symptoms: Errors in transformation logic like undefined method or type mismatch.

Cause: Incorrect Bloblang syntax or attempting operations on wrong data types.

Solution: Test transformations incrementally:

# Start simple
- mapping: |
    root = this

# Add one transformation at a time
- mapping: |
    root = this
    root.parsed = this.parse_json()

# Validate each step
- stdout:
    codec: lines

Common Bloblang mistakes:

# ❌ Wrong - trying to parse already-parsed JSON
root = this.parse_json().parse_json()

# ✅ Right - parse once
root = this.parse_json()

# ❌ Wrong - field might not exist
root.value = this.data.deeply.nested.field

# ✅ Right - check existence
root.value = this.data.deeply.nested.field | "default"

See the Bloblang Guide for more patterns.

Issue: Performance - Pipeline is Slow

Symptoms: Messages processing slowly, backlog building up.

Debug approach:

Add timing measurements

- mapping: |
    root = this
    meta start_time = now()

# ... your processors ...

- mapping: |
    root.processing_time_ms = now().ts_unix_milli() - meta("start_time").ts_unix_milli()

Check processor complexity

Complex transformations in mapping processors can slow things down.

Solution: Simplify logic or break into multiple simpler processors.

Check output destination

The output might be the bottleneck (slow API, overloaded database).

Solution: Test with stdout output to isolate the issue.

Testing Workflows

1. Syntax Validation

Always start here:

expanso-cli job validate my-pipeline.yaml --offline

2. Smoke Test

Run with limited data to stdout:

input:
  generate:
    count: 5
    interval: ""
    mapping: |
      root = {"test": "data"}

# ... your processors ...

output:
  stdout:
    codec: lines

expanso-edge run --config my-pipeline.yaml

3. Unit Test Each Component

Test input → processor → output independently:

Test input only:

input:
  file:
    paths: [./data.json]
    codec: lines

output:
  stdout:
    codec: lines

Test processor only:

input:
  generate:
    count: 5
    interval: ""
    mapping: |
      root = {"test": "data"}

pipeline:
  processors:
    - mapping: |
        root.transformed = this.test.uppercase()

output:
  stdout:
    codec: lines

4. Integration Test

Test with real data sources but safe outputs (local files or stdout):

input:
  http_server:
    address: localhost:8080

pipeline:
  processors:
    - mapping: |
        root = this.parse_json()

output:
  file:
    path: ./test-output.json
    codec: lines

5. Deploy

Once validated and tested locally, deploy to production via Expanso Cloud.

Debugging Checklist

When something goes wrong, work through this checklist:

Validate syntax: expanso-cli job validate --offline
Check input has data: Print raw input to stdout
Verify transformations: Test each processor separately
Enable verbose logging: Run with --verbose or --log-level debug
Simplify: Remove processors one by one to isolate the issue
Test with minimal data: Use count: 5 in generate input
Check for filtering: Ensure you're not accidentally deleting all messages
Validate Bloblang: Test complex transformations in isolation

Next Steps

Bloblang Guide - Learn transformation patterns and functions
Component Reference - Browse all available inputs, processors, and outputs
Error Handling - Add retry logic and dead letter queues
Deploy to Production - Move to managed deployment with Expanso Cloud

Pro Tips

Use generate for testing: The generate input is your best friend for testing transformations without external dependencies.

Keep test files small: Don't test with production-sized data initially. Use 5-10 records.

Version control your configs: Track changes to pipeline configurations in git.

Test transformations in isolation: Before adding complex Bloblang logic to your pipeline, test it separately:

input:
  generate:
    count: 1
    interval: ""
    mapping: |
      root = {"test": "value"}

pipeline:
  processors:
    - mapping: |
        # Test your transformation here
        root.result = this.test.uppercase()

output:
  stdout:
    codec: lines

Use descriptive labels: Add label to all processors to make logs easier to read.

Start simple, add complexity: Begin with a minimal pipeline, verify it works, then add features incrementally.

Quick Reference​

Editor Setup​

Configure VS Code​

Configure IntelliJ IDEA / PyCharm​

Configure Other Editors​

What You'll See​

File Naming​

Limitations​

Tips​

Validate Before Running​

Client-Side Validation​

Server-Side Validation​

Add Debug Output​

Debug at Each Stage​

Use Labels for Clarity​

Enable Verbose Logging​

Access Detailed Pipeline Logs​

Where Logs Are Stored​

View Pipeline Logs​

Console vs File Logging​

Test with Limited Data​

Limit Generated Messages​

Process Specific Files​

Common Issues and Solutions​

Issue: File Access Errors​

Issue: Pipeline Runs But No Output​

Issue: Parsing Errors​

Issue: Mapping/Bloblang Errors​

Issue: Performance - Pipeline is Slow​

Testing Workflows​

1. Syntax Validation​

2. Smoke Test​

3. Unit Test Each Component​

4. Integration Test​

5. Deploy​

Debugging Checklist​

Next Steps​

Pro Tips​

Quick Reference

Editor Setup

Configure VS Code

Configure IntelliJ IDEA / PyCharm

Configure Other Editors

What You'll See

File Naming

Limitations

Tips

Validate Before Running

Client-Side Validation

Server-Side Validation

Add Debug Output

Debug at Each Stage

Use Labels for Clarity

Enable Verbose Logging

Access Detailed Pipeline Logs

Where Logs Are Stored

View Pipeline Logs

Console vs File Logging

Test with Limited Data

Limit Generated Messages

Process Specific Files

Common Issues and Solutions

Issue: File Access Errors

Issue: Pipeline Runs But No Output

Issue: Parsing Errors

Issue: Mapping/Bloblang Errors

Issue: Performance - Pipeline is Slow

Testing Workflows

1. Syntax Validation

2. Smoke Test

3. Unit Test Each Component

4. Integration Test

5. Deploy

Debugging Checklist

Next Steps

Pro Tips