Skip to main content

Testing & Debugging

Learn how to validate configurations, add debugging output, and troubleshoot common issues when building pipelines.

Quick Reference

# Validate pipeline syntax
expanso-cli job validate my-pipeline.yaml --offline

# Run pipeline locally with verbose logging
expanso-edge run --config my-pipeline.yaml --verbose

# Test with limited data
# (add count: 10 to your generate input)

# Output to terminal for inspection
# (use stdout output in your config)

Editor Setup

Get autocomplete and validation while writing pipeline configs - makes authoring faster and catches errors before you run anything.

Expanso provides a JSON Schema for pipeline YAML files. Point your editor at it and you'll get autocomplete for component names, validation for config fields, and inline docs without leaving your editor.

What you'll get:

  • Browse available components as you type (inputs, processors, outputs)
  • Catch typos and missing fields immediately
  • See component docs on hover
  • Write configs faster with fewer errors

Configure VS Code

Add this to .vscode/settings.json in your workspace (or global settings):

{
"yaml.schemas": {
"https://docs.expanso.io/schemas/pipeline.schema.json": [
"**/*.pipeline.yaml",
"**/*pipeline*.yaml"
]
}
}

This maps the Expanso pipeline schema to any YAML files matching these patterns.

Per-workspace setup (recommended):

  1. Create .vscode/settings.json in your project directory
  2. Add the schema configuration above
  3. Commit it to version control so your team gets autocomplete automatically

Global setup (applies to all projects):

  1. Open VS Code settings (Cmd+, or Ctrl+,)
  2. Search for "yaml.schemas"
  3. Click "Edit in settings.json"
  4. Add the schema configuration

Configure IntelliJ IDEA / PyCharm

  1. Go to PreferencesLanguages & FrameworksSchemas and DTDsJSON Schema Mappings
  2. Click + to add a new schema
  3. Set:
    • Name: Expanso Pipeline Schema
    • Schema file or URL: https://docs.expanso.io/schemas/pipeline.schema.json
    • Schema version: JSON Schema version 7
  4. Add file path patterns:
    • **/*.pipeline.yaml
    • **/*pipeline*.yaml
  5. Click OK

Configure Other Editors

Most editors with YAML support can use JSON Schema for validation.

Neovim (with yaml-language-server):

Add to your LSP config or .luarc.json:

{
"yaml.schemas": {
"https://docs.expanso.io/schemas/pipeline.schema.json": "**/*.pipeline.yaml"
}
}

Sublime Text (with LSP-yaml):

Add to LSP-yaml settings:

{
"settings": {
"yaml.schemas": {
"https://docs.expanso.io/schemas/pipeline.schema.json": "**/*.pipeline.yaml"
}
}
}

General approach: Look for YAML language server or JSON Schema support in your editor's docs. Configure it to associate https://docs.expanso.io/schemas/pipeline.schema.json with your pipeline YAML files.

What You'll See

Once configured:

Autocomplete for components:

Start typing under input:, processors:, or output: and you'll get suggestions for all available components (kafka, http_server, generate, mapping, etc.).

Parameter completion:

When configuring a component, autocomplete shows valid configuration fields. For example, typing under kafka: suggests addresses, topics, consumer_group, and other Kafka-specific options.

Inline validation:

Red squiggly lines appear immediately when you:

  • Misspell a component name
  • Forget required fields
  • Use invalid configuration keys
  • Have incorrect YAML indentation

Hover documentation:

Hover over any component or field to see its description and usage notes without switching to the docs.

File Naming

The schema works best with these naming patterns:

  • *.pipeline.yaml - Standard pipeline configs
  • log-processor.pipeline.yaml - Descriptive names
  • my-pipeline.yaml - Also works if you add the pattern to your schema config

You can customize the file patterns in your editor's schema configuration to match your team's naming conventions.

Limitations

The JSON Schema provides syntax validation and autocomplete, but it doesn't validate:

  • Bloblang expressions: Your mapping logic syntax isn't checked
  • Runtime values: Environment variables like ${VAR} aren't validated
  • Component availability: All components show in autocomplete, but some may not be available in your Expanso Edge version

Always validate before deploying:

expanso-cli job validate my-pipeline.yaml --offline

This catches issues the schema can't detect, like invalid Bloblang syntax or missing environment variables.

Tips

Use descriptive file names: log-processor.pipeline.yaml is clearer than pipeline1.yaml.

Commit your workspace settings: Share .vscode/settings.json with your team so everyone gets autocomplete automatically.

Validate often: Run expanso-cli job validate --offline to catch issues the schema misses.


Validate Before Running

Always validate your pipeline configuration before running it. This catches syntax errors and configuration mistakes early.

Client-Side Validation

Check syntax and structure without connecting to a server:

expanso-cli job validate my-pipeline.yaml --offline

What it checks:

  • ✅ Valid YAML syntax
  • ✅ Required fields present
  • ✅ Component names exist
  • ✅ Configuration structure

Example output (success):

✓ Configuration is valid

Example output (error):

Error: validation failed
- input.file.paths: required field missing
- output.unknown_component: component does not exist

Server-Side Validation

For comprehensive validation against your Expanso Cloud environment:

expanso-cli job validate my-pipeline.yaml

This performs both client-side and server-side checks (requires connection to Expanso Cloud).


Add Debug Output

When developing pipelines, add stdout outputs at each stage to see what's happening.

Debug at Each Stage

input:
file:
paths: [./data.json]
codec: lines

pipeline:
processors:
# First transformation
- mapping: |
root = this.parse_json()

# Debug output - see what we parsed
- stdout:
codec: lines

# Second transformation
- mapping: |
root.processed = true
root.timestamp = now()

output:
stdout:
codec: lines

Use Labels for Clarity

Add labels to track which stage produced output:

pipeline:
processors:
- label: parse_json
mapping: |
root = this.parse_json()

- label: debug_parsed
stdout:
codec: lines

- label: add_metadata
mapping: |
root.processed_at = now()

- label: debug_final
stdout:
codec: lines

Enable Verbose Logging

Run with verbose logging to see detailed execution information:

expanso-edge run --config my-pipeline.yaml --verbose

Log levels available:

  • --log-level trace - Everything (very detailed)
  • --log-level debug - Debug information
  • --log-level info - Standard information (default)
  • --log-level warn - Warnings only
  • --log-level error - Errors only

Example with debug level:

expanso-edge run --config my-pipeline.yaml --log-level debug
Console vs File Logging

When running pipelines, your console only shows warnings and errors for pipeline execution logs. All detailed logs (including DEBUG and INFO messages) are written to log files. See Access Detailed Pipeline Logs below to learn how to view full debug logs.


Access Detailed Pipeline Logs

When you run pipelines, your terminal stays clean by showing only warnings and errors. But all the detailed debug logs (INFO and DEBUG messages) are still there—they're written to log files on disk so you can dig into them when you need to troubleshoot.

Where Logs Are Stored

Pipeline logs live in your edge agent's data directory at:

{data_dir}/executions/{job_id}/logs/pipeline.log

Default data directory locations:

  • Linux: ~/.local/share/expanso/edge (or /var/lib/expanso/edge when running as root/system service)
  • macOS: ~/Library/Application Support/expanso/edge
  • Windows: %LOCALAPPDATA%\expanso\edge

You can override the data directory using the --data-dir flag or EXPANSO_DATA_DIR environment variable.

View Pipeline Logs

You have two ways to access logs:

Option 1: Using the CLI (recommended)

The CLI gives you formatted log viewing with filtering options:

# View all pipeline logs
expanso-cli job logs <job-id>

# Filter by log level
expanso-cli job logs <job-id> --level debug
expanso-cli job logs <job-id> --level warn

# Follow logs in real-time
expanso-cli job logs <job-id> --follow

This is the quickest way to view logs with nice formatting and real-time tailing.

Option 2: Direct file access

If you need to access log files directly (for scripting, archiving, or using other log tools):

# List executions to find the ID
expanso-cli execution list

# View logs directly
cat ~/.local/share/expanso/edge/executions/<job_id>/logs/pipeline.log

Console vs File Logging

Here's how Expanso splits logging between your terminal and disk:

OutputLog LevelsUse Case
Console (stdout)WARN, ERROR onlyQuick monitoring, spotting problems
Log filesAll levels (DEBUG, INFO, WARN, ERROR)Detailed debugging, troubleshooting

Why split logs this way?

Pipeline debug logs can get extremely verbose—showing every message processed, every transformation applied, and so on. By limiting console output to warnings and errors, your terminal stays readable while full details are preserved in files for when you need them.

Quick guide for which to use:

  • Console output - Great when you're running pipelines interactively during development and just want to see that things are working (or catch errors quickly)
  • Log files - Essential when troubleshooting specific pipeline behavior, investigating data transformation issues, debugging Bloblang mapping logic, or analyzing performance details

Test with Limited Data

When testing, use a fixed number of messages instead of continuous generation.

Limit Generated Messages

input:
generate:
count: 10 # Stop after 10 messages
interval: "" # Generate as fast as possible
mapping: |
root.test_id = uuid_v4()
root.timestamp = now()

Process Specific Files

Test against a small sample file first:

# Create test file with 5 records
head -n 5 large-file.json > test-data.json

# Test against small file first
expanso-edge run --config my-pipeline.yaml
input:
file:
paths: [./test-data.json]
codec: lines

Common Issues and Solutions

Issue: File Access Errors

Symptoms: expanso-cli job deploy or expanso-cli job validate fail with errors like "no such file or directory" or "permission denied".

Cause: The file path doesn't exist, is inaccessible, or points to a directory instead of a file.

Solution: Add the -v (verbose) flag to get detailed diagnostics:

# Default error (minimal details)
expanso-cli job deploy my-pipeline.yaml
# Output: failed to read job specification file: open /path/to/file: no such file or directory

# Verbose mode (detailed diagnostics)
expanso-cli job deploy my-pipeline.yaml -v

Example verbose output:

cannot read job specification '/var/log/app/pipeline.yaml': parent directory does not exist
path: /var/log/app/pipeline.yaml
path_exists: no
parent_exists: no
parent_path: /var/log/app

What the diagnostic fields mean:

FieldDescription
pathThe file path you requested
path_existsWhether the path exists (yes/no)
parent_existsWhether the parent directory exists (yes/no)
parent_pathThe parent directory path
path_typeType if path exists (file/directory)
is_symlinkWhether it's a symbolic link
symlink_targetWhere the symlink points
symlink_brokenWhether the symlink target is missing
permission_deniedWhether access was blocked by permissions

Common scenarios:

  1. Parent directory missing (parent_exists: no): Create the directory first, then add your file.

  2. File missing (parent_exists: yes, path_exists: no): Check for typos in the filename.

  3. Permission denied (permission_denied: yes): Check file permissions or run as a different user.

  4. Broken symlink (symlink_broken: yes): Fix the symlink target or remove it.

  5. Path is a directory (path_type: directory): You provided a directory path instead of a file path.


Issue: Pipeline Runs But No Output

Symptoms: Pipeline starts but nothing appears in output.

Possible causes:

  1. Data is being filtered out

Check your mapping processors - are you using deleted()?

# This will delete all messages!
- mapping: |
root = if this.status == "active" { deleted() }

Solution: Add debug output before the filter to see what's being dropped:

- label: before_filter
stdout:
codec: lines

- mapping: |
root = if this.status != "active" { deleted() }
  1. Input has no data

Solution: Check your input source has data available:

# For files
ls -lh ./data.json
cat ./data.json

# For generate input - check your mapping is valid
  1. Output is going somewhere else

Solution: Temporarily change output to stdout for debugging:

output:
stdout:
codec: lines

Issue: Parsing Errors

Symptoms: Errors like failed to parse JSON or invalid format.

Cause: Data format doesn't match what you're trying to parse.

Solution: Print the raw input first:

input:
file:
paths: [./data.json]
codec: lines

pipeline:
processors:
# Debug: see raw data
- label: raw_input
stdout:
codec: lines

# Then try parsing
- mapping: |
root = this.parse_json()

Look at the raw data and adjust your parsing logic accordingly.


Issue: Mapping/Bloblang Errors

Symptoms: Errors in transformation logic like undefined method or type mismatch.

Cause: Incorrect Bloblang syntax or attempting operations on wrong data types.

Solution: Test transformations incrementally:

# Start simple
- mapping: |
root = this

# Add one transformation at a time
- mapping: |
root = this
root.parsed = this.parse_json()

# Validate each step
- stdout:
codec: lines

Common Bloblang mistakes:

# ❌ Wrong - trying to parse already-parsed JSON
root = this.parse_json().parse_json()

# ✅ Right - parse once
root = this.parse_json()

# ❌ Wrong - field might not exist
root.value = this.data.deeply.nested.field

# ✅ Right - check existence
root.value = this.data.deeply.nested.field | "default"

See the Bloblang Guide for more patterns.


Issue: Performance - Pipeline is Slow

Symptoms: Messages processing slowly, backlog building up.

Debug approach:

  1. Add timing measurements
- mapping: |
root = this
meta start_time = now()

# ... your processors ...

- mapping: |
root.processing_time_ms = now().ts_unix_milli() - meta("start_time").ts_unix_milli()
  1. Check processor complexity

Complex transformations in mapping processors can slow things down.

Solution: Simplify logic or break into multiple simpler processors.

  1. Check output destination

The output might be the bottleneck (slow API, overloaded database).

Solution: Test with stdout output to isolate the issue.


Testing Workflows

1. Syntax Validation

Always start here:

expanso-cli job validate my-pipeline.yaml --offline

2. Smoke Test

Run with limited data to stdout:

input:
generate:
count: 5
interval: ""
mapping: |
root = {"test": "data"}

# ... your processors ...

output:
stdout:
codec: lines
expanso-edge run --config my-pipeline.yaml

3. Unit Test Each Component

Test input → processor → output independently:

Test input only:

input:
file:
paths: [./data.json]
codec: lines

output:
stdout:
codec: lines

Test processor only:

input:
generate:
count: 5
interval: ""
mapping: |
root = {"test": "data"}

pipeline:
processors:
- mapping: |
root.transformed = this.test.uppercase()

output:
stdout:
codec: lines

4. Integration Test

Test with real data sources but safe outputs (local files or stdout):

input:
http_server:
address: localhost:8080

pipeline:
processors:
- mapping: |
root = this.parse_json()

output:
file:
path: ./test-output.json
codec: lines

5. Deploy

Once validated and tested locally, deploy to production via Expanso Cloud.


Debugging Checklist

When something goes wrong, work through this checklist:

  • Validate syntax: expanso-cli job validate --offline
  • Check input has data: Print raw input to stdout
  • Verify transformations: Test each processor separately
  • Enable verbose logging: Run with --verbose or --log-level debug
  • Simplify: Remove processors one by one to isolate the issue
  • Test with minimal data: Use count: 5 in generate input
  • Check for filtering: Ensure you're not accidentally deleting all messages
  • Validate Bloblang: Test complex transformations in isolation

Next Steps


Pro Tips

Use generate for testing: The generate input is your best friend for testing transformations without external dependencies.

Keep test files small: Don't test with production-sized data initially. Use 5-10 records.

Version control your configs: Track changes to pipeline configurations in git.

Test transformations in isolation: Before adding complex Bloblang logic to your pipeline, test it separately:

input:
generate:
count: 1
interval: ""
mapping: |
root = {"test": "value"}

pipeline:
processors:
- mapping: |
# Test your transformation here
root.result = this.test.uppercase()

output:
stdout:
codec: lines

Use descriptive labels: Add label to all processors to make logs easier to read.

Start simple, add complexity: Begin with a minimal pipeline, verify it works, then add features incrementally.