Testing & Debugging
Learn how to validate configurations, add debugging output, and troubleshoot common issues when building pipelines.
Quick Reference
# Validate pipeline syntax
expanso-cli job validate my-pipeline.yaml --offline
# Run pipeline locally with verbose logging
expanso-edge run --config my-pipeline.yaml --verbose
# Test with limited data
# (add count: 10 to your generate input)
# Output to terminal for inspection
# (use stdout output in your config)
Editor Setup
Get autocomplete and validation while writing pipeline configs - makes authoring faster and catches errors before you run anything.
Expanso provides a JSON Schema for pipeline YAML files. Point your editor at it and you'll get autocomplete for component names, validation for config fields, and inline docs without leaving your editor.
What you'll get:
- Browse available components as you type (inputs, processors, outputs)
- Catch typos and missing fields immediately
- See component docs on hover
- Write configs faster with fewer errors
Configure VS Code
Add this to .vscode/settings.json in your workspace (or global settings):
{
"yaml.schemas": {
"https://docs.expanso.io/schemas/pipeline.schema.json": [
"**/*.pipeline.yaml",
"**/*pipeline*.yaml"
]
}
}
This maps the Expanso pipeline schema to any YAML files matching these patterns.
Per-workspace setup (recommended):
- Create
.vscode/settings.jsonin your project directory - Add the schema configuration above
- Commit it to version control so your team gets autocomplete automatically
Global setup (applies to all projects):
- Open VS Code settings (Cmd+, or Ctrl+,)
- Search for "yaml.schemas"
- Click "Edit in settings.json"
- Add the schema configuration
Configure IntelliJ IDEA / PyCharm
- Go to Preferences → Languages & Frameworks → Schemas and DTDs → JSON Schema Mappings
- Click + to add a new schema
- Set:
- Name: Expanso Pipeline Schema
- Schema file or URL:
https://docs.expanso.io/schemas/pipeline.schema.json - Schema version: JSON Schema version 7
- Add file path patterns:
**/*.pipeline.yaml**/*pipeline*.yaml
- Click OK
Configure Other Editors
Most editors with YAML support can use JSON Schema for validation.
Neovim (with yaml-language-server):
Add to your LSP config or .luarc.json:
{
"yaml.schemas": {
"https://docs.expanso.io/schemas/pipeline.schema.json": "**/*.pipeline.yaml"
}
}
Sublime Text (with LSP-yaml):
Add to LSP-yaml settings:
{
"settings": {
"yaml.schemas": {
"https://docs.expanso.io/schemas/pipeline.schema.json": "**/*.pipeline.yaml"
}
}
}
General approach: Look for YAML language server or JSON Schema support in your editor's docs. Configure it to associate https://docs.expanso.io/schemas/pipeline.schema.json with your pipeline YAML files.
What You'll See
Once configured:
Autocomplete for components:
Start typing under input:, processors:, or output: and you'll get suggestions for all available components (kafka, http_server, generate, mapping, etc.).
Parameter completion:
When configuring a component, autocomplete shows valid configuration fields. For example, typing under kafka: suggests addresses, topics, consumer_group, and other Kafka-specific options.
Inline validation:
Red squiggly lines appear immediately when you:
- Misspell a component name
- Forget required fields
- Use invalid configuration keys
- Have incorrect YAML indentation
Hover documentation:
Hover over any component or field to see its description and usage notes without switching to the docs.
File Naming
The schema works best with these naming patterns:
*.pipeline.yaml- Standard pipeline configslog-processor.pipeline.yaml- Descriptive namesmy-pipeline.yaml- Also works if you add the pattern to your schema config
You can customize the file patterns in your editor's schema configuration to match your team's naming conventions.
Limitations
The JSON Schema provides syntax validation and autocomplete, but it doesn't validate:
- Bloblang expressions: Your mapping logic syntax isn't checked
- Runtime values: Environment variables like
${VAR}aren't validated - Component availability: All components show in autocomplete, but some may not be available in your Expanso Edge version
Always validate before deploying:
expanso-cli job validate my-pipeline.yaml --offline
This catches issues the schema can't detect, like invalid Bloblang syntax or missing environment variables.
Tips
Use descriptive file names: log-processor.pipeline.yaml is clearer than pipeline1.yaml.
Commit your workspace settings: Share .vscode/settings.json with your team so everyone gets autocomplete automatically.
Validate often: Run expanso-cli job validate --offline to catch issues the schema misses.
Validate Before Running
Always validate your pipeline configuration before running it. This catches syntax errors and configuration mistakes early.
Client-Side Validation
Check syntax and structure without connecting to a server:
expanso-cli job validate my-pipeline.yaml --offline
What it checks:
- ✅ Valid YAML syntax
- ✅ Required fields present
- ✅ Component names exist
- ✅ Configuration structure
Example output (success):
✓ Configuration is valid
Example output (error):
Error: validation failed
- input.file.paths: required field missing
- output.unknown_component: component does not exist
Server-Side Validation
For comprehensive validation against your Expanso Cloud environment:
expanso-cli job validate my-pipeline.yaml
This performs both client-side and server-side checks (requires connection to Expanso Cloud).
Add Debug Output
When developing pipelines, add stdout outputs at each stage to see what's happening.
Debug at Each Stage
input:
file:
paths: [./data.json]
codec: lines
pipeline:
processors:
# First transformation
- mapping: |
root = this.parse_json()
# Debug output - see what we parsed
- stdout:
codec: lines
# Second transformation
- mapping: |
root.processed = true
root.timestamp = now()
output:
stdout:
codec: lines
Use Labels for Clarity
Add labels to track which stage produced output:
pipeline:
processors:
- label: parse_json
mapping: |
root = this.parse_json()
- label: debug_parsed
stdout:
codec: lines
- label: add_metadata
mapping: |
root.processed_at = now()
- label: debug_final
stdout:
codec: lines
Enable Verbose Logging
Run with verbose logging to see detailed execution information:
expanso-edge run --config my-pipeline.yaml --verbose
Log levels available:
--log-level trace- Everything (very detailed)--log-level debug- Debug information--log-level info- Standard information (default)--log-level warn- Warnings only--log-level error- Errors only
Example with debug level:
expanso-edge run --config my-pipeline.yaml --log-level debug
When running pipelines, your console only shows warnings and errors for pipeline execution logs. All detailed logs (including DEBUG and INFO messages) are written to log files. See Access Detailed Pipeline Logs below to learn how to view full debug logs.
Access Detailed Pipeline Logs
When you run pipelines, your terminal stays clean by showing only warnings and errors. But all the detailed debug logs (INFO and DEBUG messages) are still there—they're written to log files on disk so you can dig into them when you need to troubleshoot.
Where Logs Are Stored
Pipeline logs live in your edge agent's data directory at:
{data_dir}/executions/{job_id}/logs/pipeline.log
Default data directory locations:
- Linux:
~/.local/share/expanso/edge(or/var/lib/expanso/edgewhen running as root/system service) - macOS:
~/Library/Application Support/expanso/edge - Windows:
%LOCALAPPDATA%\expanso\edge
You can override the data directory using the --data-dir flag or EXPANSO_DATA_DIR environment variable.
View Pipeline Logs
You have two ways to access logs:
Option 1: Using the CLI (recommended)
The CLI gives you formatted log viewing with filtering options:
# View all pipeline logs
expanso-cli job logs <job-id>
# Filter by log level
expanso-cli job logs <job-id> --level debug
expanso-cli job logs <job-id> --level warn
# Follow logs in real-time
expanso-cli job logs <job-id> --follow
This is the quickest way to view logs with nice formatting and real-time tailing.
Option 2: Direct file access
If you need to access log files directly (for scripting, archiving, or using other log tools):
# List executions to find the ID
expanso-cli execution list
# View logs directly
cat ~/.local/share/expanso/edge/executions/<job_id>/logs/pipeline.log
Console vs File Logging
Here's how Expanso splits logging between your terminal and disk:
| Output | Log Levels | Use Case |
|---|---|---|
| Console (stdout) | WARN, ERROR only | Quick monitoring, spotting problems |
| Log files | All levels (DEBUG, INFO, WARN, ERROR) | Detailed debugging, troubleshooting |
Why split logs this way?
Pipeline debug logs can get extremely verbose—showing every message processed, every transformation applied, and so on. By limiting console output to warnings and errors, your terminal stays readable while full details are preserved in files for when you need them.
Quick guide for which to use:
- Console output - Great when you're running pipelines interactively during development and just want to see that things are working (or catch errors quickly)
- Log files - Essential when troubleshooting specific pipeline behavior, investigating data transformation issues, debugging Bloblang mapping logic, or analyzing performance details
Test with Limited Data
When testing, use a fixed number of messages instead of continuous generation.
Limit Generated Messages
input:
generate:
count: 10 # Stop after 10 messages
interval: "" # Generate as fast as possible
mapping: |
root.test_id = uuid_v4()
root.timestamp = now()
Process Specific Files
Test against a small sample file first:
# Create test file with 5 records
head -n 5 large-file.json > test-data.json
# Test against small file first
expanso-edge run --config my-pipeline.yaml
input:
file:
paths: [./test-data.json]
codec: lines
Common Issues and Solutions
Issue: File Access Errors
Symptoms: expanso-cli job deploy or expanso-cli job validate fail with errors like "no such file or directory" or "permission denied".
Cause: The file path doesn't exist, is inaccessible, or points to a directory instead of a file.
Solution: Add the -v (verbose) flag to get detailed diagnostics:
# Default error (minimal details)
expanso-cli job deploy my-pipeline.yaml
# Output: failed to read job specification file: open /path/to/file: no such file or directory
# Verbose mode (detailed diagnostics)
expanso-cli job deploy my-pipeline.yaml -v
Example verbose output:
cannot read job specification '/var/log/app/pipeline.yaml': parent directory does not exist
path: /var/log/app/pipeline.yaml
path_exists: no
parent_exists: no
parent_path: /var/log/app
What the diagnostic fields mean:
| Field | Description |
|---|---|
path | The file path you requested |
path_exists | Whether the path exists (yes/no) |
parent_exists | Whether the parent directory exists (yes/no) |
parent_path | The parent directory path |
path_type | Type if path exists (file/directory) |
is_symlink | Whether it's a symbolic link |
symlink_target | Where the symlink points |
symlink_broken | Whether the symlink target is missing |
permission_denied | Whether access was blocked by permissions |
Common scenarios:
-
Parent directory missing (
parent_exists: no): Create the directory first, then add your file. -
File missing (
parent_exists: yes,path_exists: no): Check for typos in the filename. -
Permission denied (
permission_denied: yes): Check file permissions or run as a different user. -
Broken symlink (
symlink_broken: yes): Fix the symlink target or remove it. -
Path is a directory (
path_type: directory): You provided a directory path instead of a file path.
Issue: Pipeline Runs But No Output
Symptoms: Pipeline starts but nothing appears in output.
Possible causes:
- Data is being filtered out
Check your mapping processors - are you using deleted()?
# This will delete all messages!
- mapping: |
root = if this.status == "active" { deleted() }
Solution: Add debug output before the filter to see what's being dropped:
- label: before_filter
stdout:
codec: lines
- mapping: |
root = if this.status != "active" { deleted() }
- Input has no data
Solution: Check your input source has data available:
# For files
ls -lh ./data.json
cat ./data.json
# For generate input - check your mapping is valid
- Output is going somewhere else
Solution: Temporarily change output to stdout for debugging:
output:
stdout:
codec: lines
Issue: Parsing Errors
Symptoms: Errors like failed to parse JSON or invalid format.
Cause: Data format doesn't match what you're trying to parse.
Solution: Print the raw input first:
input:
file:
paths: [./data.json]
codec: lines
pipeline:
processors:
# Debug: see raw data
- label: raw_input
stdout:
codec: lines
# Then try parsing
- mapping: |
root = this.parse_json()
Look at the raw data and adjust your parsing logic accordingly.
Issue: Mapping/Bloblang Errors
Symptoms: Errors in transformation logic like undefined method or type mismatch.
Cause: Incorrect Bloblang syntax or attempting operations on wrong data types.
Solution: Test transformations incrementally:
# Start simple
- mapping: |
root = this
# Add one transformation at a time
- mapping: |
root = this
root.parsed = this.parse_json()
# Validate each step
- stdout:
codec: lines
Common Bloblang mistakes:
# ❌ Wrong - trying to parse already-parsed JSON
root = this.parse_json().parse_json()
# ✅ Right - parse once
root = this.parse_json()
# ❌ Wrong - field might not exist
root.value = this.data.deeply.nested.field
# ✅ Right - check existence
root.value = this.data.deeply.nested.field | "default"
See the Bloblang Guide for more patterns.
Issue: Performance - Pipeline is Slow
Symptoms: Messages processing slowly, backlog building up.
Debug approach:
- Add timing measurements
- mapping: |
root = this
meta start_time = now()
# ... your processors ...
- mapping: |
root.processing_time_ms = now().ts_unix_milli() - meta("start_time").ts_unix_milli()
- Check processor complexity
Complex transformations in mapping processors can slow things down.
Solution: Simplify logic or break into multiple simpler processors.
- Check output destination
The output might be the bottleneck (slow API, overloaded database).
Solution: Test with stdout output to isolate the issue.
Testing Workflows
1. Syntax Validation
Always start here:
expanso-cli job validate my-pipeline.yaml --offline
2. Smoke Test
Run with limited data to stdout:
input:
generate:
count: 5
interval: ""
mapping: |
root = {"test": "data"}
# ... your processors ...
output:
stdout:
codec: lines
expanso-edge run --config my-pipeline.yaml
3. Unit Test Each Component
Test input → processor → output independently:
Test input only:
input:
file:
paths: [./data.json]
codec: lines
output:
stdout:
codec: lines
Test processor only:
input:
generate:
count: 5
interval: ""
mapping: |
root = {"test": "data"}
pipeline:
processors:
- mapping: |
root.transformed = this.test.uppercase()
output:
stdout:
codec: lines
4. Integration Test
Test with real data sources but safe outputs (local files or stdout):
input:
http_server:
address: localhost:8080
pipeline:
processors:
- mapping: |
root = this.parse_json()
output:
file:
path: ./test-output.json
codec: lines
5. Deploy
Once validated and tested locally, deploy to production via Expanso Cloud.
Debugging Checklist
When something goes wrong, work through this checklist:
- Validate syntax:
expanso-cli job validate --offline - Check input has data: Print raw input to stdout
- Verify transformations: Test each processor separately
- Enable verbose logging: Run with
--verboseor--log-level debug - Simplify: Remove processors one by one to isolate the issue
- Test with minimal data: Use
count: 5in generate input - Check for filtering: Ensure you're not accidentally deleting all messages
- Validate Bloblang: Test complex transformations in isolation
Next Steps
- Bloblang Guide - Learn transformation patterns and functions
- Component Reference - Browse all available inputs, processors, and outputs
- Error Handling - Add retry logic and dead letter queues
- Deploy to Production - Move to managed deployment with Expanso Cloud
Pro Tips
Use generate for testing: The generate input is your best friend for testing transformations without external dependencies.
Keep test files small: Don't test with production-sized data initially. Use 5-10 records.
Version control your configs: Track changes to pipeline configurations in git.
Test transformations in isolation: Before adding complex Bloblang logic to your pipeline, test it separately:
input:
generate:
count: 1
interval: ""
mapping: |
root = {"test": "value"}
pipeline:
processors:
- mapping: |
# Test your transformation here
root.result = this.test.uppercase()
output:
stdout:
codec: lines
Use descriptive labels: Add label to all processors to make logs easier to read.
Start simple, add complexity: Begin with a minimal pipeline, verify it works, then add features incrementally.