Testing & Debugging
Learn how to validate configurations, add debugging output, and troubleshoot common issues when building pipelines.
Quick Reference
# Validate pipeline syntax
expanso-cli job validate my-pipeline.yaml --offline
# Run pipeline locally with verbose logging
expanso-edge run --config my-pipeline.yaml --verbose
# Test with limited data
# (add count: 10 to your generate input)
# Output to terminal for inspection
# (use stdout output in your config)
Validate Before Running
Always validate your pipeline configuration before running it. This catches syntax errors and configuration mistakes early.
Client-Side Validation
Check syntax and structure without connecting to a server:
expanso-cli job validate my-pipeline.yaml --offline
What it checks:
- ✅ Valid YAML syntax
- ✅ Required fields present
- ✅ Component names exist
- ✅ Configuration structure
Example output (success):
✓ Configuration is valid
Example output (error):
Error: validation failed
- input.file.paths: required field missing
- output.unknown_component: component does not exist
Server-Side Validation
For comprehensive validation against your Expanso Cloud environment:
expanso-cli job validate my-pipeline.yaml
This performs both client-side and server-side checks (requires connection to Expanso Cloud).
Add Debug Output
When developing pipelines, add stdout outputs at each stage to see what's happening.
Debug at Each Stage
input:
file:
paths: [./data.json]
codec: lines
pipeline:
processors:
# First transformation
- mapping: |
root = this.parse_json()
# Debug output - see what we parsed
- stdout:
codec: lines
# Second transformation
- mapping: |
root.processed = true
root.timestamp = now()
output:
stdout:
codec: lines
Use Labels for Clarity
Add labels to track which stage produced output:
pipeline:
processors:
- label: parse_json
mapping: |
root = this.parse_json()
- label: debug_parsed
stdout:
codec: lines
- label: add_metadata
mapping: |
root.processed_at = now()
- label: debug_final
stdout:
codec: lines
Enable Verbose Logging
Run with verbose logging to see detailed execution information:
expanso-edge run --config my-pipeline.yaml --verbose
Log levels available:
--log-level trace- Everything (very detailed)--log-level debug- Debug information--log-level info- Standard information (default)--log-level warn- Warnings only--log-level error- Errors only
Example with debug level:
expanso-edge run --config my-pipeline.yaml --log-level debug
Test with Limited Data
When testing, use a fixed number of messages instead of continuous generation.
Limit Generated Messages
input:
generate:
count: 10 # Stop after 10 messages
interval: "" # Generate as fast as possible
mapping: |
root.test_id = uuid_v4()
root.timestamp = now()
Process Specific Files
Test against a small sample file first:
# Create test file with 5 records
head -n 5 large-file.json > test-data.json
# Test against small file first
expanso-edge run --config my-pipeline.yaml
input:
file:
paths: [./test-data.json]
codec: lines
Common Issues and Solutions
Issue: Pipeline Runs But No Output
Symptoms: Pipeline starts but nothing appears in output.
Possible causes:
- Data is being filtered out
Check your mapping processors - are you using deleted()?
# This will delete all messages!
- mapping: |
root = if this.status == "active" { deleted() }
Solution: Add debug output before the filter to see what's being dropped:
- label: before_filter
stdout:
codec: lines
- mapping: |
root = if this.status != "active" { deleted() }
- Input has no data
Solution: Check your input source has data available:
# For files
ls -lh ./data.json
cat ./data.json
# For generate input - check your mapping is valid
- Output is going somewhere else
Solution: Temporarily change output to stdout for debugging:
output:
stdout:
codec: lines
Issue: Parsing Errors
Symptoms: Errors like failed to parse JSON or invalid format.
Cause: Data format doesn't match what you're trying to parse.
Solution: Print the raw input first:
input:
file:
paths: [./data.json]
codec: lines
pipeline:
processors:
# Debug: see raw data
- label: raw_input
stdout:
codec: lines
# Then try parsing
- mapping: |
root = this.parse_json()
Look at the raw data and adjust your parsing logic accordingly.
Issue: Mapping/Bloblang Errors
Symptoms: Errors in transformation logic like undefined method or type mismatch.
Cause: Incorrect Bloblang syntax or attempting operations on wrong data types.
Solution: Test transformations incrementally:
# Start simple
- mapping: |
root = this
# Add one transformation at a time
- mapping: |
root = this
root.parsed = this.parse_json()
# Validate each step
- stdout:
codec: lines
Common Bloblang mistakes:
# ❌ Wrong - trying to parse already-parsed JSON
root = this.parse_json().parse_json()
# ✅ Right - parse once
root = this.parse_json()
# ❌ Wrong - field might not exist
root.value = this.data.deeply.nested.field
# ✅ Right - check existence
root.value = this.data.deeply.nested.field | "default"
See the Bloblang Guide for more patterns.
Issue: Performance - Pipeline is Slow
Symptoms: Messages processing slowly, backlog building up.
Debug approach:
- Add timing measurements
- mapping: |
root = this
meta start_time = now()
# ... your processors ...
- mapping: |
root.processing_time_ms = now().ts_unix_milli() - meta("start_time").ts_unix_milli()
- Check processor complexity
Complex transformations in mapping processors can slow things down.
Solution: Simplify logic or break into multiple simpler processors.
- Check output destination
The output might be the bottleneck (slow API, overloaded database).
Solution: Test with stdout output to isolate the issue.
Testing Workflows
1. Syntax Validation
Always start here:
expanso-cli job validate my-pipeline.yaml --offline
2. Smoke Test
Run with limited data to stdout:
input:
generate:
count: 5
interval: ""
mapping: |
root = {"test": "data"}
# ... your processors ...
output:
stdout:
codec: lines
expanso-edge run --config my-pipeline.yaml
3. Unit Test Each Component
Test input → processor → output independently:
Test input only:
input:
file:
paths: [./data.json]
codec: lines
output:
stdout:
codec: lines
Test processor only:
input:
generate:
count: 5
interval: ""
mapping: |
root = {"test": "data"}
pipeline:
processors:
- mapping: |
root.transformed = this.test.uppercase()
output:
stdout:
codec: lines
4. Integration Test
Test with real data sources but safe outputs (local files or stdout):
input:
http_server:
address: localhost:8080
pipeline:
processors:
- mapping: |
root = this.parse_json()
output:
file:
path: ./test-output.json
codec: lines
5. Deploy
Once validated and tested locally, deploy to production via Expanso Cloud.
Debugging Checklist
When something goes wrong, work through this checklist:
- Validate syntax:
expanso-cli job validate --offline - Check input has data: Print raw input to stdout
- Verify transformations: Test each processor separately
- Enable verbose logging: Run with
--verboseor--log-level debug - Simplify: Remove processors one by one to isolate the issue
- Test with minimal data: Use
count: 5in generate input - Check for filtering: Ensure you're not accidentally deleting all messages
- Validate Bloblang: Test complex transformations in isolation
Next Steps
- Bloblang Guide - Learn transformation patterns and functions
- Component Reference - Browse all available inputs, processors, and outputs
- Error Handling - Add retry logic and dead letter queues
- Deploy to Production - Move to managed deployment with Expanso Cloud
Pro Tips
Use generate for testing: The generate input is your best friend for testing transformations without external dependencies.
Keep test files small: Don't test with production-sized data initially. Use 5-10 records.
Version control your configs: Track changes to pipeline configurations in git.
Test transformations in isolation: Before adding complex Bloblang logic to your pipeline, test it separately:
input:
generate:
count: 1
interval: ""
mapping: |
root = {"test": "value"}
pipeline:
processors:
- mapping: |
# Test your transformation here
root.result = this.test.uppercase()
output:
stdout:
codec: lines
Use descriptive labels: Add label to all processors to make logs easier to read.
Start simple, add complexity: Begin with a minimal pipeline, verify it works, then add features incrementally.