Skip to main content

Testing & Debugging

Learn how to validate configurations, add debugging output, and troubleshoot common issues when building pipelines.

Quick Reference

# Validate pipeline syntax
expanso-cli job validate my-pipeline.yaml --offline

# Run pipeline locally with verbose logging
expanso-edge run --config my-pipeline.yaml --verbose

# Test with limited data
# (add count: 10 to your generate input)

# Output to terminal for inspection
# (use stdout output in your config)

Validate Before Running

Always validate your pipeline configuration before running it. This catches syntax errors and configuration mistakes early.

Client-Side Validation

Check syntax and structure without connecting to a server:

expanso-cli job validate my-pipeline.yaml --offline

What it checks:

  • ✅ Valid YAML syntax
  • ✅ Required fields present
  • ✅ Component names exist
  • ✅ Configuration structure

Example output (success):

✓ Configuration is valid

Example output (error):

Error: validation failed
- input.file.paths: required field missing
- output.unknown_component: component does not exist

Server-Side Validation

For comprehensive validation against your Expanso Cloud environment:

expanso-cli job validate my-pipeline.yaml

This performs both client-side and server-side checks (requires connection to Expanso Cloud).


Add Debug Output

When developing pipelines, add stdout outputs at each stage to see what's happening.

Debug at Each Stage

input:
file:
paths: [./data.json]
codec: lines

pipeline:
processors:
# First transformation
- mapping: |
root = this.parse_json()

# Debug output - see what we parsed
- stdout:
codec: lines

# Second transformation
- mapping: |
root.processed = true
root.timestamp = now()

output:
stdout:
codec: lines

Use Labels for Clarity

Add labels to track which stage produced output:

pipeline:
processors:
- label: parse_json
mapping: |
root = this.parse_json()

- label: debug_parsed
stdout:
codec: lines

- label: add_metadata
mapping: |
root.processed_at = now()

- label: debug_final
stdout:
codec: lines

Enable Verbose Logging

Run with verbose logging to see detailed execution information:

expanso-edge run --config my-pipeline.yaml --verbose

Log levels available:

  • --log-level trace - Everything (very detailed)
  • --log-level debug - Debug information
  • --log-level info - Standard information (default)
  • --log-level warn - Warnings only
  • --log-level error - Errors only

Example with debug level:

expanso-edge run --config my-pipeline.yaml --log-level debug

Test with Limited Data

When testing, use a fixed number of messages instead of continuous generation.

Limit Generated Messages

input:
generate:
count: 10 # Stop after 10 messages
interval: "" # Generate as fast as possible
mapping: |
root.test_id = uuid_v4()
root.timestamp = now()

Process Specific Files

Test against a small sample file first:

# Create test file with 5 records
head -n 5 large-file.json > test-data.json

# Test against small file first
expanso-edge run --config my-pipeline.yaml
input:
file:
paths: [./test-data.json]
codec: lines

Common Issues and Solutions

Issue: Pipeline Runs But No Output

Symptoms: Pipeline starts but nothing appears in output.

Possible causes:

  1. Data is being filtered out

Check your mapping processors - are you using deleted()?

# This will delete all messages!
- mapping: |
root = if this.status == "active" { deleted() }

Solution: Add debug output before the filter to see what's being dropped:

- label: before_filter
stdout:
codec: lines

- mapping: |
root = if this.status != "active" { deleted() }
  1. Input has no data

Solution: Check your input source has data available:

# For files
ls -lh ./data.json
cat ./data.json

# For generate input - check your mapping is valid
  1. Output is going somewhere else

Solution: Temporarily change output to stdout for debugging:

output:
stdout:
codec: lines

Issue: Parsing Errors

Symptoms: Errors like failed to parse JSON or invalid format.

Cause: Data format doesn't match what you're trying to parse.

Solution: Print the raw input first:

input:
file:
paths: [./data.json]
codec: lines

pipeline:
processors:
# Debug: see raw data
- label: raw_input
stdout:
codec: lines

# Then try parsing
- mapping: |
root = this.parse_json()

Look at the raw data and adjust your parsing logic accordingly.


Issue: Mapping/Bloblang Errors

Symptoms: Errors in transformation logic like undefined method or type mismatch.

Cause: Incorrect Bloblang syntax or attempting operations on wrong data types.

Solution: Test transformations incrementally:

# Start simple
- mapping: |
root = this

# Add one transformation at a time
- mapping: |
root = this
root.parsed = this.parse_json()

# Validate each step
- stdout:
codec: lines

Common Bloblang mistakes:

# ❌ Wrong - trying to parse already-parsed JSON
root = this.parse_json().parse_json()

# ✅ Right - parse once
root = this.parse_json()

# ❌ Wrong - field might not exist
root.value = this.data.deeply.nested.field

# ✅ Right - check existence
root.value = this.data.deeply.nested.field | "default"

See the Bloblang Guide for more patterns.


Issue: Performance - Pipeline is Slow

Symptoms: Messages processing slowly, backlog building up.

Debug approach:

  1. Add timing measurements
- mapping: |
root = this
meta start_time = now()

# ... your processors ...

- mapping: |
root.processing_time_ms = now().ts_unix_milli() - meta("start_time").ts_unix_milli()
  1. Check processor complexity

Complex transformations in mapping processors can slow things down.

Solution: Simplify logic or break into multiple simpler processors.

  1. Check output destination

The output might be the bottleneck (slow API, overloaded database).

Solution: Test with stdout output to isolate the issue.


Testing Workflows

1. Syntax Validation

Always start here:

expanso-cli job validate my-pipeline.yaml --offline

2. Smoke Test

Run with limited data to stdout:

input:
generate:
count: 5
interval: ""
mapping: |
root = {"test": "data"}

# ... your processors ...

output:
stdout:
codec: lines
expanso-edge run --config my-pipeline.yaml

3. Unit Test Each Component

Test input → processor → output independently:

Test input only:

input:
file:
paths: [./data.json]
codec: lines

output:
stdout:
codec: lines

Test processor only:

input:
generate:
count: 5
interval: ""
mapping: |
root = {"test": "data"}

pipeline:
processors:
- mapping: |
root.transformed = this.test.uppercase()

output:
stdout:
codec: lines

4. Integration Test

Test with real data sources but safe outputs (local files or stdout):

input:
http_server:
address: localhost:8080

pipeline:
processors:
- mapping: |
root = this.parse_json()

output:
file:
path: ./test-output.json
codec: lines

5. Deploy

Once validated and tested locally, deploy to production via Expanso Cloud.


Debugging Checklist

When something goes wrong, work through this checklist:

  • Validate syntax: expanso-cli job validate --offline
  • Check input has data: Print raw input to stdout
  • Verify transformations: Test each processor separately
  • Enable verbose logging: Run with --verbose or --log-level debug
  • Simplify: Remove processors one by one to isolate the issue
  • Test with minimal data: Use count: 5 in generate input
  • Check for filtering: Ensure you're not accidentally deleting all messages
  • Validate Bloblang: Test complex transformations in isolation

Next Steps


Pro Tips

Use generate for testing: The generate input is your best friend for testing transformations without external dependencies.

Keep test files small: Don't test with production-sized data initially. Use 5-10 records.

Version control your configs: Track changes to pipeline configurations in git.

Test transformations in isolation: Before adding complex Bloblang logic to your pipeline, test it separately:

input:
generate:
count: 1
interval: ""
mapping: |
root = {"test": "value"}

pipeline:
processors:
- mapping: |
# Test your transformation here
root.result = this.test.uppercase()

output:
stdout:
codec: lines

Use descriptive labels: Add label to all processors to make logs easier to read.

Start simple, add complexity: Begin with a minimal pipeline, verify it works, then add features incrementally.