Step 4: Process Real Data
Now let's process real, live data from your system. We'll tail a log file that changes continuously.
Permission Note
Reading system logs may require elevated permissions. If you get a permission error:
- Run with
sudo expanso-edge run --config logs-pipeline.yaml - Or choose a log file you have access to (like application logs)
Create the Pipeline
Create logs-pipeline.yaml:
- Linux
- macOS
- Windows
input:
file:
paths: [/var/log/syslog]
codec: lines
pipeline:
processors:
- mapping: |
root.raw = this
root.length = this.length()
root.processed_at = now()
output:
stdout:
codec: lines
input:
file:
paths: [/var/log/system.log]
codec: lines
pipeline:
processors:
- mapping: |
root.raw = this
root.length = this.length()
root.processed_at = now()
output:
stdout:
codec: lines
input:
file:
paths: [C:\Windows\System32\log.txt]
codec: lines
pipeline:
processors:
- mapping: |
root.raw = this
root.length = this.length()
root.processed_at = now()
output:
stdout:
codec: lines
Run it:
# May need sudo for system logs
sudo expanso-edge run --config logs-pipeline.yaml
You'll see new log entries appear in real-time as your system generates them:
{"length":142,"processed_at":"2024-12-26T10:10:00Z","raw":"Dec 26 10:10:00 myhost kernel: ..."}
{"length":98,"processed_at":"2024-12-26T10:10:01Z","raw":"Dec 26 10:10:01 myhost systemd: ..."}
What's Happening?
fileinput - Tails the file, reading new lines as they're writtencodec: lines- Treats each line as a separate message- Real-time processing - As logs are written, they flow through your pipeline
Try Adding a Filter
Want to only see error messages? Add a filter processor:
input:
file:
paths: [/var/log/syslog]
codec: lines
pipeline:
processors:
# Only keep lines containing "error" (case-insensitive)
- mapping: |
root = if !this.lowercase().contains("error") {
deleted()
}
# Add metadata
- mapping: |
root.raw = this
root.severity = "ERROR"
root.processed_at = now()
output:
stdout:
codec: lines
Recap: What You've Learned
| Step | Concept | Key Takeaway |
|---|---|---|
| 1 | Hello World | Simplest pipeline: generate → stdout |
| 2 | Make a Change | Fast iteration, Bloblang functions |
| 3 | Transformation | Processors add/modify fields, do calculations |
| 4 | Real Data | File input tails logs in real-time |
You now understand the core pipeline pattern:
[Input] → [Processor(s)] → [Output]
Next Steps
Now that you've built pipelines locally:
- Test and Debug - Validate configurations and troubleshoot issues
- Explore Components - Browse 200+ inputs, processors, and outputs
- Learn Bloblang - Master the transformation language
- Deploy to Production - Set up Expanso Cloud and deploy pipelines to your infrastructure
Tips
Start Simple: Test each part of your pipeline separately before combining them.
Use stdout for Debugging: Always output to stdout first to see exactly what's happening.
Limit Test Data: Use count to produce a fixed number of messages:
input:
generate:
count: 5 # Generate exactly 5 messages then stop
interval: "" # As fast as possible
mapping: |
root = {"test": "data"}
Fast Iteration: Edit YAML → Stop (Ctrl+C) → Re-run. The feedback loop is instant.