Skip to main content

Step 4: Process Real Data

Now let's process real, live data from your system. We'll tail a log file that changes continuously.

Permission Note

Reading system logs may require elevated permissions. If you get a permission error:

  • Run with sudo expanso-edge run --config logs-pipeline.yaml
  • Or choose a log file you have access to (like application logs)

Create the Pipeline

Create logs-pipeline.yaml:

input:
file:
paths: [/var/log/syslog]
codec: lines

pipeline:
processors:
- mapping: |
root.raw = this
root.length = this.length()
root.processed_at = now()

output:
stdout:
codec: lines

Run it:

# May need sudo for system logs
sudo expanso-edge run --config logs-pipeline.yaml

You'll see new log entries appear in real-time as your system generates them:

{"length":142,"processed_at":"2024-12-26T10:10:00Z","raw":"Dec 26 10:10:00 myhost kernel: ..."}
{"length":98,"processed_at":"2024-12-26T10:10:01Z","raw":"Dec 26 10:10:01 myhost systemd: ..."}

What's Happening?

  • file input - Tails the file, reading new lines as they're written
  • codec: lines - Treats each line as a separate message
  • Real-time processing - As logs are written, they flow through your pipeline

Try Adding a Filter

Want to only see error messages? Add a filter processor:

input:
file:
paths: [/var/log/syslog]
codec: lines

pipeline:
processors:
# Only keep lines containing "error" (case-insensitive)
- mapping: |
root = if !this.lowercase().contains("error") {
deleted()
}

# Add metadata
- mapping: |
root.raw = this
root.severity = "ERROR"
root.processed_at = now()

output:
stdout:
codec: lines

Recap: What You've Learned

StepConceptKey Takeaway
1Hello WorldSimplest pipeline: generate → stdout
2Make a ChangeFast iteration, Bloblang functions
3TransformationProcessors add/modify fields, do calculations
4Real DataFile input tails logs in real-time

You now understand the core pipeline pattern:

[Input] → [Processor(s)] → [Output]

Next Steps

Now that you've built pipelines locally:


Tips

Start Simple: Test each part of your pipeline separately before combining them.

Use stdout for Debugging: Always output to stdout first to see exactly what's happening.

Limit Test Data: Use count to produce a fixed number of messages:

input:
generate:
count: 5 # Generate exactly 5 messages then stop
interval: "" # As fast as possible
mapping: |
root = {"test": "data"}

Fast Iteration: Edit YAML → Stop (Ctrl+C) → Re-run. The feedback loop is instant.