Collect OpenShift Logs
Stream logs from all pods and namespaces in your SNO cluster to S3 with structured metadata.
Pipeline
input:
subprocess:
name: oc
args:
- logs
- --all-containers=true
- --prefix=true
- --follow
- --all-namespaces
- --since=10m
codec: lines
restart_on_exit: true
pipeline:
processors:
# Parse oc log prefix: [namespace/pod/container] message
- mapping: |
root.raw_log = this
root.timestamp = now()
# Extract metadata from prefix
let parts = this.re_find_all("^\\[([^/]+)/([^/]+)/([^\\]]+)\\] (.*)$")
root.namespace = $parts.0.1
root.pod = $parts.0.2
root.container = $parts.0.3
root.message = $parts.0.4
# Add SNO cluster context
root.node_name = env("NODE_NAME")
root.cluster_name = env("CLUSTER_NAME")
root.location = env("LOCATION")
root.deployment_type = "single-node-openshift"
output:
aws_s3:
bucket: edge-openshift-logs
path: 'sno/${! env("CLUSTER_NAME") }/${! timestamp_unix("2006-01-02") }/${! json("namespace") }.jsonl'
batching:
count: 1000
period: 5m
processors:
- archive:
format: concatenate
What This Does
- Follows logs from all containers in all namespaces using
oc logs --follow - Parses metadata: Extracts namespace, pod, and container from log prefix
- Adds SNO context: Includes node name, cluster identifier, and physical location
- Batches logs: Collects 1000 logs or waits 5 minutes before writing
- Organizes by namespace: S3 path includes namespace for easy filtering
- Auto-restarts: If oc process exits, it automatically restarts
Example Output
Input (oc log line):
[production/web-app-7d8f9c/app] Request processed in 45ms
Output (structured JSON):
{
"namespace": "production",
"pod": "web-app-7d8f9c",
"container": "app",
"message": "Request processed in 45ms",
"node_name": "sno-retail-001",
"cluster_name": "sno-retail-001",
"location": "store-chicago-north",
"deployment_type": "single-node-openshift",
"timestamp": "2024-11-12T10:30:45Z"
}
Key Arguments
--all-containers=true: Includes logs from all containers in each pod
--prefix=true: Adds [namespace/pod/container] prefix for parsing
--follow: Continuously streams new logs (like tail -f)
--since=10m: Only collect logs from last 10 minutes (reduces initial load)
Environment Variables
Set these where Expanso runs:
NODE_NAME: Automatically set by Kubernetes (from pod spec)CLUSTER_NAME: Unique SNO cluster identifierLOCATION: Physical location identifier
Next Steps
- Application Logs: Focus on specific apps or namespaces
- Offline-Resilient: Add buffering for intermittent connectivity
- Best Practices: Optimize for SNO resource constraints