SNO Best Practices
Configuration recommendations and troubleshooting for reliable Expanso deployments on Single-Node OpenShift.
Minimize Resource Footprint
Set resource limits for Expanso pods:
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
Why: SNO nodes have limited resources (8 vCPU, 16GB RAM) shared across control plane and workloads.
Use Large Batches
Configure aggressive batching to reduce network overhead:
output:
aws_s3:
batching:
count: 5000 # Larger batches
period: 10m # Longer periods
Why: Edge locations often have limited or metered bandwidth.
Impact: Reduces network overhead by 90% compared to individual writes.
Filter Logs Early
Apply filters in the pipeline before sending data:
pipeline:
processors:
# Only keep WARN and ERROR logs
- switch:
cases:
- check: 'this.level.lowercase().contains_any(["warn", "error", "fatal"])'
Why: Reduces CPU, memory, and network usage for logs that will be discarded.
Volume reduction: Typically 80-90% for production workloads.
Add Location Context
Always include SNO-specific metadata:
processors:
- mapping: |
root.cluster_name = env("CLUSTER_NAME")
root.location = env("LOCATION")
root.deployment_type = "single-node-openshift"
Why: Essential for identifying source when managing 100+ edge locations.
Use Offline-Resilient Configuration
Add buffering and retry for intermittent connectivity:
buffer:
system_window:
timestamp_mapping: 'root = this.timestamp'
size: 1h
output:
retry:
max_retries: 10
backoff:
initial_interval: 30s
max_interval: 10m
Why: Edge locations frequently have unreliable network connectivity.
Troubleshooting
oc Command Not Found
Symptom: Pipeline fails with "oc: command not found"
Solution: Install OpenShift CLI in Expanso container:
RUN curl -LO https://mirror.openshift.com/pub/openshift-v4/clients/ocp/stable/openshift-client-linux.tar.gz && \
tar -xzf openshift-client-linux.tar.gz -C /usr/local/bin oc
Or use full path:
input:
subprocess:
name: /usr/local/bin/oc
Permission Denied
Symptom: Error accessing pods or logs
Solution: Verify service account permissions:
oc auth can-i get pods --all-namespaces \
--as=system:serviceaccount:expanso-system:expanso-edge
If no, review RBAC Setup.
High Resource Usage
Symptom: Expanso pod consuming > 500m CPU or > 512Mi memory
Solution 1 - Reduce log collection frequency:
input:
subprocess:
args:
- logs
- --since=5m # Only last 5 minutes
Solution 2 - Increase batching:
output:
batching:
count: 10000 # Larger batches
period: 15m # Less frequent writes
Solution 3 - Add filtering:
pipeline:
processors:
- switch:
cases:
- check: '!this.contains("DEBUG")'
Logs Not Appearing
Check 1 - Verify oc access:
oc get pods --all-namespaces
Check 2 - Check Expanso logs:
oc logs -n expanso-system -l app=expanso-edge --tail=100
Check 3 - Verify pipeline configuration:
oc get configmap expanso-pipeline -n expanso-system -o yaml
Network Connectivity Issues
Symptom: Logs not reaching S3/Elasticsearch, retry errors
Solution: Add offline-resilient configuration (see Offline-Resilient)
Monitor: Check retry metrics:
oc logs -n expanso-system -l app=expanso-edge | grep retry
Integration with OpenShift Logging
Expanso can complement OpenShift's built-in logging stack:
Use Expanso when:
- Need to send logs to destinations OpenShift logging doesn't support
- Want custom processing or filtering before centralization
- Need offline-resilient behavior for edge locations
- Require minimal resource overhead
Use OpenShift logging when:
- Need cluster-wide logging with full observability stack
- Have dedicated logging infrastructure capacity
- Want integrated with OpenShift console
Use both together:
# Collect from OpenShift logging stack
input:
subprocess:
name: oc
args:
- logs
- --namespace=openshift-logging
- deployment/cluster-logging-operator
- --follow
Resource Monitoring
Track Expanso resource usage:
# CPU and memory
oc adm top pod -n expanso-system
# Detailed metrics
oc describe pod -n expanso-system -l app=expanso-edge
Configuration Checklist
Before deploying to production:
- Resource limits configured (CPU, memory)
- RBAC permissions configured and verified
- Location metadata included in all logs
- Batching configured for network efficiency
- Offline-resilient configuration added
- Log filtering applied to reduce volume
- Service account has minimal required permissions
- Pipeline validated with test data
- Monitoring and alerting configured
- Rollback plan documented
Additional Resources
Next Steps
- Deployment: Deploy Expanso on your SNO cluster
- RBAC Setup: Configure permissions
- Collect Logs: Start collecting logs