Skip to main content

SNO Best Practices

Configuration recommendations and troubleshooting for reliable Expanso deployments on Single-Node OpenShift.

Minimize Resource Footprint

Set resource limits for Expanso pods:

resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi

Why: SNO nodes have limited resources (8 vCPU, 16GB RAM) shared across control plane and workloads.

Use Large Batches

Configure aggressive batching to reduce network overhead:

output:
aws_s3:
batching:
count: 5000 # Larger batches
period: 10m # Longer periods

Why: Edge locations often have limited or metered bandwidth.

Impact: Reduces network overhead by 90% compared to individual writes.

Filter Logs Early

Apply filters in the pipeline before sending data:

pipeline:
processors:
# Only keep WARN and ERROR logs
- switch:
cases:
- check: 'this.level.lowercase().contains_any(["warn", "error", "fatal"])'

Why: Reduces CPU, memory, and network usage for logs that will be discarded.

Volume reduction: Typically 80-90% for production workloads.

Add Location Context

Always include SNO-specific metadata:

processors:
- mapping: |
root.cluster_name = env("CLUSTER_NAME")
root.location = env("LOCATION")
root.deployment_type = "single-node-openshift"

Why: Essential for identifying source when managing 100+ edge locations.

Use Offline-Resilient Configuration

Add buffering and retry for intermittent connectivity:

buffer:
system_window:
timestamp_mapping: 'root = this.timestamp'
size: 1h

output:
retry:
max_retries: 10
backoff:
initial_interval: 30s
max_interval: 10m

Why: Edge locations frequently have unreliable network connectivity.

Troubleshooting

oc Command Not Found

Symptom: Pipeline fails with "oc: command not found"

Solution: Install OpenShift CLI in Expanso container:

RUN curl -LO https://mirror.openshift.com/pub/openshift-v4/clients/ocp/stable/openshift-client-linux.tar.gz && \
tar -xzf openshift-client-linux.tar.gz -C /usr/local/bin oc

Or use full path:

input:
subprocess:
name: /usr/local/bin/oc

Permission Denied

Symptom: Error accessing pods or logs

Solution: Verify service account permissions:

oc auth can-i get pods --all-namespaces \
--as=system:serviceaccount:expanso-system:expanso-edge

If no, review RBAC Setup.

High Resource Usage

Symptom: Expanso pod consuming > 500m CPU or > 512Mi memory

Solution 1 - Reduce log collection frequency:

input:
subprocess:
args:
- logs
- --since=5m # Only last 5 minutes

Solution 2 - Increase batching:

output:
batching:
count: 10000 # Larger batches
period: 15m # Less frequent writes

Solution 3 - Add filtering:

pipeline:
processors:
- switch:
cases:
- check: '!this.contains("DEBUG")'

Logs Not Appearing

Check 1 - Verify oc access:

oc get pods --all-namespaces

Check 2 - Check Expanso logs:

oc logs -n expanso-system -l app=expanso-edge --tail=100

Check 3 - Verify pipeline configuration:

oc get configmap expanso-pipeline -n expanso-system -o yaml

Network Connectivity Issues

Symptom: Logs not reaching S3/Elasticsearch, retry errors

Solution: Add offline-resilient configuration (see Offline-Resilient)

Monitor: Check retry metrics:

oc logs -n expanso-system -l app=expanso-edge | grep retry

Integration with OpenShift Logging

Expanso can complement OpenShift's built-in logging stack:

Use Expanso when:

  • Need to send logs to destinations OpenShift logging doesn't support
  • Want custom processing or filtering before centralization
  • Need offline-resilient behavior for edge locations
  • Require minimal resource overhead

Use OpenShift logging when:

  • Need cluster-wide logging with full observability stack
  • Have dedicated logging infrastructure capacity
  • Want integrated with OpenShift console

Use both together:

# Collect from OpenShift logging stack
input:
subprocess:
name: oc
args:
- logs
- --namespace=openshift-logging
- deployment/cluster-logging-operator
- --follow

Resource Monitoring

Track Expanso resource usage:

# CPU and memory
oc adm top pod -n expanso-system

# Detailed metrics
oc describe pod -n expanso-system -l app=expanso-edge

Configuration Checklist

Before deploying to production:

  • Resource limits configured (CPU, memory)
  • RBAC permissions configured and verified
  • Location metadata included in all logs
  • Batching configured for network efficiency
  • Offline-resilient configuration added
  • Log filtering applied to reduce volume
  • Service account has minimal required permissions
  • Pipeline validated with test data
  • Monitoring and alerting configured
  • Rollback plan documented

Additional Resources

Next Steps