What is Expanso and how does it work?

Expanso is a managed platform for deploying intelligent data pipelines at the edge. It processes data where it's generated - reducing bandwidth, latency, and costs. You deploy lightweight agents on your infrastructure, build pipelines using our visual builder or YAML, and control everything from a central SaaS platform.

Can I run AI/ML models directly in my data pipelines?

Yes! Expanso supports running ONNX, TensorFlow Lite, and other models as native pipeline steps. Execute low-latency inference on streaming data, enrich events with model outputs (like risk scores), and make decisions at the edge without cloud round-trips.

How many pre-built components are available?

Expanso provides 200+ pre-built components including inputs (Kafka, HTTP, files), processors (transformations, filtering, PII masking, aggregations), and outputs (S3, Snowflake, Datadog, Splunk). Browse the complete catalog in our Component Reference.

Do I need to write code to build pipelines?

No - use our drag-and-drop visual pipeline builder to create sophisticated pipelines without code. For advanced use cases, you can also write pipelines in YAML or use the Bloblang transformation language for complex data mappings.

How does Expanso help with data governance and compliance?

Expanso includes built-in governance features: automatic PII detection and masking, policy enforcement at the edge, RBAC, SSO integration, and comprehensive audit trails. Mask sensitive data before it ever leaves your network.

SNO Best Practices

Configuration recommendations and troubleshooting for reliable Expanso deployments on Single-Node OpenShift.

Minimize Resource Footprint

Set resource limits for Expanso pods:

resources:
  requests:
    cpu: 100m
    memory: 128Mi
  limits:
    cpu: 500m
    memory: 512Mi

Why: SNO nodes have limited resources (8 vCPU, 16GB RAM) shared across control plane and workloads.

Use Large Batches

Configure aggressive batching to reduce network overhead:

output:
  aws_s3:
    batching:
      count: 5000       # Larger batches
      period: 10m       # Longer periods

Why: Edge locations often have limited or metered bandwidth.

Impact: Reduces network overhead by 90% compared to individual writes.

Filter Logs Early

Apply filters in the pipeline before sending data:

pipeline:
  processors:
    # Only keep WARN and ERROR logs
    - switch:
        cases:
          - check: 'this.level.lowercase().contains_any(["warn", "error", "fatal"])'

Why: Reduces CPU, memory, and network usage for logs that will be discarded.

Volume reduction: Typically 80-90% for production workloads.

Add Location Context

Always include SNO-specific metadata:

processors:
  - mapping: |
      root.cluster_name = env("CLUSTER_NAME")
      root.location = env("LOCATION")
      root.deployment_type = "single-node-openshift"

Why: Essential for identifying source when managing 100+ edge locations.

Use Offline-Resilient Configuration

Add buffering and retry for intermittent connectivity:

buffer:
  system_window:
    timestamp_mapping: 'root = this.timestamp'
    size: 1h

output:
  retry:
    max_retries: 10
    backoff:
      initial_interval: 30s
      max_interval: 10m

Why: Edge locations frequently have unreliable network connectivity.

Troubleshooting

oc Command Not Found

Symptom: Pipeline fails with "oc: command not found"

Solution: Install OpenShift CLI in Expanso container:

RUN curl -LO https://mirror.openshift.com/pub/openshift-v4/clients/ocp/stable/openshift-client-linux.tar.gz && \
    tar -xzf openshift-client-linux.tar.gz -C /usr/local/bin oc

Or use full path:

input:
  subprocess:
    name: /usr/local/bin/oc

Permission Denied

Symptom: Error accessing pods or logs

Solution: Verify service account permissions:

oc auth can-i get pods --all-namespaces \
  --as=system:serviceaccount:expanso-system:expanso-edge

If no, review RBAC Setup.

High Resource Usage

Symptom: Expanso pod consuming > 500m CPU or > 512Mi memory

Solution 1 - Reduce log collection frequency:

input:
  subprocess:
    args:
      - logs
      - --since=5m  # Only last 5 minutes

Solution 2 - Increase batching:

output:
  batching:
    count: 10000  # Larger batches
    period: 15m   # Less frequent writes

Solution 3 - Add filtering:

pipeline:
  processors:
    - switch:
        cases:
          - check: '!this.contains("DEBUG")'

Logs Not Appearing

Check 1 - Verify oc access:

oc get pods --all-namespaces

Check 2 - Check Expanso logs:

oc logs -n expanso-system -l app=expanso-edge --tail=100

Check 3 - Verify pipeline configuration:

oc get configmap expanso-pipeline -n expanso-system -o yaml

Network Connectivity Issues

Symptom: Logs not reaching S3/Elasticsearch, retry errors

Solution: Add offline-resilient configuration (see Offline-Resilient)

Monitor: Check retry metrics:

oc logs -n expanso-system -l app=expanso-edge | grep retry

Integration with OpenShift Logging

Expanso can complement OpenShift's built-in logging stack:

Use Expanso when:

Need to send logs to destinations OpenShift logging doesn't support
Want custom processing or filtering before centralization
Need offline-resilient behavior for edge locations
Require minimal resource overhead

Use OpenShift logging when:

Need cluster-wide logging with full observability stack
Have dedicated logging infrastructure capacity
Want integrated with OpenShift console

Use both together:

# Collect from OpenShift logging stack
input:
  subprocess:
    name: oc
    args:
      - logs
      - --namespace=openshift-logging
      - deployment/cluster-logging-operator
      - --follow

Resource Monitoring

Track Expanso resource usage:

# CPU and memory
oc adm top pod -n expanso-system

# Detailed metrics
oc describe pod -n expanso-system -l app=expanso-edge

SNO Best Practices

Minimize Resource Footprint

Use Large Batches

Filter Logs Early

Add Location Context

Use Offline-Resilient Configuration

Troubleshooting

oc Command Not Found

Permission Denied

High Resource Usage

Logs Not Appearing

Network Connectivity Issues

Integration with OpenShift Logging

Resource Monitoring

Configuration Checklist

Additional Resources

Next Steps

Minimize Resource Footprint​

Use Large Batches​

Filter Logs Early​

Add Location Context​

Use Offline-Resilient Configuration​

Troubleshooting​

oc Command Not Found​

Permission Denied​

High Resource Usage​

Logs Not Appearing​

Network Connectivity Issues​

Integration with OpenShift Logging​

Resource Monitoring​

Configuration Checklist​

Additional Resources​

Next Steps​

Minimize Resource Footprint

Use Large Batches

Filter Logs Early

Add Location Context

Use Offline-Resilient Configuration

Troubleshooting

oc Command Not Found

Permission Denied

High Resource Usage

Logs Not Appearing

Network Connectivity Issues

Integration with OpenShift Logging

Resource Monitoring

Configuration Checklist

Additional Resources

Next Steps