What is Expanso and how does it work?

Expanso is a managed platform for deploying intelligent data pipelines at the edge. It processes data where it's generated - reducing bandwidth, latency, and costs. You deploy lightweight agents on your infrastructure, build pipelines using our visual builder or YAML, and control everything from a central SaaS platform.

Can I run AI/ML models directly in my data pipelines?

Yes! Expanso supports running ONNX, TensorFlow Lite, and other models as native pipeline steps. Execute low-latency inference on streaming data, enrich events with model outputs (like risk scores), and make decisions at the edge without cloud round-trips.

How many pre-built components are available?

Expanso provides 200+ pre-built components including inputs (Kafka, HTTP, files), processors (transformations, filtering, PII masking, aggregations), and outputs (S3, Snowflake, Datadog, Splunk). Browse the complete catalog in our Component Reference.

Do I need to write code to build pipelines?

No - use our drag-and-drop visual pipeline builder to create sophisticated pipelines without code. For advanced use cases, you can also write pipelines in YAML or use the Bloblang transformation language for complex data mappings.

How does Expanso help with data governance and compliance?

Expanso includes built-in governance features: automatic PII detection and masking, policy enforcement at the edge, RBAC, SSO integration, and comprehensive audit trails. Mask sensitive data before it ever leaves your network.

OpenShift Single-Node at the Edge

Monitor and manage Single-Node OpenShift (SNO) deployments at edge locations with Expanso. Deploy Expanso directly on the OpenShift node to collect logs, monitor cluster health, and automate operations—all without requiring external infrastructure.

What is Single-Node OpenShift?

Single-Node OpenShift (SNO) is Red Hat's solution for running OpenShift in constrained edge environments where both control plane and worker capabilities run on a single physical or virtual machine.

Ideal for edge scenarios:

Confined physical spaces (retail stores, factories, remote sites)
Intermittent network connectivity to central data centers
Resource-constrained environments
Locations requiring zero-touch operations

OpenShift SNO minimum requirements:

vCPU: 8
RAM: 16 GB
Storage: 120 GB

Expanso Resource Usage

Expanso Edge runs as a lightweight container on your SNO node, requiring only 0.5 CPU, 64MB RAM, and 150MB disk—a tiny fraction of the node's total resources.

Why Use Expanso with Single-Node OpenShift?

Challenge: SNO deployments at edge locations need monitoring and log collection, but network connectivity may be intermittent.

Solution: Deploy Expanso on the SNO node itself to collect logs and metrics locally, then batch and send to central storage when connectivity is available.

Benefits:

Minimal footprint: Uses <1% of SNO node resources
Offline capable: Queues data when network is down
Automatic batching: Optimizes network usage
No external dependencies: Self-contained operation
Local deployment: Runs directly on the OpenShift node

Deploy Expanso on Single-Node OpenShift

Deploy Expanso Edge agent as a DaemonSet on your SNO cluster:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: expanso-edge
  namespace: expanso-system
spec:
  selector:
    matchLabels:
      app: expanso-edge
  template:
    metadata:
      labels:
        app: expanso-edge
    spec:
      serviceAccountName: expanso-edge
      hostNetwork: true
      containers:
      - name: expanso-edge
        image: ghcr.io/expanso-io/expanso-edge:nightly
        env:
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        - name: CLUSTER_NAME
          value: "sno-retail-001"
        - name: LOCATION
          value: "store-chicago-north"
        volumeMounts:
        - name: config
          mountPath: /etc/expanso/pipeline.yaml
          subPath: pipeline.yaml
        - name: kubeconfig
          mountPath: /root/.kube/config
          subPath: config
      volumes:
      - name: config
        configMap:
          name: expanso-pipeline
      - name: kubeconfig
        secret:
          secretName: expanso-kubeconfig

Collect OpenShift Logs

Stream logs from all pods in the SNO cluster to S3:

input:
  subprocess:
    name: oc
    args:
      - logs
      - --all-containers=true
      - --prefix=true
      - --follow
      - --all-namespaces
      - --since=10m
    codec: lines
    restart_on_exit: true

pipeline:
  processors:
    # Parse oc log prefix: [namespace/pod/container] message
    - mapping: |
        root.raw_log = this
        root.timestamp = now()
        
        # Extract metadata from prefix
        let parts = this.re_find_all("^\\[([^/]+)/([^/]+)/([^\\]]+)\\] (.*)$")
        root.namespace = $parts.0.1
        root.pod = $parts.0.2
        root.container = $parts.0.3
        root.message = $parts.0.4
        
        # Add SNO cluster context
        root.node_name = env("NODE_NAME")
        root.cluster_name = env("CLUSTER_NAME")
        root.location = env("LOCATION")
        root.deployment_type = "single-node-openshift"

output:
  aws_s3:
    bucket: edge-openshift-logs
    path: 'sno/${! env("CLUSTER_NAME") }/${! timestamp_unix("2006-01-02") }/${! json("namespace") }.jsonl'
    batching:
      count: 1000
      period: 5m
    processors:
      - archive:
          format: concatenate

What this does:

Follows logs from all pods and containers
Parses namespace, pod, container metadata
Adds SNO-specific context (node, cluster, location)
Batches logs to minimize network usage
Writes to S3 organized by cluster and date

Monitor Cluster Health

Check SNO cluster health and send metrics to central monitoring:

input:
  generate:
    interval: 60s
    mapping: |
      root.check_time = now()
      root.cluster = env("CLUSTER_NAME")

pipeline:
  processors:
    # Check node status
    - command:
        name: oc
        args_mapping: '["get", "nodes", "-o", "json"]'
    
    - mapping: |
        root.nodes = content().parse_json().items
        root.node_ready = this.nodes.all(n -> 
          n.status.conditions.any(c -> c.type == "Ready" && c.status == "True")
        )
        root.node_name = this.nodes.index(0).metadata.name
    
    # Check cluster operators
    - command:
        name: oc
        args_mapping: '["get", "clusteroperators", "-o", "json"]'
    
    - mapping: |
        root.operators = content().parse_json().items
        root.degraded_operators = this.operators.filter(op -> 
          op.status.conditions.any(c -> c.type == "Degraded" && c.status == "True")
        ).map_each(op -> op.metadata.name)
        
        root.all_operators_healthy = this.degraded_operators.length() == 0
    
    # Check pod status across namespaces
    - command:
        name: oc
        args_mapping: '["get", "pods", "--all-namespaces", "-o", "json"]'
    
    - mapping: |
        root.pods = content().parse_json().items
        root.total_pods = this.pods.length()
        root.running_pods = this.pods.filter(p -> p.status.phase == "Running").length()
        root.failed_pods = this.pods.filter(p -> 
          p.status.phase == "Failed" || p.status.phase == "CrashLoopBackOff"
        ).map_each(p -> {
          "namespace": p.metadata.namespace,
          "name": p.metadata.name,
          "phase": p.status.phase
        })
    
    # Aggregate health status
    - mapping: |
        root.health_report = {
          "cluster": @cluster,
          "location": env("LOCATION"),
          "timestamp": @check_time,
          "node_ready": @node_ready,
          "operators_healthy": @all_operators_healthy,
          "degraded_operators": @degraded_operators,
          "total_pods": @total_pods,
          "running_pods": @running_pods,
          "failed_pods": @failed_pods,
          "cluster_healthy": @node_ready && @all_operators_healthy && @failed_pods.length() == 0
        }

output:
  switch:
    cases:
      # Alert if unhealthy
      - check: '!this.health_report.cluster_healthy'
        output:
          broker:
            pattern: fan_out
            outputs:
              # Send alert
              - http_client:
                  url: https://alerts.company.com/sno-health
                  verb: POST
                  headers:
                    Content-Type: application/json
              # Log alert
              - aws_s3:
                  bucket: sno-health-alerts
                  path: 'alerts/${! env("CLUSTER_NAME") }/${! timestamp_unix() }.json'
      
      # Normal health metrics
      - output:
          http_client:
            url: https://metrics.company.com/sno-health
            verb: POST
            batching:
              count: 10
              period: 5m

Monitor Resource Usage

Track CPU, memory, and storage on the SNO node:

input:
  generate:
    interval: 60s
    mapping: 'root = {}'

pipeline:
  processors:
    # Get node resource usage
    - command:
        name: oc
        args_mapping: '["adm", "top", "node", "--no-headers"]'
    
    - mapping: |
        # Parse: node-name   CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
        let parts = content().string().split_regex("\\s+")
        root.node_name = $parts.0
        root.cpu_cores = $parts.1
        root.cpu_percent = $parts.2.trim("%").parse_float()
        root.memory_bytes = $parts.3
        root.memory_percent = $parts.4.trim("%").parse_float()
        root.cluster = env("CLUSTER_NAME")
        root.timestamp = now()
    
    # Get pod resource usage
    - command:
        name: oc
        args_mapping: '["adm", "top", "pods", "--all-namespaces", "--no-headers"]'
    
    - mapping: |
        # Parse pod metrics
        root.pod_metrics = content().string().split("\n").filter(l -> l != "").map_each(line -> {
          let parts = line.split_regex("\\s+")
          {
            "namespace": $parts.0,
            "pod": $parts.1,
            "cpu": $parts.2,
            "memory": $parts.3
          }
        })
        
        # Aggregate by namespace
        root.namespace_usage = this.pod_metrics.fold({}, tally, namespace -> {
          $tally.set(namespace.namespace, ($tally.get(namespace.namespace).or(0) + 1))
        })
    
    # Check for resource pressure
    - mapping: |
        root.resource_alert = {
          "high_cpu": this.cpu_percent > 80,
          "high_memory": this.memory_percent > 85,
          "cluster": @cluster,
          "timestamp": @timestamp
        }

output:
  broker:
    pattern: fan_out
    outputs:
      # Send metrics
      - http_client:
          url: https://metrics.company.com/sno-resources
          verb: POST
          batching:
            count: 20
            period: 5m
      
      # Alert on high resource usage
      - switch:
          cases:
            - check: 'this.resource_alert.high_cpu || this.resource_alert.high_memory'
              output:
                http_client:
                  url: https://alerts.company.com/sno-resources
                  verb: POST

Collect Specific Application Logs

Focus on logs from specific namespaces (e.g., production apps):

input:
  subprocess:
    name: oc
    args:
      - logs
      - --namespace=production
      - --all-containers=true
      - --prefix=true
      - --follow
      - --selector=app=point-of-sale
    codec: lines
    restart_on_exit: true

pipeline:
  processors:
    - mapping: |
        # Parse and structure logs
        root = this.parse_json().catch({
          "message": this,
          "level": "info"
        })
        root.cluster = env("CLUSTER_NAME")
        root.location = env("LOCATION")
        root.app = "point-of-sale"
        root.timestamp = now()

output:
  broker:
    pattern: fan_out
    outputs:
      # Real-time to Elasticsearch
      - elasticsearch_v2:
          urls: ['https://elasticsearch.company.com:9200']
          index: 'sno-pos-logs-${! timestamp_unix("2006-01-02") }'
          batching:
            count: 100
            period: 10s
      
      # Archive to S3
      - aws_s3:
          bucket: sno-app-logs
          path: 'pos/${! env("CLUSTER_NAME") }/${! timestamp_unix() }.jsonl'
          batching:
            count: 5000
            period: 10m

Offline-Resilient Configuration

Handle intermittent connectivity with retry and buffering:

input:
  subprocess:
    name: oc
    args: [logs, --all-containers, --prefix, --follow, --all-namespaces]
    codec: lines
    restart_on_exit: true

pipeline:
  processors:
    - mapping: |
        root = this
        root.cluster = env("CLUSTER_NAME")
        root.timestamp = now()

# Buffer for offline periods
buffer:
  system_window:
    timestamp_mapping: 'root = this.timestamp'
    size: 1h

output:
  retry:
    max_retries: 10
    backoff:
      initial_interval: 30s
      max_interval: 10m
    output:
      aws_s3:
        bucket: sno-logs
        path: 'logs/${! env("CLUSTER_NAME") }/${! timestamp_unix() }.jsonl'
        batching:
          count: 5000
          period: 5m

What this does:

Buffers up to 1 hour of logs locally
Retries S3 writes up to 10 times
Uses exponential backoff (30s → 10m)
Queues logs during network outages
Automatically catches up when connectivity returns

Service Account Setup

Create RBAC for Expanso to read cluster data.

Service Account:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: expanso-edge
  namespace: expanso-system

ClusterRole:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: expanso-edge-reader
rules:
- apiGroups: [""]
  resources: ["pods", "pods/log", "nodes", "namespaces"]
  verbs: ["get", "list", "watch"]
- apiGroups: ["apps"]
  resources: ["deployments", "daemonsets", "statefulsets"]
  verbs: ["get", "list"]
- apiGroups: ["config.openshift.io"]
  resources: ["clusteroperators"]
  verbs: ["get", "list"]

ClusterRoleBinding:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: expanso-edge-reader
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: expanso-edge-reader
subjects:
- kind: ServiceAccount
  name: expanso-edge
  namespace: expanso-system

Apply all three with:

oc apply -f expanso-serviceaccount.yaml
oc apply -f expanso-clusterrole.yaml
oc apply -f expanso-clusterrolebinding.yaml

Best Practices for SNO

1. Resource Allocation

# Expanso uses minimal resources on your SNO node
resources:
  requests:
    cpu: 100m      # 0.1 CPU cores
    memory: 128Mi  # 128 MB RAM
  limits:
    cpu: 500m      # 0.5 CPU cores max
    memory: 512Mi  # 512 MB RAM max

These conservative limits ensure Expanso doesn't impact your application workloads.

2. Use Batch Processing

output:
  aws_s3:
    batching:
      count: 5000       # Larger batches for SNO
      period: 10m       # Longer periods to reduce network

Reduces network overhead critical for edge deployments.

3. Filter Logs Early

pipeline:
  processors:
    # Only send WARN and ERROR logs
    - switch:
        cases:
          - check: 'this.level.lowercase().contains_any(["warn", "error", "fatal"])'

Saves bandwidth and storage costs.

4. Add Location Context

processors:
  - mapping: |
      root.cluster_name = env("CLUSTER_NAME")
      root.location = env("LOCATION")
      root.deployment_type = "single-node-openshift"

Essential for multi-site deployments.

Troubleshooting

oc Command Not Found

Solution: Use full path or install OpenShift CLI in Expanso container:

# Add to Dockerfile
RUN curl -LO https://mirror.openshift.com/pub/openshift-v4/clients/ocp/stable/openshift-client-linux.tar.gz && \
    tar -xzf openshift-client-linux.tar.gz -C /usr/local/bin oc

Permission Denied

Solution: Verify service account permissions:

oc auth can-i get pods --all-namespaces --as=system:serviceaccount:expanso-system:expanso-edge

High Resource Usage

Solution: Reduce log collection frequency and increase batching:

input:
  subprocess:
    args:
      - logs
      - --since=5m  # Only last 5 minutes instead of all logs

output:
  batching:
    count: 10000  # Larger batches
    period: 15m   # Less frequent writes

Integration with OpenShift Logging

Expanso can complement OpenShift's built-in logging:

# Collect from OpenShift logging stack
input:
  subprocess:
    name: oc
    args:
      - logs
      - --namespace=openshift-logging
      - deployment/cluster-logging-operator
      - --follow
    codec: lines
    restart_on_exit: true

Or forward to external systems that OpenShift logging doesn't support.

Next Steps

K3s Logs: Similar patterns for K3s clusters
Kubernetes Deployments: Deploy manifests to SNO
Docker Compose: Manage containers alongside OpenShift
Subprocess Input: Component reference

What is Single-Node OpenShift?​

Why Use Expanso with Single-Node OpenShift?​

Deploy Expanso on Single-Node OpenShift​

Collect OpenShift Logs​

Monitor Cluster Health​

Monitor Resource Usage​

Collect Specific Application Logs​

Offline-Resilient Configuration​

Service Account Setup​

Best Practices for SNO​

1. Resource Allocation​

2. Use Batch Processing​

3. Filter Logs Early​

4. Add Location Context​

Troubleshooting​

oc Command Not Found​

Permission Denied​

High Resource Usage​

Integration with OpenShift Logging​

Next Steps​

Additional Resources​