What is Expanso and how does it work?

Expanso is a managed platform for deploying intelligent data pipelines at the edge. It processes data where it's generated - reducing bandwidth, latency, and costs. You deploy lightweight agents on your infrastructure, build pipelines using our visual builder or YAML, and control everything from a central SaaS platform.

Can I run AI/ML models directly in my data pipelines?

Yes! Expanso supports running ONNX, TensorFlow Lite, and other models as native pipeline steps. Execute low-latency inference on streaming data, enrich events with model outputs (like risk scores), and make decisions at the edge without cloud round-trips.

How many pre-built components are available?

Expanso provides 200+ pre-built components including inputs (Kafka, HTTP, files), processors (transformations, filtering, PII masking, aggregations), and outputs (S3, Snowflake, Datadog, Splunk). Browse the complete catalog in our Component Reference.

Do I need to write code to build pipelines?

No - use our drag-and-drop visual pipeline builder to create sophisticated pipelines without code. For advanced use cases, you can also write pipelines in YAML or use the Bloblang transformation language for complex data mappings.

How does Expanso help with data governance and compliance?

Expanso includes built-in governance features: automatic PII detection and masking, policy enforcement at the edge, RBAC, SSO integration, and comprehensive audit trails. Mask sensitive data before it ever leaves your network.

Collect OpenShift Logs

Stream logs from all pods and namespaces in your SNO cluster to S3 with structured metadata.

Pipeline

input:
  subprocess:
    name: oc
    args:
      - logs
      - --all-containers=true
      - --prefix=true
      - --follow
      - --all-namespaces
      - --since=10m
    codec: lines
    restart_on_exit: true

pipeline:
  processors:
    # Parse oc log prefix: [namespace/pod/container] message
    - mapping: |
        root.raw_log = this
        root.timestamp = now()

        # Extract metadata from prefix
        let parts = this.re_find_all("^\\[([^/]+)/([^/]+)/([^\\]]+)\\] (.*)$")
        root.namespace = $parts.0.1
        root.pod = $parts.0.2
        root.container = $parts.0.3
        root.message = $parts.0.4

        # Add SNO cluster context
        root.node_name = env("NODE_NAME")
        root.cluster_name = env("CLUSTER_NAME")
        root.location = env("LOCATION")
        root.deployment_type = "single-node-openshift"

output:
  aws_s3:
    bucket: edge-openshift-logs
    path: 'sno/${! env("CLUSTER_NAME") }/${! timestamp_unix("2006-01-02") }/${! json("namespace") }.jsonl'
    batching:
      count: 1000
      period: 5m
    processors:
      - archive:
          format: concatenate

What This Does

Follows logs from all containers in all namespaces using oc logs --follow
Parses metadata: Extracts namespace, pod, and container from log prefix
Adds SNO context: Includes node name, cluster identifier, and physical location
Batches logs: Collects 1000 logs or waits 5 minutes before writing
Organizes by namespace: S3 path includes namespace for easy filtering
Auto-restarts: If oc process exits, it automatically restarts

Example Output

Input (oc log line):

[production/web-app-7d8f9c/app] Request processed in 45ms

Output (structured JSON):

{
  "namespace": "production",
  "pod": "web-app-7d8f9c",
  "container": "app",
  "message": "Request processed in 45ms",
  "node_name": "sno-retail-001",
  "cluster_name": "sno-retail-001",
  "location": "store-chicago-north",
  "deployment_type": "single-node-openshift",
  "timestamp": "2024-11-12T10:30:45Z"
}

Key Arguments

--all-containers=true: Includes logs from all containers in each pod

--prefix=true: Adds [namespace/pod/container] prefix for parsing

--follow: Continuously streams new logs (like tail -f)

--since=10m: Only collect logs from last 10 minutes (reduces initial load)

Environment Variables

Set these where Expanso runs:

NODE_NAME: Automatically set by Kubernetes (from pod spec)
CLUSTER_NAME: Unique SNO cluster identifier
LOCATION: Physical location identifier

Next Steps

Application Logs: Focus on specific apps or namespaces
Offline-Resilient: Add buffering for intermittent connectivity
Best Practices: Optimize for SNO resource constraints

Pipeline​

What This Does​

Example Output​

Key Arguments​

Environment Variables​

Next Steps​