What is Expanso and how does it work?

Expanso is a managed platform for deploying intelligent data pipelines at the edge. It processes data where it's generated - reducing bandwidth, latency, and costs. You deploy lightweight agents on your infrastructure, build pipelines using our visual builder or YAML, and control everything from a central SaaS platform.

Can I run AI/ML models directly in my data pipelines?

Yes! Expanso supports running ONNX, TensorFlow Lite, and other models as native pipeline steps. Execute low-latency inference on streaming data, enrich events with model outputs (like risk scores), and make decisions at the edge without cloud round-trips.

How many pre-built components are available?

Expanso provides 200+ pre-built components including inputs (Kafka, HTTP, files), processors (transformations, filtering, PII masking, aggregations), and outputs (S3, Snowflake, Datadog, Splunk). Browse the complete catalog in our Component Reference.

Do I need to write code to build pipelines?

No - use our drag-and-drop visual pipeline builder to create sophisticated pipelines without code. For advanced use cases, you can also write pipelines in YAML or use the Bloblang transformation language for complex data mappings.

How does Expanso help with data governance and compliance?

Expanso includes built-in governance features: automatic PII detection and masking, policy enforcement at the edge, RBAC, SSO integration, and comprehensive audit trails. Mask sensitive data before it ever leaves your network.

Monitor SNO Cluster Health

Automatically check SNO cluster health and send alerts when issues are detected.

Pipeline

input:
  generate:
    interval: 60s
    mapping: |
      root.check_time = now()
      root.cluster = env("CLUSTER_NAME")

pipeline:
  processors:
    # Check node status
    - command:
        name: oc
        args_mapping: '["get", "nodes", "-o", "json"]'

    - mapping: |
        root.nodes = content().parse_json().items
        root.node_ready = this.nodes.all(n ->
          n.status.conditions.any(c -> c.type == "Ready" && c.status == "True")
        )
        root.node_name = this.nodes.index(0).metadata.name

    # Check cluster operators
    - command:
        name: oc
        args_mapping: '["get", "clusteroperators", "-o", "json"]'

    - mapping: |
        root.operators = content().parse_json().items
        root.degraded_operators = this.operators.filter(op ->
          op.status.conditions.any(c -> c.type == "Degraded" && c.status == "True")
        ).map_each(op -> op.metadata.name)

        root.all_operators_healthy = this.degraded_operators.length() == 0

    # Check pod status across namespaces
    - command:
        name: oc
        args_mapping: '["get", "pods", "--all-namespaces", "-o", "json"]'

    - mapping: |
        root.pods = content().parse_json().items
        root.total_pods = this.pods.length()
        root.running_pods = this.pods.filter(p -> p.status.phase == "Running").length()
        root.failed_pods = this.pods.filter(p ->
          p.status.phase == "Failed" || p.status.phase == "CrashLoopBackOff"
        ).map_each(p -> {
          "namespace": p.metadata.namespace,
          "name": p.metadata.name,
          "phase": p.status.phase
        })

    # Aggregate health status
    - mapping: |
        root.health_report = {
          "cluster": @cluster,
          "location": env("LOCATION"),
          "timestamp": @check_time,
          "node_ready": @node_ready,
          "operators_healthy": @all_operators_healthy,
          "degraded_operators": @degraded_operators,
          "total_pods": @total_pods,
          "running_pods": @running_pods,
          "failed_pods": @failed_pods,
          "cluster_healthy": @node_ready && @all_operators_healthy && @failed_pods.length() == 0
        }

output:
  switch:
    cases:
      # Alert if unhealthy
      - check: '!this.health_report.cluster_healthy'
        output:
          broker:
            pattern: fan_out
            outputs:
              # Send alert
              - http_client:
                  url: https://alerts.company.com/sno-health
                  verb: POST
                  headers:
                    Content-Type: application/json
              # Log alert
              - aws_s3:
                  bucket: sno-health-alerts
                  path: 'alerts/${! env("CLUSTER_NAME") }/${! timestamp_unix() }.json'

      # Normal health metrics
      - output:
          http_client:
            url: https://metrics.company.com/sno-health
            verb: POST
            batching:
              count: 10
              period: 5m

What This Does

Checks every 60 seconds: Generates health check trigger every minute
Node status: Verifies the single node is Ready
Operator health: Checks if any cluster operators are degraded
Pod health: Counts running vs failed pods across all namespaces
Conditional alerting: Sends immediate alert if cluster is unhealthy, otherwise batches metrics
Detailed failure info: Includes list of degraded operators and failed pods

Health Criteria

Healthy cluster:

Node status is Ready
All cluster operators are not degraded
No failed or crash-looping pods

Unhealthy cluster (triggers alert):

Node is not Ready
Any operator is degraded
Any pods are in Failed or CrashLoopBackOff state

Example Health Report

{
  "cluster": "sno-retail-001",
  "location": "store-chicago-north",
  "timestamp": "2024-11-12T10:30:00Z",
  "node_ready": true,
  "operators_healthy": true,
  "degraded_operators": [],
  "total_pods": 145,
  "running_pods": 145,
  "failed_pods": [],
  "cluster_healthy": true
}

Alert Example

When unhealthy, alert includes specific failures:

{
  "cluster": "sno-retail-001",
  "cluster_healthy": false,
  "degraded_operators": ["authentication", "console"],
  "failed_pods": [
    {"namespace": "production", "name": "web-app-7d8f9c", "phase": "CrashLoopBackOff"}
  ]
}

Customization

Change check frequency: Adjust interval in input (e.g., interval: 5m for every 5 minutes)

Alert thresholds: Modify health criteria in the aggregation mapping

Additional checks: Add more command processors to check storage, networking, etc.

Next Steps

Resource Monitoring: Track CPU, memory, and storage usage
Collect Logs: Correlate health issues with logs
Best Practices: Optimize monitoring for SNO

Pipeline​

What This Does​

Health Criteria​

Example Health Report​

Alert Example​

Customization​

Next Steps​