What is Expanso and how does it work?

Expanso is a managed platform for deploying intelligent data pipelines at the edge. It processes data where it's generated - reducing bandwidth, latency, and costs. You deploy lightweight agents on your infrastructure, build pipelines using our visual builder or YAML, and control everything from a central SaaS platform.

Can I run AI/ML models directly in my data pipelines?

Yes! Expanso supports running ONNX, TensorFlow Lite, and other models as native pipeline steps. Execute low-latency inference on streaming data, enrich events with model outputs (like risk scores), and make decisions at the edge without cloud round-trips.

How many pre-built components are available?

Expanso provides 200+ pre-built components including inputs (Kafka, HTTP, files), processors (transformations, filtering, PII masking, aggregations), and outputs (S3, Snowflake, Datadog, Splunk). Browse the complete catalog in our Component Reference.

Do I need to write code to build pipelines?

No - use our drag-and-drop visual pipeline builder to create sophisticated pipelines without code. For advanced use cases, you can also write pipelines in YAML or use the Bloblang transformation language for complex data mappings.

How does Expanso help with data governance and compliance?

Expanso includes built-in governance features: automatic PII detection and masking, policy enforcement at the edge, RBAC, SSO integration, and comprehensive audit trails. Mask sensitive data before it ever leaves your network.

Parse K3s Log Metadata

Extract structured metadata from kubectl log prefixes to enable filtering and searching by namespace, pod, and container.

Pipeline

input:
  subprocess:
    name: kubectl
    args:
      - logs
      - --all-containers=true
      - --prefix=true
      - --follow
      - --all-namespaces
    codec: lines
    restart_on_exit: true

pipeline:
  processors:
    # Parse kubectl log prefix: [namespace/pod/container] message
    - mapping: |
        root.raw_log = this
        root.timestamp = now()

        # Extract metadata from prefix
        let parts = this.re_find_all("^\\[([^/]+)/([^/]+)/([^\\]]+)\\] (.*)$")
        root.namespace = $parts.0.1
        root.pod = $parts.0.2
        root.container = $parts.0.3
        root.message = $parts.0.4

        # Add context
        root.node_id = env("NODE_ID")
        root.location = env("LOCATION")
        root.cluster = env("CLUSTER_NAME")

output:
  aws_s3:
    bucket: edge-k3s-logs
    path: 'logs/${! env("NODE_ID") }/${! timestamp_unix("2006-01-02") }/${! json("namespace") }.jsonl'
    batching:
      count: 1000
      period: 1m

What This Does

Parses kubectl prefix: Extracts namespace, pod, and container from [namespace/pod/container] format
Separates message: Stores the actual log message separately from metadata
Adds location context: Includes node ID, location, and cluster name
Organizes by namespace: S3 path includes namespace for easy filtering

Example Output

Input (kubectl log line):

[production/web-app-7d8f9c/app] Request processed in 45ms

Output (structured JSON):

{
  "namespace": "production",
  "pod": "web-app-7d8f9c",
  "container": "app",
  "message": "Request processed in 45ms",
  "node_id": "edge-site-42",
  "location": "chicago",
  "cluster": "k3s-chicago",
  "timestamp": "2024-11-09T10:30:45Z"
}

Regex Breakdown

^\\[([^/]+)/([^/]+)/([^\\]]+)\\] (.*)$

^\\[ - Match opening bracket at start
([^/]+) - Capture namespace (everything before first /)
/ - Match separator
([^/]+) - Capture pod name (everything before second /)
/ - Match separator
([^\\]]+) - Capture container name (everything before ])
\\] - Match closing bracket
(.*)$ - Capture message (everything after bracket and space)

Use Cases

Search by namespace: Query S3 for all logs from production namespace

Filter by pod: Find all logs from a specific pod across time

Container-level debugging: Isolate logs from sidecar containers

Multi-cluster aggregation: Compare logs from same namespace across different edge locations

Next Steps

Multiple Destinations: Send parsed logs to Elasticsearch for real-time search
Filter by Log Level: Combine with log level filtering
Best Practices: Learn about efficient log handling

Pipeline​

What This Does​

Example Output​

Regex Breakdown​

Use Cases​

Next Steps​