What is Expanso and how does it work?

Expanso is a managed platform for deploying intelligent data pipelines at the edge. It processes data where it's generated - reducing bandwidth, latency, and costs. You deploy lightweight agents on your infrastructure, build pipelines using our visual builder or YAML, and control everything from a central SaaS platform.

Can I run AI/ML models directly in my data pipelines?

Yes! Expanso supports running ONNX, TensorFlow Lite, and other models as native pipeline steps. Execute low-latency inference on streaming data, enrich events with model outputs (like risk scores), and make decisions at the edge without cloud round-trips.

How many pre-built components are available?

Expanso provides 200+ pre-built components including inputs (Kafka, HTTP, files), processors (transformations, filtering, PII masking, aggregations), and outputs (S3, Snowflake, Datadog, Splunk). Browse the complete catalog in our Component Reference.

Do I need to write code to build pipelines?

No - use our drag-and-drop visual pipeline builder to create sophisticated pipelines without code. For advanced use cases, you can also write pipelines in YAML or use the Bloblang transformation language for complex data mappings.

How does Expanso help with data governance and compliance?

Expanso includes built-in governance features: automatic PII detection and masking, policy enforcement at the edge, RBAC, SSO integration, and comprehensive audit trails. Mask sensitive data before it ever leaves your network.

Collect Logs from Specific Namespace

Collect logs from a single namespace instead of all namespaces to reduce volume and focus on specific applications.

Pipeline

input:
  subprocess:
    name: kubectl
    args:
      - logs
      - --namespace=production
      - --all-containers=true
      - --prefix=true
      - --follow
    codec: lines
    restart_on_exit: true

pipeline:
  processors:
    - mapping: |
        root.log = this
        root.namespace = "production"
        root.node_id = env("NODE_ID")
        root.timestamp = now()

output:
  http_client:
    url: https://logs.company.com/ingest
    verb: POST
    batching:
      count: 500
      period: 30s

What This Does

Namespace filtering: Only collects logs from the production namespace
Reduced volume: Ignores logs from other namespaces (kube-system, monitoring, etc.)
HTTP output: Sends logs to a custom log ingestion endpoint
Smaller batches: 500 logs or 30 seconds for faster delivery

Use Cases

Production monitoring: Only collect logs from production workloads, ignore system pods

Multi-tenant clusters: Separate log collection per tenant namespace

High-volume namespaces: Isolate logs from specific high-traffic applications

Compliance: Collect logs only from namespaces with compliance requirements

Multiple Namespace Pipelines

Run multiple Expanso pipelines to collect from different namespaces:

production-logs.yaml:

input:
  subprocess:
    name: kubectl
    args: [logs, --namespace=production, --follow]
output:
  aws_s3:
    bucket: production-logs

staging-logs.yaml:

input:
  subprocess:
    name: kubectl
    args: [logs, --namespace=staging, --follow]
output:
  aws_s3:
    bucket: staging-logs

Run both:

expanso-edge run --config production-logs.yaml &
expanso-edge run --config staging-logs.yaml &

Namespace Patterns

Collect from multiple specific namespaces: Run separate pipelines for each

Exclude system namespaces: Use --all-namespaces and filter out kube-system, kube-public

Dynamic namespace selection: Use environment variables:

args:
  - logs
  - --namespace=${NAMESPACE}
  - --follow

Next Steps

Basic Collection: Collect from all namespaces
Filter by Log Level: Combine namespace filtering with level filtering
Best Practices: Learn about efficient log handling

Pipeline​

What This Does​

Use Cases​

Multiple Namespace Pipelines​

Namespace Patterns​

Next Steps​