What is Expanso and how does it work?

Expanso is a managed platform for deploying intelligent data pipelines at the edge. It processes data where it's generated - reducing bandwidth, latency, and costs. You deploy lightweight agents on your infrastructure, build pipelines using our visual builder or YAML, and control everything from a central SaaS platform.

Can I run AI/ML models directly in my data pipelines?

Yes! Expanso supports running ONNX, TensorFlow Lite, and other models as native pipeline steps. Execute low-latency inference on streaming data, enrich events with model outputs (like risk scores), and make decisions at the edge without cloud round-trips.

How many pre-built components are available?

Expanso provides 200+ pre-built components including inputs (Kafka, HTTP, files), processors (transformations, filtering, PII masking, aggregations), and outputs (S3, Snowflake, Datadog, Splunk). Browse the complete catalog in our Component Reference.

Do I need to write code to build pipelines?

No - use our drag-and-drop visual pipeline builder to create sophisticated pipelines without code. For advanced use cases, you can also write pipelines in YAML or use the Bloblang transformation language for complex data mappings.

How does Expanso help with data governance and compliance?

Expanso includes built-in governance features: automatic PII detection and masking, policy enforcement at the edge, RBAC, SSO integration, and comprehensive audit trails. Mask sensitive data before it ever leaves your network.

file

Consumes data from files on disk, emitting messages according to a chosen codec.

When to Use

Use file input when you need to:

Read log files from /var/log/ or application directories
Process CSV/JSON files dropped into a directory
Tail files like tail -f for continuous streaming

Don't use this if:

Files are on S3/GCS — use aws_s3 or gcp_cloud_storage
You're reading from SFTP — use sftp
You need stdin — use stdin

Common Patterns

Tail Log Files

Watch for new lines continuously:

input:
  file:
    paths: ["/var/log/app/*.log"]
    scanner:
      lines: {}

Process All JSON Files

Read complete files as documents:

input:
  file:
    paths: ["/data/incoming/*.json"]
    scanner:
      json_documents: {}

CSV Processing

Parse CSV with headers:

input:
  file:
    paths: ["/data/*.csv"]
    scanner:
      csv: {}

Common
Advanced

# Common config fields, showing default values
input:
  label: ""
  file:
    paths: [] # No default (required)
    scanner:
      lines: {}
    auto_replay_nacks: true

# All config fields, showing default values
input:
  label: ""
  file:
    paths: [] # No default (required)
    scanner:
      lines: {}
    delete_on_finish: false
    auto_replay_nacks: true

Metadata

This input adds the following metadata fields to each message:

- path
- mod_time_unix
- mod_time (RFC3339)

You can access these metadata fields using function interpolation.

Fields

`paths`

A list of paths to consume sequentially. Glob patterns are supported, including super globs (double star).

Type: array

`scanner`

The scanner by which the stream of bytes consumed will be broken out into individual messages. Scanners are useful for processing large sources of data without holding the entirety of it within memory. For example, the csv scanner allows you to process individual CSV rows without loading the entire CSV file in memory at once.

Type: scanner
Default: {"lines":{}}

`delete_on_finish`

Whether to delete input files from the disk once they are fully consumed.

Type: bool
Default: false

`auto_replay_nacks`

Whether messages that are rejected (nacked) at the output level should be automatically replayed indefinitely, eventually resulting in back pressure if the cause of the rejections is persistent. If set to false these messages will instead be deleted. Disabling auto replays can greatly improve memory efficiency of high throughput streams as the original shape of the data can be discarded immediately upon consumption and mutation.

Type: bool
Default: true

Examples

Read a Bunch of CSVs

If we wished to consume a directory of CSV files as structured documents we can use a glob pattern and the csv scanner:

input:
  file:
    paths: [ ./data/*.csv ]
    scanner:
      csv: {}

When to Use​

Common Patterns​

Tail Log Files​

Process All JSON Files​

CSV Processing​

Metadata​

Fields​

paths​

scanner​

delete_on_finish​

auto_replay_nacks​

Examples​