What is Expanso and how does it work?

Expanso is a managed platform for deploying intelligent data pipelines at the edge. It processes data where it's generated - reducing bandwidth, latency, and costs. You deploy lightweight agents on your infrastructure, build pipelines using our visual builder or YAML, and control everything from a central SaaS platform.

Can I run AI/ML models directly in my data pipelines?

Yes! Expanso supports running ONNX, TensorFlow Lite, and other models as native pipeline steps. Execute low-latency inference on streaming data, enrich events with model outputs (like risk scores), and make decisions at the edge without cloud round-trips.

How many pre-built components are available?

Expanso provides 200+ pre-built components including inputs (Kafka, HTTP, files), processors (transformations, filtering, PII masking, aggregations), and outputs (S3, Snowflake, Datadog, Splunk). Browse the complete catalog in our Component Reference.

Do I need to write code to build pipelines?

No - use our drag-and-drop visual pipeline builder to create sophisticated pipelines without code. For advanced use cases, you can also write pipelines in YAML or use the Bloblang transformation language for complex data mappings.

How does Expanso help with data governance and compliance?

Expanso includes built-in governance features: automatic PII detection and masking, policy enforcement at the edge, RBAC, SSO integration, and comprehensive audit trails. Mask sensitive data before it ever leaves your network.

nlp_classify_tokens

BETA

This component is mostly stable but breaking changes could still be made outside of major version releases if a fundamental problem with the component is found.

Performs token classification using a Hugging Face 🤗 NLP pipeline with an ONNX Runtime model.

Introduced in version v1.11.0.

Common
Advanced

# Common config fields, showing default values
label: ""
nlp_classify_tokens:
  name: "" # No default (optional)
  path: /path/to/models/my_model.onnx # No default (required)
  aggregation_strategy: SIMPLE
  ignore_labels: []

# All config fields, showing default values
label: ""
nlp_classify_tokens:
  name: "" # No default (optional)
  path: /path/to/models/my_model.onnx # No default (required)
  enable_download: false
  download_options:
    repository: KnightsAnalytics/distilbert-NER # No default (required)
    onnx_filepath: model.onnx
  aggregation_strategy: SIMPLE
  ignore_labels: []

Token Classification

Token classification assigns a label to individual tokens in a sentence. This processor runs token classification inference against batches of text data, returning a set of Entities classification corresponding to each input. This component uses Hugot, a library that provides an interface for running Open Neural Network Exchange (ONNX) models and transformer pipelines, with a focus on NLP tasks.

Currently, Expanso Edge only implements:

What is a pipeline?

From HuggingFace docs:

A pipeline in 🤗 Transformers is an abstraction referring to a series of steps that are executed in a specific order to preprocess and transform data and return a prediction from a model. Some example stages found in a pipeline might be data preprocessing, feature extraction, and normalization.

warning

While, only models in ONNX format are supported, exporting existing formats to ONNX is both possible and straightforward in most standard ML libraries. For more on this, check out the ONNX conversion docs. Otherwise, check out using HuggingFace Optimum for easy model conversion.

Examples

Named Entity Recognition
Custom Entity Extraction

Extract entities like persons, organizations, and locations from text.

pipeline:
  processors:
    - nlp_classify_tokens:
        path: "KnightsAnalytics/distilbert-NER"
        aggregation_strategy: "SIMPLE"
        ignore_labels: ["O"]
# In: "John works at Apple Inc. in New York."
# Out: [
#   {"Entity": "PER", "Score": 0.997136, "Index": 0, "Word": "John", "Start": 0, "End": 4, "IsSubword": false},
#   {"Entity": "ORG", "Score": 0.985432, "Index": 3, "Word": "Apple Inc.", "Start": 14, "End": 24, "IsSubword": false},
#   {"Entity": "LOC", "Score": 0.972841, "Index": 6, "Word": "New York", "Start": 28, "End": 36, "IsSubword": false}
# ]

Extract entities with no aggregation to see individual token classifications.

pipeline:
  processors:
    - nlp_classify_tokens:
        path: "KnightsAnalytics/distilbert-NER"
        aggregation_strategy: "NONE"
        ignore_labels: ["O", "MISC"]
# In: "Microsoft was founded by Bill Gates."
# Out: [
#   {"Entity": "B-ORG", "Score": 0.991234, "Index": 0, "Word": "Microsoft", "Start": 0, "End": 9, "IsSubword": false},
#   {"Entity": "B-PER", "Score": 0.987654, "Index": 4, "Word": "Bill", "Start": 23, "End": 27, "IsSubword": false},
#   {"Entity": "I-PER", "Score": 0.976543, "Index": 5, "Word": "Gates", "Start": 28, "End": 33, "IsSubword": false}
# ]

Fields

`name`

Name of the hugot pipeline. Defaults to a random UUID if not set.

Type: string

`path`

Path to the ONNX model file, or directory containing the model. When downloading (enable_download: true), this becomes the destination and must be a directory.

Type: string

# Examples

path: /path/to/models/my_model.onnx

path: /path/to/models/

`enable_download`

When enabled, attempts to download an ONNX Runtime compatible model from HuggingFace specified in repository.

Type: bool
Default: false

`download_options`

Options used to download a model directly from HuggingFace. Before the model is downloaded, validation occurs to ensure the remote repository contains both an.onnx and tokenizers.json file.

Type: object

`download_options.repository`

The name of the huggingface model repository.

Type: string

# Examples

repository: KnightsAnalytics/distilbert-NER

repository: KnightsAnalytics/distilbert-base-uncased-finetuned-sst-2-english

repository: sentence-transformers/all-MiniLM-L6-v2

`download_options.onnx_filepath`

Filepath of the ONNX model within the repository. Only needed when multiple .onnx files exist.

Type: string
Default: "model.onnx"

# Examples

onnx_filepath: onnx/model.onnx

onnx_filepath: onnx/model_quantized.onnx

onnx_filepath: onnx/model_fp16.onnx

`aggregation_strategy`

The aggregation strategy to use for the token classification pipeline.

Type: string
Default: "SIMPLE"
Options: SIMPLE, NONE.

`ignore_labels`

Labels to ignore in the token classification pipeline.

Type: array
Default: []

# Examples

ignore_labels:
  - O
  - MISC

Token Classification​

What is a pipeline?​

Examples​

Fields​

name​

path​

enable_download​

download_options​

download_options.repository​

download_options.onnx_filepath​

aggregation_strategy​

ignore_labels​

Token Classification

What is a pipeline?

Examples

Fields

`name`

`path`

`enable_download`

`download_options`

`download_options.repository`

`download_options.onnx_filepath`

`aggregation_strategy`

`ignore_labels`