Skip to main content

nlp_classify_text

BETA

This component is mostly stable but breaking changes could still be made outside of major version releases if a fundamental problem with the component is found.

Performs text classification using a Hugging Face 🤗 NLP pipeline with an ONNX Runtime model.

Introduced in version v1.11.0.

# Common config fields, showing default values
label: ""
nlp_classify_text:
name: "" # No default (optional)
path: /path/to/models/my_model.onnx # No default (required)
aggregation_function: SOFTMAX
multi_label: false

Text Classification​

Text Classification is the task of assigning a label or class to a given text. Some use cases are sentiment analysis, natural language inference, and assessing grammatical correctness. This processor runs text-classification inference against batches of text data, returning labelled classification corresponding to each input. This component uses Hugot, a library that provides an interface for running Open Neural Network Exchange (ONNX) models and transformer pipelines, with a focus on NLP tasks.

Currently, Expanso Edge only implements:

What is a pipeline?​

From HuggingFace docs:

A pipeline in 🤗 Transformers is an abstraction referring to a series of steps that are executed in a specific order to preprocess and transform data and return a prediction from a model. Some example stages found in a pipeline might be data preprocessing, feature extraction, and normalization.

warning

While, only models in ONNX format are supported, exporting existing formats to ONNX is both possible and straightforward in most standard ML libraries. For more on this, check out the ONNX conversion docs. Otherwise, check out using HuggingFace Optimum for easy model conversion.

Examples​

Here, we load the Cohee/distilbert-base-uncased-go-emotions-onnx model from the local directory at models/coheedistilbert_base_uncased_go_emotions_onnx.The processor returns a single-label output with the highest emotion score for the text.

pipeline:
processors:
- nlp_classify_text:
name: classify-incoming-data
path: "models/coheedistilbert_base_uncased_go_emotions_onnx"

# In: "I'm super excited for this example!"
# Out: [{"Label":"excitement","Score":0.34134513}]

Fields​

name​

Name of the hugot pipeline. Defaults to a random UUID if not set.

Type: string

path​

Path to the ONNX model file, or directory containing the model. When downloading (enable_download: true), this becomes the destination and must be a directory.

Type: string

# Examples

path: /path/to/models/my_model.onnx

path: /path/to/models/

enable_download​

When enabled, attempts to download an ONNX Runtime compatible model from HuggingFace specified in repository.

Type: bool
Default: false

download_options​

Options used to download a model directly from HuggingFace. Before the model is downloaded, validation occurs to ensure the remote repository contains both an.onnx and tokenizers.json file.

Type: object

download_options.repository​

The name of the huggingface model repository.

Type: string

# Examples

repository: KnightsAnalytics/distilbert-NER

repository: KnightsAnalytics/distilbert-base-uncased-finetuned-sst-2-english

repository: sentence-transformers/all-MiniLM-L6-v2

download_options.onnx_filepath​

Filepath of the ONNX model within the repository. Only needed when multiple .onnx files exist.

Type: string
Default: "model.onnx"

# Examples

onnx_filepath: onnx/model.onnx

onnx_filepath: onnx/model_quantized.onnx

onnx_filepath: onnx/model_fp16.onnx

aggregation_function​

The aggregation function to use for the text classification pipeline.

Type: string
Default: "SOFTMAX"
Options: SOFTMAX, SIGMOID.

multi_label​

Whether a text classification pipeline should return multiple labels. If false, only the label-pair with the highest score is returned.

Type: bool
Default: false