What is Expanso and how does it work?

Expanso is a managed platform for deploying intelligent data pipelines at the edge. It processes data where it's generated - reducing bandwidth, latency, and costs. You deploy lightweight agents on your infrastructure, build pipelines using our visual builder or YAML, and control everything from a central SaaS platform.

Can I run AI/ML models directly in my data pipelines?

Yes! Expanso supports running ONNX, TensorFlow Lite, and other models as native pipeline steps. Execute low-latency inference on streaming data, enrich events with model outputs (like risk scores), and make decisions at the edge without cloud round-trips.

How many pre-built components are available?

Expanso provides 200+ pre-built components including inputs (Kafka, HTTP, files), processors (transformations, filtering, PII masking, aggregations), and outputs (S3, Snowflake, Datadog, Splunk). Browse the complete catalog in our Component Reference.

Do I need to write code to build pipelines?

No - use our drag-and-drop visual pipeline builder to create sophisticated pipelines without code. For advanced use cases, you can also write pipelines in YAML or use the Bloblang transformation language for complex data mappings.

How does Expanso help with data governance and compliance?

Expanso includes built-in governance features: automatic PII detection and masking, policy enforcement at the edge, RBAC, SSO integration, and comprehensive audit trails. Mask sensitive data before it ever leaves your network.

Database Best Practices

Performance tips, optimization strategies, and troubleshooting guidance for database integration with Expanso Edge.

Performance Optimization

1. Always Use Batching for Writes

Batch inserts provide 10-100x better performance than individual inserts.

Do:

output:
  sql_insert:
    driver: postgres
    dsn: postgres://user:pass@localhost:5432/db?sslmode=disable
    table: events
    columns: ['data']
    args_mapping: 'root = [ this ]'
    batching:
      count: 500      # Batch 500 records
      period: 10s     # Or every 10 seconds

Don't:

output:
  sql_insert:
    # No batching - slow!
    driver: postgres
    table: events
    columns: ['data']
    args_mapping: 'root = [ this ]'

Performance impact:

Batch Size	Inserts/sec	Improvement
1 (no batching)	100	Baseline
100	5,000	50x faster
1,000	20,000	200x faster

2. Use Local Databases for Edge Storage

Use SQLite for edge-local data instead of remote cloud databases.

Do:

output:
  sql_insert:
    driver: sqlite
    dsn: 'file:///var/lib/expanso/cache.db?_journal_mode=WAL'
    table: events
    columns: ['data']
    args_mapping: 'root = [ this ]'

Benefits:

Sub-millisecond queries (vs 50-200ms for cloud)
Works offline
No network costs
Scales to millions of rows per database

3. Configure Connection Pooling

Properly configure connection pools to balance performance and resource usage.

Do:

output:
  sql_insert:
    driver: postgres
    dsn: postgres://user:pass@localhost:5432/db?sslmode=disable
    table: events
    columns: ['data']
    args_mapping: 'root = [ this ]'
    conn_max_open: 10          # Max 10 concurrent connections
    conn_max_idle: 2           # Keep 2 idle connections
    conn_max_idle_time: 5m     # Close idle after 5 min
    conn_max_life_time: 30m    # Recycle after 30 min

For SQLite:

conn_max_open: 1  # SQLite works best with single writer

For cloud databases:

conn_max_open: 20   # Can handle more concurrent connections
conn_max_idle: 5

4. Initialize Tables on Startup

Use init_statement to ensure tables exist before writing data.

Do:

output:
  sql_insert:
    driver: sqlite
    dsn: 'file:///var/lib/expanso/events.db?_journal_mode=WAL'
    table: events
    columns: ['event_id', 'data']
    args_mapping: 'root = [ this.id, this ]'
    init_statement: |
      CREATE TABLE IF NOT EXISTS events (
        id INTEGER PRIMARY KEY AUTOINCREMENT,
        event_id TEXT NOT NULL UNIQUE,
        data TEXT NOT NULL,
        created_at DATETIME DEFAULT CURRENT_TIMESTAMP
      );
      CREATE INDEX IF NOT EXISTS idx_created_at ON events (created_at);

5. Use SQLite WAL Mode

Always enable Write-Ahead Logging for SQLite.

Do:

dsn: 'file:///var/lib/expanso/db.db?_journal_mode=WAL'

Don't:

dsn: 'file:///var/lib/expanso/db.db'  # No WAL mode

Benefits:

Allows concurrent reads during writes
Better performance
More reliable crash recovery

Database Selection

Use the Right Database for Each Use Case

Use Case	Best Database	Why
Edge caching	SQLite	Fast, embedded, offline-capable
Local analytics	SQLite	No network, sub-millisecond queries
Offline queueing	SQLite	Reliable, no dependencies
Cloud analytics	PostgreSQL	Advanced features, JSON support
Cloud operational	MySQL/PostgreSQL	Proven, widely supported
Time-series	ClickHouse	Optimized for analytics
Data warehouse	Snowflake	Cloud-scale analytics

Connection Management

Handle Connection Failures Gracefully

Use retry logic for transient failures:

output:
  retry:
    max_retries: 5
    backoff:
      initial_interval: 1s
      max_interval: 30s
    output:
      sql_insert:
        driver: postgres
        dsn: postgres://user:pass@db:5432/mydb?sslmode=require
        table: events
        columns: ['data']
        args_mapping: 'root = [ this ]'

Test Connections on Startup

Verify database connectivity:

input:
  sql_select:
    driver: postgres
    dsn: postgres://user:pass@db:5432/mydb?sslmode=require
    table: users
    columns: ['id']
    init_verify_conn: true  # Ping database on startup

Schema Management

Use Versioned Migrations

Track schema changes with versioned migration files:

output:
  sql_insert:
    driver: postgres
    dsn: postgres://user:pass@localhost:5432/db?sslmode=disable
    table: events
    columns: ['data']
    args_mapping: 'root = [ this ]'
    init_files:
      - /etc/expanso/migrations/001_initial_schema.sql
      - /etc/expanso/migrations/002_add_indexes.sql
      - /etc/expanso/migrations/003_add_columns.sql

Handle Upserts Properly

Use database-specific upsert syntax for conflict handling:

PostgreSQL:

suffix: |
  ON CONFLICT (id) DO UPDATE SET
    updated_at = EXCLUDED.updated_at,
    value = EXCLUDED.value

MySQL:

suffix: |
  ON DUPLICATE KEY UPDATE
    updated_at = VALUES(updated_at),
    value = VALUES(value)

SQLite:

suffix: |
  ON CONFLICT(id) DO UPDATE SET
    updated_at = excluded.updated_at,
    value = excluded.value

Security

Never Put Credentials in Code

Don't:

dsn: postgres://admin:SuperSecret123@db:5432/prod

Do - Use environment variables:

dsn: postgres://${DB_USER}:${DB_PASS}@db:5432/prod

Do - Use AWS Secrets Manager:

secret_name: prod/db/credentials
region: us-east-1

Use SSL for Cloud Databases

Do:

dsn: postgres://user:pass@cloud-db:5432/prod?sslmode=require

Don't:

dsn: postgres://user:pass@cloud-db:5432/prod?sslmode=disable

Grant Minimal Permissions

Don't use admin/root accounts. Create dedicated users with minimal permissions:

-- PostgreSQL example
CREATE USER expanso_writer WITH PASSWORD 'secure_password';
GRANT INSERT, SELECT ON TABLE events TO expanso_writer;

-- MySQL example
CREATE USER 'expanso_writer'@'%' IDENTIFIED BY 'secure_password';
GRANT INSERT, SELECT ON mydb.events TO 'expanso_writer'@'%';

Monitoring

Log Slow Queries

Enable query logging for troubleshooting:

PostgreSQL:

ALTER DATABASE mydb SET log_min_duration_statement = 1000;  -- Log queries > 1s

MySQL:

SET GLOBAL slow_query_log = 1;
SET GLOBAL long_query_time = 1;  -- Log queries > 1s

Monitor Connection Pool Usage

Watch for connection pool exhaustion:

Check conn_max_open vs actual usage
Monitor connection wait times
Adjust pool sizes based on load

Track Database Performance

Monitor key metrics:

Query latency (p50, p95, p99)
Connection pool utilization
Insert throughput
Error rates

Troubleshooting

Connection Refused

Symptoms:

dial tcp: connection refused

Solutions:

Verify database is running
Check firewall rules
Verify host and port are correct
For PostgreSQL, check pg_hba.conf and postgresql.conf
For MySQL, check bind-address in my.cnf

Too Many Connections

Symptoms:

too many connections
remaining connection slots are reserved

Solutions:

Reduce conn_max_open in pipeline config
Increase database max_connections setting
Check for connection leaks in other applications
Use connection pooling properly

Slow Inserts

Symptoms:

High insert latency
Pipeline backpressure
Growing buffer sizes

Solutions:

Enable batching:
```
batching:
  count: 500
  period: 10s
```

For SQLite, use WAL mode:

dsn: 'file:///path/to/db.db?_journal_mode=WAL'

Add appropriate indexes (but not too many)
Use connection pooling:
```
conn_max_open: 10
conn_max_idle: 2
```
Consider partitioning for very large tables

Database Locked (SQLite)

Symptoms:

database is locked

Solutions:

Enable WAL mode (most important):

dsn: 'file:///path/to/db.db?_journal_mode=WAL'

Limit concurrent writers:
```
conn_max_open: 1
```
Use smaller batches to reduce transaction duration
Check for long-running queries in other processes

SSL/TLS Errors

Symptoms:

x509: certificate signed by unknown authority
SSL is not enabled on the server

Solutions:

For development (PostgreSQL):

dsn: postgres://user:pass@localhost:5432/db?sslmode=disable

For development (MySQL):

dsn: user:pass@tcp(localhost:3306)/db?tls=skip-verify

For production: Use proper SSL certificates and sslmode=require

Performance Comparison

SQLite vs Cloud Database (Latency)

Operation	SQLite (Local)	Cloud DB	Improvement
Simple SELECT	0.1ms	50ms	500x faster
JOIN query	1ms	150ms	150x faster
INSERT (single)	0.5ms	25ms	50x faster

Batching Impact

Batch Size	Throughput	Cloud Cost Reduction
1 (no batching)	100/sec	Baseline
100	5,000/sec	80% reduction
1,000	20,000/sec	95% reduction

Next Steps

Common Patterns - Real-world integration examples
Database Components - Full component reference
MySQL Guide - MySQL-specific features and tips
PostgreSQL Guide - PostgreSQL-specific features and tips
SQLite Guide - SQLite edge optimization

Performance Optimization​

1. Always Use Batching for Writes​

2. Use Local Databases for Edge Storage​

3. Configure Connection Pooling​

4. Initialize Tables on Startup​

5. Use SQLite WAL Mode​

Database Selection​

Use the Right Database for Each Use Case​

Connection Management​

Handle Connection Failures Gracefully​

Test Connections on Startup​

Schema Management​

Use Versioned Migrations​

Handle Upserts Properly​

Security​

Never Put Credentials in Code​

Use SSL for Cloud Databases​

Grant Minimal Permissions​

Monitoring​

Log Slow Queries​

Monitor Connection Pool Usage​

Track Database Performance​

Troubleshooting​

Connection Refused​

Too Many Connections​

Slow Inserts​

Database Locked (SQLite)​

SSL/TLS Errors​

Performance Comparison​

SQLite vs Cloud Database (Latency)​

Batching Impact​

Next Steps​

Performance Optimization

1. Always Use Batching for Writes

2. Use Local Databases for Edge Storage

3. Configure Connection Pooling

4. Initialize Tables on Startup

5. Use SQLite WAL Mode

Database Selection

Use the Right Database for Each Use Case

Connection Management

Handle Connection Failures Gracefully

Test Connections on Startup

Schema Management

Use Versioned Migrations

Handle Upserts Properly

Security

Never Put Credentials in Code

Use SSL for Cloud Databases

Grant Minimal Permissions

Monitoring

Log Slow Queries

Monitor Connection Pool Usage

Track Database Performance

Troubleshooting

Connection Refused

Too Many Connections

Slow Inserts

Database Locked (SQLite)

SSL/TLS Errors

Performance Comparison

SQLite vs Cloud Database (Latency)

Batching Impact

Next Steps