Skip to main content

K3s Log Collection Best Practices

Configuration recommendations for reliable and efficient K3s log collection.

Always Add Node Identifiers

Include node context in every log to identify the source edge location:

pipeline:
processors:
- mapping: |
root.node_id = env("NODE_ID")
root.location = env("LOCATION")
root.cluster = env("CLUSTER_NAME")

Why: Essential for filtering logs by location when managing 100+ edge sites.

Set environment variables:

export NODE_ID="edge-site-42"
export LOCATION="chicago"
export CLUSTER_NAME="k3s-chicago"

Use Batching for Cloud Destinations

Batch logs before sending to S3, Elasticsearch, or HTTP endpoints:

output:
aws_s3:
bucket: logs
batching:
count: 1000 # Batch size
period: 1m # Max wait time

Why: Reduces API calls by 1000x, lowering costs and improving performance.

Recommended batch sizes:

  • S3: 1000-5000 logs or 1-5 minutes
  • Elasticsearch: 100-500 logs or 10-30 seconds
  • HTTP: 100-1000 logs or 30-60 seconds

Set restart_on_exit: true

Always enable auto-restart for the kubectl subprocess:

input:
subprocess:
name: kubectl
restart_on_exit: true # Auto-restart if kubectl exits

Why: Ensures logs keep flowing if the kubectl process crashes or exits unexpectedly.

Handle Large Log Messages

Set maximum buffer size to prevent memory issues:

input:
subprocess:
name: kubectl
max_buffer: 1048576 # 1MB max per log line

Why: Some applications generate very large log messages (stack traces, JSON payloads). Without a limit, these can cause memory issues.

Recommended sizes:

  • Standard logs: 524288 (512KB)
  • Large logs: 1048576 (1MB)
  • Very large: 2097152 (2MB)

Configure RBAC Permissions

Create a service account with minimal required permissions:

# Create service account
kubectl create serviceaccount expanso-logs

# Create role with log read permissions
kubectl create clusterrole log-reader \
--verb=get,list,watch \
--resource=pods,pods/log

# Bind role to service account
kubectl create clusterrolebinding expanso-logs \
--clusterrole=log-reader \
--serviceaccount=default:expanso-logs

Why: Follows principle of least privilege. Expanso only needs read access to logs, not write access to cluster resources.

Use the service account:

kubectl --as=system:serviceaccount:default:expanso-logs logs --follow

Filter Before Sending

Apply filters early in the pipeline to reduce downstream processing:

pipeline:
processors:
# Parse and filter FIRST
- mapping: |
root = this.parse_json().catch(deleted())

# Only keep errors
- switch:
cases:
- check: 'this.level == "error"'
processors:
- mapping: 'root = this'

# Then add metadata (only for filtered logs)
- mapping: |
root.node_id = env("NODE_ID")

Why: Filtering early reduces CPU, memory, and network usage for logs that will be discarded anyway.

Monitor Log Pipeline Health

Add a metrics output to track pipeline performance:

output:
broker:
pattern: fan_out
outputs:
- aws_s3:
bucket: logs

- http_client:
url: https://metrics.company.com
verb: POST
processors:
- metric:
type: counter
name: logs_processed
labels:
node_id: ${NODE_ID}

Why: Detect issues like log collection stopping, high error rates, or performance degradation.

Handle High-Volume Namespaces

For namespaces with very high log volume, use separate pipelines:

# High-volume namespace: aggressive filtering
expanso-edge run --config production-errors-only.yaml &

# Low-volume namespaces: collect everything
expanso-edge run --config staging-all-logs.yaml &

Why: Prevents high-volume namespaces from overwhelming the pipeline or hitting rate limits.

Use Connection Pooling for HTTP Outputs

Configure connection pooling for HTTP destinations:

output:
http_client:
url: https://logs.company.com/ingest
max_in_flight: 64 # Parallel requests
batching:
count: 500
period: 30s

Why: Improves throughput for HTTP-based log ingestion endpoints.

Troubleshooting Tips

Logs not appearing:

# Verify kubectl works
kubectl get pods --all-namespaces

# Check Expanso logs
expanso-edge run --config k3s-logs.yaml --log.level=debug

kubectl process exits:

  • Check restart_on_exit: true is set
  • Verify kubeconfig is valid
  • Check RBAC permissions

High memory usage:

  • Reduce max_buffer size
  • Add filtering to reduce log volume
  • Increase batching period

Performance issues:

  • Increase batch sizes
  • Add filtering earlier in pipeline
  • Use multiple parallel pipelines for different namespaces

Next Steps