Skip to main content

types.OrchestratorConfig

api object
auth object

Auth configures authentication for the API

tokenstring
listen_addrstring

Listen address - defaults to localhost:9010 Empty string disables the API server

data_dirstring

Core data directory - all subdirectories are managed automatically

evaluation_broker object
initial_retry_delayinteger<int64>

InitialRetryDelay is the delay before re-enqueuing a Nacked evaluation for the first time. Defaults to 5 seconds if not set. Set a lower value (e.g., 100ms) for tests to avoid blocking subsequent evaluations for the same job.

Possible values: [-9223372036854776000, 9223372036854776000, 1, 1000, 1000000, 1000000000, 60000000000, 3600000000000]

max_retry_countinteger

MaxRetryCount specifies the maximum number of times an evaluation can be retried before being marked as failed.

visibility_timeoutinteger<int64>

VisibilityTimeout specifies how long an evaluation can be claimed before it's returned to the queue.

Possible values: [-9223372036854776000, 9223372036854776000, 1, 1000, 1000000, 1000000000, 60000000000, 3600000000000]

log object
formatstring

Log format: json, text

levelstring

Log level: trace, debug, info, warn, error - defaults to info

namestring

Node name - defaults to hostname

name_providerstring

Name provider for auto-generation: "cloud", "hostname", "uuid", "machine-id"

node_manager object
connected_afterinteger<int64>

ConnectedAfter is how long a node must be stable in Connecting state before being promoted to Connected. This provides flapping protection - a node that keeps crashing and restarting will reset this timer on each handshake. Default: 30s. Must be less than disconnect_timeout.

Possible values: [-9223372036854776000, 9223372036854776000, 1, 1000, 1000000, 1000000000, 60000000000, 3600000000000]

disconnect_timeoutinteger<int64>

DisconnectTimeout is how long to wait without heartbeats before marking a node as disconnected. This value is sent to edge nodes during handshake so both sides use the same threshold. Default: 90s. Increase for unreliable networks.

Possible values: [-9223372036854776000, 9223372036854776000, 1, 1000, 1000000, 1000000000, 60000000000, 3600000000000]

heartbeat_intervalinteger<int64>

HeartbeatInterval is how often edge nodes should send heartbeats. This value is sent to edge nodes during handshake. Default: 15s.

Possible values: [-9223372036854776000, 9223372036854776000, 1, 1000, 1000000, 1000000000, 60000000000, 3600000000000]

lost_timeoutinteger<int64>

LostTimeout is how long a node must remain disconnected before marking it as lost. Default: 1h. Must be greater than disconnect_timeout. Lost nodes are removed from scheduling and become eligible for garbage collection.

Possible values: [-9223372036854776000, 9223372036854776000, 1, 1000, 1000000, 1000000000, 60000000000, 3600000000000]

scheduler object
execution_limit_backoffinteger<int64>

ExecutionLimitBackoff is the duration to wait before creating a new scheduling run when hitting execution limits.

Possible values: [-9223372036854776000, 9223372036854776000, 1, 1000, 1000000, 1000000000, 60000000000, 3600000000000]

max_executions_per_runinteger

MaxExecutionsPerRun limits the total number of scheduler operations per evaluation (including creating, stopping, replacing, and failing executions). Set to 0 for no limit.

queue_backoffinteger<int64>

QueueBackoff specifies the time to wait before retrying a failed job.

Possible values: [-9223372036854776000, 9223372036854776000, 1, 1000, 1000000, 1000000000, 60000000000, 3600000000000]

queue_timeout_neverinteger<int64>

DefaultQueueTimeoutNeverRestart is the default queue timeout for jobs with "never" restart policy. Batch jobs get fast feedback when no matching nodes exist. Set to 0 for no default (wait indefinitely).

Possible values: [-9223372036854776000, 9223372036854776000, 1, 1000, 1000000, 1000000000, 60000000000, 3600000000000]

queue_timeout_otherinteger<int64>

DefaultQueueTimeoutOtherPolicies is the default queue timeout for "always" or "on-failure" restart policies. Services wait for matching nodes (e.g., auto-scaling scenarios). Set to 0 for no default (wait indefinitely).

Possible values: [-9223372036854776000, 9223372036854776000, 1, 1000, 1000000, 1000000000, 60000000000, 3600000000000]

worker_countinteger

WorkerCount specifies the number of concurrent workers for job scheduling.

shutdown_timeoutinteger<int64>

ShutdownTimeout is the maximum time to wait for graceful shutdown

Possible values: [-9223372036854776000, 9223372036854776000, 1, 1000, 1000000, 1000000000, 60000000000, 3600000000000]

store object
gc object
deleted_jobs_retentioninteger<int64>

DeletedJobsRetention is how long to keep soft-deleted jobs before permanent deletion. Default: 7 days. Increase for longer audit history.

Possible values: [-9223372036854776000, 9223372036854776000, 1, 1000, 1000000, 1000000000, 60000000000, 3600000000000]

lost_nodes_retentioninteger<int64>

LostNodesRetention is how long to keep lost node records after they're marked as lost. Default: 7 days. Measured from when the node transitions to Lost state. Independent of node_manager.lost_timeout.

Possible values: [-9223372036854776000, 9223372036854776000, 1, 1000, 1000000, 1000000000, 60000000000, 3600000000000]

terminal_evaluations_retentioninteger<int64>

TerminalEvaluationsRetention is how long to keep terminal evaluation records (complete/failed/cancelled). Default: 1 day. Evaluations are short-lived scheduling decisions.

Possible values: [-9223372036854776000, 9223372036854776000, 1, 1000, 1000000, 1000000000, 60000000000, 3600000000000]

terminal_executions_retentioninteger<int64>

TerminalExecutionsRetention is how long to keep terminal execution records (complete/failed/stopped). Default: 7 days. Increase for longer execution history.

Possible values: [-9223372036854776000, 9223372036854776000, 1, 1000, 1000000, 1000000000, 60000000000, 3600000000000]

streaming_proxy object

Streaming proxy configuration

read_timeout_secondsinteger

ReadTimeoutSeconds is the read timeout in seconds for connections to the remote log server before the connection is closed. A value of 0 means no timeout.

remote_endpointstring

RemoteEndpoint is the endpoint of the log server to proxy to, typically a Loki instance

remote_tokenstring

RemoteToken is the authentication token used to access the log server

telemetry object

Simplified telemetry configuration

authentication object

Optional authentication configuration for telemetry exporters

namespacestring

Namespace is used to group telemetry data for all nodes in a namespace

tokenstring

Token is the authentication token or password

typestring

Type represents the authentication type, currently only supports "Basic"

do_not_trackboolean

DoNotTrack disables telemetry collection (default: false, meaning telemetry is enabled)

endpointstring

Endpoint is the telemetry collector endpoint and should not include a path, use EndpointPath for that. Examples: "localhost:4317", "https://collector.example.com:4318"

endpoint_pathstring

Some endpoints have a path under which they serve /v1/metrics or similar, but this cannot be included in Endpoint directly.

export_intervalinteger<int64>

ExportInterval is how often metrics are exported (default: 30s)

Possible values: [-9223372036854776000, 9223372036854776000, 1, 1000, 1000000, 1000000000, 60000000000, 3600000000000]

headers object

Headers are optional headers for authentication

property name*string
include_go_metricsboolean

IncludeGoMetrics enables collection of Go runtime metrics (GC, goroutines, etc.)

insecureboolean

Insecure disables TLS verification (for development/testing)

process_metrics_intervalinteger<int64>

ProcessMetricsInterval is how often process metrics are collected (default: 15s) Process metrics (CPU, memory, file descriptors) are always enabled

Possible values: [-9223372036854776000, 9223372036854776000, 1, 1000, 1000000, 1000000000, 60000000000, 3600000000000]

protocolstring

Protocol specifies the export protocol: "grpc" or "http"

resource_attributes object

ResourceAttributes are additional attributes to include with all telemetry data

property name*string
transport object
addressstring
credentials_pathstring
insecureboolean
listen_addrstring

ListenAddr - when set, runs embedded server for nodes to connect to If specified, this overrides any server address from credentials/bootstrapping

network_idstring
node_idstring

Connection config settings

refresh_addressstring
require_tlsboolean
reverse_proxyboolean
types.OrchestratorConfig
{
"api": {
"auth": {
"token": "string"
},
"listen_addr": "string"
},
"data_dir": "string",
"evaluation_broker": {
"initial_retry_delay": -9223372036854776000,
"max_retry_count": 0,
"visibility_timeout": -9223372036854776000
},
"log": {
"format": "string",
"level": "string"
},
"name": "string",
"name_provider": "string",
"node_manager": {
"connected_after": -9223372036854776000,
"disconnect_timeout": -9223372036854776000,
"heartbeat_interval": -9223372036854776000,
"lost_timeout": -9223372036854776000
},
"scheduler": {
"execution_limit_backoff": -9223372036854776000,
"max_executions_per_run": 0,
"queue_backoff": -9223372036854776000,
"queue_timeout_never": -9223372036854776000,
"queue_timeout_other": -9223372036854776000,
"worker_count": 0
},
"shutdown_timeout": -9223372036854776000,
"store": {
"gc": {
"deleted_jobs_retention": -9223372036854776000,
"lost_nodes_retention": -9223372036854776000,
"terminal_evaluations_retention": -9223372036854776000,
"terminal_executions_retention": -9223372036854776000
}
},
"streaming_proxy": {
"read_timeout_seconds": 0,
"remote_endpoint": "string",
"remote_token": "string"
},
"telemetry": {
"authentication": {
"namespace": "string",
"token": "string",
"type": "string"
},
"do_not_track": true,
"endpoint": "string",
"endpoint_path": "string",
"export_interval": -9223372036854776000,
"headers": {},
"include_go_metrics": true,
"insecure": true,
"process_metrics_interval": -9223372036854776000,
"protocol": "string",
"resource_attributes": {}
},
"transport": {
"address": "string",
"credentials_path": "string",
"insecure": true,
"listen_addr": "string",
"network_id": "string",
"node_id": "string",
"refresh_address": "string",
"require_tls": true,
"reverse_proxy": true
}
}