types.OrchestratorConfig
api object
auth object
Auth configures authentication for the API
Listen address - defaults to localhost:9010 Empty string disables the API server
Core data directory - all subdirectories are managed automatically
evaluation_broker object
InitialRetryDelay is the delay before re-enqueuing a Nacked evaluation for the first time. Defaults to 5 seconds if not set. Set a lower value (e.g., 100ms) for tests to avoid blocking subsequent evaluations for the same job.
Possible values: [-9223372036854776000, 9223372036854776000, 1, 1000, 1000000, 1000000000, 60000000000, 3600000000000]
MaxRetryCount specifies the maximum number of times an evaluation can be retried before being marked as failed.
VisibilityTimeout specifies how long an evaluation can be claimed before it's returned to the queue.
Possible values: [-9223372036854776000, 9223372036854776000, 1, 1000, 1000000, 1000000000, 60000000000, 3600000000000]
log object
Log format: json, text
Log level: trace, debug, info, warn, error - defaults to info
Node name - defaults to hostname
Name provider for auto-generation: "cloud", "hostname", "uuid", "machine-id"
node_manager object
ConnectedAfter is how long a node must be stable in Connecting state before being promoted to Connected. This provides flapping protection - a node that keeps crashing and restarting will reset this timer on each handshake. Default: 30s. Must be less than disconnect_timeout.
Possible values: [-9223372036854776000, 9223372036854776000, 1, 1000, 1000000, 1000000000, 60000000000, 3600000000000]
DisconnectTimeout is how long to wait without heartbeats before marking a node as disconnected. This value is sent to edge nodes during handshake so both sides use the same threshold. Default: 90s. Increase for unreliable networks.
Possible values: [-9223372036854776000, 9223372036854776000, 1, 1000, 1000000, 1000000000, 60000000000, 3600000000000]
HeartbeatInterval is how often edge nodes should send heartbeats. This value is sent to edge nodes during handshake. Default: 15s.
Possible values: [-9223372036854776000, 9223372036854776000, 1, 1000, 1000000, 1000000000, 60000000000, 3600000000000]
LostTimeout is how long a node must remain disconnected before marking it as lost. Default: 1h. Must be greater than disconnect_timeout. Lost nodes are removed from scheduling and become eligible for garbage collection.
Possible values: [-9223372036854776000, 9223372036854776000, 1, 1000, 1000000, 1000000000, 60000000000, 3600000000000]
scheduler object
ExecutionLimitBackoff is the duration to wait before creating a new scheduling run when hitting execution limits.
Possible values: [-9223372036854776000, 9223372036854776000, 1, 1000, 1000000, 1000000000, 60000000000, 3600000000000]
MaxExecutionsPerRun limits the total number of scheduler operations per evaluation (including creating, stopping, replacing, and failing executions). Set to 0 for no limit.
QueueBackoff specifies the time to wait before retrying a failed job.
Possible values: [-9223372036854776000, 9223372036854776000, 1, 1000, 1000000, 1000000000, 60000000000, 3600000000000]
DefaultQueueTimeoutNeverRestart is the default queue timeout for jobs with "never" restart policy. Batch jobs get fast feedback when no matching nodes exist. Set to 0 for no default (wait indefinitely).
Possible values: [-9223372036854776000, 9223372036854776000, 1, 1000, 1000000, 1000000000, 60000000000, 3600000000000]
DefaultQueueTimeoutOtherPolicies is the default queue timeout for "always" or "on-failure" restart policies. Services wait for matching nodes (e.g., auto-scaling scenarios). Set to 0 for no default (wait indefinitely).
Possible values: [-9223372036854776000, 9223372036854776000, 1, 1000, 1000000, 1000000000, 60000000000, 3600000000000]
WorkerCount specifies the number of concurrent workers for job scheduling.
ShutdownTimeout is the maximum time to wait for graceful shutdown
Possible values: [-9223372036854776000, 9223372036854776000, 1, 1000, 1000000, 1000000000, 60000000000, 3600000000000]
store object
gc object
DeletedJobsRetention is how long to keep soft-deleted jobs before permanent deletion. Default: 7 days. Increase for longer audit history.
Possible values: [-9223372036854776000, 9223372036854776000, 1, 1000, 1000000, 1000000000, 60000000000, 3600000000000]
LostNodesRetention is how long to keep lost node records after they're marked as lost. Default: 7 days. Measured from when the node transitions to Lost state. Independent of node_manager.lost_timeout.
Possible values: [-9223372036854776000, 9223372036854776000, 1, 1000, 1000000, 1000000000, 60000000000, 3600000000000]
TerminalEvaluationsRetention is how long to keep terminal evaluation records (complete/failed/cancelled). Default: 1 day. Evaluations are short-lived scheduling decisions.
Possible values: [-9223372036854776000, 9223372036854776000, 1, 1000, 1000000, 1000000000, 60000000000, 3600000000000]
TerminalExecutionsRetention is how long to keep terminal execution records (complete/failed/stopped). Default: 7 days. Increase for longer execution history.
Possible values: [-9223372036854776000, 9223372036854776000, 1, 1000, 1000000, 1000000000, 60000000000, 3600000000000]
streaming_proxy object
Streaming proxy configuration
ReadTimeoutSeconds is the read timeout in seconds for connections to the remote log server before the connection is closed. A value of 0 means no timeout.
RemoteEndpoint is the endpoint of the log server to proxy to, typically a Loki instance
RemoteToken is the authentication token used to access the log server
telemetry object
Simplified telemetry configuration
authentication object
Optional authentication configuration for telemetry exporters
Namespace is used to group telemetry data for all nodes in a namespace
Token is the authentication token or password
Type represents the authentication type, currently only supports "Basic"
DoNotTrack disables telemetry collection (default: false, meaning telemetry is enabled)
Endpoint is the telemetry collector endpoint and should not include a path, use EndpointPath for that. Examples: "localhost:4317", "https://collector.example.com:4318"
Some endpoints have a path under which they serve /v1/metrics or similar, but this cannot be included in Endpoint directly.
ExportInterval is how often metrics are exported (default: 30s)
Possible values: [-9223372036854776000, 9223372036854776000, 1, 1000, 1000000, 1000000000, 60000000000, 3600000000000]
headers object
Headers are optional headers for authentication
IncludeGoMetrics enables collection of Go runtime metrics (GC, goroutines, etc.)
Insecure disables TLS verification (for development/testing)
ProcessMetricsInterval is how often process metrics are collected (default: 15s) Process metrics (CPU, memory, file descriptors) are always enabled
Possible values: [-9223372036854776000, 9223372036854776000, 1, 1000, 1000000, 1000000000, 60000000000, 3600000000000]
Protocol specifies the export protocol: "grpc" or "http"
resource_attributes object
ResourceAttributes are additional attributes to include with all telemetry data
transport object
ListenAddr - when set, runs embedded server for nodes to connect to If specified, this overrides any server address from credentials/bootstrapping
Connection config settings
{
"api": {
"auth": {
"token": "string"
},
"listen_addr": "string"
},
"data_dir": "string",
"evaluation_broker": {
"initial_retry_delay": -9223372036854776000,
"max_retry_count": 0,
"visibility_timeout": -9223372036854776000
},
"log": {
"format": "string",
"level": "string"
},
"name": "string",
"name_provider": "string",
"node_manager": {
"connected_after": -9223372036854776000,
"disconnect_timeout": -9223372036854776000,
"heartbeat_interval": -9223372036854776000,
"lost_timeout": -9223372036854776000
},
"scheduler": {
"execution_limit_backoff": -9223372036854776000,
"max_executions_per_run": 0,
"queue_backoff": -9223372036854776000,
"queue_timeout_never": -9223372036854776000,
"queue_timeout_other": -9223372036854776000,
"worker_count": 0
},
"shutdown_timeout": -9223372036854776000,
"store": {
"gc": {
"deleted_jobs_retention": -9223372036854776000,
"lost_nodes_retention": -9223372036854776000,
"terminal_evaluations_retention": -9223372036854776000,
"terminal_executions_retention": -9223372036854776000
}
},
"streaming_proxy": {
"read_timeout_seconds": 0,
"remote_endpoint": "string",
"remote_token": "string"
},
"telemetry": {
"authentication": {
"namespace": "string",
"token": "string",
"type": "string"
},
"do_not_track": true,
"endpoint": "string",
"endpoint_path": "string",
"export_interval": -9223372036854776000,
"headers": {},
"include_go_metrics": true,
"insecure": true,
"process_metrics_interval": -9223372036854776000,
"protocol": "string",
"resource_attributes": {}
},
"transport": {
"address": "string",
"credentials_path": "string",
"insecure": true,
"listen_addr": "string",
"network_id": "string",
"node_id": "string",
"refresh_address": "string",
"require_tls": true,
"reverse_proxy": true
}
}