types.Job
ID is a unique identifier assigned to this job. It helps to distinguish jobs with the same name after they have been deleted and re-created. The ID is generated by the server and should not be set directly by the client.
spec object
Spec contains all user-provided fields (desired state)
config object
Config contains type-specific configuration for the workload. The structure depends on the job type (e.g., pipeline config, query parameters).
Config contains type-specific configuration for the workload. The structure depends on the job type (e.g., pipeline config, query parameters).
Description is an optional human-readable description of the job.
labels object
Labels is used to associate arbitrary labels with this job. Labels can be used for filtering and selection.
meta object
Meta is used to associate arbitrary metadata with this job. Keys with the prefix "expanso.io/" are reserved for system use.
Name is the logical name of the job used to refer to it. Submitting a job with the same name as an existing job will result in an update to the existing job.
Namespace is the namespace this job is running in.
Priority defines the scheduling priority of this job. Higher values indicate higher priority.
RestartPolicy controls restart behavior when executions exit.
- "on-failure" (default): Restart on non-zero exit, complete on success
- "always": Restart on any exit (current daemon behavior)
- "never": No restart, one-shot execution
rollout object
Rollout defines how to rollout the job
Auto-promote canary rollouts
Canary-specific settings
Percentage of canary nodes
health_check object
HealthCheck defines health check configuration (required for rolling/canary, ignored for immediate)
Deadline is the maximum time to wait for an execution to become healthy (required)
Possible values: [-9223372036854776000, 9223372036854776000, 1, 1000, 1000000, 1000000000, 60000000000, 3600000000000, 10000000000]
FailureThreshold is the number of consecutive unhealthy intervals before the execution is considered unhealthy (optional, default: 3)
Interval is the duration of each health evaluation window (optional, default: 10s) Error rate is calculated per interval, not lifetime.
Possible values: [-9223372036854776000, 9223372036854776000, 1, 1000, 1000000, 1000000000, 60000000000, 3600000000000, 10000000000]
MaxErrorRate is the maximum error rate allowed during health checks (optional, default: 0.10) Pointer because we need to distinguish nil (use default) from explicit 0.0
SuccessThreshold is the number of consecutive healthy intervals before the execution is considered healthy (optional, default: 2)
MaxFailedNodes is the maximum number of failed nodes before stopping (optional, default: 10)
MaxFailedNodesPercent is the maximum percentage of failed nodes before stopping (optional, default: 10.0)
MaxParallel is the maximum percentage of nodes to update in parallel (0-100) For immediate strategy: this value is ignored (all nodes updated simultaneously) For rolling/canary: controls wave size as percentage of total nodes (default: 10 if not specified) Examples: 10 = 10% of nodes per wave, 50 = 50% of nodes per wave, 100 = all nodes at once
NoAutoRollback disables automatic rollback on rollout failure (default: false = auto-rollback enabled)
Strategy: immediate|rolling|canary
Possible values: [immediate, rolling, canary]
selector object
Selector defines which nodes to run the job on
MatchExpressions selects nodes using label selector expression strings. Each expression is evaluated independently and all must match (AND logic). Supported syntax:
- Equality: "key=value" or "key==value"
- Inequality: "key!=value"
- Set inclusion: "key in (value1,value2,...)"
- Set exclusion: "key notin (value1,value2,...)"
- Existence: "key"
- Non-existence: "!key" Examples:
- "region=us-east"
- "tier in (premium,standard)"
- "environment!=prod"
- "gpu"
- "!debug"
MatchIDs selects specific nodes by their IDs. If specified, the job will only run on nodes whose ID is in this list.
match_labels object
MatchLabels selects nodes with labels that exactly match all specified key-value pairs. All labels must match (AND logic). Example: {"region": "us-east", "tier": "compute"}
timeouts object
Timeouts defines timeout configurations for the job
ExecutionTimeout is the maximum amount of time a task is allowed to run in seconds. Zero means no timeout, such as for a daemon task.
QueueTimeout is the maximum amount of time a task is allowed to wait in the orchestrator queue in seconds before being scheduled. Zero means no timeout.
TotalTimeout is the maximum amount of time a task is allowed to complete in seconds. This includes the time spent in the queue, the time spent executing and the time spent retrying. Zero means no timeout.
Type specifies what kind of workload this job runs (e.g. "pipeline", "query", "update", "config"). The scheduling behavior is derived from this type.
status object
Status contains all system-managed fields (observed state)
CreatedAt is the time when the job was created
Revision is a per-job monotonically increasing revision number that is incremented on each update to the job's state or specification. This includes both user-initiated changes (which also increment Version) and system-initiated changes (status updates, state transitions, etc.). Revision >= Version always.
rollout object
Rollout contains the runtime state of the active rollout. Empty (zero State) when no rollout is in progress.
Canary tracking (runtime state)
When rollout finished (zero if in progress)
Progress tracking
Nodes that failed
RollbackToVersion is the stable version to rollback to when a rollout halts. For Create: 0 (no previous version - cannot rollback) For Update: last version with completed rollout (stable version to fallback to) For Rollback: version being rolled back from
Timestamps
Target nodes for this rollout
Type indicates what triggered this rollout
Possible values: [create, update, restart, rollback]
Nodes successfully updated
state object
State represents the current state of the job
details object
Details is a map of additional details about the state.
Message is a human readable message describing the state.
StateType is the current state of the object.
Possible values: [``, pending, queued, deploying, running, rollout_paused, rollout_failed, degraded, completed, failed, stopped, deleted]
UpdatedAt is the time when the job was last updated
Version is a per-job monotonically increasing version number that is incremented on each job specification update. Version tracks changes to the job specification (user-defined fields like runtime, deployment settings, etc.). Compare with Revision which tracks ANY change including status updates.
{
"id": "job-abc123xyz",
"spec": {
"config": {
"input": {
"file": {
"paths": [
"/var/log/app/*.log"
]
}
},
"output": {
"stdout": {}
},
"pipeline": {
"processors": [
{
"mapping": "root = this\nroot.processed_at = now()\n"
}
]
}
},
"description": "Processes application logs from edge nodes",
"labels": {
"env": "production",
"region": "us-west"
},
"name": "log-processor",
"namespace": "production",
"priority": 50,
"selector": {
"match_labels": {
"env": "production"
}
},
"type": "pipeline"
},
"status": {
"created_at": "2025-01-15T10:30:00Z",
"revision": 3,
"state": {
"message": "Job running on 5 nodes",
"state_type": "running"
},
"updated_at": "2025-01-15T10:35:00Z",
"version": 1
}
}