AgenTopology

Scale

Configure auto-scaling for agents with mode, bounds, and batch size

Scale

The scale block is defined inside an agent block to configure auto-scaling behavior. When workload increases, scaled agents spawn additional instances to handle demand in parallel.

Basic Syntax

agent worker {
  model: sonnet
  tools: [Read, Write]

  scale {
    mode: auto
    by: pending-tasks
    min: 1
    max: 5
    batch-size: 3
  }
}

Field Reference

FieldTypeRequiredDescription
modeidentifierYesScaling mode (currently auto)
byidentifierNoMetric to scale by (e.g. pending-tasks)
minnumberNoMinimum number of instances (default: 1)
maxnumberNoMaximum number of instances
batch-sizenumberNoNumber of items each instance processes

Scaling Mode

Currently, auto is the only supported mode. In auto mode, AgenTopology monitors the specified metric and adjusts instance count within the defined bounds.

scale {
  mode: auto
  by: pending-tasks
  min: 1
  max: 10
}

Scaling Metric

The by field determines what drives scaling decisions:

MetricDescription
pending-tasksNumber of queued tasks waiting for this agent
scale {
  mode: auto
  by: pending-tasks
  min: 2
  max: 8
}

Batch Size

Control how many items each agent instance processes at a time:

agent file-processor {
  model: sonnet
  tools: [Read, Write]

  scale {
    mode: auto
    by: pending-tasks
    min: 1
    max: 5
    batch-size: 10
  }
}

Each instance handles up to 10 items before a new instance is spawned.

Full Example

topology processing-pipeline : [fan-out] {
  agent dispatcher {
    model: sonnet
    role: "Splits large workloads into individual tasks"
    tools: [Read]
  }

  agent processor {
    model: sonnet
    role: "Processes individual items"
    tools: [Read, Write]

    scale {
      mode: auto
      by: pending-tasks
      min: 1
      max: 10
      batch-size: 5
    }
  }

  agent aggregator {
    model: sonnet
    role: "Combines processed results"
    tools: [Read, Write]
  }

  flow {
    dispatcher -> processor -> aggregator
  }
}

Tips

  • Scale blocks go inside agent blocks, not at the topology level.
  • Set max to prevent runaway costs from unbounded scaling.
  • batch-size is useful for file processing, data transformation, and other item-oriented workloads.
  • Scaling works best with the fan-out and orchestrator-worker patterns.
  • min: 1 is the default. Set min: 0 if the agent should only run when there is work.

On this page