A monitoring topology with schedule blocks, notification interfaces, sandbox agents, and fallback chains

Scheduled Monitor

This example demonstrates a monitoring topology that runs on a schedule, checks system health, and sends alerts through Slack and PagerDuty. It uses sandbox agents for safe execution and fallback chains for resilience.

The Complete Topology

topology system-monitor : [pipeline, fan-out] {
  meta {
    version: "1.0.0"
    description: "Scheduled system health monitoring with alerts"
  }

  orchestrator {
    model: sonnet
    handles: [intake, report]
  }

  schedule {
    cron: "*/15 * * * *"
    timezone: "UTC"
    on-overlap: skip
  }

  interfaces {
    interface slack {
      type: webhook
      url: "$SLACK_WEBHOOK_URL"
      format: markdown
    }

    interface pagerduty {
      type: api
      url: "https://events.pagerduty.com/v2/enqueue"
      auth: "$PAGERDUTY_API_KEY"
    }
  }

  agent health-checker {
    model: sonnet
    phase: 1
    sandbox: true
    tools: [Bash, Read]
    outputs: { status: healthy | degraded | critical }
    prompt {
      "Run system health checks: CPU usage, memory, disk space, and
       service endpoints. Report overall status."
    }
  }

  agent log-analyzer {
    model: sonnet
    phase: 1
    sandbox: true
    tools: [Read, Grep, Glob]
    outputs: { anomalies-found: yes | no, severity: low | medium | high }
    prompt {
      "Analyze recent log files for error patterns, unusual spikes,
       and anomalies. Report findings with severity."
    }
  }

  agent reporter {
    model: haiku
    phase: 2
    tools: [Read, Write]
    outputs: { alert-level: none | warning | critical }
    prompt {
      "Synthesize health check and log analysis results into a concise
       status report. Determine if alerts should be sent."
    }
  }

  agent notifier {
    model: haiku
    phase: 3
    tools: [Read]
    prompt {
      "Send notifications based on alert level. Use Slack for warnings
       and PagerDuty for critical alerts."
    }
    fallback-chain: [slack, pagerduty]
  }

  flow {
    intake -> [health-checker, log-analyzer]
    health-checker -> reporter
    log-analyzer -> reporter
    reporter -> notifier  [when reporter.alert-level == warning]
    reporter -> notifier  [when reporter.alert-level == critical]
    reporter -> report    [when reporter.alert-level == none]
    notifier -> report
  }
}

Walkthrough

Schedule Block

schedule {
  cron: "*/15 * * * *"
  timezone: "UTC"
  on-overlap: skip
}

The schedule block runs the topology automatically on a cron schedule:

Property	Value	Purpose
`cron`	`/15 * * *`	Run every 15 minutes
`timezone`	`UTC`	Interpret the cron expression in UTC
`on-overlap`	`skip`	If a previous run is still active, skip this execution

The on-overlap property prevents concurrent runs from piling up. Other options include queue (wait for the previous run to finish) and cancel (stop the previous run and start fresh).

Interfaces

interfaces {
  interface slack {
    type: webhook
    url: "$SLACK_WEBHOOK_URL"
    format: markdown
  }

  interface pagerduty {
    type: api
    url: "https://events.pagerduty.com/v2/enqueue"
    auth: "$PAGERDUTY_API_KEY"
  }
}

Interfaces define external communication channels. Each interface has:

Property	Description
`type`	Communication method: `webhook`, `api`, `email`
`url`	Endpoint URL (supports environment variables)
`format`	Output format for the interface
`auth`	Authentication credential (supports environment variables)

Agents reference interfaces by name. The notifier agent can send messages to slack or pagerduty based on the alert severity.

Sandbox Agents

agent health-checker {
  sandbox: true
  tools: [Bash, Read]
}

agent log-analyzer {
  sandbox: true
  tools: [Read, Grep, Glob]
}

The sandbox: true property runs the agent in an isolated environment. This is critical for monitoring agents that execute system commands — the sandbox prevents accidental writes or destructive operations.

Sandboxed agents can read system state but cannot modify it. If the health checker's Bash commands attempt to write files or change configurations, the sandbox blocks those operations.

Fallback Chain

agent notifier {
  fallback-chain: [slack, pagerduty]
}

The fallback-chain property defines a list of interfaces to try in order. If Slack is unreachable (webhook fails), the notifier automatically falls back to PagerDuty. This ensures critical alerts always reach someone.

Fallback chains try each interface in order until one succeeds. If all interfaces fail, the agent reports the failure to the orchestrator.

Phase 1: Parallel Health Checks (Fan-Out)

flow {
  intake -> [health-checker, log-analyzer]
}

Both monitoring agents run simultaneously at phase 1. The health checker runs system commands while the log analyzer scans log files. Running them in parallel reduces the total monitoring cycle time.

Phase 2: Report Synthesis

agent reporter {
  model: haiku
  phase: 2
  outputs: { alert-level: none | warning | critical }
}

The reporter merges results from both phase 1 agents and determines the overall alert level. Using haiku keeps costs low for a task that requires synthesis but not deep reasoning.

Phase 3: Conditional Notification

flow {
  reporter -> notifier  [when reporter.alert-level == warning]
  reporter -> notifier  [when reporter.alert-level == critical]
  reporter -> report    [when reporter.alert-level == none]
}

Notifications are only sent when there is something to report. If the alert level is none, the flow goes directly to the final report without bothering the notifier. This prevents alert fatigue from routine "all clear" messages.

Flow Diagram

              +--> health-checker --+
              |                     |
[schedule] ---+                     +--> reporter --+--> notifier -> report
              |                     |               |
              +--> log-analyzer ----+               +--> report
                                                   (none)

Adapting This Example

Add more checkers — include a dependency-checker or certificate-checker at phase 1
Add escalation — if PagerDuty also fails, route to an email interface
Add metering — set a daily budget to control costs for frequent scheduled runs
Change the schedule — use cron: "0 * * * *" for hourly checks or cron: "0 9 * * 1-5" for weekday mornings
Add a human gate — require human confirmation before sending PagerDuty alerts to reduce false alarms

Scheduled Monitor

On this page