Features

Automation Watchdog provides comprehensive monitoring for automated workflows, ensuring they run as expected and alerting your team when something goes wrong.

Core Monitoring

Missed Action Watch

The primary monitoring mode. A Missed Action watch expects regular check-ins from your workflow. If a check-in is not received within the configured period, the watch enters an Error State and alerts your team.

Notify Only Watch

A lightweight notification relay. A Notify Only watch sends a notification whenever a check-in is received, without monitoring for missed check-ins.

Flexible Scheduling and Activation

Schedule-Based Activation

Define recurring schedules for when your watch should be active. Supports configurable frequency, interval, days of the week, hours, minutes, and duration. Watches automatically activate and deactivate according to the schedule.

Event-Based Activation

Activate and deactivate watches dynamically through API calls. Ideal for workflows triggered by external events, such as a dispatcher loading work into a queue and activating the performer's watch.

Inactive Check-ins

By default, check-ins on inactive watches are rejected. Watches must be activated before they accept check-ins. This ensures clear lifecycle control and prevents silent processing of check-ins on watches that should not be running.

Activate On Check-In

For non-scheduled Missed Action watches, Activate On Check-In allows the first valid check-in to activate the condition automatically and continue processing that same check-in. This is supported for Standard, Condition By Queue, and Condition By Machine watches.

Activate On Deactivation

Activate On Deactivation links one watch to another so that when the source watch deactivates, the target watch is automatically activated. This is useful for workflow handoffs such as dispatcher-to-performer patterns.

The target watch must not have a schedule, and the linked activation fires when the source watch actually becomes inactive. For queue- and machine-based watches, that means all active child conditions for the source watch must be fully deactivated first.

Deactivate-By Deadline

Set a deadline on activation after which the watch enters an error state if it has not been deactivated. Supports relative deadlines (e.g., +2h, +30m, +1d) calculated from the activation time, and absolute deadlines using ISO 8601 timestamps. Useful for workflows that must complete within a known time window.

Check-in Periods

Fixed Period

A check-in is required within each fixed-length interval. For example, with a 15-minute period, check-ins are expected by 9:15, 9:30, 9:45, and so on, regardless of when previous check-ins occurred.

Cascading Period

The next check-in deadline is calculated relative to the last received check-in. For example, with a 15-minute period, if a check-in arrives at 9:07, the next is expected by 9:22.

Outcome Tracking and Threshold Monitoring

Check-in Outcomes

Each check-in can optionally include an outcome indicating whether the workflow step succeeded or failed. Outcomes feed into threshold monitoring for quality-based alerting beyond simple presence checks.

Threshold Monitoring

When enabled, threshold monitoring evaluates the quality of your workflow's check-ins using a sliding window of recent outcomes. Configure:

Setting	Description
Window Size	Number of recent check-ins to evaluate
Minimum Observations	Check-ins required before evaluation begins
Threshold Percent	Minimum success rate required (e.g., 80%)
Consecutive Failure Limit	Maximum consecutive failures allowed before error

The watch enters an error state when the success rate drops below the configured percentage or when the consecutive failure limit is exceeded. Threshold counters persist across activation and deactivation. Use the Clear Errors action to reset them, or for scheduled watches enable Reset on Good Standing so a new occurrence can begin with fresh counters when the condition is already healthy.

Multi-Machine and Multi-Queue Monitoring

Standard Multi-Machine

Multiple machines can check in to the same watch. The watch tracks which machines are active and applies check-in requirements to the watch as a whole. A single expected check-in time is maintained; any machine checking in satisfies it.

Optional Expected Machine Count support can require a minimum number of machines to report before the watch clears its overdue condition
If the watch becomes overdue before enough machines have reported, it stays in error with a Not all expected machines reported reason until the configured count is met

Condition By Machine

When Condition By Machine is enabled, each machine is monitored independently with its own state, check-in deadline, and error tracking. The overall watch state is an aggregation: error if any machine is in error, relax if all machines are in relax.

Each machine has its own watch condition with independent state tracking
Machines register automatically on first check-in, and on non-scheduled watches can also activate the placeholder in that same request when Activate On Check-In is enabled
Optional Expected Machine Count support keeps the placeholder condition in error until the configured number of machines have checked in
Ideal for workflows where each machine must independently meet check-in requirements

Condition By Queue

Allows a single watch to monitor a workflow that operates across multiple queues. Each queue is tracked independently with its own state and check-in deadlines. Useful when the same workflow is reused for different queues, avoiding the need for separate watches per queue. Also supports multi-machine workflows within each queue.

Error Reasons

Watches track multiple independent error reasons simultaneously. A watch condition is in an error state when one or more reasons are active:

Error Reason	Trigger	Cleared When
Overdue	Check-in not received before deadline	A check-in is received
Threshold Percent	Success rate below configured percentage	Success rate recovers above threshold
Threshold Consecutive	Consecutive failures exceed limit	A successful check-in resets the streak
Expected Machines	Fewer than the configured number of machines have reported before the condition becomes overdue	The configured number of machines report, or Clear Errors is used
Deactivate-By Deadline	Deactivation deadline passed without deactivation	Clear Errors action is used

Error State Persistence

Error states persist through deactivation and activation cycles. If a watch is deactivated while in Error, it remains in Error when reactivated. This gives support teams visibility into whether issues were actually resolved.

Clear Errors

An explicit action to reset a watch condition to a clean state: Relax state, no error reasons, zeroed threshold counters. Available on both active and inactive conditions.

Reset on Good Standing

For scheduled watches, when enabled, threshold counters reset at the start of a new occurrence only if the condition is already in good standing (Relax state, no error reasons). Normal successful check-ins still use the rolling threshold window and only reset the consecutive failure streak.

Alerts and Notifications

Automation Watchdog sends email notifications to configured mailing lists when a watch enters or exits an error state. Alert messages can be customized per watch to provide context about what the check-in represents.

Handling Errors and Alerts

Deactivating a watch does not clear its errors. The watch becomes inactive and timing fields such as Overdue At are cleared, but unresolved error reasons remain visible.
Clear Errors explicitly resets the condition to Relax, removes all error reasons, and resets threshold counters without changing whether the watch is active or inactive.
For active watches, errors normally clear through the expected recovery event such as a check-in or a successful check-in that restores good standing. In those cases, the system automatically returns the condition to Relax.
If a watch is deactivated while in Error, manual Clear Errors is still required even if the underlying workflow later runs successfully. Once the watch is inactive, monitoring for that time period has been intentionally stopped, so a human acknowledgment is required before the condition is considered reset.
If a watch remains in Error but the set of active error reasons changes, Automation Watchdog sends an updated error email so teams can see what was added or resolved.
When all error reasons clear through normal check-ins, a relax email is sent.

Security

API access is controlled through organization-scoped API Tokens
Role-based access control for team members within organizations
All communication secured with HTTPS
See Privacy and Security for details

Features ​

Core Monitoring ​

Missed Action Watch ​

Notify Only Watch ​

Flexible Scheduling and Activation ​

Schedule-Based Activation ​

Event-Based Activation ​

Inactive Check-ins ​

Activate On Check-In ​

Activate On Deactivation ​

Deactivate-By Deadline ​

Check-in Periods ​

Fixed Period ​

Cascading Period ​

Outcome Tracking and Threshold Monitoring ​

Check-in Outcomes ​

Threshold Monitoring ​

Multi-Machine and Multi-Queue Monitoring ​

Standard Multi-Machine ​

Condition By Machine ​

Condition By Queue ​

Error Reasons ​

Error State Persistence ​

Clear Errors ​

Reset on Good Standing ​

Alerts and Notifications ​

Handling Errors and Alerts ​

Security ​