Appearance
Features
Automation Watchdog provides comprehensive monitoring for automated workflows, ensuring they run as expected and alerting your team when something goes wrong.
Core Monitoring
Missed Action Watch
The primary monitoring mode. A Missed Action watch expects regular check-ins from your workflow. If a check-in is not received within the configured period, the watch enters an Error State and alerts your team.
Notify Only Watch
A lightweight notification relay. A Notify Only watch sends a notification whenever a check-in is received, without monitoring for missed check-ins.
Flexible Scheduling and Activation
Schedule-Based Activation
Define recurring schedules for when your watch should be active. Supports configurable frequency, interval, days of the week, hours, minutes, and duration. Watches automatically activate and deactivate according to the schedule.
Event-Based Activation
Activate and deactivate watches dynamically through API calls. Ideal for workflows triggered by external events, such as a dispatcher loading work into a queue and activating the performer's watch.
Inactive Check-ins
By default, check-ins on inactive watches are rejected. Watches must be activated before they accept check-ins. This ensures clear lifecycle control and prevents silent processing of check-ins on watches that should not be running.
Activate On Check-In
For non-scheduled Missed Action watches, Activate On Check-In allows the first valid check-in to activate the condition automatically and continue processing that same check-in. This is supported for Standard, Condition By Queue, and Condition By Machine watches.
Activate On Deactivation
Activate On Deactivation links one watch to another so that when the source watch deactivates, the target watch is automatically activated. This is useful for workflow handoffs such as dispatcher-to-performer patterns.
The target watch must not have a schedule, and the linked activation fires when the source watch actually becomes inactive. For queue- and machine-based watches, that means all active child conditions for the source watch must be fully deactivated first.
Deactivate-By Deadline
Set a deadline on activation after which the watch enters an error state if it has not been deactivated. Supports relative deadlines (e.g., +2h, +30m, +1d) calculated from the activation time, and absolute deadlines using ISO 8601 timestamps. Useful for workflows that must complete within a known time window.
Check-in Periods
Fixed Period
A check-in is required within each fixed-length interval. For example, with a 15-minute period, check-ins are expected by 9:15, 9:30, 9:45, and so on, regardless of when previous check-ins occurred.
Cascading Period
The next check-in deadline is calculated relative to the last received check-in. For example, with a 15-minute period, if a check-in arrives at 9:07, the next is expected by 9:22.
Outcome Tracking and Threshold Monitoring
Check-in Outcomes
Each check-in can optionally include an outcome indicating whether the workflow step succeeded or failed. Outcomes feed into threshold monitoring for quality-based alerting beyond simple presence checks.
Threshold Monitoring
When enabled, threshold monitoring evaluates the quality of your workflow's check-ins using a sliding window of recent outcomes. Configure:
| Setting | Description |
|---|---|
| Window Size | Number of recent check-ins to evaluate |
| Minimum Observations | Check-ins required before evaluation begins |
| Threshold Percent | Minimum success rate required (e.g., 80%) |
| Consecutive Failure Limit | Maximum consecutive failures allowed before error |
The watch enters an error state when the success rate drops below the configured percentage or when the consecutive failure limit is exceeded. Threshold counters persist across activation and deactivation. Use the Clear Errors action to reset them, or for scheduled watches enable Reset on Good Standing so a new occurrence can begin with fresh counters when the condition is already healthy.
Multi-Machine and Multi-Queue Monitoring
Standard Multi-Machine
Multiple machines can check in to the same watch. The watch tracks which machines are active and applies check-in requirements to the watch as a whole. A single expected check-in time is maintained; any machine checking in satisfies it.
- Optional Expected Machine Count support can require a minimum number of machines to report before the watch clears its overdue condition
- If the watch becomes overdue before enough machines have reported, it stays in error with a Not all expected machines reported reason until the configured count is met
Condition By Machine
When Condition By Machine is enabled, each machine is monitored independently with its own state, check-in deadline, and error tracking. The overall watch state is an aggregation: error if any machine is in error, relax if all machines are in relax.
- Each machine has its own watch condition with independent state tracking
- Machines register automatically on first check-in, and on non-scheduled watches can also activate the placeholder in that same request when Activate On Check-In is enabled
- Optional Expected Machine Count support keeps the placeholder condition in error until the configured number of machines have checked in
- Ideal for workflows where each machine must independently meet check-in requirements
Condition By Queue
Allows a single watch to monitor a workflow that operates across multiple queues. Each queue is tracked independently with its own state and check-in deadlines. Useful when the same workflow is reused for different queues, avoiding the need for separate watches per queue. Also supports multi-machine workflows within each queue.
Error Reasons
Watches track multiple independent error reasons simultaneously. A watch condition is in an error state when one or more reasons are active:
| Error Reason | Trigger | Cleared When |
|---|---|---|
| Overdue | Check-in not received before deadline | A check-in is received |
| Threshold Percent | Success rate below configured percentage | Success rate recovers above threshold |
| Threshold Consecutive | Consecutive failures exceed limit | A successful check-in resets the streak |
| Expected Machines | Fewer than the configured number of machines have reported before the condition becomes overdue | The configured number of machines report, or Clear Errors is used |
| Deactivate-By Deadline | Deactivation deadline passed without deactivation | Clear Errors action is used |
Error State Persistence
Error states persist through deactivation and activation cycles. If a watch is deactivated while in Error, it remains in Error when reactivated. This gives support teams visibility into whether issues were actually resolved.
Clear Errors
An explicit action to reset a watch condition to a clean state: Relax state, no error reasons, zeroed threshold counters. Available on both active and inactive conditions.
Reset on Good Standing
For scheduled watches, when enabled, threshold counters reset at the start of a new occurrence only if the condition is already in good standing (Relax state, no error reasons). Normal successful check-ins still use the rolling threshold window and only reset the consecutive failure streak.
Alerts and Notifications
Automation Watchdog sends email notifications to configured mailing lists when a watch enters or exits an error state. Alert messages can be customized per watch to provide context about what the check-in represents.
Handling Errors and Alerts
- Deactivating a watch does not clear its errors. The watch becomes inactive and timing fields such as Overdue At are cleared, but unresolved error reasons remain visible.
- Clear Errors explicitly resets the condition to Relax, removes all error reasons, and resets threshold counters without changing whether the watch is active or inactive.
- For active watches, errors normally clear through the expected recovery event such as a check-in or a successful check-in that restores good standing. In those cases, the system automatically returns the condition to Relax.
- If a watch is deactivated while in Error, manual Clear Errors is still required even if the underlying workflow later runs successfully. Once the watch is inactive, monitoring for that time period has been intentionally stopped, so a human acknowledgment is required before the condition is considered reset.
- If a watch remains in Error but the set of active error reasons changes, Automation Watchdog sends an updated error email so teams can see what was added or resolved.
- When all error reasons clear through normal check-ins, a relax email is sent.
Security
- API access is controlled through organization-scoped API Tokens
- Role-based access control for team members within organizations
- All communication secured with HTTPS
- See Privacy and Security for details