Incident Conditions
Define the conditions that determine when incidents are created. Incident conditions evaluate healthcheck results and create incidents when real problems are detected.
Overview
Incident conditions are the rules that tell upti.my when to create an incident. Every healthcheck can have one or more conditions attached to it. When a condition is met, upti.my automatically opens a new incident and records the affected healthcheck, timestamp, and severity.
Conditions focus on one thing: deciding whether a problem is real enough to warrant an incident. They do not handle notifications, enrichment, or routing. That part is handled by Workflows, where you configure destinations, message formatting, escalation chains, and everything else using a visual drag-and-drop builder.
ℹ️ Conditions Create. Workflows Notify.
Think of it this way: incident conditions decide when an incident is created. Workflows decide what happens next. This separation keeps your setup clean. You define detection logic in conditions and notification logic in workflows.
Common Settings
All condition types share the following configurable setting:
| Setting | Description |
|---|---|
| Severity | The severity assigned to incidents created by this condition: critical, high, medium, or low. Workflows can use severity to route notifications to the right channels. |
Condition Types
1. Consecutive Failures
The most straightforward condition. It creates an incident when a healthcheck fails a specified number of consecutive times. This is the default condition type and works well for clear-cut "is it up or down" monitoring.
In the dashboard, select Consecutive Failures as the condition type, then set the number of consecutive failures required before an incident is created. The default is 3 consecutive failures.
💡 Start Conservative
A threshold of 3 consecutive failures is a good starting point for most services. It filters out single transient errors while still detecting real outages quickly.
2. Sliding Window
Creates an incident based on the percentage of failures within a rolling time window. This is useful for services that have occasional transient failures. You might tolerate 10% failure but want an incident at 50%.
Configure the failure percentage threshold and the time window (in minutes) in the dashboard. For example, set 50% failures within a 5-minute window to catch sustained degradation while ignoring occasional blips.
ℹ️ Window Size Matters
Shorter windows (1 to 5 minutes) detect issues faster but may create incidents from transient failures. Longer windows (10 to 30 minutes) are more stable but slower to react. Match the window size to the criticality of the service.
3. Latency
Creates an incident when the response time of a healthcheck exceeds a specified threshold. This is useful for detecting performance degradation before a service goes fully down. You define a latency threshold in milliseconds, and an incident is created when the response time crosses that boundary.
In the dashboard, select Latency as the condition type and enter the maximum acceptable response time in milliseconds. For example, set a 2000ms threshold for an API that should respond within 2 seconds.
4. Average Latency
Similar to the Latency condition, but evaluates the average response time over a rolling window instead of individual check results. This smooths out occasional spikes and only creates an incident when the overall performance trend exceeds the threshold.
Configure the latency threshold in milliseconds and the time window over which to calculate the average. This is ideal for services where occasional slow responses are acceptable, but sustained high latency is not.
💡 Pair with Severity-Based Routing
Use a high severity Consecutive Failures condition for total outages and a medium severity Latency condition for performance degradation. Then configure your workflows to route each severity to the appropriate channel.
SLA Tracking
Incident conditions also support SLA target percentages and SLA time windows. Set your uptime target (e.g., 99.9%) and the evaluation period in days. upti.my tracks your actual uptime against the target and surfaces it in your dashboard metrics.
What Happens After an Incident is Created?
Once a condition creates an incident, the incident enters the incident lifecycle (Ongoing, Investigating, Identified, Monitoring, Resolved). From there, Workflows take over.
In the workflow builder, you configure everything that happens after detection:
- Destinations - where notifications go (Slack, Discord, Email, Teams, PagerDuty, WhatsApp, SMS, webhooks, and status pages)
- Enrichment - automatically attach application context or AI-powered analysis to your alerts
- Routing - use Filter nodes to send critical incidents to PagerDuty and lower-severity alerts to Slack
- Escalation chains - add delays between notification stages so your team has time to respond
- Deduplication - prevent notification floods during major outages
This separation means you can change how you get notified without touching your detection logic, and vice versa.