Monitoring Agents
The uptimy agent is open source and runs directly on your infrastructure for checks, incidents, and deterministic remediation.
Overview
The agent is open source and available at github.com/uptimy/uptimy-agent. It runs locally on your infrastructure (bare metal, VMs, Docker, or Kubernetes) and can operate fully standalone.
When configured, it executes checks on schedule, creates deduplicated incidents, and runs operator-defined repair recipes. Control-plane connectivity is optional.
Installation Options
The open-source repository supports these installation paths:
- One-line installer (Linux/macOS) via
scripts/install.sh - Docker using
uptimy/agent:latest - Kubernetes DaemonSet using
deploy/kubernetes/daemonset.yaml - Build from source with
make build
curl -sSfL https://raw.githubusercontent.com/uptimy/uptimy-agent/master/scripts/install.sh | sudo bash
sudo vi /etc/uptimy/config.yaml
sudo systemctl enable --now uptimy-agentkubectl apply -f deploy/kubernetes/daemonset.yaml
kubectl -n uptimy edit configmap uptimy-agent-config
kubectl -n uptimy rollout restart daemonset/uptimy-agentHow Agents Work
The agent runs directly where your services run and continuously:
- Executes health checks from the config
- Creates and tracks incidents with deduplication and lifecycle management
- Runs deterministic repair recipes with retries, branching, and verification
ℹ️ Standalone by Default
Checks, incident detection, and repairs do not require cloud connectivity. If a control plane is configured, telemetry can be streamed over gRPC for fleet visibility and offline-agent alerts.
Local System Checks
The open-source agent currently supports these check types:
| Check Type | Description |
|---|---|
| HTTP | HTTP/HTTPS endpoint checks |
| TCP | TCP port connectivity checks |
| Process | System process alive checks |
| Docker Container | Docker container status checks |
| Docker Swarm | Docker Swarm node/service health checks |
| Disk | Disk usage threshold checks |
| Memory | Memory usage threshold checks |
| CPU | CPU usage threshold checks |
| Certificate | TLS certificate expiry checks |
Self-Healing Recovery Actions
Repairs are configured as recipes and linked by repairs rules to specific checks. The agent supports 12 recovery actions plus 3 recipe orchestration steps (wait, healthcheck, webhook). Safety guardrails block forbidden actions.
💡 Combine Checks with Recovery
Start in observe mode (checks only), then alert mode (webhook-only recipes), and then enable full remediation recipes as you gain confidence.
Learn More
Dive deeper into agent capabilities with the detailed guides below.