Monitoring Agents

The uptimy agent is open source and runs directly on your infrastructure for checks, incidents, and deterministic remediation.

Overview

The agent is open source and available at github.com/uptimy/uptimy-agent. It runs locally on your infrastructure (bare metal, VMs, Docker, or Kubernetes) and can operate fully standalone.

When configured, it executes checks on schedule, creates deduplicated incidents, and runs operator-defined repair recipes. Control-plane connectivity is optional.

Installation Options

The open-source repository supports these installation paths:

One-line installer (Linux/macOS) viascripts/install.sh
Docker using uptimy/agent:latest
Kubernetes DaemonSet usingdeploy/kubernetes/daemonset.yaml
Build from source with make build

Quick Install (Linux / macOS)

curl -sSfL https://raw.githubusercontent.com/uptimy/uptimy-agent/master/scripts/install.sh | sudo bash
sudo vi /etc/uptimy/config.yaml
sudo systemctl enable --now uptimy-agent

Kubernetes DaemonSet

kubectl apply -f deploy/kubernetes/daemonset.yaml
kubectl -n uptimy edit configmap uptimy-agent-config
kubectl -n uptimy rollout restart daemonset/uptimy-agent

How Agents Work

The agent runs directly where your services run and continuously:

Executes health checks from the config
Creates and tracks incidents with deduplication and lifecycle management
Runs deterministic repair recipes with retries, branching, and verification

ℹ️ Standalone by Default

Checks, incident detection, and repairs do not require cloud connectivity. If a control plane is configured, telemetry can be streamed over gRPC for fleet visibility and offline-agent alerts.

Local System Checks

The open-source agent currently supports these check types:

Check Type	Description
HTTP	HTTP/HTTPS endpoint checks
TCP	TCP port connectivity checks
Process	System process alive checks
Docker Container	Docker container status checks
Docker Swarm	Docker Swarm node/service health checks
Disk	Disk usage threshold checks
Memory	Memory usage threshold checks
CPU	CPU usage threshold checks
Certificate	TLS certificate expiry checks

Self-Healing Recovery Actions

Repairs are configured as recipes and linked by repairs rules to specific checks. The agent supports 12 recovery actions plus 3 recipe orchestration steps (wait, healthcheck, webhook). Safety guardrails block forbidden actions.

💡 Combine Checks with Recovery

Start in observe mode (checks only), then alert mode (webhook-only recipes), and then enable full remediation recipes as you gain confidence.

Learn More

Dive deeper into agent capabilities with the detailed guides below.

Local Checks

Configure local health checks to monitor internal services, processes, disk usage, memory, CPU, and more.

Self-Healing

Set up deterministic repair recipes with allowed actions, retries, verification steps, and safety guardrails.