upti.my

Monitoring Agents

The uptimy agent is open source and runs directly on your infrastructure for checks, incidents, and deterministic remediation.

Overview

The agent is open source and available at github.com/uptimy/uptimy-agent. It runs locally on your infrastructure (bare metal, VMs, Docker, or Kubernetes) and can operate fully standalone.

When configured, it executes checks on schedule, creates deduplicated incidents, and runs operator-defined repair recipes. Control-plane connectivity is optional.

Installation Options

The open-source repository supports these installation paths:

  • One-line installer (Linux/macOS) viascripts/install.sh
  • Docker using uptimy/agent:latest
  • Kubernetes DaemonSet usingdeploy/kubernetes/daemonset.yaml
  • Build from source with make build
Quick Install (Linux / macOS)
curl -sSfL https://raw.githubusercontent.com/uptimy/uptimy-agent/master/scripts/install.sh | sudo bash
sudo vi /etc/uptimy/config.yaml
sudo systemctl enable --now uptimy-agent
Kubernetes DaemonSet
kubectl apply -f deploy/kubernetes/daemonset.yaml
kubectl -n uptimy edit configmap uptimy-agent-config
kubectl -n uptimy rollout restart daemonset/uptimy-agent

How Agents Work

The agent runs directly where your services run and continuously:

  1. Executes health checks from the config
  2. Creates and tracks incidents with deduplication and lifecycle management
  3. Runs deterministic repair recipes with retries, branching, and verification

ℹ️ Standalone by Default

Checks, incident detection, and repairs do not require cloud connectivity. If a control plane is configured, telemetry can be streamed over gRPC for fleet visibility and offline-agent alerts.

Local System Checks

The open-source agent currently supports these check types:

Check TypeDescription
HTTPHTTP/HTTPS endpoint checks
TCPTCP port connectivity checks
ProcessSystem process alive checks
Docker ContainerDocker container status checks
Docker SwarmDocker Swarm node/service health checks
DiskDisk usage threshold checks
MemoryMemory usage threshold checks
CPUCPU usage threshold checks
CertificateTLS certificate expiry checks

Self-Healing Recovery Actions

Repairs are configured as recipes and linked by repairs rules to specific checks. The agent supports 12 recovery actions plus 3 recipe orchestration steps (wait, healthcheck, webhook). Safety guardrails block forbidden actions.

💡 Combine Checks with Recovery

Start in observe mode (checks only), then alert mode (webhook-only recipes), and then enable full remediation recipes as you gain confidence.

Learn More

Dive deeper into agent capabilities with the detailed guides below.