upti.my
All Articles
Infrastructure··6 min read

Your Cron Job Started. That Does Not Mean It Finished.

Most cron monitoring only checks if a job ran. That misses the failures that actually matter: jobs that start but never finish. Here is how to fix that with Heartbeat Job Chains.

A lot of cron monitoring is too shallow. It tells you that a job ran, or that your server was up when it was supposed to run, and that is where the thinking stops.

That might be enough for throwaway jobs. It is not enough for anything important.

Because for real scheduled work, "it started" and "it finished" are two very different things. That gap is where a lot of failures hide.

The Problem With Most Cron Monitoring

A lot of teams monitor cron jobs as if the only failure mode is "it never ran". But that is not how these jobs usually fail.

They fail like this:

  • The job starts and gets stuck halfway through
  • The script exits before the important part
  • The process times out after doing some of the work
  • The backup starts but never finishes
  • The sync starts on time and silently dies later

That is why "did it run?" is not a very useful question by itself.

The better questions are:

  • Did it start on time?
  • Did it finish successfully?
  • If it did not finish, where did it stop?

If you cannot answer those, your monitoring is giving you false confidence.

One Heartbeat Is Fine for Simple Jobs

There is nothing wrong with a single heartbeat if the job is short and simple. If a task runs quickly and one completion ping tells you the whole story, that is fine.

But a lot of jobs are not like that.

  • Backups are not
  • Imports are not
  • Sync jobs are not
  • Billing tasks are not
  • Scheduled reports are not

These jobs can start correctly and still fail in a way that matters. That is the class of failure basic cron monitoring often misses.

What You Actually Want to Know

For anything important, you usually care about two things:

  1. The job started
  2. The job finished

That sounds obvious, but it changes how you monitor the job.

  • If there is no start signal, the job never began.
  • If there is a start signal but no finish signal, the job probably got stuck, timed out, or failed during execution.
  • If both signals arrive, the run completed.

That is a much better operational signal than a single generic success ping.

This Is Why upti.my Has Heartbeat With Job Chain

In upti.my, this is handled with Heartbeat with Job Chain. The idea is simple. Instead of treating a job like one event, you track it as a sequence.

For example:

curl "https://heartbeats.upti.my/v1/heartbeat/<heartbeat-id>?step=start"
curl "https://heartbeats.upti.my/v1/heartbeat/<heartbeat-id>?step=finish"

This lets you track the shape of the run, not just whether one request happened at some point. That matters because "job triggered" is not the same as "job completed successfully".

A Practical Example

Say you have a nightly backup.

A lot of setups only send one ping after the backup command finishes. That works if everything goes well. But if the job starts and gets stuck halfway through, that single-ping model tells you very little.

A better version looks like this:

backup.sh
#!/bin/bash
set -e

curl "https://heartbeats.upti.my/v1/heartbeat/<heartbeat-id>?step=start"

pg_dump mydb > /backups/mydb.sql

curl "https://heartbeats.upti.my/v1/heartbeat/<heartbeat-id>?step=finish"

Now the signal is more useful:

  • No start means the job never began.
  • Start without finish means it failed during execution.
  • Start and finish means it completed.

That is the kind of monitoring that actually helps when something breaks.

Why This Matters More Than People Think

Silent cron failures are annoying because they usually do not show up as immediate downtime. They show up later.

  • You find out your backup was broken when you need to restore it.
  • You find out a sync stopped when someone notices stale data.
  • You find out invoices were not generated after a customer asks about billing.
  • You find out reports were never sent because somebody complains.

By then, the failure is old news. You are already dealing with the fallout.

That is why "the server is healthy" is not enough, and "the job probably ran" is definitely not enough. For important scheduled work, you want direct visibility into whether the job started and whether it actually reached the end.

A Simple Rule

Use one heartbeat when the job is short, simple, and easy to validate with one completion event.

Use Heartbeat with Job Chain when the job takes time, can fail midway, has meaningful execution stages, or matters to the business if it only partially runs.

That usually covers backups, syncs, imports, exports, billing tasks, ETL pipelines, and reporting jobs.

📌Key Takeaways

  • 1"Did it run?" is a weak question for important cron jobs
  • 2Jobs that start but never finish are a common and silent failure mode
  • 3Track both start and finish signals to get real visibility
  • 4Heartbeat with Job Chain gives you the shape of each run
  • 5Use single heartbeats for simple jobs, job chains for anything that matters

For important cron jobs, a better question is: did it start on time, and did it finish successfully? That is the difference between basic monitoring and useful monitoring. If you only track one heartbeat, you can miss a whole class of failures where the job started but never completed. Heartbeat with Job Chain fixes that by giving you visibility into the run from start to finish.