upti.my
All Articles
Protocols··8 min read

How to Monitor gRPC Services in Production

A practical guide to monitoring gRPC services, from the standard health check protocol to custom RPC validation.

gRPC has become the backbone of modern microservices. But monitoring gRPC services requires different approaches than HTTP APIs. You can't just hit an endpoint and check for 200 OK.

The gRPC Health Checking Protocol

gRPC defines a standard health checking protocol (grpc.health.v1.Health) that every gRPC service should implement. It's not optional. It's the foundation of proper gRPC monitoring.

The Standard Protocol

health.proto
syntax = "proto3";

package grpc.health.v1;

service Health {
  rpc Check(HealthCheckRequest) returns (HealthCheckResponse);
  rpc Watch(HealthCheckRequest) returns (stream HealthCheckResponse);
}

message HealthCheckRequest {
  string service = 1;
}

message HealthCheckResponse {
  enum ServingStatus {
    UNKNOWN = 0;
    SERVING = 1;
    NOT_SERVING = 2;
    SERVICE_UNKNOWN = 3;
  }
  ServingStatus status = 1;
}

Why the Standard Matters

  • Load balancers understand it. Kubernetes, Envoy, and other infrastructure can use it natively.
  • Per-service granularity. Check individual services within a single gRPC server.
  • Watch support. Get streaming health updates instead of polling.
💡

Native Support

Most gRPC frameworks have built-in support for the health protocol. Check your framework's docs for implementation details.

Beyond Basic Health Checks

The standard protocol tells you the service is "SERVING", but that's just the beginning. Real production monitoring requires more.

1. Check Specific RPCs

Don't just check if the health service responds. Verify that your actual RPC endpoints work:

monitor.go
// Monitor your actual business RPCs
response, err := client.GetUser(ctx, &GetUserRequest{Id: "test-user"})
if err != nil {
    return HealthStatus_NOT_SERVING
}
if response.User == nil {
    return HealthStatus_NOT_SERVING
}

2. Validate Response Content

A successful gRPC response isn't enough. Validate the response data:

  • Are required fields present?
  • Do enum values match expected options?
  • Are repeated fields populated when expected?

3. Monitor All Error Codes

gRPC has 16 status codes. Each tells a different story:

grpc-status-codes.txt
OK = 0               // Success
CANCELLED = 1        // Client cancelled
UNKNOWN = 2          // Unknown error
INVALID_ARGUMENT = 3 // Bad request
DEADLINE_EXCEEDED = 4 // Timeout - ALERT ON THIS
NOT_FOUND = 5        // Resource missing
PERMISSION_DENIED = 7 // Auth failure - ALERT ON THIS
RESOURCE_EXHAUSTED = 8 // Rate limited
UNAVAILABLE = 14     // Service down - CRITICAL ALERT
INTERNAL = 13        // Server error - ALERT ON THIS
⚠️

Don't Ignore DEADLINE_EXCEEDED

Timeouts are often the first sign of bigger problems. A service responding to health checks but timing out on real requests is effectively down.

Setting Up gRPC Monitoring in upti.my

Configure gRPC monitors in the dashboard with native protocol support:

  1. Create an application for your gRPC service
  2. Add a gRPC healthcheck with your service address
  3. Enable the standard health protocol or specify custom RPCs
  4. Set latency thresholds and error code alerting

You get TLS/mTLS support, status code tracking, and per-RPC latency monitoring out of the box.


Common Mistakes

  • Only monitoring the health service. The health RPC can respond while other RPCs fail due to different dependencies.
  • Ignoring deadline exceeded. Slow services may respond to health checks but timeout on real requests.
  • Not testing authentication. Expired tokens cause UNAUTHENTICATED errors that health checks miss.
  • Missing streaming RPC issues. Unary health checks don't catch problems with streaming endpoints.

📌Key Takeaways

  • 1Implement the standard gRPC health checking protocol
  • 2Monitor actual RPCs, not just the health service
  • 3Track all 16 gRPC status codes with appropriate alerting
  • 4Validate response content, not just success/failure
  • 5Include latency thresholds in your health checks

gRPC monitoring requires protocol-native tooling. Generic HTTP monitors won't cut it. Set up proper gRPC health checks before your services hit production.