Skip to main content
  1. Programming Languages/
  2. Scalable Node.js Architecture/

Node.js Health Checks: The Ultimate Guide to Liveness, Readiness, and Dependency Monitoring

Jeff Taakey
Author
Jeff Taakey
21+ Year CTO & Multi-Cloud Architect. Bridging the gap between theoretical CS and production-grade engineering for 300+ deep-dive guides.

It’s 3:00 AM. Your pager duty alert triggers. The load balancer is throwing 502 Bad Gateway errors, but your logs show the Node.js process is technically “running.”

This is the nightmare scenario for any backend developer. The process exists, but it’s zombie-locked—disconnected from the database, out of memory, or stuck in an infinite event loop. In the microservices landscape of 2025, simply knowing your application “started” isn’t enough. You need to know if it’s alive, if it’s ready to take traffic, and if its dependencies are healthy.

In this guide, we are going deep into implementing robust Health Checks in Node.js. We won’t just write a simple res.send('OK') route; we will build a production-grade monitoring system that integrates with Kubernetes (or any orchestrator), handles database disconnects gracefully, and ensures your application fails safely.

Why Health Checks Are Non-Negotiable
#

Before we write code, we need to clarify the terminology. In modern orchestration (like Kubernetes or AWS ECS), a single “health” endpoint is rarely sufficient. We usually distinguish between three specific states:

Probe Type Purpose typical K8s Action on Failure
Liveness Probe Checks if the process is running and not deadlocked. Restarts the container.
Readiness Probe Checks if the app is ready to accept traffic (e.g., DB connected). Removes the pod from the Load Balancer (stops sending traffic).
Startup Probe Checks if the app has finished initialization (useful for slow starts). Waits before running Liveness/Readiness probes.

If you mix these up, you risk restarting a container that is simply waiting for a database to wake up, causing a restart loop (CrashLoopBackOff) that makes the situation worse.

Prerequisites & Environment
#

To follow this tutorial, you should have the following:

  • Node.js: v20 or v22 (Active LTS versions).
  • Docker: To spin up dependencies (MongoDB/Redis) easily.
  • Knowledge: Basic understanding of Express.js and async/await.

Let’s set up our project structure.

mkdir node-health-monitor
cd node-health-monitor
npm init -y
npm install express mongoose redis @godaddy/terminus
  • express: Our web framework.
  • mongoose: To simulate a critical database dependency.
  • redis: To simulate a caching dependency.
  • @godaddy/terminus: The industry-standard library for handling graceful shutdowns and health checks in Node.js.

Step 1: The Basic “Naive” Health Check
#

The simplest form of a health check is a route that returns a 200 OK status. This is useful for a basic Liveness probe. It tells the orchestrator, “Yes, the event loop is turning, and the HTTP server is bound to the port.”

Create a file named server-basic.js:

const express = require('express');
const app = express();
const PORT = process.env.PORT || 3000;

// Basic Liveness Probe
app.get('/health/liveness', (req, res) => {
    // If this request can be processed, the event loop is not blocked.
    res.status(200).json({
        status: 'up',
        timestamp: new Date().toISOString(),
        uptime: process.uptime()
    });
});

app.listen(PORT, () => {
    console.log(`Server running on port ${PORT}`);
});

Why this isn’t enough: If your application relies on MongoDB to fetch user data, and MongoDB goes down, this /health/liveness route will still return 200 OK. Your load balancer will keep sending traffic to a broken instance, resulting in errors for your users.

Step 2: Integrating Dependency Monitoring
#

To implement a Readiness probe, we need to check downstream services. This is where things get tricky. You don’t want to check every dependency on every request (that’s a DDoS attack on yourself), but you need to know if the critical path is clear.

Let’s construct a more robust architecture.

The Architecture of a Robust Health Check
#

We will design a system that queries the status of our database and cache.

sequenceDiagram participant LB as Load Balancer/K8s participant Node as Node.js App participant Mongo as MongoDB participant Redis as Redis Cache LB->>Node: GET /health/readiness activate Node par Check Dependencies Node->>Mongo: Ping / Admin Command Node->>Redis: PING end Mongo-->>Node: OK (or Error) Redis-->>Node: PONG (or Error) alt All Dependencies OK Node-->>LB: 200 OK {status: "ready"} else One or More Failed Node-->>LB: 503 Service Unavailable end deactivate Node

Implementing the Readiness Logic
#

Let’s create server-advanced.js. We will set up connection logic for Mongo and Redis, and then build a health checker that verifies them.

const express = require('express');
const mongoose = require('mongoose');
const { createClient } = require('redis');

const app = express();
const PORT = 3000;

// 1. Setup Dependencies
// Note: In a real app, use environment variables for connection strings
const MONGO_URI = process.env.MONGO_URI || 'mongodb://localhost:27017/health_demo';
const REDIS_URL = process.env.REDIS_URL || 'redis://localhost:6379';

const redisClient = createClient({ url: REDIS_URL });

redisClient.on('error', (err) => console.error('Redis Client Error', err));

async function connectDeps() {
    try {
        await mongoose.connect(MONGO_URI);
        console.log('MongoDB Connected');
        
        await redisClient.connect();
        console.log('Redis Connected');
    } catch (error) {
        console.error('Initial connection failed:', error);
    }
}

connectDeps();

// 2. Health Check Helpers
async function checkMongo() {
    // 0 = disconnected, 1 = connected, 2 = connecting, 3 = disconnecting
    if (mongoose.connection.readyState === 1) {
        // Optional: Perform a lightweight command to ensure it's responsive
        try {
            await mongoose.connection.db.admin().ping();
            return { status: 'up' };
        } catch (e) {
            return { status: 'down', error: e.message };
        }
    }
    return { status: 'down', reason: 'disconnected' };
}

async function checkRedis() {
    try {
        const reply = await redisClient.ping();
        if (reply === 'PONG') return { status: 'up' };
        return { status: 'down', reason: 'Unexpected response' };
    } catch (e) {
        return { status: 'down', error: e.message };
    }
}

// 3. The Readiness Route
app.get('/health/readiness', async (req, res) => {
    const mongoStatus = await checkMongo();
    const redisStatus = await checkRedis();

    const isHealthy = mongoStatus.status === 'up' && redisStatus.status === 'up';

    const responseBody = {
        status: isHealthy ? 'ready' : 'not ready',
        timestamp: new Date().toISOString(),
        services: {
            database: mongoStatus,
            cache: redisStatus
        }
    };

    // Return 503 if dependencies are down so LB stops sending traffic
    if (!isHealthy) {
        return res.status(503).json(responseBody);
    }

    return res.status(200).json(responseBody);
});

// Liveness is kept simple
app.get('/health/liveness', (req, res) => res.status(200).json({ status: 'alive' }));

app.listen(PORT, () => {
    console.log(`Advanced Health Check Server on ${PORT}`);
});

Step 3: Graceful Shutdown with Terminus
#

One of the most overlooked aspects of health checks is Graceful Shutdown. When Kubernetes scales down a pod or deploys a new version, it sends a SIGTERM signal.

If you don’t handle this, your Node process dies instantly, killing any in-flight requests (e.g., a user halfway through uploading a file).

We will use @godaddy/terminus. It intercepts the shutdown signals, marks the readiness probe as “down” (so no new traffic comes in), allows existing requests to finish, and then cleans up connections.

Create server-production.js:

const http = require('http');
const express = require('express');
const mongoose = require('mongoose');
const { createClient } = require('redis');
const { createTerminus } = require('@godaddy/terminus');

const app = express();
const MONGO_URI = process.env.MONGO_URI || 'mongodb://localhost:27017/health_demo';
const REDIS_URL = process.env.REDIS_URL || 'redis://localhost:6379';

const redisClient = createClient({ url: REDIS_URL });

// --- Application Logic ---
app.get('/', (req, res) => {
    // Simulate some work
    setTimeout(() => res.send('Hello World'), 100);
});

// --- Health Check Logic for Terminus ---

async function onHealthCheck() {
    // This function runs when /health is requested
    // If this throws, Terminus returns 503
    
    // Check Mongo
    if (mongoose.connection.readyState !== 1) {
        throw new Error('MongoDB not ready');
    }
    
    // Check Redis
    if (!redisClient.isOpen) {
        throw new Error('Redis not ready');
    }
    
    // Optionally check DB ping
    // await mongoose.connection.db.admin().ping();
}

async function onSignal() {
    console.log('server is starting cleanup');
    // Close database connections, stop background jobs, etc.
    await Promise.all([
        mongoose.disconnect(),
        redisClient.quit()
    ]);
}

async function onShutdown() {
    console.log('cleanup finished, server is shutting down');
}

// --- Start Server ---

const server = http.createServer(app);

async function start() {
    try {
        await mongoose.connect(MONGO_URI);
        await redisClient.connect();
        console.log('Dependencies connected');

        // Attach Terminus
        createTerminus(server, {
            signal: 'SIGTERM',
            healthChecks: {
                '/health/readiness': onHealthCheck, 
                // We can add a separate liveness logic if needed, 
                // but usually readiness is the complex one.
                verbatim: true // returns the error message in body
            },
            onSignal, // cleanup logic
            onShutdown, // final logging
            timeout: 5000 // force kill after 5000ms if cleanup hangs
        });

        server.listen(3000, () => {
            console.log('Production Server running on 3000');
        });
    } catch (err) {
        console.error('Startup failed', err);
        process.exit(1);
    }
}

start();

Why this is better:
#

  1. Kubernetes Friendly: When K8s sends SIGTERM, Terminus keeps the server running for existing requests but fails the health check immediately.
  2. Resource Safety: It explicitly closes Redis and Mongo connections, preventing “Too many connections” errors on your DB server during rapid deployments.

Step 4: Docker & Kubernetes Configuration
#

Code is only half the battle. You need to configure your environment to use these endpoints.

Dockerfile HEALTHCHECK
#

Docker has a built-in instruction to check container health.

FROM node:22-alpine

WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .

# Native Docker Healthcheck (uses curl or wget)
# This runs inside the container
HEALTHCHECK --interval=30s --timeout=5s --start-period=5s --retries=3 \
  CMD wget --no-verbose --tries=1 --spider http://localhost:3000/health/readiness || exit 1

EXPOSE 3000
CMD ["node", "server-production.js"]

Kubernetes Probes (The Gold Standard)
#

In your deployment.yaml, you should configure liveness and readiness separately.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: node-app
spec:
  template:
    spec:
      containers:
      - name: node-app
        image: my-node-app:latest
        ports:
        - containerPort: 3000
        
        # 1. Liveness: Is the process dead?
        livenessProbe:
          httpGet:
            path: /health/liveness # Make sure to implement a simple endpoint for this!
            port: 3000
          initialDelaySeconds: 15
          periodSeconds: 20
        
        # 2. Readiness: Can it serve traffic?
        readinessProbe:
          httpGet:
            path: /health/readiness # Checks DB/Redis
            port: 3000
          initialDelaySeconds: 10
          periodSeconds: 10
          failureThreshold: 3

Common Pitfalls and Performance Tips
#

1. The “Cascading Failure” Trap
#

If your database is under heavy load, its response time might increase. If your health check queries the DB and times out, the Load Balancer will think your Node app is dead and kill it. This reduces the number of available Node instances, increasing the load on the remaining ones, causing a total system crash.

Solution: Set strict timeouts on your health check DB queries (e.g., 1000ms). If the DB is slow, report “Unhealthy” so traffic stops flowing to this node, but perhaps keep the Liveness probe passing so the container doesn’t restart unnecessarily.

2. Caching Health Checks
#

If you have a high-frequency polling interval (e.g., every 2 seconds) and 50 pods, your database will get hit with 25 queries per second just for health checks.

Solution: Cache the health result in a local variable for a few seconds.

let cachedStatus = null;
let lastCheck = 0;
const CACHE_TTL = 5000; // 5 seconds

async function checkHealthCached() {
    const now = Date.now();
    if (cachedStatus && (now - lastCheck < CACHE_TTL)) {
        return cachedStatus;
    }
    
    // Perform actual heavy checks...
    const result = await doHeavyDbCheck();
    
    cachedStatus = result;
    lastCheck = now;
    return result;
}

3. Don’t Expose Sensitive Info
#

Never return stack traces or database connection strings in your health check JSON. Attackers often scan /health endpoints to fingerprint technology stacks.

Conclusion
#

In 2025, a Node.js application is only as reliable as its observability. By implementing distinct Liveness and Readiness probes, handling graceful shutdowns with tools like Terminus, and being mindful of dependency cascading failures, you ensure your application can heal itself and scale smoothly.

Key Takeaways:

  • Liveness != Readiness. Liveness restarts the app; Readiness stops traffic.
  • Monitor Dependencies. Check DB and Redis connectivity in your Readiness probe.
  • Graceful Shutdown. Handle SIGTERM to allow in-flight requests to complete.
  • Don’t DDoS yourself. Cache health check results if necessary.

Now, go update your deployment.yaml before the next pager duty alert wakes you up!


Further Reading
#