Engineer in Tokyo

Using Kubernetes Health Checks

I’ve seen a lot of questions about Kubernetes health checks recently and how they should be used. I’ll do my best to explain them and the difference between the types of health checks and how each will affect your application.

Liveness Probes

Kubernetes health checks are divided into liveness and readiness probes. The purpose of liveness probes are to indicate that your application is running. Normally your app could just crash and Kubernetes will see that the app has terminated and restart it but the goal of liveness probes is to catch situations when an app has crashed or deadlocked without terminating. So a simple HTTP response should suffice here.

As a simple example here is a health check I often use for my Go applications.

http.HandleFunc("/healthz", func(w http.ResponseWriter, r *http.Request) {
  w.Write([]byte("OK"))
}
http.ListenAndServe(":8080", nil)

and in the deployment:

livenessProbe:
  # an http probe
  httpGet:
    path: /healthz
    port: 8080
  initialDelaySeconds: 15
  timeoutSeconds: 1

This just tells Kubernetes that the application is up and running. The initialDelaySeconds tells Kubernetes to delay starting the health checks for this number of seconds after it sees the pod is started. If your application takes a while to start up, you can play with this setting to help it out. The timeoutSeconds tells Kubernetes how long it should wait for responses for the health check. For liveness probes, this shouldn’t be very long but you do want to give your app enough time to respond even in cases where it’s under load.

If the app never starts up or responds with an HTTP error code then Kubernetes will restart the pod. You will want to do your best to not do anything too fancy in liveness probes since it could cause disruptions in your app if your liveness probes start failing.

Readiness Probes

Readiness probes are very similar to liveness probes except that the result of a failed probe is different. Readiness probes are meant to check if your application is ready to serve traffic. This is subtly different than liveness. For instance, say your application depends on a database and memcached. If both of these need to be up and running for your app to serve traffic, then you could say that both are required for your app’s “readiness”.

If the readiness probe for your app fails, then that pod is removed from the endpoints that make up a service. This makes it so that pods that are not ready will not have traffic sent to them by Kubernetes’ service discovery mechanism. This is really helpful for situations where a new pod for a service gets started; scale up events, rolling updates, etc. Readiness probes make sure that pods are not sent traffic in the time between when they start up, and and when they are ready to serve traffic.

The definition of a readiness probe is the same as liveness probes. Readiness probes are defined as part of a Deployment like so:

readinessProbe:
  # an http probe
  httpGet:
    path: /readiness
    port: 8080
  initialDelaySeconds: 20
  timeoutSeconds: 5

You will want to check that you can connect to all of your application’s dependencies in your readiness probe. To use the example where we depend on a database and memcached, we will want to check that we are able to connect to both.

Here’s what that might look like. Here I check memcached and the database and if one is not available I return a 503 response status.

http.HandleFunc("/readiness", func(w http.ResponseWriter, r *http.Request) {
  ok := true
  errMsg = ""

  // Check memcache
  if mc != nil {
    err := mc.Set(&memcache.Item{Key: "healthz", Value: []byte("test")})
  }
  if mc == nil || err != nil {
    ok = false
    errMsg += "Memcached not ok.¥n"
  }

  // Check database
  if db != nil {
    _, err := db.Query("SELECT 1;")
  }
  if db == nil || err != nil {
    ok = false
    errMsg += "Database not ok.¥n"
  }

  if ok {
    w.Write([]byte("OK"))
  } else {
    // Send 503
    http.Error(w, errMsg, http.StatusServiceUnavailable)
  }
})
http.ListenAndServe(":8080", nil)

More Stable Applications

Liveness and Readiness probes really help with the stability of applications. They help to make sure that traffic only goes to instances that are ready for it, as well as self heal when apps become unresponsive. They are a better solution to what my colleage Kelsey Hightower called 12 Fractured Apps. With proper health checks in place you can deploy your applications in any order without having to worry about dependencies or complicated entrypoint scripts. And applications will start serving traffic when they are ready so auto-scaling and rolling updates work smoothly.