watchdog/build.sh · 5ebad64fc3a2d812237ce3c69e6745fbeabafb1a · Team Jaz CS 598 CCC Final Project / openfaas-faas

Sep 17, 2018

Alter graceful shutdown sequence · e67811c9

Alex Ellis (VMware) authored Sep 17, 2018



- the shutdown sequence meant that the kubelet was still passing
work to the watchdog after the HTTP socket was closed. This change
means that the kubelet has a chance to run its check before we
finally stop accepting new connections. It will require some
basic co-ordination between the kubelet's checking period and the
"write_timeout" value in the container.

Tested with Kubernetes on GKE - before the change some Pods were
giving a connection refused error due to them being not detected
as unhealthy. Now I receive 0% error rate even with 20 qps.

Issue was shown by scaling to 20 replicas, starting a test with
hey and then scaling to 1 replica while tailing the logs from the
gateway. Before I saw some 502, now I see just 200s.

Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>

e67811c9

Alter graceful shutdown sequence

Alex Ellis (VMware) authored Sep 17, 2018



- the shutdown sequence meant that the kubelet was still passing
work to the watchdog after the HTTP socket was closed. This change
means that the kubelet has a chance to run its check before we
finally stop accepting new connections. It will require some
basic co-ordination between the kubelet's checking period and the
"write_timeout" value in the container.

Tested with Kubernetes on GKE - before the change some Pods were
giving a connection refused error due to them being not detected
as unhealthy. Now I receive 0% error rate even with 20 qps.

Issue was shown by scaling to 20 replicas, starting a test with
hey and then scaling to 1 replica while tailing the logs from the
gateway. Before I saw some 502, now I see just 200s.

Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>