-
Alex Ellis (VMware) authored
- the shutdown sequence meant that the kubelet was still passing work to the watchdog after the HTTP socket was closed. This change means that the kubelet has a chance to run its check before we finally stop accepting new connections. It will require some basic co-ordination between the kubelet's checking period and the "write_timeout" value in the container. Tested with Kubernetes on GKE - before the change some Pods were giving a connection refused error due to them being not detected as unhealthy. Now I receive 0% error rate even with 20 qps. Issue was shown by scaling to 20 replicas, starting a test with hey and then scaling to 1 replica while tailing the logs from the gateway. Before I saw some 502, now I see just 200s. Signed-off-by:Alex Ellis (VMware) <alexellis2@gmail.com>
Alex Ellis (VMware) authored- the shutdown sequence meant that the kubelet was still passing work to the watchdog after the HTTP socket was closed. This change means that the kubelet has a chance to run its check before we finally stop accepting new connections. It will require some basic co-ordination between the kubelet's checking period and the "write_timeout" value in the container. Tested with Kubernetes on GKE - before the change some Pods were giving a connection refused error due to them being not detected as unhealthy. Now I receive 0% error rate even with 20 qps. Issue was shown by scaling to 20 replicas, starting a test with hey and then scaling to 1 replica while tailing the logs from the gateway. Before I saw some 502, now I see just 200s. Signed-off-by:Alex Ellis (VMware) <alexellis2@gmail.com>
Loading