Skip to content
  • Alex Ellis (VMware)'s avatar
    e67811c9
    Alter graceful shutdown sequence · e67811c9
    Alex Ellis (VMware) authored
    
    
    - the shutdown sequence meant that the kubelet was still passing
    work to the watchdog after the HTTP socket was closed. This change
    means that the kubelet has a chance to run its check before we
    finally stop accepting new connections. It will require some
    basic co-ordination between the kubelet's checking period and the
    "write_timeout" value in the container.
    
    Tested with Kubernetes on GKE - before the change some Pods were
    giving a connection refused error due to them being not detected
    as unhealthy. Now I receive 0% error rate even with 20 qps.
    
    Issue was shown by scaling to 20 replicas, starting a test with
    hey and then scaling to 1 replica while tailing the logs from the
    gateway. Before I saw some 502, now I see just 200s.
    
    Signed-off-by: default avatarAlex Ellis (VMware) <alexellis2@gmail.com>
    e67811c9
    Alter graceful shutdown sequence
    Alex Ellis (VMware) authored
    
    
    - the shutdown sequence meant that the kubelet was still passing
    work to the watchdog after the HTTP socket was closed. This change
    means that the kubelet has a chance to run its check before we
    finally stop accepting new connections. It will require some
    basic co-ordination between the kubelet's checking period and the
    "write_timeout" value in the container.
    
    Tested with Kubernetes on GKE - before the change some Pods were
    giving a connection refused error due to them being not detected
    as unhealthy. Now I receive 0% error rate even with 20 qps.
    
    Issue was shown by scaling to 20 replicas, starting a test with
    hey and then scaling to 1 replica while tailing the logs from the
    gateway. Before I saw some 502, now I see just 200s.
    
    Signed-off-by: default avatarAlex Ellis (VMware) <alexellis2@gmail.com>
Loading