During the master node upgrade part of the BareOS cluster upgrade in PMK, currently, the user losses access to the K8s API endpoint for up to 15 – 30 sec due to VIP change as the VIP fails over if the node or network goes down immediately.
- Platform9 Managed Kubernetes - All Versions
- Keepalived is configured to perform a health check every 10 seconds. Thus, if the K8s Apiserver goes down right after the health check, it would take 9-10s for the next check + election time + upstream switch cache update to take place.
- An optimization feature request PMK8-I-136 has been filed to look into ways in which this switchover time can be reduced during upgrades by bringing down keepalived first, forcing a VIP failover before bringing down the K8s Apiserver as part of the pf9-kube service stop process which would bring down the switchover time during upgrade significantly (in terms of % total time compared to the current time).