Problem
- The pf9-kube service fails to start. The following error is observed in the hostagent.log (/var/log/pf9/hostagent.log) on the host:
[2019-01-01 09:48:17] Job for docker.service failed because the control process exited with error code. See "systemctl status docker.service" and "journalctl -xe" for details.
stderr=
- Upon further inspection of the Docker daemon.log (via journalctl), the following subsequent error is observed:
# journalctl -fu docker.service
Jan 01 09:47:10 node dockerd[26492]: time="2019-01-01T09:47:10.024478136Z" level=fatal msg="can't create unix socket /var/run/docker.sock: is a directory"Jan 01 09:47:10 node systemd[1]: docker.service: main process exited, code=exited, status=1/FAILURE
Jan 01 09:47:10 node systemd[1]: Failed to start Docker Application Container Engine (Installed by Platform9).
Jan 01 09:47:10 node systemd[1]: Unit docker.service entered failed state.
Jan 01 09:47:10 node systemd[1]: docker.service failed.
Environment
- Platform9 Managed Kubernetes - v3.2 - v3.6
- Docker
Cause
A race condition is present where a container may attempt to mount /var/run/docker.sock while Docker is starting up. An upstream bug has been identified related to this.
Resolution
This issue is addressed in PMK v3.7. As a workaround for older releases:
- Remove the directory
/var/run/docker.sock
# rm -rf /var/run/docker.sock
- Restart the Docker service.
# systemctl restart docker.service