Debugging Docker Swarm services

Prachi More
FAUN — Developer Community 🐾
3 min readSep 5, 2018

--

Quick debugging using docker experimental feature ~

Debugging docker swarm services is no easy task.

I was trying to set-up Jenkins as a swarm service with a compose stack deployment. After deploying the stack, jenkins service was running but the containers kept crashing. Since the stack was deployed in swarm service mode, it kept spinning the containers automatically. However, before the container logs could be viewed, the containers exited.

It becomes difficult in such scenarios to debug the underlying cause of the crashing service. We can see the status of the service with ‘docker service ls’ or ‘docker service ps’, but it just gives an overall idea of the state of the running service. In case of such scenarios as above it becomes important to be able to view the detailed debug logs to identify the exact cause of the backend issue.

Luckily docker service logs is a feature introduced in recent docker versions. This gives us a multiplexed output of logs from all the containers spun by the service. This is at present available as an experimental feature.

In below snippet I’m trying to check the logs of my jenkins service and it is not supported in the default daemon configuration.

$docker service logs t7w
only supported with experimental daemon

Enable Docker Daemon experimental features

Edit the daemon configuration as below to add the experimental features option. If the file ‘/etc/docker/daemon.json’ is not available, you can create it.

root@ip-172-31-22-115:/etc/docker# cat daemon.json
{
"experimental": true
}

Restart the docker service.

root@ip-172-31-22-115:/etc/docker# service docker restart

Check if the experimental feature is enabled.

root@ip-172-31-22-115:/etc/docker# docker version -f '{{.Server.Experimental}}'true

Debug the docker service logs

You can now check the docker service logs.

Below snippet enlists the service status which shows that my tasks are spawn, shut and failed within seconds. The specific error here “task: non-zero exit (1)” gives no details of the underlying cause. This error could be due to multiple reasons. Clueless!

root@ip-172-31-22-115:~/jenkins# docker service ps t7wID            NAME                   IMAGE                   NODE              DESIRED STATE  CURRENT STATE          ERROR                      PORTStsv42txxhkn9  jenkins_jenkins.1      bitnami/jenkins:latest  ip-172-31-22-115  Ready          Ready 3 seconds agogmqgkfsndw50   \_ jenkins_jenkins.1  bitnami/jenkins:latest  ip-172-31-22-115  Shutdown       Failed 3 seconds ago   "task: non-zero exit (1)"oz2ctrxi74qk   \_ jenkins_jenkins.1  bitnami/jenkins:latest  ip-172-31-22-115  Shutdown       Failed 11 seconds ago  "task: non-zero exit (1)"pgtoo3eaeqj7   \_ jenkins_jenkins.1  bitnami/jenkins:latest  ip-172-31-22-115  Shutdown       Failed 17 seconds ago  "task: non-zero exit (1)"kheptpifm3v9   \_ jenkins_jenkins.1  bitnami/jenkins:latest  ip-172-31-22-115  Shutdown       Failed 23 seconds ago  "task: non-zero exit (1)"

It’s time to check the logs and identify the root cause of the issue.

Below snippet shows the docker logs of all the tasks of my service that were spun above and failing. Observe the task IDs.

root@ip-172-31-22-115:~/jenkins# docker service logs t7wjenkins_jenkins.1.tsv42txxhkn9@ip-172-31-22-115    | /app-entrypoint.sh: line 3: /opt/bitnami/base/functions: No such file or directoryjenkins_jenkins.1.oz2ctrxi74qk@ip-172-31-22-115    | /app-entrypoint.sh: line 3: /opt/bitnami/base/functions: No such file or directoryjenkins_jenkins.1.gmqgkfsndw50@ip-172-31-22-115    | /app-entrypoint.sh: line 3: /opt/bitnami/base/functions: No such file or directoryjenkins_jenkins.1.pgtoo3eaeqj7@ip-172-31-22-115    | /app-entrypoint.sh: line 3: /opt/bitnami/base/functions: No such file or directoryjenkins_jenkins.1.c9lw4mx5igzs@ip-172-31-22-115    | /app-entrypoint.sh: line 3: /opt/bitnami/base/functions: No such file or directory

I can see above that the containers in my service are not able to find the absolute path in the start-up script defined in my Dockerfile. Once I rectify this the service should be up and running.

As simple as that! This helps us in debugging and identifying the exact root cause of docker swarm service failures.

Hope this helps for quick troubleshooting!

Join our community Slack and read our weekly Faun topics ⬇

If this post was helpful, please click the clap 👏 button below a few times to show your support for the author! ⬇

--

--