Finally, (successfully…) setup docker registry inside kind Kubernetes cluster

--

Background

Following the previous articles, I was trying to setup docker registry inside my Kubernetes cluster so that I can push things into that registry and the cluster would be able to pull image from that registry.

I could not find anything about that back then online, otherwise I would have stop proceeding and never been able to learn so much throughout the journey.

And here is the setup of kind cluster.

My Environment Setup

As mentioned in the kind cluster article, I dropped WSL2 and switch to VM as I expect more control on network (which WSL2 have its VM host that I cannot / not willing to tamper)

So how I operate is SSH into the VM, and setup the kind cluster and tools like kubectl inside the VM.

The kind cluster setup is same as the article above, and I would like to highlight several things that might be impactful to the setup:

  1. Use of Calico network (which I believe it created a network bridge in the VM that would be needed later)
  2. extraPortMappings settings for port 80 (http) , 443 (https) and 5443 (my docker registry port), this allow the VM host

Also, as I am testing from the host PC to access the services in the kind cluster using the Freenom domain name (and Let’s Encrypt tls cert associated), so I add all service’s domain name in my Windows’s hosts table (C:\Windows\System32\drivers\etc\hosts)

# hosts record like following
# VM IP all domain associate to kind cluster services
172.30.4.107 traefik.xxxxx.ml registry.xxxxx.ml

Docker Registry Setup

Repeating what have been done in the first article, but this time with success. The way I choose is “setup idea 1" which use a Traefik IngressRouteTCP having tls passthrough, and anticipate the network would go as follow:

docker client [in VM] =(https)=> traefik =(https)=> docker registry

This time after setting up, I can perform docker login and docker pull/push successfully.

Image Pulling — some hiccup

So after the success of pulling and pushing images, I further test with deployment

The first error I got is the ImagePullBackOff, and by inspecting the event in the target namespace, it show that the resolve of my domain name of docker registry is pointing to 127.0.1.1

So I suspect it’s related to DNS resolve.

What is not working — change coreDNS config

My understanding of DNS resolve in Kubernetes was, the service would lookup to dns service within the cluster, which is coreDNS in my kind cluster kubernetes (note that the service is named “kube-dns” for backward compatibility reason, but it’s actually using coreDNS)

And follow the article to try checking how my URL being resolve in the cluste, and I believe I can change the coreDNS config with applying config map as follow:

Note that the IP address 172.30.4.107 is my VM’s IP address. And I believe the DNS resolve chain from each pod is like follow:

pod's /etc/resolve.conf => coreDNS => kind cluster host (VM) => host machine => (home network DNS, if any) => Internet DNS

After the change in coreDNS config and see the nslookup resolving my registry domain URL from a pod point to 172.30.4.107 (VM IP), I go try redeploy the pod (pull image from in-cluster docker registry).

No luck, same error. And then I try to change the IP address in coreDNS config to my traefik service cluster IP or external IP, all no luck.

What is not working — using Trow (which is amazing, just not working in my setup)

Before I figured out it’s mainly about the networking

Trow is designed to be cluster management solution that host inside a Kubernetes cluster, they provide simple way to install (my take is helm chart) and some additional features like restricting which images can be use in the cluster.

Without further goes into detail, I have successfully deploy Trow, setting up Traefik route (instead of using Nginx config in helm chart) and performing image push and pull.

But again, getting the same error that it resolve the registry URL to 127.0.1.1.

Reason: the image pulling is NOT from within the cluster, but at the cluster node level.

I figure this out by reading this amazing stackoverflow discussion (this introduced me trow, and if I read this 3 months ago, I would likely not start/continue this journey)

What work s— setting the VM’s host table

According to the stackoverflow discussion mentioned, the image pulling resolve the registry URL at the node level, which is out of coreDNS control, so the next item the DNS resolve that I can easily control is the VM, so I go ahead and update the /etc/hosts file, this time it worked and I see the error message changed and getting a timeout.

Reason: the IP address we point to is important

My firs trial was to update VM’s host table, point the URL to cluster IP 10.96.43.248, which is dumb as the cluster IP cannot be resolve at node (the VM)

And then I change the IP to the “External-IP” 172.18.255.200, and then it worked (with some minor hiccups like unauthorized access to registry)

Some important point I would like to mention on networking and config of kind cluster extraPortMappings and Traefik

First of all, the external ip 172.18.255.200, what potentially it is.

At VM, the ip address give result as follow, we see that we have a network bridge which cover the external IP address (CIDR block 172.18.0.1/16 cover IP range 172.18.0.0–172.0.255.255).

As well, checking the iptables settings, we see that the 172.18.0.2 destination is in the DOCKER chain and the ports are the kind cluster’s control plane 6443 port and the 3 extraPortMappings port I configured in kind cluster.

So I would assume the network bridge is either created by Calico CNI or by kind cluster.

Regarding the extraPortMappings and Traefik setting, the network I believe flow as following:

When perform docker client operation, the docker client would go through:

VM port 5443 => extraPortMapping 35443 as node port => traefik entrypoint port 5443 =(ingressRoute)=> service’s port 9665 => pod's 5000 port
The image pulling agent, I believe is the kubelet, even I don’t see any documentation explicitly explained image pulling procedure / operation

When the cluster try to pull the image, it would go through:

Agent => the traefik’s entrypoint port 5443 => service's 9665 port => pods 5000 port

Note: I keep the port in kind cluster extraPortMappings host port and Traefik entrypoint the same (as 5443) so that either path would point to same port number (this is because the image being tagged is in pattern registry.xxxx.ml:5443/image:tag and the port is being included)

Resolving the authentication of docker registry

This one is straight forward, we need to create a image pulling secret, and assign it either to the service account or to the deployment

Detail on assigning secret to service account is in the official document

A minor hiccups — multiple namespaces

As I separate the pods and services in different namespaces, so does the image pulling secret, one cannot use an image pulling secret from another namespace in deployment.

So I tried the service account approach and I figured that even “default” service account is on a per namespace basis.

Running “kubectl get serviceaccount -A | grep default” provide result as follow:

First column is namespace, 2nd column is item name, so we have a “default” service account per namespace…

Up to this point, I am happy as the setup seems work, and the secret across namespace issue, I plan to use reflector (used when I sync TLS certificate secrets from cert manager) to manage it.

Conclusion

Again, I wish this actually help someone, as I don’t find too much resources on this topic, and I always think, if I cannot find some solution online, either it’s insane / not practical / stupid or it’s pretty tough / niche.

I wish I won’t bang my head to the wall too soon.

Join FAUN: Website 💻|Podcast 🎙️|Twitter 🐦|Facebook 👥|Instagram 📷|Facebook Group 🗣️|Linkedin Group 💬| Slack 📱|Cloud Native News 📰|More.

If this post was helpful, please click the clap 👏 button below a few times to show your support for the author 👇

--

--