I’m running a Docker Swarm cluster with 13 nodes (some managers, some workers). I deployed a stack (uptrace) with several services, including a global service:
services:
uptrace_cadvisor:
image: container.my_Domain/ghcr/google/cadvisor:0.56.2
deploy:
mode: global
restart_policy:
condition: on-failure
networks:
- uptrace_back
networks:
uptrace_back:
driver: overlay
However, this global service does not deploy on all nodes — only on 10 out of 13. The missing nodes are .37, .38, .45 (for example).
Observations:
On worker nodes, the containers are in Created state:
docker ps -a | grep uptrace
31043c3c9261 container.my_Domain/ghcr/google/cadvisor:0.56.2 "/usr/bin/entrypoint…" 2 hours ago Created uptrace_cadvisor.1cc9nxsdtw5dmgdpfwx5utbzb.vsdsq0k7ywu08s8s7cwlhfk1w
Logs from one of the affected nodes (mafalda) show:
level=error msg="fatal task error" error="invalid pool request: Pool overlaps with other one on this address space" module=node/agent/taskmanager node.id=1cc9nxsdtw5dmgdpfwx5utbzb service.id=shjrqkzjjld2xn451e4b0wt5q task.id=...
level=info msg="initialized VXLAN UDP port to 4789"
level=warning msg="failed to deactivate service binding for container uptrace_cadvisor ... No such container"
Overlay network inspection:
docker network inspect uptrace_back
Shows peers only on 10 nodes, missing the affected ones. Subnet used: 10.0.33.0/24. Other overlay networks exist on the cluster as well.
Networking checks (nc) on UDP ports:
nc -zvu 7946 # succeeds
nc -zvu 4789 # succeeds
All nodes can reach each other on the Swarm gossip and VXLAN ports.
The logs clearly show:
invalid pool request: Pool overlaps with other one on this address space
Question:
How can I resolve this “invalid pool request / pool overlaps” error so that global services can run on all nodes?
Should I recreate overlay networks with different subnets?
Can I fix this without removing all other running services?
Any recommendations for properly configuring multiple overlay networks in Swarm with many nodes?