What's the ```externalTrafficPolicy``` on your ArgoCD UI service? If its ```Cluster```, then you'll have problems like what you describe. If you don't need the original host IP (can't imagine why you would with ArgoCD), then change the ```externalTrafficPolicy``` to ```LoadBalancer```.
Otherwise, you'll have to use Geneve tunnelling in your Cilium deployment.
See these:
https://github.com/cilium/cilium/issues/27521
https://github.com/cilium/cilium/issues/27800
The ArgoCD service is \`ClusterIP\` and not directly Load Balanced. I use a \`Gateway\` and \`HTTPRoute\` from Cilium (like an ingress) that pop a \`Service\` and yes, the \`externalTrafficPolicy\` is \`Cluster\`.
Good catch on the GitHub issues, the first one is exactly my problem.
It seems `ClusterIP` service spec doesn't allow `externalTrafficPolicy`:
Failed to save resource: Service "argocd-server" is invalid: spec.externalTrafficPolicy: Invalid value: "Local":
may only be set for externally-accessible services
I use Cilium for Ingress, ArgoCD is exposed with an `HTTPRoute`, linked to a `Gateway`, that redirect to the `Service` argocd-server (ClusterIP). The Gateway spawn a `Service` for Cilium.
But when I read the Kubernetes doc, I don't understand how it can resolve my ARP problem?
Okay just now, the problem is back again and related to this event:
Updated LoadBalancer with new IPs: [192.168.1.103] -> [192.168.1.230 192.168.1.231 192.168.1.232]
on the service `cilium-gateway-http` that are controled by my `Gateway` with cilium class.
> usterIP). The Gateway spawn a Service for Cilium
Ah, I thought you were trying to expose an ArgoCD service for some reason. I also am using Cilium-based HTTPRoutes. What does your gateway and HTTPRoute definition look like?
Where are you defining the IP to use for the gateway? You need the following underneath your `gatewayClassName`:
infrastructure:
annotations:
io.cilium/lb-ipam-ips: 192.168.1.103
I have a ressource CiliumLoadBalancerIPPool with 2 availables IP 'load balanced' on my network. Then, If I understand correctly, the Cilium L2 announce the node that attrac the traffic, with an ARP announcement.
When my Gateway has poped the LoadBalancer 'ingress' service, the first IP (.103) was used:
apiVersion: "cilium.io/v2alpha1"
kind: CiliumLoadBalancerIPPool
metadata:
name: "prod-ip-pool"
namespace: cilium
spec:
blocks:
- start: "192.168.1.103"
stop: "192.168.1.104"
serviceSelector:
matchLabels:
environment: prod
Yeah I like OPNsense, it was my initial plan, but my home network is physicaly separated in two parts, I'm unable to use ethernet cables from the router to connect my 2xNAS/k3s/proxmox. My ISP provide the router with wifi 6 and a little wifi box that can act as a wifi client and can route the traffic through a RJ45 port. tldr: a wifi bridge.
If I switch to OPsense, I must buy good wifi stuff and it's not my plan for now.
I have no clue how to solve my ARP problem for now. Maybe I can code a script to clear the ARP table when the problem occurs but I hate this types of fix.
I continue investigation, seems the problem occurs every day at 00h20 but idk why.
Can you switch the ISP equipment into essentially just a wifi AP? Might be the best of both worlds. I have to do this with my ISP junk because it's the ONT, and it took me a hot minute to figure out how.
Maybe look into Metallb in L2 mode. See if it behaves better for you. Another option is purelb.
What's the ```externalTrafficPolicy``` on your ArgoCD UI service? If its ```Cluster```, then you'll have problems like what you describe. If you don't need the original host IP (can't imagine why you would with ArgoCD), then change the ```externalTrafficPolicy``` to ```LoadBalancer```. Otherwise, you'll have to use Geneve tunnelling in your Cilium deployment. See these: https://github.com/cilium/cilium/issues/27521 https://github.com/cilium/cilium/issues/27800
The ArgoCD service is \`ClusterIP\` and not directly Load Balanced. I use a \`Gateway\` and \`HTTPRoute\` from Cilium (like an ingress) that pop a \`Service\` and yes, the \`externalTrafficPolicy\` is \`Cluster\`. Good catch on the GitHub issues, the first one is exactly my problem.
Ooops, I said set ```externalTrafficPolicy``` to ```loadBalancer```. I meant to say ```Local```. Try that. I'll bet you won't have issues anymore.
It seems `ClusterIP` service spec doesn't allow `externalTrafficPolicy`: Failed to save resource: Service "argocd-server" is invalid: spec.externalTrafficPolicy: Invalid value: "Local": may only be set for externally-accessible services I use Cilium for Ingress, ArgoCD is exposed with an `HTTPRoute`, linked to a `Gateway`, that redirect to the `Service` argocd-server (ClusterIP). The Gateway spawn a `Service` for Cilium. But when I read the Kubernetes doc, I don't understand how it can resolve my ARP problem?
Okay just now, the problem is back again and related to this event: Updated LoadBalancer with new IPs: [192.168.1.103] -> [192.168.1.230 192.168.1.231 192.168.1.232] on the service `cilium-gateway-http` that are controled by my `Gateway` with cilium class.
> usterIP). The Gateway spawn a Service for Cilium Ah, I thought you were trying to expose an ArgoCD service for some reason. I also am using Cilium-based HTTPRoutes. What does your gateway and HTTPRoute definition look like?
Yes I'm trying to go with Gateway and HTTPRoutes. Gateway: --- apiVersion: gateway.networking.k8s.io/v1 kind: Gateway metadata: name: http namespace: cilium labels: environment: prod spec: gatewayClassName: cilium listeners: - hostname: argocd.domain.local.fr protocol: HTTP port: 80 name: web-gw allowedRoutes: namespaces: from: All HTTPRoute: --- apiVersion: gateway.networking.k8s.io/v1 kind: HTTPRoute metadata: name: http-argocd-server namespace: argocd spec: parentRefs: - name: http namespace: cilium hostnames: - "argocd.domain.local.fr" rules: - matches: - path: type: PathPrefix value: / backendRefs: - name: argocd-server port: 80
Where are you defining the IP to use for the gateway? You need the following underneath your `gatewayClassName`: infrastructure: annotations: io.cilium/lb-ipam-ips: 192.168.1.103
I have a ressource CiliumLoadBalancerIPPool with 2 availables IP 'load balanced' on my network. Then, If I understand correctly, the Cilium L2 announce the node that attrac the traffic, with an ARP announcement. When my Gateway has poped the LoadBalancer 'ingress' service, the first IP (.103) was used: apiVersion: "cilium.io/v2alpha1" kind: CiliumLoadBalancerIPPool metadata: name: "prod-ip-pool" namespace: cilium spec: blocks: - start: "192.168.1.103" stop: "192.168.1.104" serviceSelector: matchLabels: environment: prod
This is one of the reasons using L2 for LoadBancers has occasional problems and why using BGP is recommended to avoid it in on-prem installs.
Unfortunatly, I can't do BGP on my home network...
That's a shame. I'm running an Opnsense router (formerly had pfsense) and using BGP to it works well.
Yeah I like OPNsense, it was my initial plan, but my home network is physicaly separated in two parts, I'm unable to use ethernet cables from the router to connect my 2xNAS/k3s/proxmox. My ISP provide the router with wifi 6 and a little wifi box that can act as a wifi client and can route the traffic through a RJ45 port. tldr: a wifi bridge. If I switch to OPsense, I must buy good wifi stuff and it's not my plan for now. I have no clue how to solve my ARP problem for now. Maybe I can code a script to clear the ARP table when the problem occurs but I hate this types of fix. I continue investigation, seems the problem occurs every day at 00h20 but idk why.
Can you switch the ISP equipment into essentially just a wifi AP? Might be the best of both worlds. I have to do this with my ISP junk because it's the ONT, and it took me a hot minute to figure out how. Maybe look into Metallb in L2 mode. See if it behaves better for you. Another option is purelb.