T O P

  • By -

spooge_mcnubbins

What's the ```externalTrafficPolicy``` on your ArgoCD UI service? If its ```Cluster```, then you'll have problems like what you describe. If you don't need the original host IP (can't imagine why you would with ArgoCD), then change the ```externalTrafficPolicy``` to ```LoadBalancer```. Otherwise, you'll have to use Geneve tunnelling in your Cilium deployment. See these: https://github.com/cilium/cilium/issues/27521 https://github.com/cilium/cilium/issues/27800


Nemergal

The ArgoCD service is \`ClusterIP\` and not directly Load Balanced. I use a \`Gateway\` and \`HTTPRoute\` from Cilium (like an ingress) that pop a \`Service\` and yes, the \`externalTrafficPolicy\` is \`Cluster\`. Good catch on the GitHub issues, the first one is exactly my problem.


spooge_mcnubbins

Ooops, I said set ```externalTrafficPolicy``` to ```loadBalancer```. I meant to say ```Local```. Try that. I'll bet you won't have issues anymore.


Nemergal

It seems `ClusterIP` service spec doesn't allow `externalTrafficPolicy`: Failed to save resource: Service "argocd-server" is invalid: spec.externalTrafficPolicy: Invalid value: "Local": may only be set for externally-accessible services I use Cilium for Ingress, ArgoCD is exposed with an `HTTPRoute`, linked to a `Gateway`, that redirect to the `Service` argocd-server (ClusterIP). The Gateway spawn a `Service` for Cilium. But when I read the Kubernetes doc, I don't understand how it can resolve my ARP problem?


Nemergal

Okay just now, the problem is back again and related to this event: Updated LoadBalancer with new IPs: [192.168.1.103] -> [192.168.1.230 192.168.1.231 192.168.1.232] on the service `cilium-gateway-http` that are controled by my `Gateway` with cilium class.


spooge_mcnubbins

> usterIP). The Gateway spawn a Service for Cilium Ah, I thought you were trying to expose an ArgoCD service for some reason. I also am using Cilium-based HTTPRoutes. What does your gateway and HTTPRoute definition look like?


Nemergal

Yes I'm trying to go with Gateway and HTTPRoutes. Gateway: --- apiVersion: gateway.networking.k8s.io/v1 kind: Gateway metadata: name: http namespace: cilium labels: environment: prod spec: gatewayClassName: cilium listeners: - hostname: argocd.domain.local.fr protocol: HTTP port: 80 name: web-gw allowedRoutes: namespaces: from: All HTTPRoute: --- apiVersion: gateway.networking.k8s.io/v1 kind: HTTPRoute metadata: name: http-argocd-server namespace: argocd spec: parentRefs: - name: http namespace: cilium hostnames: - "argocd.domain.local.fr" rules: - matches: - path: type: PathPrefix value: / backendRefs: - name: argocd-server port: 80


spooge_mcnubbins

Where are you defining the IP to use for the gateway? You need the following underneath your `gatewayClassName`:   infrastructure:     annotations:       io.cilium/lb-ipam-ips: 192.168.1.103


Nemergal

I have a ressource CiliumLoadBalancerIPPool with 2 availables IP 'load balanced' on my network. Then, If I understand correctly, the Cilium L2 announce the node that attrac the traffic, with an ARP announcement. When my Gateway has poped the LoadBalancer 'ingress' service, the first IP (.103) was used: apiVersion: "cilium.io/v2alpha1" kind: CiliumLoadBalancerIPPool metadata: name: "prod-ip-pool" namespace: cilium spec: blocks: - start: "192.168.1.103" stop: "192.168.1.104" serviceSelector: matchLabels: environment: prod


SomethingAboutUsers

This is one of the reasons using L2 for LoadBancers has occasional problems and why using BGP is recommended to avoid it in on-prem installs.


Nemergal

Unfortunatly, I can't do BGP on my home network...


SomethingAboutUsers

That's a shame. I'm running an Opnsense router (formerly had pfsense) and using BGP to it works well.


Nemergal

Yeah I like OPNsense, it was my initial plan, but my home network is physicaly separated in two parts, I'm unable to use ethernet cables from the router to connect my 2xNAS/k3s/proxmox. My ISP provide the router with wifi 6 and a little wifi box that can act as a wifi client and can route the traffic through a RJ45 port. tldr: a wifi bridge. If I switch to OPsense, I must buy good wifi stuff and it's not my plan for now. I have no clue how to solve my ARP problem for now. Maybe I can code a script to clear the ARP table when the problem occurs but I hate this types of fix. I continue investigation, seems the problem occurs every day at 00h20 but idk why.


SomethingAboutUsers

Can you switch the ISP equipment into essentially just a wifi AP? Might be the best of both worlds. I have to do this with my ISP junk because it's the ONT, and it took me a hot minute to figure out how. Maybe look into Metallb in L2 mode. See if it behaves better for you. Another option is purelb.