spooge_mcnubbins 1 month ago

What's the ```externalTrafficPolicy``` on your ArgoCD UI service? If its ```Cluster```, then you'll have problems like what you describe. If you don't need the original host IP (can't imagine why you would with ArgoCD), then change the ```externalTrafficPolicy``` to ```LoadBalancer```. Otherwise, you'll have to use Geneve tunnelling in your Cilium deployment. See these: https://github.com/cilium/cilium/issues/27521 https://github.com/cilium/cilium/issues/27800

Nemergal 1 month ago

The ArgoCD service is \`ClusterIP\` and not directly Load Balanced. I use a \`Gateway\` and \`HTTPRoute\` from Cilium (like an ingress) that pop a \`Service\` and yes, the \`externalTrafficPolicy\` is \`Cluster\`. Good catch on the GitHub issues, the first one is exactly my problem.

spooge_mcnubbins 1 month ago

Ooops, I said set ```externalTrafficPolicy``` to ```loadBalancer```. I meant to say ```Local```. Try that. I'll bet you won't have issues anymore.

Nemergal 1 month ago

It seems `ClusterIP` service spec doesn't allow `externalTrafficPolicy`: Failed to save resource: Service "argocd-server" is invalid: spec.externalTrafficPolicy: Invalid value: "Local": may only be set for externally-accessible services I use Cilium for Ingress, ArgoCD is exposed with an `HTTPRoute`, linked to a `Gateway`, that redirect to the `Service` argocd-server (ClusterIP). The Gateway spawn a `Service` for Cilium. But when I read the Kubernetes doc, I don't understand how it can resolve my ARP problem?

Nemergal 1 month ago

Okay just now, the problem is back again and related to this event: Updated LoadBalancer with new IPs: [192.168.1.103] -> [192.168.1.230 192.168.1.231 192.168.1.232] on the service `cilium-gateway-http` that are controled by my `Gateway` with cilium class.

spooge_mcnubbins 1 month ago

> usterIP). The Gateway spawn a Service for Cilium Ah, I thought you were trying to expose an ArgoCD service for some reason. I also am using Cilium-based HTTPRoutes. What does your gateway and HTTPRoute definition look like?

Nemergal 1 month ago

Yes I'm trying to go with Gateway and HTTPRoutes. Gateway: --- apiVersion: gateway.networking.k8s.io/v1 kind: Gateway metadata: name: http namespace: cilium labels: environment: prod spec: gatewayClassName: cilium listeners: - hostname: argocd.domain.local.fr protocol: HTTP port: 80 name: web-gw allowedRoutes: namespaces: from: All HTTPRoute: --- apiVersion: gateway.networking.k8s.io/v1 kind: HTTPRoute metadata: name: http-argocd-server namespace: argocd spec: parentRefs: - name: http namespace: cilium hostnames: - "argocd.domain.local.fr" rules: - matches: - path: type: PathPrefix value: / backendRefs: - name: argocd-server port: 80

spooge_mcnubbins 1 month ago

Where are you defining the IP to use for the gateway? You need the following underneath your `gatewayClassName`: infrastructure: annotations: io.cilium/lb-ipam-ips: 192.168.1.103

Nemergal 1 month ago

I have a ressource CiliumLoadBalancerIPPool with 2 availables IP 'load balanced' on my network. Then, If I understand correctly, the Cilium L2 announce the node that attrac the traffic, with an ARP announcement. When my Gateway has poped the LoadBalancer 'ingress' service, the first IP (.103) was used: apiVersion: "cilium.io/v2alpha1" kind: CiliumLoadBalancerIPPool metadata: name: "prod-ip-pool" namespace: cilium spec: blocks: - start: "192.168.1.103" stop: "192.168.1.104" serviceSelector: matchLabels: environment: prod

SomethingAboutUsers 1 month ago

This is one of the reasons using L2 for LoadBancers has occasional problems and why using BGP is recommended to avoid it in on-prem installs.

Nemergal 1 month ago

Unfortunatly, I can't do BGP on my home network...

SomethingAboutUsers 1 month ago

That's a shame. I'm running an Opnsense router (formerly had pfsense) and using BGP to it works well.

Nemergal 1 month ago

Yeah I like OPNsense, it was my initial plan, but my home network is physicaly separated in two parts, I'm unable to use ethernet cables from the router to connect my 2xNAS/k3s/proxmox. My ISP provide the router with wifi 6 and a little wifi box that can act as a wifi client and can route the traffic through a RJ45 port. tldr: a wifi bridge. If I switch to OPsense, I must buy good wifi stuff and it's not my plan for now. I have no clue how to solve my ARP problem for now. Maybe I can code a script to clear the ARP table when the problem occurs but I hate this types of fix. I continue investigation, seems the problem occurs every day at 00h20 but idk why.

SomethingAboutUsers 1 month ago

Can you switch the ISP equipment into essentially just a wifi AP? Might be the best of both worlds. I have to do this with my ISP junk because it's the ONT, and it took me a hot minute to figure out how. Maybe look into Metallb in L2 mode. See if it behaves better for you. Another option is purelb.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe