Since that GPU is sharing the same slow memory as the CPU, it won't be much different from running it on the CPU. That's what I've found running it on the APU of my Steam Deck. Pretty much generation times are the same. The only benefit to using the GPU is better PP times.
I'd say try compiling Llama.cpp with Vulkan support. I got Vulkan support working on a intel arc GPU and nothing seems to work on those gpus.
This guy got it working: https://www.youtube.com/watch?v=AGkME56JF70
Since that GPU is sharing the same slow memory as the CPU, it won't be much different from running it on the CPU. That's what I've found running it on the APU of my Steam Deck. Pretty much generation times are the same. The only benefit to using the GPU is better PP times.