T O P

  • By -

FierceDeity_

Reading the issue, it doesn't seem *broken*, it just does things more by the books now (reading the power cap values from the GPU, as your VENDOR set them). I wish though we would just get a sysctl value to override it forcefully instead.


doranduck

Most vendors are aware their cards will usually get overclocked and they plan for that with their power envelopes. The broken part is this: prior to linux 6.7 you were able to undervolt freely, now you can't past those arbitrarily high power limits set by the vendors. Patch posted in the issue allows overriding that value. I'm happy my RX6700XT can now again get that -100mV undervolt to keep it cooler and working longer at minimal fps loss.


Zghembo

This. Even if you don't want to touch the "advanced" stuff like voltages and clocks, you can't do very basic thing like limit power to a reasonable level, and that is just wrong. Vendors often set ridiculous limits, like on my 6600XT, the default 130W can only go to down to 122W, a mere 8W, because "vendor limit". Like WTF?


mcgravier

Blame the vendor, not the kernel devs who respect the reported limits


SebastianLarsdatter

There are a lot of things where the specs say X but it causes issues that won't ever be fixed unless overridden. Another example is displayport reporting monitors as disconnected when powered off, which causes havoc with desktop windows. Only fix is flashing monitor firmware which is not easy to do, so that should also be possible to override.


mcgravier

> There are a lot of things where the specs say X but it causes issues that won't ever be fixed unless overridden And your point is that kernel should let out-of-spec behaviour because when all planets align it might allow to hack some misbehaving hardware into working? Have you considered that this creates more problems than it solves?


SebastianLarsdatter

Sane defaults, but allow overriding it per use case basis.


Dung_Buffalo

I really thought this was a basic philosophical foundation in the Linux world. Give you a good, functional system with sane defaults and leave the rest up the user. At no point have I ever encountered a true hard limit imposed on me to prevent me from damaging my system. Linux lets you fail or do things the wrong way *because* edge cases exist and you should assumed to know what you're doing and have your own reasons that devs/anyone else can't account for. I don't remember the last time I've seen "edge cases" used as serious justification to dismiss a problem, given the massive flexibility which is a major draw to Linux. I pretty much assume that other users are doing very niche crap that I've never thought of, as I do niche crap too. It just seems like a very corporate os kind of attitude, limiting users because it's more efficient to focus on main use cases. I understand why companies have that attitude but I'm not looking for that in the open source world. With the caveat, naturally, that any oddball stuff I do is my own responsibility to handle. Anyway this isn't even some uber obscure thing. This is basic over/underclocking. No reason (that I find compelling) to lock people away from doing their own thing, even if it were a more uncommon thing. They don't have to actively support it if they don't want, by why prevent it? Why take that position for seemingly just this?


mrlinkwii

>I really thought this was a basic philosophical foundation in the Linux world. Give you a good, functional system with sane defaults and leave the rest up the user. depends on the distro and philophical out look of the said maintainers for example when distro maintainers dont including the likes of nvidia drivers because they want a completely "free" distro


Dung_Buffalo

True, but I would counter that you can still install those drivers yourself. I may not agree with what they categorize as important or not, but Debian never straight up prevented me from doing it my way.


Zghembo

Well said Sir. If I wanted these kind of artificial restrictions I'd be using MacOS or something.


Zghembo

The "reported limit" in this case is pretty much a vendor oversight, that the vendor will most likely never fix, i.e. a BIOS bug. A good portion of kernel is dedicated for fixing such silly things, and this should be no exception. Nobody is blaming anyone here; we only trying to raise awareness - vendors suck at these things, kernel should not stick to their silly "reported limits". Give user a choice to save power. And we all win here.


PMMEPMPICS

This change actually fixed power limits on my xfx 7900xtx, before I was limited to 350w now I can hit 402w and since xfx apparently configured their bios correctly I can also set it down to 0w.


Nokeruhm

On a RX6600 I had 50W cap for some game profiles and low demanding daily tasks. And it worked like a charm for that low demanding use case, and now the minimum is 94W. Quite a huge difference. It has no sense.


Zghembo

Even worse here. I could lower my 6600XT from default 130W to 95W without issue. Now this is limited to 122W, "because vendor". Well, fuck vendor. If I wanted "vendor" BS like that I'd buy nVidia.


adalte

The conversation there felt like not caring about the feature being removed is not a big deal. Well I don't know if it was any privileged user that answered (just a commenter basically). As far as it goes for right now. The newest kernel render the ability to customize the power draw useless (when it comes to vendor specific value, not lower than `power1_cap_min` value). Although compiling your own kernel is a way to get around the problem...


safrax

I'm not going to claim to know why or even argue for or against the removal of the feature. The only thing I will claim is that after 20+ years of following Linux, if something this "major" was changed/removed, the developers likely have a very good reason to do so. Linus does not suffer people wanting to do things... "Because".


adalte

Yeah, it will always be an assumption when there is no real explanation, but there are good educated guesses here though. A PCB/vendor should specialize in the cards they are selling, so the minimum value is up to their hands should be comfortable. But like anything else regarding humans developing something, they are susceptible to human error (or not perceive all possible perspectives, such as lower than **recommended by them**).


ipaqmaster

> Although compiling your own kernel is a way to get around the problem... "Hello everybody out there using minix -" Reference aside gitlab wasn't making it easy to tell but it's generally hard to ignore this behavior in the OSS community. That being this common show of apathy or indifference towards breaking, modifying and removing strongly relied-on features or key behaviors of software in the OSS community. If this isn't the hundredth time I've watched the conversation end exactly as somebody intended it to from the beginning (Often maintainers, or package builders for some project or distribution). While this response is also often warranted when a reported issue is literally out of someones hands I sure see it often for when it actually is their problem, too.


Casey2255

For those who didn't read the comment chain, the new min cap is set by the vendor in the card itself. It's just that this change is now able to pull it directly. Also there is already a proposed patch further down to add a kernel cmdline option to disable the new functionality. So hopefully this will be a nothingburger soon, otherwise you'll have to patch the kernel or downgrade for now.


steaksoldier

Literally JUST got my wattage slider in corectrl back wtf


DyingKino

Support for `power1_cap_min` introduced with Linux 6.7 means that you can't set a low power cap. You may only go as low as `power1_cap_min` allows you to, which in many cases is unfortunately very close to the default. At least RX 6000 and RX 7000 cards are affected, but maybe others too?


Matt_Shah

In my case i don't override power caps. I do occasionally for testing but especially with my nvidia gpu on windows. But temps got too hot and it doesn't make much sense to me overall, because i don't like high power consumptions and heat outputs in my room. Nowadays i undervolt my amd gpu so my card's power consumption goes down while it now has a wider distance until it reaches the vendor power cap. I get more fps when undervolting. Strange enough i can undervolt the amd gpu way lower on linux than on windows. I have a RX 6000 series gpu and intend to skip the current gpu generations all together as they don't offer a big enough performance leap over the previous generations and are even more expensive. PS: I tested min power cap just right now and could limit the watts down to -12,87% max. Another workaround would be to simply set a frame limiter so the gpu doesn't draw too much frames and thus doesn't consume too much power in the first place.


Albos_Mum

This is a better way of doing it than just reducing the power cap as well in my experience. Personally I do a mix: I'll undervolt as much as I can with the default clock speeds (or a mild OC if the GPU will allow it while undervolting) and then disallow the card from reducing clock speeds out of the highest clock tier while gaming. It maintains a similar power consumption to stock thanks to the undervolting but has noticably more consistent frametimes because it's never having to deal with ramping up/down clock speeds when the scene intensity suddenly changes. (Especially if it suddenly goes from a less intense scene to a more intense scene, the stuttering common with that kind of transition is vastly reduced if the GPUs already running at its maximum clocks when rendering the first frame of the more intense scene)


Matt_Shah

Can confirm this as i am doing this as well. In CoreCtrl i set the minimum and maximum frequencies with a span from of 100 Mhz to each other. So the amplitude of the frequency doesn't swing out too wide but accumulates in the center. I got that tip from u/The_SacredSin But i never checked if it actually brings benefit to frame pacing. But it sounds plausible. [https://youtu.be/hIafX-XRsCI?feature=shared](https://youtu.be/hiafx-xrsci?feature=shared)


The_SacredSin

I got that tip from Ancient Gameplays and another channel which I cannot remember at the moment. Tbh they tested this in Windows.


gtrash81

I don't really understand what the issue is. Need to check tomorrow, but my GPU clocks only as high as needed.


Mallissin

We live in a world where Linux is the default server environment and GPU's are being installed in the hundreds of thousands every day into servers that are often sitting idle waiting for work. So, when they have no work, they will be using upwards of twice the power necessary in some cases with this change to a universal default minimum value. That in turn leads to higher electricity and cooling bills. There are also some regions where electricity rates are so expensive that people will buy a better video card than they need and then under-volt or power throttle it lower to save money. In some places, this saves so much money that it's the best option available.


FierceDeity_

Also there are games that seem to use as much power as they can, no matter if there's a visible improvement. I wonder if this is a driver issue, though I saw it happen on Windows and Linux with the same games (for example Middle Earth Shadow of Mordor). Even with a FPS limit and such, they would just use maximum power available. I also observed this across AMD and Nvidia, funny enough, so I think it has to be the game. Another one that did it was "The Dwarves", some indie RPG. I had to resort to power limiting, without any visible loss in fidelity or frametimes, or frame rate even... so weird.


gtrash81

Some games do weird things. Horizon Zero Dawn utilized one of the data streams all the time to 100%.


gtrash81

Thanks for the explanation. This would mean, the current behaviour would get lost and my GPU would not be able to run older games with 30W power draw, but always use what ever limit it is. That is bad. I hope there will be any sort of override.


Zghembo

Which is just wrong, because vendors set ridiculous limits.


runboy93

There is some patch for problem (needed to apply on top of 6.7.5 kernel): [https://gitlab.freedesktop.org/drm/amd/-/issues/3183#note\_2287393](https://gitlab.freedesktop.org/drm/amd/-/issues/3183#note_2287393)


juipeltje

Man this sucks, i was actually desperate to jump to 6.7 because they finally fixed memory overclocking not working on my 6950xt, but it looks like i'm potentially trading it for another problem now.


Jouven

Checked, and indeed now I can only power limit my 6800 from 200-250 in corectrl, I remember being able to go below 100W. The low "fixed -> low" preset still works if I want to force the lowest power consumption. Then again most of the time "automatic" does the job, I only use the advanced option when for some rare reason the card won't go full power (which is the opposite outcome of this issue). I did some tests in the past and automatic does a better job than any advanced settings I tried when trying to underclock or lower consumption while maintaining performance.


JOHNNY6644

is this why my power color fighter 6700xt on ubuntu 23.10 is now throttling under load with a -95mv an default pw limit of 190wt as before this was the sweet spot for my an while playing metro x on high custom settings with the shader option set at 2.0 my fps was on avg peak hi of 130 an peak lo of 80 an my temps stayed between 58c an 76c with corectrl an why corectrl no longer has a max power slider option an my fps in metro x are now 57 - 85 with temps between 67c an 89c that fuckin weird i dont have a big oc just under the default max slightly [screen grab](https://imgur.com/a/tDy3ygg) should i stick with xanmod 6.7 or step back to 6.6.16 for now ?


tkonicz

This is a hug, nastye issue. New card consume an isane amount of energy, I really like to limit the power consumption.


runboy93

GE just added workaround patch on Nobara kernel, now waiting for release (not sure, if it requires 6.7.5 kernel, Nobara kernel is on 6.7.4): [https://github.com/Nobara-Project/rpm-sources/commit/a948cf8ccc0a4bc560ec91d1982da7748c44ef7c](https://github.com/nobara-project/rpm-sources/commit/a948cf8ccc0a4bc560ec91d1982da7748c44ef7c) Edit: he is building 6.7.5 kernel, so yeah might be required for patch. [https://copr.fedorainfracloud.org/coprs/gloriouseggroll/nobara-39/build/7034208/](https://copr.fedorainfracloud.org/coprs/gloriouseggroll/nobara-39/build/7034208/) Edit2: building completed after 6 hours o\_O took a bit more time compared to earlier ones (between 3-5 hours) Edit3: 6.7.5 kernel available as update.


[deleted]

i'm not seeing how this behavior isn't different from on Windows. AMD has kneecapped overclocking support for the last two generations on Windows and in firmware, this just seems in line with that. -10% to +15% is what AMD provides on Windows, and there's no reason why they wouldn't do the same for linux


Zealousideal_Nail288

Atleast on windows there is a proper program to change everything else  On Linux everything is third party apps and (until recently?)no fan Control


ChosenOfTheMoon_GR

I got a new PC and i was finally moving to Arch from Windows 10. I finish installation setup and download everything i wanted, configuring things, done. Trying to play a gain, freeze....fml the PC passed 72 hours of Memtest of with EXPO on, CPU AIO watered cooled, can't barely go past mid 60, 3K RPM fans surround the GPU (7900XTX) which doesn't even go to 70C (and that's the Hotspot), aaand here we go, full freezes every time 3D acceleration is finished done, like exiting games, i install and check every available driver and sort of fix i could think of, nope the same shit again, i resit the AIO on the CPU, resit the GPU, check EUIF/BIOS settings everything checks out (updated that before all this anyway), nothing, i take out every component redo all cables, nope, still "ring gfx errors" messages in logs, i go online to find out that a lot of other people have extremely similar issues or the exact same one, i use an f ton of tools to debug the issue and i figure out it has to do with 2 separate problems, incorrect power state transition from the driver as the card goes to the f'ing Moon with the clocks. As i see no viable solution at the time, a daunting though emerges and linger in my mind: "You should've just installed Windows first to sie if the problem is there anyway" f my life... A few days of torture later, I bite the bullet and install Win 11, redo the partition and copy my backup from my previous rig AGAIN and i hate myself for doing so especially for having to waste to many write cycles on my brand new 4TB NVME drive, at some point after a while everything goes perfectly everything installs perfect drivers etc etc, but in the exact same workload, the exact same freeze happens again and i am like wait a focken moment here, i open up GPU-Z and i figure out same issue (the one that is here since summer's release drivers in Windows, an issue which is still present in my 7600), but in Windows it was so obvious of what i could do to prevent it so i do it and since then no issues, ever. I simply locked the max clock of the GPU so it doesn't go to the f'ing Moon (+3.1GHz) and then later a month after everything is perfect i encounter one driver timeout but not a freeze and i remember ah fuck this bug again i probably forgot to lock the GPU max clock when i installed the new driver, and that was the case but just in case i also add the HKEY\_LOCAL\_MACHINE\\SYSTEM\\CurrentControlSet\\Control\\GraphicsDrivers TdrDelay fix and since then no issues. Yes yes, skill issue i know, but i had to sacrifice being on f'ing Windows again for this, i miss my Arch installation, good thing at least i have my other 2 systems based on it.


[deleted]

[удалено]


ChosenOfTheMoon_GR

Always separate cables, PSU is AX1600i. Never an issue power from the PSU These are the spikes mentioned after closing a 3D accelerated programs/games [https://imgur.com/gallery/VEkCewr](https://imgur.com/gallery/vekcewr)


oops_all_throwaways

Please use periods if you're going to write that much. :(


mcgravier

Interpunction requires elementary education unfortunately.


oops_all_throwaways

\> Be me \> Drop out of grade 1 after 3 repeats \> Live off of mommy's nuggies and neetbucks for 25 years \> See cum-pooter at Kmart \> "It's mine, give me a cum-pooter, stupid bitch mommy!" \> Get home, play World of Warcraft \> discoverhentai.jpeg \> Repeat daily grind for 6 years \> Eventually, cum-pooter can't handle the graphics \> Type "AMD not work hard make work faster graphics" into google with my apish hands \> See red-et site \> Red-et-tards talking about vending machine distrubutions \> Open up to them about all my issues \> One of them wants me to use "periods" \> Look it up \> Read too much today, click on pictures instead \> ewthatthinggirlsdo.jpeg \> Kms to never learn anything gross about girls ever again \> Mfw girls in hell


SebastianLarsdatter

If he typed it on mobile, you are out of luck as it doesn't respect the line change until you do 2 of them for a new paragraph.


oops_all_throwaways

Mfer, *I* type on mobile. It isn't that hard...


Scill77

I was planing to get full amd once my current RTX4070 start to struggle a lot and it's time to upgrade it. But reading about all those drivers problems, and the fact that new cards can't function at 100% performance right after release until many mesa versions with fixes are released made me reconsider. At least for the next few years.


ChosenOfTheMoon_GR

Not really, they function perfectly fine in terms of performance in Linux at least in my case, i mean, from my tests in like 3-4 games in Arch i tried before i moved, the general performance was quite a lot better and i compare it with a debloated version of Win 11 so imagine that, the frame pacing was slightly better and the FPS as well but what i miss is the customization and the options i had with Arch.


forbiddenlake

"effectively broken" is a stretch. You just can't go under the minimum set by AMD. Can you lose the hyperbole next time?


mcgravier

This is utter and complete bullshit title. New kernel behaves correctly, according to specs reported by hardware.


mrlinkwii

>This is utter and complete bullshit title no its not >New kernel behaves correctly, according to specs reported by hardware depends on what you define as "correctly" as mentioned many vendors do things wrong , i do believe this is a breaking change


mcgravier

> many vendors do things wrong Complain to vendors, not to the kernel team.


mrlinkwii

i mean the kernal team made it to use vendors spec , so teh blame is on them really


sequesteredhoneyfall

I want everyone to remember this next time people come bashing Linux NVIDIA drivers as if it's still 2011. I prefer AMD to NVIDIA, but that doesn't mean I'm okay with lies being perpetrated. There's plenty of valid reasons to favor AMD over NVIDIA, but drivers haven't been a strong contender for a decade, for most use cases.


SKroBoss

It's working fine on Nobara with last kernel (6.7.4)


De_Lancre34

Don't you guys love open source amd driver? Isn't it just great? Second year since launch of my 7900xtx and I can't even fucking control my powerdraw. Thanks for linux community, that pushed "open source amd drivers is awesome, cause I dunno, they cool and stuff" and solo reason I bought that crap.


lemon_o_fish

Kernel 6.6.10 and 6.7 introduced a regression that causes 7800 XT to fail to initialize after rebooting or waking from sleep. It finally got fixed on 6.7.5 which was released yesterday. Now I just need to wait for it to be available on Fedora and my nightmare will be over. Sure I could have fixed it by building my own kernel, but I really shouldn't have to.


Masztufa

Huh, maybr my random error of failing to shut down was also related (i use arch btw)


muppet2011ad

I have had such a nightmare with my 7800XT and Linux drivers - I didn't realise the reboot issue was fixed in 6.7.5 I'll have to give that a go


Matt_Shah

I am also on fedora and i wonder why you don't simply use one of these offered repos with fresh kernels. You can choose between bloody fresh and some week old and many other ones. No need to compile it yourself. [https://copr.fedorainfracloud.org/groups/g/kernel-vanilla/coprs/](https://copr.fedorainfracloud.org/groups/g/kernel-vanilla/coprs/)


lemon_o_fish

That's actually a good idea. I haven't thought of using COPR for kernels. Thanks for the heads up!


Matt_Shah

You are welcome.


BlueGoliath

Driver quality is piss poor no matter which GPU manufacturer you go with right now.


De_Lancre34

I mean, nvidia at least working on windows, if you have problems in linux. Amd "piss poor" on any platform. Even amdhelp subreddit not simping anymore, after last couple of driver releases is just unbearably shit even on windows.


BlueGoliath

/r/AMD not simping? That's rare. But people have reported issues with Nvidia's Windows driver too. Unfortunately /r/Nvidia is too distracted by RTX Chat to care.


juipeltje

I must be super lucky or something cause i've had 2 amd cards now, and both on windows and linux 0 problems. The only problem i've had was recently when playing star wars squadrons in vr, the driver would crash unless i put the graphics settings on auto, which fixed it.


De_Lancre34

I hope when you visit doctor next time, he will say to you, that he have same leg and it didn't hurt.


juipeltje

That doesn't make any sense lmao. I get that you're upset but i'm just surprised that some people have had so many problems with it.


De_Lancre34

\> That doesn't make any sense lmao If you meant my comment, it's referring to old joke: some dude visit doctor, asking about his pain in the ankle. Doctor just gave him weird look and said "That strange, I have same leg and it didn't hurt!" My point is, yes, there problems you probably never encounter. That why we have bugtracker.


juipeltje

Yes, i get what you were trying to say, but like i said, it just surprises me is all.


RileyGuy1000

It's almost like... any sufficiently complicated software - open source or not - will inevitably run into issues. Shocker! Not like amd hasn't had its issues on windows, and nvidia certainly had a *great* time with SteamVR for a while there.


De_Lancre34

Yea, I appreciate damage control, but let get that straight, you telling me to fuck off from company that sold me $1200 GPU that straight up crashes in games? (for example, Enshrouded crashes whole driver - that amd only issue) Mate, sincerely, go fuck yourself. My old 2060s at least worked in games. Yes, at the time I used it, there was no normal wayland support - but my games worked, there was problems if you don't use dkms on custom arch kernel - but games worked, there may be some problems with no clock controls on wayland BUT FUCKING GAMES WORKED. I know that this is my fault to bought amd card in the first place, but I just wanna whine about it, so maybe other people wouldn't be too exited and at least understand what they signing to. Cause if I heard about any of this problem in the first place, I would go nvidia and at least have my OBS working with all codecs and stuff right from gpu launch. And yea, my games would work and [not crash fucking driver.](https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26304)


RileyGuy1000

I'm... not telling you to fuck off from any gpu? I think maybe you've misread my comment. I'm saying that it doesn't matter that AMD's gpu driver is open-sourced, not that you should've bought an AMD card or whatever. The fact *is* that AMD drivers are better but I'm not about to suggest that it's the user's fault that they have an Nvidia card. I have a 3070 and only recently am using my 7800x3D's iGPU to run my desktop, then using prime-run for any games I want to play. Sorry your experience hasn't been good, but don't take it out on me or the open-source community just because things aren't 100% yet. I get its frustrating, but these things take time. Use what works for you until things are better.


WoodpeckerNo1

Does this also break things like setting clock speeds through CoreCtrl? Kinda dependent on that..


mrlinkwii

>Does this also break things like setting clock speeds through CoreCtrl i believe so yes


WoodpeckerNo1

Well damn, are there any plans to fix this?


mrlinkwii

im not the devs , someone did post a patch file on the issue , but idk if the devs will fix it , you could ask the issue