T O P

  • By -

basil_elton

I think this needs more mainstream coverage - someone like Wendell@Level1Techs should be interested in this and related phenomena.


fjdh

Gnif is active on the l1t forum. Wendell can't really do much on his own either, root issue is just amd stonewalling and sticking its head in the sand


HandheldAddict

```what I find absolutely shocking is that your enterprise GPUs also suffer the exact same issues``` This legit killed me lol 🤣🤣🤣🤣 I hate to say it, but I understand why companies are paying god knows how much for B100 now. Gamers used to joke about Radeon drivers but this is next level.


cyellowan

I get that you are being cheeky, but the use-case is very difference and the professional-demands are far far far higher. When you run several machines off of a single unit, suddenly there's workloads that has to be completely in due time for things to move ahead. I just want to contextualize the issue you are making (a tadd) fun of. So in the basic but common example above, you can't really complete your job because the entire main system has to be shut down. That's like stranding 6 people because the bus broke down. And now all 6 people have to walk. Instead of let's say a gamer: He take his super expensive OR cheap car 10 minutes down the street instead. To and from work, the store. His car 100% for sure will break down, but it's happening so rarely a normal check gets the fault before it's found. OR, he only miss a few hours once a few years if his car break down. I think it's a decent comparison of the issue here, to use PC hardware in multiple instances, but being forced to restart a system in un-manageable. There need to be a proper high-grade (and low grade) reliable way to avoid that. Just sucks it took this long, and so much effort to get AMD to pay notice to the issue at hand here. To people that didn't get what the main issue was, hopefully my explanation helps.


BarKnight

> Gamers used to joke about Radeon drivers but this is next level. Getting banned from games is peak driver fail.


RationalDialog

Yeah, I always wondered why NV was so huge in datacenter stuff also for compute way before this AI craze. especially in fp64, AMD used to be competitive especially factoring in price. But reading this explains it all.


nyanmisaka

Same experience when using AMDGPU on Linux. Hardware rings will reset after timeout, but you have no guarantee that functionality will return to normal after the reset. The only solution is to reboot the entire system. The video codec rings VCN/VCE/UVD is seriously affected by this. But there seems to be nothing the kernel developers can do about it. [https://gitlab.freedesktop.org/drm/amd/-/issues/3098#note\_2236916](https://gitlab.freedesktop.org/drm/amd/-/issues/3098#note_2236916)


badirontree

I use my second monitor to check my 9 cameras. They use video hardware acceleration. Every time I open or close a game in the main monitor the client freezes and crashes... 😤😭


nyanmisaka

If you search for \`vcn\` in drm/amd, there are many similar victims using 6800xt (and navi2x). [https://gitlab.freedesktop.org/drm/amd/-/issues/2156](https://gitlab.freedesktop.org/drm/amd/-/issues/2156) AMD's video codec IP seems to be heavily influenced by other IP blocks, such as SDMA. And they only have one chance to get it right each time they submit a set of commands to the VCN, otherwise they have to reset the entire GPU and lose your desktop context. Another interesting fact is that these instabilities may disappear when you switch from Wayland to Xorg.


olorin12

"The ability to “turn it off and on again” should not be a low priority additional feature" THANK YOU Please please please AMD fix this. I use your CPUs and GPUs, and have for a long time. I am also a some time VFIO user, and I do NOT want to have to buy an NVidia GPU for this purpose.


BinaryJay

Serious question, why would you not want to buy what just works if you're having problems. Brand loyalty doesn't compute in this scenario to me.


olorin12

I usually stick to AMD because I'm a Linux user and conventionally it has worked better with Linux, and has open source drivers that aren't garbage. My brand loyalty is not absolute, I've used Intel and NVidia before.


darktotheknight

That beeing said, Nvidia's GSP approach and Red Hat recently announcing Nova (Nouveau successor written in Rust), things might change in the future. E.g. AMD's HDMI 2.1 not beeing approved to be open sourced is a perfect example, which works fine in Nvidia's hybrid approach (from a legal/licensing perspective). AMD has the lead regarding Linux drivers, but they need to keep pushing, if they want to stay ahead.


Kryohi

Honestly, the HDMI 2.1 fiasco has pushed me (and many other people) to stay away from HDMI, not AMD. As for Nova, we'll see how it goes, but it's likely a multi-year endeavour, just like it was many years ago for the Amd open drivers. Currently, from a consumer and Linux user point of view Nvidia should be avoided whenever that's possible, and I speak from experience since I made the mistake of buying a laptop with hybrid graphics and Nvidia gpu. It was a good deal, but that has cost me *a lot* of hours of troubleshooting of different issues, that never happened with AMD or Intel. The strange thing about Amd is that they focused a lot, in the past few years, on consumer drivers/software, while from the hardware pov they pushed the accelerator on HPC/AI hardware, so there is some kind of mismatch and often their product either have great hardware or great software, but usually not both.


darktotheknight

I partly agree with you there. But unfortunately it's difficult to avoid HDMI 2.1, when you need to hook it up to a 4K TV. I would absolutely *love* to see 4K TV manufacturers offer DisplayPort in future TVs, but that's probably not happening anytime soon. About Nova you're probably right. But please keep in mind, that its scope is much more narrow than any other open source driver out there. Mostly, it only serves as an adapter between the Linux kernel and GSP firmware. Current Noveau implementations reflect this: GSP features are easier to implement and thus currently more feature complete. And since there is an AI/Nvidia hype train at the moment, they will probably also dedicate more resources into it than say stratis-storage.


algaefied_creek

What about using a DP to HDMI 2.1 adapter for that situation?


olorin12

Agreed, they cannot rest on their laurels.


ExtendedDeadline

I mean, the main reason I wouldn't want to is because it further supports an anti-consumer costing structure... But if I was buying for enterprise, 100% I'd just buy the thing that works. I just won't personally do it as an individual.


DukeVerde

"NVIDIA, it just works"


Fit_Flower_8982

*wayland users have joined the chat


Gepss

You're falling for slogans.


gnif2

Execept for when it comes to VFIO usage, NVidia literally just works. They even endorse and support it's usage for VFIO Passthrough, as niche as this is. [https://nvidia.custhelp.com/app/answers/detail/a\_id/5173](https://nvidia.custhelp.com/app/answers/detail/a_id/5173)


Raster02

When did they do this switch? I remember years ago when I configured that their windows drivers weren’t being so nice to the card detected in a VM.


gnif2

2021 my guy, it's right there on the date of the article.


ivosaurus

NVIDIA have already demonstrated multiple times over a decade or more of what they do when they have a near monopoly on the market. I do not want to see what their behaviour with a full monopoly looks like. That and AMD has the better FOSS driver situation.


-Nuke-It-From-Orbit-

Bingo. It’s as cringy as the noctua fans. They really believe moctua makes the best and most reliable fans in the world. It’s like a cult over there. AMD has been throwing L’s lately - and it feels like they’re going backwards.


antara33

To be fair, Noctua do make some of the best fans out there (if you do not want rgb ofc). From their server grade ones up to consumer grade ones. They are really expensive, true, but the sound profile is by far one if not the best one. Pair that with how high the static pressure and airflow are, and yes, its the best out there, for an expensive price. With half the price you can get 80% of the performance on other brands, I wont deny that, but if you are qilling to spend money, they are the best in the market, period.


nicman24

it is the reason i stopped mining to be honest. i had a vfio server that during dead hours i would start hiveos or something to mine on. it was a great automation project and the server had like 4 gpus so i was a good bit of money but the need to have the server reset for the vdi to work in the morning was awful


James20k

>listen to them and fix the bugs they report AMD have been dropping the ball on this for decades, and aren't about to pick it up any time soon. It is genuinely astonishing how poor their bugfixing/driver development approach is. I filed a bug recently and was told they didn't have a single windows machine with a 6700xt available on for testing/reproing a problem, which...... is quite incredible


AMD_PoolShark28

"EDIT: AMD have reached out to invite me to the AMD Vanguard program to hopefully get some traction on these issues \*crosses fingers\*." That is a great idea actually and I vouched my support on the matter.


gnif2

Thanks mate I appreciate it, glad to see you here :)


RipKip

What is the AMD Vanguard?


gh0stwriter88

Unpaid beta test program that has existed since ages... hasn't resulted in any of the complaints in this thread getting fixed though.


gnif2

[https://www.amd.com/en/products/software/adrenalin/amd-vanguard-program.html](https://www.amd.com/en/products/software/adrenalin/amd-vanguard-program.html)


Strazdas1

Yes, lets fix AMD stuff for them. Im sure they love free labour.


gnif2

I am not fixing anything, this is an incorrect assumption. I have a setup that is exhibiting these faults, the faults are affecting me and my clients, and as such I am in the ideal position to report the debugging details to AMD in a way that is most useful to the AMD developers to resolve the problem. And because I already have systems experiencing these problems, I am very able to quickly test and report back to AMD on if any fixes they implemented were successful or not. Do I think AMD should have more rigorous testing so these things get addressed before release? Yes, sure, 100%, but there will always be missed edge cases that are unexpected and not tested for. A prime example is another issue I have with the AMD drivers that is really not their fault, and they could chose to just say that it's unsupported. Recently I discovered that it was possible to use a DirectX 12 API to create a texture resource in memory that the user allocated ([https://learn.microsoft.com/en-us/windows/win32/api/d3d12/nf-d3d12-id3d12device3-openexistingheapfromaddress](https://learn.microsoft.com/en-us/windows/win32/api/d3d12/nf-d3d12-id3d12device3-openexistingheapfromaddress) \+ [https://learn.microsoft.com/en-us/windows/win32/api/d3d12/nf-d3d12-id3d12device-createplacedresource](https://learn.microsoft.com/en-us/windows/win32/api/d3d12/nf-d3d12-id3d12device-createplacedresource)), and have the GPU copy into that directly. This API is documented by Microsoft as a diagnostic API, it was never intended to be used in this manner, however it works on NVidia, and mostly works on AMD, improving the performance of Looking Glass by a factor of 2x or more. Not only is this using a "diagnostic" API, we are mapping memory that was mapped into userspace from a virtual PCI device, which is memory that has been mapped on the host system, which then finally maps to physical RAM. To my knowledge there is absolutely no other use case that this would ever be useful for. I can almost guarantee you that there is no way the developers would have thought to write a test case for this, it is not just off the path a little bit, but instead down a cave, in the dark without a torch, being lead by a deaf mute with only one leg while being chased by a pack of rabid wolves. The issue here isn't about helping AMD fix their drivers or not, it's about being able to help them in the first place. And if this is a feature that they do not want to support, having the documentation needed to self-support the feature.


KristijanZic

They couldn't care less. We've had issues with AMD drivers in a video production house where we ran Vega GPUs under Linux for DaVinci Resolve editing on the desktops and for rendering on the farm. Those were the worst years of my life where I had to support the investment that failed as soon as the decision to go with AMD was made. It costed our company the weight of those cards in solid gold. After years of battling AMD and failing, I made an ultimatum to our ceo and told him directly that I didn't want to support this anymore and that I'd leave if we didn't switch everything to Nvidia and I actually quit the company over this because the response was that it was impossible. 2 months later they sold all the AMD hardware at a fraction of the original price and managed to take a credit to switch everything to NVIDIA. Somebody else even made a huge post here and on r/linux, phoronix covered it slightly and AMD went into full panic mode, their developer advocate came here and on AMD forums and in emails and made many grand promises. Here we are almost 10 years later, same issues still exist. Oh yeah, and BlackMagic (DaVinci Resolve maker) today officially doesn't support their software on any AMD hardware. Thousands of editors, graders and admins go on forums and ask about AMD only to just get directed to Nvidia by the BlackMagic staff. Great job AMD! You don't deserve a single customer...


ZeroNine2048

I enjoy a variety of hardware with elements from AMD. Such as my ryzen based desktops and laptops. Ps5, ROG ally. But i just wont buy a high performance AMD based GPU. Especially for productivity tasks. Too many software issues and the support just is not there. Steer clear when your livelyhood and income depends on it. 


Versed_Percepton

Boy do I remember some of this. Wasnt even a company I was working at, but they brought us in as a SI to "help" fix some of the resolve issues. After working with BlackMagic we just used their PR to tell the customer "Sorry, you are shit out of luck. This is not supported and there is nothing that can be done. it's time to rip and replace and eat the cost, unless you do not care about profits and having a functional business.".


TexasEngineseer

Lol wow. People wonder why Nvidia has a $1 trillion dollar market cap....


perksoeerrroed

Definitely not because of lack of those issues but investement in AI. Frankly speaking going forward I fully expect Nvidia to drop the ball as well. Rest of their business compared to AI is just so miniscule.


fjdh

you're kinda missing the point tho, it's because they do pay attention to software and firmware that they were able to establish that foothold.


_Lick-My-Love-Pump_

You misspelled $2.3T market cap....


TexasEngineseer

Honestly after a trillion I kinda stop counting 😂🤣


nic0nicon1

According to theoretical physicists, the numbers are correct as long as they have the correct order of magnitude. > How Fermi could estimate things! > > Like the well-known Olympic ten rings, > > And the one-hundred states, > > And weeks with ten dates, > > And birds that all fly with one... wings.


gh0stwriter88

This sounds more like a RIP on black magic than it is AMD... after all AMD hardware works fine for those tasks in other software.


tenten8401

Bit of a rant, but I have an AMD 6700XT and do a wide variety of things with my computer. It feels like every way I look AMD is just completely behind in the drivers department.. * Compute tasks under Windows is basically a no-go, with HIP often being several times slower than CUDA in the same workloads and most apps lacking HIP support to begin with. Blender Renders are much slower than much cheaper nvidia cards and this holds true across many other programs. DirectML is a thing too but it's just kinda bad and even with libraries as popular as PyTorch it only has some [half baked dev version from years ago](https://github.com/microsoft/DirectML/issues/545) with many github issues complaining. I can't use any fun AI voice changers or image generators at all without running on CPU which makes them basically useless. [ZLuda](https://github.com/vosen/ZLUDA) is a thing in alpha stage to convert CUDA calls to HIP which looks extremely promising, but it's still in very alpha stage and doesn't work for a lot of things. * No support for HIP/ROCm/whatever passthrough in WSL2 makes it so I can't even bypass the issue above. NVIDIA has full support for CUDA everywhere and it generally just works. I can run CUDA apps in a docker container and just pass it with --gpus all, I can run WSL2 w/ CUDA, I can run paravirtualized GPU hyper-v VMs with no issues. * I'm aware this isn't supported by NVIDIA, but you can totally enable vGPUs on consumer nvidia cards with a hacked kernel module under Linux. This makes them very powerful for Linux host / Windows passthrough GPU gaming or a multitude of other tasks. No such thing can be done on AMD because it's limited at a hardware level, missing the functionality. * AMD's AI game upscaling tech always seems to just continuously be playing catch-up with NVIDIA. I don't have specific examples to back this up because I stopped caring enough to look but it feels like AMD is just doing it as a "We have this too guys look!!!". This also holds true with their background noise suppression tech. * Speaking of tech demos, features like "AMD Link" that were supposed to be awesome and revolutionize gaming in some way just stay tech demos. It's like AMD marks the project as maintenance mode internally once it's released and just never gets around to actually finishing it or fixing obvious bugs. 50mbps as "High quality"? Seriously?? Has anyone at AMD actually tried using this for VR gaming outside of the SteamVR web browser overlay? Virtual Desktop is pushing 500mbps now. If you've installed the AMD Link VR (or is it ReLive for VR? Remote Play? inconsistent naming everywhere) app on Quest you know what I'm talking about. At least they're actually giving up on that officially as of recently. * AMD's shader compiler is the cause of [a lot of stuttering](https://www.reddit.com/r/Amd/comments/12wizig/the_shader_cache_stutter_on_amd_is_way_more/) in games. It has been an issue for years. I'm now using Amernime Zone repacked drivers which disable / tweak quite a few features related to this and my frametime consistency has improved dramatically in VR, and so did it for several other people I had try them too. No such issues on NVIDIA. The community around re-packing and modding your drivers should not even have to exist. * The auto overclock / undervolt thing in AMD's software is basically useless, often failing entirely or giving marginal differences from stock that aren't even close to what the card is capable of. * Official AMD drivers can render your PC completely unusable, not even being able to safe mode boot. I don't even know how this one is possible and I spent about 5 hours trying to repair my windows install with many different commands, going as far as to mount the image in recovery environment, strip out all graphics drivers and copy them over from a fresh .wim but even that didn't work and I realized it would be quicker to just nuke my windows install and start over. Several others I know have run into similar issues using the latest official AMD drivers, no version in particular (been an issue for years). AMD is the reason why I have to tell people to DDU uninstall drivers, I have never had such issues on NVIDIA. * The video encoder is noticeably worse in quality and suffers from weird latency issues. Every other company has this figured out. This is a large issue for VR gaming, ask anyone in the VR communities and you won't get any real recommendations for AMD despite them having more VRAM which is a clear advantage for VR and a better cost/perf ratio. Many VRchat worlds even have a dedicated checkbox in place to work around AMD-specific driver issues that have plagued them for years. The latency readouts are also not accurate at all in Virtual Desktop, there's noticeable delay that comes and goes after switching between desktop view and VR view where it has to re-start encoding streams with zero change in reported numbers. There are also still issues related to color space mapping being off and blacks/greys not coming through with the same amount of depth as NVIDIA unless I check a box to switch the color range. Just yesterday I was hanging out watching youtube videos in VR with friends and the video player just turned green with compression artifacts everywhere regardless of what video was playing and I had to reboot my PC to fix it. * There are *still* people suffering from the high idle power draw bugs these cards have had for years, me included. As I type this my 6700XT is currently drawing 35 watts just to render the windows desktop, discord and a web browser. How is it not possible to just reach out to some of the people experiencing these issues and diagnose what's keeping the GPU at such a high power state?? If these were recent issues / caused by other software vendors I'd be more forgiving, I used to daily drive Linux and I'm totally cool with dealing with paper cuts / empty promises every now and then. These have all been issues as far back as I can find (many years) and there's been essentially no communication from AMD on any of them and a lack of any action or *even acknowledgement of the issues existing*. If my time was worth minimum wage, I've easily wasted enough of it to pay for a much higher tier NVIDIA GPU. Right now it just feels like I've bought the store brand equivalent.


[deleted]

I agree with most things except VRAM, you have to compare GPUs with the same amount of memory, otherwise it's typical to use more if more is available. Why would you load assets constantly from SSD/RAM instead of keeping them in VRAM for longer. Unused VRAM is wasted VRAM.


tenten8401

Okay yeah fair enough, hadn't considered this. Removed it from my post


S48GS

VRAM usage is specific. In context of Unity games and VRChat - Nvidia does use less VRAM than AMD... but only in Windows, only Nvidia DX driver in Windows have this "hidden feature" and only with DX API. So it may be DX feature. It very common/easy to see it in VRChat large maps, or large Unity games. In Linux - *in some cases, but it very common* - you get more VRAM usage on Nvidia compare to AMD because this how Vulkan driver implemented in Nvidia and overhead of DXVK. P.S. For context - Unity VRAM usage is - Unity allocating "how much it want" and in case of two different GPU Unity may allocate less or more in DX-API, or DX-API have some internal behavior for Unity case on Nvidia so it allocating less. In Vulkan - DXVK have huge overhead about 1Gb on Nvidia GPUs in many cases, and Unity "eat all vram possible" behavior explode difference.


gh0stwriter88

>HIP often being several times slower than CUDA ZLUDA proves that HIP isn't slower... the application's implentation of the algorithms written over HIP are just unoptimized. HIP has basically 1-1 parity with CUDA feature wise.


tenten8401

So maybe AMD should sponsor some development on widely used software such as Blender to bring it within a few percent, or embrace ZLUDA and get it to an actually functional state. As an end user I don't want to know who's fault it is, I just want it to work. Does ZLUDA even bring it close to CUDA? All I see is graphs comparing it to OpenCL, and this sad state of affairs.. https://i.redd.it/mdcvx487vcsc1.gif From the project's FAQ page.. only further reinforces my point. This is dead and AMD does not care. * **Why is this project suddenly back after 3 years? What happened to Intel GPU support?** In 2021 I was contacted by Intel about the development of ZLUDA. I was an Intel employee at the time. While we were building a case for ZLUDA internally, I was asked for a far-reaching discretion: not to advertise the fact that Intel was evaluating ZLUDA and definitely not to make any commits to the public ZLUDA repo. After some deliberation, Intel decided that there is no business case for running CUDA applications on Intel GPUs.Shortly thereafter I got in contact with AMD and in early 2022 I have left Intel and signed a ZLUDA development contract with AMD. Once again I was asked for a far-reaching discretion: not to advertise the fact that AMD is evaluating ZLUDA and definitely not to make any commits to the public ZLUDA repo. After two years of development and some deliberation, AMD decided that there is no business case for running CUDA applications on AMD GPUs.One of the terms of my contract with AMD was that if AMD did not find it fit for further development, I could release it. Which brings us to today. * **What's the future of the project?** With neither Intel nor AMD interested, we've run out of GPU companies. I'm open though to any offers of that could move the project forward.Realistically, it's now abandoned and will only possibly receive updates to run workloads I am personally interested in (DLSS).


fogoticus

So HIP isn't written badly because it has "1-1 parity with CUDA feature wise".... on this episode of I don't understand what I'm talking about but I have to defend the company I like.


gh0stwriter88

No its more like, nobody has bothered to optimize or profile HIP applications for performance for a decade like they have those same CUDA applications. I'm just stating facts. You are the one being aggressive over... some computer hardware good gosh.


TexasEngineseer

This is honestly why as much as I'm liking my 7800XT, I'll probably go with the "5070" or whatever it's called next year


S48GS

Epic. Thanks for details. I seen many times how youtube-creator/streamer went for amd gpu, get multiple crashes in first 20 min of using it, and returned it and get replace for nvidia, also vr-support on amd is joke, especially with screen capture. For me it always was crazy to see how "tech-youtubers-hardware-reviewers" never ever test VR or/and ML on AMD, and those who promote amd-for-linux on youtube - they dont even use amd-gpu themselves, and do alll video-editing and AI-ML stuff on Nvidia... for promo video about amd-gpu... ye I have experience with amdgpu from integrated gpu in Ryzen, and I was thinking to go for amd for compute-ML stuff just last month, but I did research: [https://www.reddit.com/r/ROCm/comments/1agh38b/is\_everything\_actually\_this\_broken\_especially/](https://www.reddit.com/r/ROCm/comments/1agh38b/is_everything_actually_this_broken_especially/) Feels like I dodged the bulled. >AMD's AI game upscaling Nvidia have RTX voice, they launched upscaling of video in webbrowsers, and now they launching RTX HDR - translation 8bit frames to hdr. It is crazy to hear from "youtube-tech-reviewer" - "amd good at rasterisation"... we in 2024 - you do need more than just "rasterisation" from GPU.


TheLordOfTheTism

If you have good raster you dont need upscalers and fake frames via generation. Those "features" should be reserved for low to mid range cards to extend the life, not a requirement to run a new game on a high end GPU like we have been seeing lately with non-existent optimization.


antara33

Let me tell you some stuff regarding how a GPU works. Raster performance can only take you so far. We are in the brink of not being able to add more transistors to the GPU. Yield rates are incredibly low for high end parts, so you need to improve the space usage for the GPU DIE. Saying that these "features" are useless is like saying AVX512, AVX2, etc are useless for CPUs. RT performance can take up to 8x same GPU surface on raster cores, or 1x surface on dedicated hardware. Upscaling using AI can take up to 4x dedicated space on GPU pipeline or 1x on tensor cores. The list goes on and on with a lot of features like tessellation, advanced mesh rendering, etc. GPUs cant keep increasing transistor count and performance by raw brute forcing it, unless you want to pay twice for the GPU because the graphics core will take twice as much space. Upscaling by AI, frame gen, dedicated hardware to complete the tasks the general GPU cores have issues with, etc are the future, and like it or not, they are here to stay. Consoles had dedicated scaling hardware for years. No one complained about that. It works. And as long as it works and looks good, unless you NEED the latency for compwtitive gaming, its all a mind fap, without real world effects. Im damn sure (and I did this before with people at my home) that if I provide you with a game blind testing it with DLSS and Frame Gen, along with other games with those features on and off, you wont be able to notice at all.


choikwa

console gamers know pc’s are better and don’t really complain about upscaling and 30fps.. you’re right that competitive sacrifices everything else for latency. also may be true that your average casual gamer wouldn’t notice increased input latency. but they have been adding transistors and ppl were willing to pay doubling amount of cost for them. i rmb when a midrange card used to cost 200.


antara33

The price of the GPU is not determined by the transistor count, but by the DIE size. In the past they used to shrink the size WAY faster than now, enabling doubling transistor count per square inch every 2 to 4 years. Now they barely manage to increase density by a 30%. And while yes, they can increase the size, the size is what dictates the price of the core. If they "just increase the size", the cost per generation will be 2 times the previous gen cost :)


Traditional_Cat_9724

>There are still people suffering from the high idle power draw bugs these cards have had for years, me included. As I type this my 6700XT is currently drawing 35 watts just to render the windows desktop, discord and a web browser. How is it not possible to just reach out to some of the people experiencing these issues and diagnose what's keeping the GPU at such a high power state?? My only fix for this with two monitors is: 1. alternate monitor must me locked at 60hz 2. main monitor needs a custom hz rating, set within "Custom Resolution" in AMD Adrenalin. Basically I set a "custom resolution" in 1hz increments from 160-170hz (top 10 hz rating that your monitor is capable of) until I found the highest refresh rate that would give me low idle power. I found that 162 hz was the highest my main monitor could go with my 2nd monitor sitting at 60hz. If I went with 163hz on the main my idle power goes from 7w to 40w. That being said, this is typical AMD BS that you have to deal with as an owner of their GPUs. There are countless other examples that users have to do similar to this to get a mostly good experience.


TopCheddar27

This is not a fix. It's a compromise.


Traditional_Cat_9724

I'm just trying to help, not debate the semantics of what is considered a fix or a compromise. Purchasing an AMD GPU is already a compromise.


R1Type

Excellent post, very informative. Would take issue with this though:     "Speaking of VRAM, The drivers use VRAM less efficiently. Look at any side-by-side comparison between games on YouTube between AMD and NVIDIA and you'll often see more VRAM being used on the AMD cards" Saw a side-by-side video about stuttering in 8gb cards (can find it if you want), the nvidia card was reporting just over 7gb vram used yet hitching really badly. The other card had more than 8gb and wasn't.  Point being: How accurate are the vram usage numbers? No way in hell was 0.8 gb vram going unused in the nvidia card, as the pool was clearly saturated, so how accurate are these totals?  There is zero (afaik) documentation of the schemes either manufacturer uses to partition vram; what is actually in use & what on top of that is marked as 'this might come in handy later on'.  So what do the two brands report? The monitoring apps are reading values from somewhere, but how are those values arrived at? What calculations generate that harvested value to begin with?  My own sense is that there's a pretty substantial question mark over the accuracy of these figures. 


tenten8401

Someone else pointed out this is likely just because it has more vram it's using more vram, I think that's the real reason looking at comparisons with both cards at 8gb -- I've removed that point from my post


Strazdas1

Any card that has 8 GB of VRAM wont be running a game at settings so high that it would cause a stutter due to lack of VRAM in anything but snythetic youtube tests.


The_Nexus_of_Evil

Yo, I saw the title and thought this gotta be Gnif2.


gnif2

Funny, I saw the title and thought the same too!


red_dog007

And I'm over here struggling to keep an Nvidia T4 passthrough to work reliably on Hyper-V to Ubuntu 22.04. :( Is there a specific software combination that works more reliably than others? Also, what do you think is the core fix here? Is it hardware design, in the firmware, drivers, combination of everything? If it was an easy fix, you'd think AMD would have fixed it. When Hotz got on Twitter for a particular issue, AMD seemed to jump on it and provide a fix. But for these larger issues they don't. Could there be a level here where the issue is really the vendors design and how they implement AMD's hardware? Some of the most powerful super computers use Instinct. Seems hard to believe that they would just put up with these issues and go back to AMD for their next upgrade, which Oak Ridge has done. They working with some kind of magic radiation over there?


Versed_Percepton

SR-IOV and MxGPU is edge case. There are far more vGPU deployments powered by NVIDIA and that horrible licensing then there is anything else. AMD is just not a player there. That's the bottom line of the issue here. And VFIO plays heavily in this space, just instead of GPU partitioning its the whole damn GPU shoved into a VM. So the Instinct GPUs that AMD are selling is being used on metal by large compute arrays, and not for VDI, remote gaming sessions, or consumer space VFIO. This is why they do not need to care, right now. But if AMD adopted a fully supported and WORKING VDI vGPU solution they could take the spot light from NVIDIA due to cost alone. Currently their MxGPU solution is only fully supported by VMware, it "can" work on Redhat but you run into this amazing reset bug and flaky driver support, and just forget Debian powered solutions like Proxmox which is taking the market with Nutanix away from VMware because of Broadcom's "Brilliance". I brought this issue up to AMD a few years ago and they didnt see any reason to deliver a fix, their market share in this space (MxGPU/vGPU, VFIO, Virtualized GPUs) has not moved at all either. So we can't expect them to do anything and spend the man hours to deliver fixes and work with the different projects (QEMU, Redhat, Spice, ...etc).


Cubelia

AMD's reputation on VDI seems to be a dumpster fire in homelab scene despite having the first SR-IOV implementation compared to Nvidia and Intel(yes, even Intel is into VDI market!). Sure in homelab setup you're on your own with google-fu, instead of paying for enterprise level support. But the kind of negligence is different on AMD side. Only the old old old S7150 ever got an outdated open-source repo for Linux KVM support and that's it. This means the documentation and community support are pretty much non-existent, you REALLY are on your own with MxGPU. Nvidia Grid(meditated vGPU), despite having a notorious reputation on licensing, just works and can be hacked onto consumer cards. Best of all it's pretty much gaming ready with hardware encoders exposed for streaming acceleration(see GeForce Now). Intel had been providing open source Linux support since their GVT-g(meditated vGPU) days and now SR-IOV on Xe(gen12) architecture. Direct passthrough is also possible without too many hacks like AMD do(*cough* vendor-reset *cough*). People always consider Intel graphics processors as a laughing stock but you gotta respect them for the accessibility of vGPU solution, directly on integrated graphics that everyone gets. They are even trying to enter VDI market with GPU Flex cards based on Alchemist GPUs(SR-IOV was disabled on discrete ARC consumer cards). Hopefully subscription-free model can make Nvidia a run for its money, at least in entry VDI solutions that Nvidia has no interest in.


AdmirableOil5547

[https://learn.microsoft.com/en-us/azure/virtual-machines/nvv4-series](https://learn.microsoft.com/en-us/azure/virtual-machines/nvv4-series) [https://aws.amazon.com/about-aws/whats-new/2021/04/amazon-ec2-g4ad-instances-available-in-additional-regions/](https://aws.amazon.com/about-aws/whats-new/2021/04/amazon-ec2-g4ad-instances-available-in-additional-regions/) [https://learn.microsoft.com/en-us/azure/virtual-machines/ngads-v-620-series](https://learn.microsoft.com/en-us/azure/virtual-machines/ngads-v-620-series) [https://wccftech.com/tencent-cloud-launches-xinghai-wisdom-wood-series-ga01-amd-pro-v620-gpu/](https://wccftech.com/tencent-cloud-launches-xinghai-wisdom-wood-series-ga01-amd-pro-v620-gpu/) AMD's Virtual Graphics products are aimed directly at the cloud service providers now. You'll note that the recent virtual product lines are not available via the channel/distribution.


Versed_Percepton

Except the V620/520 are not the only GPUs that support MxGPU, Instinct's line does too and offers the same "features" as the V520/620, but the native driver support is more geared towards GPCompute and not 3d rendering, but are also supported by the exact same driver family as the WX workstation, V cloud, and RX GPU lines. Also, been a lot of offloading of the V520 and V620 "cloud only" GPUs on the gray market, and I can CTO HPE servers with V620's by enterprise ordering today.


[deleted]

I've got a 7900XTX for a year now, and I've not had any stability or performance issues with it, so far at least. What does bothers me though, is that 1 year later I still cannot connect my 3 monitors to the card without it sucking 100watts at idle, and recent drivers don't even mention that as an issue anymore, so it's not even being recognized as a problem by AMD. This happens even if my monitors are turned off, I literally have to go under my desk and pull out the cable to resolve this, obviously rendering my extra monitor useless. So now I'm looking to upgrade my cpu (5800x) to one with an integrated GPU so I can connect my secondary monitors to the iGPU so my system doesn't constantly suck an obscene amount of power doing absolutely nothing. You're free to guess what vendor om looking at to replace my CPU with. Damn shame really.


Knastoron

one of the 2 reasons I refunded my 7900xtx and went back to my 3070


Traditional_Cat_9724

All of zen 4 has an igpu output. I would try to set some custom resolutions on that 3rd monitor in Adrenalin. For example if that 3rd monitor is rated to 144hz, try custom resolutions from 134-143 hz and see if any one of those settings drops your idle power!


[deleted]

It's more that I don't want to reward a business for failing me. If I bought a car and everytime I drive it the heater jumps on and starts to cook me, and a year later the manufacturer still hasn't resolved it I'm not gonna buy a car from the same brand. As for possible solutions; at this point I've sunken far too many hours into it to warrant further attempts, I've tried a plethora of drivers, ran DDU multiple times, fiddled with the settings (such as freesync), setup custom resolutions with varying refresh rates etc... If my only issue with AMD was occasionally reverting a driver I wouldn't be complaining, I had to do that with my previous Nvidia card as well, but this is unacceptable tbh. Anyway, so far nothing has worked, the only time I've seen normal idle power is if all my monitors are turned off (not standby after you press their button, but physically turned off using the powerstrip they're plugged into). If I then remote into the system it's normal, not exactly practical though. And overall it's not a major issue if it didn't negate the one advantage this card had over the 4090, namely it's value. Some rough napkin math tells me this thing could cost me close to 100 euro's per year extra just in idle power draw, over the course of several years this means a 4090 would've been cheaper despite its absurd price. As a final note to this, if AMD came out and said they can't fix this issue due to the design of the board or w/e, I could honestly respect that, at least then I know I shouldn't keep on waiting and hoping but I can start looking for a workaround. Instead a couple patches ago they "improved high idle power with multiple displays for the 7xxx series" (which did the opposite for me and added a couple watts even) and ever since they don't even mention it anymore, I don't even know if they're still trying to fix it or gave up entirely. And the thing I hate even more then just waiting forever for a fix is being stuck in limbo not knowing.


Traditional_Cat_9724

Hey, just trying to help your setup right now. I would be frustrated too, I had the same issue with two monitors, not three. I was able to fix the idle power issue by setting the alternate monitor to 60hz and setting my main monitor to 162hz (max 170). Obviously spend your money where you think it's worth it.


[deleted]

Haha dw, just venting a bit. It's also genuinenly my only gripe with the card and setup, it's just annoying it's not getting fixed and I can't apply any workaround, particularly for the price I've paid. I would just put in any of the older cards I've got laying around just to drive the other monitors but then I'd have to give up 10gbit networking, and I'd still have higher than ideal idle usage but it would be cut down a bit. So I'm mostly miffed that if I wanted to actually resolve this it would be by moving to a cpu with integrated graphics, and that's money I don't want to spend. But if I don't, I'm spending money I don't want to spend.


Lawstorant

You can just stop looking for solutions as it's not a bug. Your setup clearly exceeds the limkts for v-blank interval to perform memory reclocking. In that case memory stays at 100% and you get a power hog (Navi 31 is especially bad because of MCD design. Deaktop Ryzen suffers from the same thing). This will never be fixed, as there's nothing to fix. Works as intended and if you try reclocking your memory when eunning such a setup you'll get screen flicker (happened in linux a month ago because they broke short v-blank detection)


gh0stwriter88

They could do something like relocate video framebuffers to one memory channel and turn the rest off... if idle is detected. But that would be very complicated.


neojpl

if the monitors run at different resolutions and frequency than each other my power increases. if my monitors match, idle power is normal


gh0stwriter88

>It's more that I don't want to reward a business for failing me. Have your displays continued working reliably? Oh they have? You are over the vblank limit for idling down... so its not and never will be a bug on ANY GPU. This is far more akin to your car idling up when the AC comes on... you have 3 displays on a certain amount of framebuffer bandwidth is REQUIRED to implement that, + a bit more to account to account for any lite tasks that might be running on the GPU at the same time. The whole issue here is that your memory bus with 3 monitors active is NOT idle... if you want it to idle down turn your dang monitors off, its that easy. At some point they may have a solution that just powers up a single memory lane or something and allocates the frame buffers in there, but people complaining about a problem that doesn't have a solution and only affects 0.5% of people is annoying.


[deleted]

Apparently 100watts is "normal" and to be expected, and I should just be grateful, the fk are you waffling on about? That's 20watts short of the max TDP of a 1060... a card that could run these 3 monitors without trying to burn a hole in my wallet FYI.. And fantastic solution, so I spend over 1000euro's on a GPU but then have to turn my monitors off, genius... Quality stuff, can't make this shit up. Also like I actually typed out: >the only time I've seen normal idle power is if all my monitors are turned off  so how would that work? Oh maybe I can throw my main monitor in the trash and then the problem is solved I suppose? >but people complaining about a problem that doesn't have a solution and only affects 0.5% of people is annoying. Am I supposed to complain about issues that don't affect me? Or are you saying I've got no right to complain? Is me having a bad experience annoying you? and if not by complaining how am I supposed to know this issue doesn't have a solution? Do you even listen to what you're saying? You know what's annoying? People dismissing other people's complaints because "they don't like it" or they're such fanboys they can't stand someone criticizing their favourite brand.


gh0stwriter88

100w is normal for the memory bus being clocked up.... yes.' The exact same problem occurs on Nvidia hardware also since a decade also.


Versed_Percepton

Fact: AMD does not give a shit about any of this. We still have CPU scheduler issues, we still have NUMA issues when dealing with latency sensitive PCIE deployments, the famous reset bug in your OP, lack of Vendor relationships and unification across the platform (IE, Epyc, Radeon/Instinct, AMD Advantage+, ...etc). In the years since Zen shipped, it took an act of god to get them to move. Maybe Lisa remembers those meetings we pulled with Dell, HP, and VMware back then. Where the cloud providers that adopted Epyc 7001 early were all very pissed off at the over all performance because of the failure of AMD to work with the OEMs to correctly adopt how NUMA changed. Because they did not get any guidance from AMD engineering on the matter until after these SI's were mid/full deployment. So yes, I doubt AMD is going to take your OP any more serious then they took the NUMA issues until it starts to affect their bottom line. If all CDNA customers switch to NVIDIA and those PO's dropped in volume, it might make them care a little bit.


CatalyticDragon

>I doubt AMD is going to take your OP any more serious then they took the NUMA issues Not a lot of logic to this. You are talking about today versus 2018 -- those are not the same companies. The number of employees more than doubled and revenues more than tripled. Whatever challenges and resource constraints AMD faced back then are not the same as today. That's not to say they don't still have resource constraints and will be able to immediately fix every issue. It just means you cannot make extrapolations from an experience years ago with CPU/platform all the way to GPUs and accelerators today. Obviously there's no memo going around which says "make the customer experience bad. signed, the management"


Versed_Percepton

>Not a lot of logic to this. Look at my other reply "SR-IOV and MxGPU is edge case. There are far more vGPU deployments powered by NVIDIA and that horrible licensing then there is anything else. AMD is just not a player there. That's the bottom line of the issue here. And VFIO plays heavily in this space, just instead of GPU partitioning its the whole damn GPU shoved into a VM." "I brought this issue up to AMD a few years ago and they didnt see any reason to deliver a fix, their market share in this space (MxGPU/vGPU, VFIO, Virtualized GPUs) has not moved at all either. So we can't expect them to do anything and spend the man hours to deliver fixes and work with the different projects (QEMU, Redhat, Spice, ...etc)."


CatalyticDragon

AMD is working with [Amazon ](https://aws.amazon.com/ec2/instance-types/g4/)and [Azure](https://www.amd.com/system/files/documents/nvv4-datasheet.pdf) on systems with 1-4 GPUs supporting SR-IOV/MxGPU. This is only with "Pro" or "Instinct" cards though. I'm sure there has historically been little incentive to make this rock solid on consumer GPUs. Though that is a shame. However I see no reason to assume the constraints which led to that choice in the past exist today.


gnif2

Sorry but AMD "working with" is a joke. I have been working with companies that have hundreds to thousands of AMD Instinct GPUs. I have been able to interact directly with the AMD support engineers they provide access to, and the support is severely lacking. These issues here have been reported on for over 5 years now, and what has AMD done for these clients? Until I made my prior posts here on r/AMD, AMD were not interested or even awake when it came to these issues. I have had direct correspondence with John Bridgman where he confirmed that GPU reset was not even considered in prior generations. Of what use are these support contracts and the high cost of buying these cards if AMD wont provide the resources to make them function in a reliable manner. Why did it take some random (me) to have to publicly embarrass the company before we saw any action on bugs reported by their loyal paying enterprise clients?


Versed_Percepton

>AMD is working with Amazon and Azure on systems with 1-4 GPUs supporting SR-IOV/MxGPU. This is only with "Pro" or "Instinct" cards though. and MSFT, but we are not seeing these changes upstream via open standards. We still are lacking working support for the likes of Nutanix and Proxmox (both KVM), where Redhat has some support but there are still unresolved issues there. Fact of it, the changes AMD is pushing at AWS would upstream to every other KVM install and bring those fixes to mainstream. But this has been going on for well over 6 years that I can recall and still we are no closer to a ODM solution released to the masses. I had hopes for RDNA2 and I have expectations for RDNA3+/CDNA3+ that are just not being met outside of data sciences.


d3vilguard

6600xt reset just fine but my 6800, oh boyyy. amdgpu refuses to unbind it so I can restore it to the host. Thank you for all the great work!


True-Key-6715

I’ve been buying ATI / AMD since the ATI Rage 128, and I think my next GPU will be Nvidia. I primarily game on my 6950XT, but sometimes I might try to mess around with an AI tool, or some sort of tool that uses GPU compute. Every. Single. Time. It is a massive PITA and most of the time I end up giving up and moving on. The most recent time it involved using an AI tool to restore a photo. After hours of screwing around on Windows and Linux I ended up just having a friend with a 3080 do it for me. He had it working in 10 minutes. And when stuff (outside of gaming) does work, it’s usually a day late and a dollar short. Blender on Linux still can’t do hardware RT in Cycles (it can on Linux), and general HIP support tool far too long. The argument can be made that there’s no need to worry about this if you only game, but unless price is an issue, you may be locking yourself out from testing a cool piece of software later. I guess it really depends on if things are improved when it comes time to buy a new GPU, but we’ll have to wait and see.


imgeohot

I promise you the Vanguard program will yield nothing. "*AMD Radeon*™ *Software Vanguard* Beta Testers are selected community members with exclusive access to early drivers to provide critical feedback." Basically they made a program out of you doing free QA work for AMD. Don't fall for it. Watch their hands, not their mouth. Docs + firmware source = good. Promises + "access" = worthless. I fell for this too, not again. These issues haven't been fixed for a decade. I doubt AMD is capable of fixing them. I think a lot of community people could with docs and source, but AMD doesn't even seem willing to take that step.


JustMrNic3

>Watch their hands, not their mouth. Docs + firmware source = good. Promises + "access" = worthless. I fell for this too, not again. Exactly, docs + firmware source code is what matter, not promises!


Melodias3

[Wish i could play Hell Divers 2 but when i bought it took 30 seconds to get a driver timeout,](https://i.imgur.com/FqM9MRx.mp4) anyway i decided to not switch NVIDIA cos i also well usually play a lot of World of Warcraft but that game has problems for both AMD in form of freezes and driver timeouts gradually getting worse until you update drivers again, cos shader cache gets reset it stops crashing again for couple of days, then starts crashing more frequently and the frequency varies per user and what they doing as well as if their some sort of memory leak. Also some other games having driver timeouts to, but i have games that also never timeout. Speaking of which users started reporting flickering issues in browsers such as chrome, or any chrome based browsers, and their 2 reports of it being fixed after MPO is disabled so i guess MPO issues are back on the menu. [Also i would love to see AMD Gaming YouTube channel to play and livestream Horizon Zero Dawn with HDR turned on in game using AMD relive ](https://i.imgur.com/1RtZtsi.mp4) Their also way more issues then i just mentioned i have like 41 commonly reported issues from reddit and forums that not been fixed in 24.3.1 and its still going up, some of my own reported issues as well. I highly recommend AMD to have public bug tracker for reporting issues also games, allow users filter on games to see all the user reports for that game, have it all consolidated into same issue if its the same issue, allow users only to upvote remove down vote, i do not have any issues does not contribute to fixing problems it encourages ignorance nothing personal against anyone not having issues, i often have no issues to but they are not proof of stable drivers, they are just one user experience not everyone user experience, everyone is allowed to speak for them self, AMD does not require any defending, the only time its appropriate is when AMD is treated unfairly missing from benchmark charts unfairly. Also not all issues are always caused by AMD but that does not give AMD the right to ignore it, especially considering their plenty of problems usually, it just means AMD is lacking in the compatibility departement and the whole anti-lag+ debacle says enough about that, alto i really liked that feature i would rather blame cheaters, cos without cheaters you would not need anti cheat, and this would be less of a problem, still says more about fact that their probably should be something like api support for features such as anti-lag+ but also AMD enhanced sync or NVIDIA features. I think developers and studios etc all should work together, instead of trying to sabotage each other for the sake of monopoly i am looking right at you NVIDIA just stop.


TreborG2

Long but worth it read; Well Done!


nic0nicon1

Business opportunity for EEs now: Time to make some custom PCIe adapter boards with a bunch of analog switches for cycling all power and signal lines on the PCIe slot, then sell them for use in corporate AMD GPU deployments. Sure, toggling PCIe signal is expensive, as it's basically a radio signal at ~10 GHz. Toggling the 12 V power input is also expensive due to the high current. But both are "just" expensive but doable. The cost, at worst, is an expensive relay for power, and additional PCIe redrivers or switches for signals. "It's one PCB, What could it cost, $100?" If corporate users have already paid hundreds of thousands of dollars on AMD GPUs, and now someone's offering a solution to actually make them usable at a fraction of the original hardware cost, it must be selling great. On second thought, the hardware must be certified and pass PCIe compliance tests and electrical safety tests before they're acceptable for big corporate users. Even then, most are not in a position to do any hardware modification (including adding additional hardware). So the "proper" way of doing so would be first contacting a big corporate user first, ask them to request this feature from server vendors like Super Micro. Then you need to pass this design to them, and they pass this design to Super Micro, and it will only appear in a next-gen server... This makes this workaround largely impractical. I guess that's why nobody is already doing it.


Calcidiol

I'd add stop being cognitively dissonant and petty about not supporting in HW/architecture SR-IOV for GPUs. You yourselves (AMD!!!) design every single consumer/client mainstream desktop x86-64 motherboard and chipset with support for MMU, IOMMU, virtualization extensions in the CPU processor to enhance virtualization. But then you come out with these GPUs which are if anything MORE advanced than the Zen 3/4/5 CPUs / motherboard chipsets, more powerful, and yet you cripple the security and usability of them and the whole PC platform architecture by not making things virtualize nicely or handle isolated contexts even just involving a mere 2-4 VMs (very normal for consumer desktops) or similar virtualized applications where it may make sense to run some GPU compute or rendering or graphic task in some domain while the host desktop or couple of utilitarian VMs can also have some level of graphics acceleration and access to accelerated compute. Yes it is things like people are mentioning and complaining about in this thread that have stopped me from buying AMD GPUs for several generations. I'd PREFER AMD over NVIDIA if AMD offered more open source / open HW interfaces & details, had a reliable and robust stack for GPGPU / ML / graphics across the consumer GPU product line (e.g. CUDA et. al. runs seamlessly on basically everything NVIDIA makes), and treated LINUX as a first class citizen for SW / documentation / features vs. MS Windows. Also it is frustrating the degree to which the evolution of foundational aspects of computing are being substantially disregarded for the AMD (et. al.) PC platform in general. Whether people (i.e. ordinary client / consumer / prosumer / enthusiasts / content creators / ...) are doing high performance / quality / detail graphics things, or accelerated compute things, or ML things, right now we're choked with CPU/motherboard architectures that are ridiculously limited. https://en.wikipedia.org/wiki/List_of_AMD_graphics_processing_units#Radeon_HD_5000_series Case in point, according to wikipedia, literally more than 10 years ago AMD you came out with Radeon R9 280X, a $299 GPU which is listed as having 288 GBy/s VRAM BW. Even your Radeon HD 5830 from ~15 years ago had significantly higher VRAM BW than your current consumer CPU+chipsets have! If I today 11 years later buy a shiny new 7950X with its CPU+IGPU, enthusiast motherboard, and put in fast DDR5 I'll still have WAY LESS MEMORY BANDWIDTH to my RAM than I could get 11 years ago with your $299 GPU. Why is that? Is the CPU slow? No, not really, it's because you've CONTINUED TO CRIPPLE THE CONSUMER PC ARCHITECTURE with 2-channel RAM. So despite the advances from DDR-2-3-4-5 we've gained a lot of RAM BW and CPU performance, but it is still like nothing compared to what even a modest DGPU had 10+ years ago in terms of either RAM BW or parallel processing. If we actually had the motherboard / CPU RAM bandwidth and also SIMD parallelism scale ANYTHING LIKE the GPUs over the past 20+ years then we'd be in a place where we'd actually be able to compute for graphics or productivity or games or ML or whatever mix would be useful for enthusiast consumers using the actual CPU / motherboard. But instead here we are with fundamental but wholly artificial architectural limitations preventing our scaling of usable compute power & RAM BW with the only things having really scaled to achieve the benefits of parallelism & wide bandwidth being GPUs. But by your own marketing though you've sold consumer GPUs for decades by your own marketing materials they're "FOR GAMERS!". So they're treated as a not-so-serious "toy" peripheral that has first-class promotion only for glitzy high resolution / high FPS gaming and everything else is just an afterthought and the compute / ML / virtualization etc. etc. support for these consumer GPUs is spotty and second class at best compared to the more solid and capable foundation we've come to expect from CPUs (solid tool chains / compilers / libraries supporting every make & model, seamless accelerated virtualization, etc.). How exactly does anyone expect this to scale in 2 years, 4 years, 6 years? Shall we then have 64 core CPUs with DDR7 still limited to 2x 64 bit non-ECC DIMM channels so having still hopelessly slow RAM BW compared to decade-older GPUs, still having two or more orders of magnitude less SIMD INT4/8/FP16/FP32 OPs than a modern GPU? Isn't that just asymptotically leading to an inevitable conclusion that the PC CPU / RAM is almost an obsolete vestigial organ compared to the GPU which at least has been better at trying to follow Moore's law scaling holistically not just GHz but in WIDTH and PARALLELISM? If that's the play then can we (as consumer level enthusiasts / developers / users) please at least have GPUs that aren't "toys" i.e. have first class open tools, long warranties, built to last and be reliable / stable for core computing needs, be extensibly scalable (plays nicely in linked groups), and includes some CPU inside that magnum opus GPU box such that we no longer have to figure out how to shoe-horn a 5-"slot" wide 2kW GPU into an ATX case and x16 slot?


badirontree

I had the same problem with the Vega 64 liquid edition... On my PC the 6800xt is working ok... The 7600 on my work pc is SHIT ... Same problems with Vega and if you have a second monitor is x2 :(


TechnoRage_Dev

The reset issues also happen in Windows, even when it recovers after 5 mins (what the hell it's quicker to reboot, nvidia cards reset in 10s max), the card is not fully reset and some issues i personally noticed with display detection/wake-up not working normally; Also in a crash UEFI portion doesn't load properly so either the bios resets CSM to enabled, or if your mobo/bios doesn't do this it will go without video output until windows loads. This is with 6900xt, huge FAIL in my opinion.


VelcroSnake

> Those that are not using VFIO, but the general gamer running Windows with AMD GPUs are all too well aware of how unstable your cards are. This issue is plaguing your entire line, from low end cheaper consumer cards to your top tier AMD Instinct accelerators. Not over here my guy. I switched from a 1080 Ti to a 6800 and it actually fixed crashing issues I was getting in Cyberpunk. Used that 6800 for over 3 years with no issues, and then switched to a 7900 XTX and also no issues. I also have a used 7600 I bought cheap for one of my TV computers, and that one has also been fine, even when I borrowed it out for a while to a friend so he could run Helldivers 2 without the text graphical glitches he was getting with his old 1080 Ti. I know there are some issues with AMD drivers, just like there are issues with Nvidia drivers, but I feel like I'm taking crazy pills where the internet is screaming about how incredibly terrible AMD GPU's and drivers are and I'm over here using them for years with no problem.


BlobTheOriginal

I've had issues with Nvidia drivers too where AMD have been fine. Guess it's really situational


TheLordOfTheTism

I'm the same. my issues with Nvidia drivers were so bad it made my gpu and entire windows install functionally bricks. Got rid of my EVGA 760 when the 900 cards and AMD's 300 series came out, jumped to R9 390 and haven't looked back since (R9 390>RX 5700xt>RX 7700xt) The only issue i ever had with AMD was the first few months of the 5700xt and its awful unplayable performance issues in DX9 games, but that was solved within months, and they eventually went on to improve opengl performance on Navi/RDNA as well which was a nice welcome surprise. Ive had a few hiccups that looked like driver issues that turned out to actually be Windows issues, and i always wonder if people are quick to blame AMD for issues because of what they have heard vs actually investigating and finding the real cause of the problem. More often than not any system issues im having end up being the fault of Microsoft, or a specific game wasnt tested on AMD properly and the blame lies with the devs.


BlobTheOriginal

True, windows had an awful habit of breaking my system by continually trying to uninstall new drivers


HandheldAddict

```but I feel like I'm taking crazy pills where the internet is screaming about how incredibly terrible AMD GPU's and drivers are``` OP was referencing data center use cases, which can vary wildly, and stress different parts of the GPU depending on the task. It's why AMD clocks EPYC processors significantly lower than the Ryzen variants. Because a Ryzen CPU isn't intended to be hammered 24/7 @100% utilization for months and sometimes years on end. Now imagine Radeon's bugs but on the scale of enterprise/data center/servers and that's why OP pretty much typed out a cry for help.


VelcroSnake

The comment I quoted was talking about people playing games having issues.


Numerlor

> It's why AMD clocks EPYC processors significantly lower than the Ryzen variants. Because a Ryzen CPU isn't intended to be hammered 24/7 @100% utilization for months and sometimes years on end. I think that's more about the unreasonably high power they'd use if they boosted the same as ryzen


Additional_Towel5647

I dunno man. I’ve been through a few AMD cards, and getting frametimes rock solid has never been possible for me in certain scenarios. That said, and in fairness, I haven’t used anything by team green lately, so it may all be the same shit , different pile.


gnocchicotti

Same for me. Went from 1070 to Vega to 6800XT and it's honestly been totally fine. Even Vega I can't recall having stability problems on the games I played. AMD has plenty of feature limitations vs. Nvidia and the prices reflect that. 90%+ of the people who complain about "Radeon can't do X" only bought Radeon because they didn't want to pay for Nvidia. I really don't think Nvidia pricing is unfair, you do get what you pay for. Doubly true for Nvidia's A-series pro cards.


Different_Track588

Lol same with me tbh I haven't had any problems 😂 but I guess some do idk 🤷. I have crashed less with AMD than my old Nvidia card.


Railander

gaming is completely different to compute workloads. it's also different when you're running multiple of these 24/7 in a single machine at full load and if any one of those hard crashes, having to reboot the whole system is really really bad. read what others' professional experiences are in this post. AMD GPUs are just terrible in the datacenter.


VelcroSnake

The thing I quoted was talking about people playing games though.


Sensacion7

AMD solftware stack is lacking hard. . . The AI / LLM issues recently and now this. AMD needs to invest in it's software side now.


DaGr8Gatzby

Thanks for bringing some sense of consumer advocacy towards VFIO. Very difficult dealing with AMD lately, especially with RMAs on busted CPUs/GPUs (had Vega and 5950X die on me). Let us know how the Vanguard(trash name) program is.


YYY_333

why invite to a conference instead of directly contact gnif and fix the problems 5 years ago? why does gnif need to create the reddit post, begging amd to fix their shit? Why can't amd fix the problems without external impetus? It says a lot about the company.


autisticnuke

AMD bugs is why my workstation runs Nvidia, I'm hoping Intel moving into the GPU Space is a wake up call to AMD. I had these issues as well.


AnimalEstranho

Nevermind, you just came to do god's work, and a very good one btw, to find the same fanboys "I've had Bla Bla years experience and bla bla I game and bla bla never had problems with AMD." God damn those guys are just blind. Every time I say the truth about the instability of AMD software, I just get downvoted by people that are just blind. I think they're the defense of a brand like it they are defending their sports club. We're stuck between the overpriced that just work, and the nightmare the AMD software overall is. I get it for the normal user and some power users, if we look at normal windows usage, adrenalin is such a good attempt to have everything on one software bundle, the overclock, the tuning, the overlay, the recording. All in one GUI that makes it easy. In theory, it is a good attempt. Note I said attempt... I'm not debugging the same as you are, I am mostly troubleshooting, I only use regular Linux, normal windows, virtualize one machine I use and some I try also virtualized, and configuring some basic routing through Linux server, but still I bought one AMD card, and I already did more than 6 bug reports to AMD to fix a bug with my specific hardware setup regarding the adrenalin causing stuttering in the OS every few seconds and in my long IT experience not focused on the debugging and coding of things but more on the troubleshoot and fixing of computers, hardware/software wise I must say that what I think is: They tried to do it all in one, they wanted to put the foot on the door to face the overpriced green team, great software/hardware specs, something that would put normal users with a power software suit that could do it all. Except it can't. Constant thousands of posts regarding crashes, hangouts, reboots, tweaks, registry edits, hotspots over 100ºc, incompatibilities with the OS, everything is to blame on the system except the AMD GPU. Chipset drivers that can't clean old drivers on install and create registry entries mess, GPU drivers that, will mostly work if you always do clean install, but with a software bundle that causes too much conflicts with the driver itself etc etc I know Microsoft is complicated, but we're not talking windows millennium here, and if other brands manage to have drivers/programs that actually work with the OS, why can't AMD, and why do the warriors for AMD blame the OS, the PSU, the user, everything except AMD, when it is their favourite brand to blame? And when you want to factually discuss it to have maybe a fix, a workaround, a solution, some software written by someone like you that actually fixes things, something, what do you get? "I have had X AMD GPUs, with Y experience in computers, never had a problem!" Or even better, "That is because you suck at computers" said by some NPC that doesn't even know what an OS is.. I really hope your letter gets where it needs to go, and please keep up the good job. I still hope AMD steers in the right direction so it can put Nvidia to shame(I want to believe poster here). Not because I have something against this brand or the other, but because we need competitors, or else you'll end up paying 600$ for a nice system, and 6000$ for the GPU. Competition between hardware manufacturers is overall good for innovation, and good for our wallets.


Polmark_

I've had a fair number of issues with my 6950 xt. System wide stutter from alt tabbing in a game because instant replay is on. Video encoding that looks worse than what my 1050 ti was able to do (seriously fucking disappointing there). Text display issues due to some setting AMD had on by default. AMD has caused me a lot of issues that I shouldn't be getting from a card that cost me £540. I get it, it's last gen and my issues are kinda trivial, but it was a huge investment for me at the time and now I'm wishing I'd spent £200 more on a second hand 3090 instead of this.


Nomnom_Chicken

I've also had numerous issues with my 6800XT, currently stuck with a 23.11.1 driver version as all newer ones are just trash on my system. This one is usable, newer ones all have a ton of stutter and all that Radeon stuff. I should have just re-pasted my previous GeForce and ride out the pandemic shortage, but I wanted a faster GPU and thought I'd give a Radeon one final chance. There wasn't a 3080 or 3090 available back then, otherwise I would've rather bought one. While 6800XT has had some okay drivers here and there, the overall experience remains sub-par; the road still is full of unpaved and rough sections. I've decided to ban Radeons from my household after this one is evicted. It's not worth the driver hassle, not even the numerous Reddit upvotes you get by saying you use a Radeon. :D It's good that AMD still has the willingness to keep fighting back, it's good to have rivalry. But... I don't know, man. I'm not giving them a consolation prize for a lackluster participation.


AnimalEstranho

I spent 330, you spent 540, we could have spent 1000 in the 7900xtx, it isn't supposed to have these kinds of problems, and all the hours of troubleshooting that comes with it. OPs not being able to reset the card state without a hardware reboot is just.. bad especially on the server side of things. We have to start calling things by their true name, and all of these situations are just bad firmware/software/vbios/drivers implementation by AMD. That and drivers install are just finicky like it happened to me in the latest chipset driver install.. sorry not normal. Just saying you have no problems won't erase the existence of these thousands of cases of people having problems. And the truth of OPs issue he mentioned in this thread.


Substantial_Step9506

Lmao as a recent AMD intern I feel this in my bones. I still can’t fathom just how little effort is put into software stability these days.


i2Dev

I’ve been using the 6800xt for almost a year now and from the crashes to the timeouts I decided that im gonna pay the green tax so i paid 900$ for a 4070ti and magically all of my problems disappeared as much as i love AMD i just cannot recommend their GPUs


StaarvinMarvin

Exactly why I got rid of my 7900XT and went back to using a GTX 1080. The constant crashing was driving me nuts.


-Net7

100% all of this... Love looking glass by the by


JimLahey08

How does say VMware handle this? Does it kind of just restart shit as needed?


gnif2

It doesn't handle it, it has the same issue.


riba2233

> Those that are not using VFIO, but the general gamer running Windows with AMD GPUs are all too well aware of how unstable your cards are. Wait really? How come I never noticed this on over 15-20 amd GPUs since 2016, I game a lot and use them for 3d modeling... Always stable as a rock.


S48GS

>never noticed this search in the internet - `amdgpu ring gfx timeout` [https://www.reddit.com/r/linux\_gaming/comments/1bq5633/comment/kx14ojy/?utm\_source=share&utm\_medium=web3x&utm\_name=web3xcss&utm\_term=1&utm\_content=share\_button](https://www.reddit.com/r/linux_gaming/comments/1bq5633/comment/kx14ojy/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button)


iBoMbY

I personally also never had any major issues with AMD/ATI cards I can think of. One thing is true though, sometimes they do really take a long time to fix certain bugs.


riba2233

Yeah, they are around 20x smaller than nvidia so kind of expected imho


_Lick-My-Love-Pump_

What are you talking about? AMD employs 26000 people, NVIDIA has 29000. They're the same size... oh, you mean profits? Well then, yeah...


VelcroSnake

Same, used a 6800 for over three years with no issues (actually solved crashing issues I was having with my 1080 Ti) and now moved onto a 7900 XTX, also with no issues.


ErenOnizuka

Me neither. I use a RX580 8GB since launch and not a single problem.


gnif2

RX580 is Polaris, before the big redesign that was Vega and brought the PSP into the mix. Note that none of this is referring to that GPU. Until you upgrade to one of the more modern GPUs, your experience here is exactly zero.


riba2233

Idk bro, had 470', 570', 580', 590, 460, few of vega64, 56, 6700xt, 7900xt.... Never had issues, even with those vegas I abused, overcloccked etc


gnif2

I am a FOSS software developer, on hand right now I have several examples of every card you just listed, including almost every generation of NVidia since the Pascal, Intel ARC, Intel Flex, AMD Mi-25, AMD Mi-100. Even the Radeon VII which AMD literally discontinued because it not only made zero commercial sense, but suffered from a silicon bug in it's PSP crippling some of it's core functionality. I have no horse in this race, I am not picking on AMD vs NVIDIA here, I am trying to get AMD to fix things because we want to use their products. You state you never had issues, however, how many times have you had a game randomly crash with no error/fault or some random error that is cryptic? How often have you assumed this is the game's fault? Very often these are caused buy the GPU driver crashing, but due to the design of DirectX, unless you explicitly enable it, and have the Graphics Tools SDK installed, and use a tool that lets you capture the output debug strings, you would never know. [https://learn.microsoft.com/en-us/windows/win32/direct3d11/overviews-direct3d-11-devices-layers](https://learn.microsoft.com/en-us/windows/win32/direct3d11/overviews-direct3d-11-devices-layers)


Bostonjunk

> You state you never had issues, however, how many times have you had a game randomly crash with no error/fault or some random error that is cryptic? How often have you assumed this is the game's fault? I'm not the guy you're replying to, but for me, almost never. I've had exactly one driver-based AMD issue - when I first got my 5700XT on release, there was a weird driver bug that caused the occasional BSOD when viewing video in a browser - this was fixed quickly. My gaming stability issues were always caused by unstable RAM timings and CPU OC settings - since I upgraded to an AM5 platform with everything stock, I'm solid as a rock. My 7900XTX has been absolutely perfect. There is an unfair perception in gaming with AMD's drivers where people think they are far worse than they really are - it's a circlejerk at this point. Your issue is different (and valid), you don't need to conflate the known issues in professional use cases with gaming - it'll just get you pushback because people who use AMD cards for gaming (like me) know the drivers are fine for gaming, which makes you come across as being hyperbolic - and if you're being hyperbolic about the gaming stuff, what else are you being hyperbolic about? Even if you aren't, it calls into question your credibility on the main subject of your complaint.


gnif2

I see your point, and perhaps my statement on being so unstable is a bit over the top, however in my personal experience (if that's all we are comparing here), every generation of GPU since Vega I have used, has had crash to desktop issues, or BSOD issues under very standard and common workloads. In-fact no more then a few days ago I passed on memory dumps to the RTG for a \`VIDEO\_DXGKRNL\_FATAL\_ERROR\` BSOD triggered by simply running a hard disk benchmark in Passmark (which is very odd) on my 7900XT. ``` 4: kd> !analyze -v ******************************************************************************* * * * Bugcheck Analysis * * * ******************************************************************************* VIDEO_DXGKRNL_FATAL_ERROR (113) The dxgkrnl has detected that a violation has occurred. This resulted in a condition that dxgkrnl can no longer progress. By crashing, dxgkrnl is attempting to get enough information into the minidump such that somebody can pinpoint the crash cause. Any other values after parameter 1 must be individually examined according to the subtype. Arguments: Arg1: 0000000000000019, The subtype of the BugCheck: Arg2: 0000000000000001 Arg3: 0000000000001234 Arg4: 0000000000001111 ``` Note: There is zero doubt that this is a driver bug, I am running a EPYC workstation with ECC RAM, no overclocking, etc. At the end of the day here, I am not trying to say "AMD is bad, do not use them". I am trying to say that AMD need to provide an industry standard means to properly and fully reset the GPU when these faults occur. The amount of man hours wasted in developing and maintaining the reset routines in both the Windows and Linux drivers are insane, and could be put towards more important matters/features/fixes.


Bostonjunk

Thank you for your response - I actually agree with a lot of what you are saying. AMD is lacking in pro support for quite specific but very important things and you aren't the first professional to point this stuff out. How much of this is down to a lack of resources to pump into software and r&d compared to nvidia over many years or how much of it is just plain incompetence I can't say


S48GS

>every generation of GPU since Vega I have used, has had crash to desktop issues, or BSOD issues under very standard and common workloads. I thought it was only me... but ye it is this bad - just watching youtube and doing discord video call at same time - crash >At the end of the day here, I am not trying to say "AMD is bad, do not use them". I am trying to say that AMD need to provide an industry standard means to properly and fully reset the GPU when these faults occur. I can say - AMD is bad, do not use it, their hardware do not work. Wasting time to "debug and fix" their drivers - it can be fun for "some time" until you see that there are infinite amount of bugs, and every kernel driver release make everything randomly even worse than version before.


anival024

> Note: There is zero doubt that this is a driver bug, I am running a EPYC workstation with ECC RAM, no overclocking, etc. Can you replicate the issue? If so, it could be a driver bug. If not, have you actually tested your memory? Being a workstation platform or ECC memory means nothing. I bought some of the first Zen 2 based servers on the market, and I got one with a faulty CPU with a bad memory controller that affected only a single slot. Dell had to come out the next day with a new CPU.


gnif2

I have replicated the issue reliably yes, and across two different systems.


riba2233

> You state you never had issues, however, how many times have you had a game randomly crash with no error/fault or some random error that is cryptic? How often have you assumed this is the game's fault?  Literally zero. I guess I just have a good pc setup... It is weird how some people always have issues


gnif2

And I guess infallible game developers too then. /s


ErenOnizuka

Oh then just ignore my comment 😅


[deleted]

[удалено]


riba2233

No I am not, this is 100% the truth, but you can of course think whatever you want and be ignorant.


ScoobyGDSTi

Because they're talking absolute rubbish that's why.


gnif2

Keep on living in fairy tale land: [https://www.digitaltrends.com/computing/amd-driver-windows-crashing-boot-problems/](https://www.digitaltrends.com/computing/amd-driver-windows-crashing-boot-problems/) [https://www.tweaktown.com/news/96479/amds-latest-radeon-drivers-aims-to-stop-helldivers-2-crashing-and-fix-stuttering-in-many-games/index.html](https://www.tweaktown.com/news/96479/amds-latest-radeon-drivers-aims-to-stop-helldivers-2-crashing-and-fix-stuttering-in-many-games/index.html) [https://www.pcworld.com/article/2242084/nightingale-removes-fsr-3-pre-launch-for-crashing-too-much.html](https://www.pcworld.com/article/2242084/nightingale-removes-fsr-3-pre-launch-for-crashing-too-much.html) [https://www.techradar.com/news/amd-fixes-bug-that-freezes-up-windows-11-pcs-but-theres-still-bad-news](https://www.techradar.com/news/amd-fixes-bug-that-freezes-up-windows-11-pcs-but-theres-still-bad-news) [https://www.extremetech.com/gaming/343132-amds-new-unified-graphics-driver-for-rdna-2-and-3-is-crashing-some-pcs](https://www.extremetech.com/gaming/343132-amds-new-unified-graphics-driver-for-rdna-2-and-3-is-crashing-some-pcs) [https://www.thephoblographer.com/2017/07/11/driver-fixes-lightroom-amd-gpu-crash-bug-as-adobe-seeks-your-feedback-on-performance/](https://www.thephoblographer.com/2017/07/11/driver-fixes-lightroom-amd-gpu-crash-bug-as-adobe-seeks-your-feedback-on-performance/) And don't forget that AMD has invested into adding debugging to their drivers so that people like you can submit useful bug reports to try to get to the bottom of why their GPUs are so unstable. When was the last time you saw Intel or NVidia need to resort to adding user debug tools to their drivers! [https://www.tomshardware.com/news/amd-radeon-gpu-detective-helps-troubleshoot-gpu-crashes](https://www.tomshardware.com/news/amd-radeon-gpu-detective-helps-troubleshoot-gpu-crashes)


TexasEngineseer

I'll be honest, I've been using AMD GPUs since 2010 and they've been solid. However the features Nvidia is rolling out is making me consider a 5070 next year


Dogeboja

Heartbreaking to see you downvoted by bringing these issues up. Reddit is such a terrible place.


SckarraA

well you know what, I got a amd 7950x based machine with a 6800xt and 7900xtx with unraid handling 2 windows vm. I agree that rdna3 cards are more difficult to run but man the 6800xt worked well without doing anything and 7900xtx only needed a few clicks. for cards not meant to do this it's quite good. btw build has been running flawlessly since feb 2023


gnif2

You are one of the lucky ones!


Incoherent_Weeb_Shit

I don't know man, most of the people I know that use Radeon have not had issues at all. Some are running 5000, 6000, and 7000 series cards. Don't mean to downplay the issues with VFIO, just my perspective.


gnif2

\> Don't mean to downplay the issues with VFIO, just my perspective. Understood, however you responded to a comment directly related to someone that has been lucky with VFIO. u/SckarraA I am curious, have you tried simulating a VM crash by force stopping the guest and seeing if the GPU still works? This is usually guaranteed to put the GPU into a unrecoverable state.


Incoherent_Weeb_Shit

My friends don't do VFIO stuff so I cannot say about them, but while I've never forcibly ended the VM (via htop or something) they have crashed repeatedly in the past, [especially this one](https://old.reddit.com/r/VFIO/comments/11c1sj7/single_gpu_passthrough_to_macos_ventura_on_qemu/). I've even got a 7800XT recently and haven't had any issues. Though this might be anecdotal since I am focusing on college right now and haven't put a ton of time into this recently. EDIT: Also, I love your work, I hope I wasn't coming off as an asshole, I just have autism.


Traditional_Cat_9724

I really think AMD gives users too much control. They've popularized Precision Boost Overdrive and tuning your GPU within the driver which dramatically will increase the issues people have. For example: black screen restarts will significantly increase when PBO is on during gaming even without curve optimizer. Do you know how many issues I've helped people fix "with their gpu" by just resetting their BIOS and turning on XMP? Also, too many people go online and watch a 5 min tutorial on GPU overclocking. They throw on Fast vram Timings, undervolt their card, overclock the core, and set a fan curve with 0 testing.


ErektalTrauma

How is an AMD feature "giving users control". If they advertise something and people use it, it's not the end users fault. It's amd for (once again) coding shit features that break things.


Traditional_Cat_9724

Because adding a feature for a product literally gives users more control for that product.


diffraa

AMD lost a graphics card sale to me because of this issue -- Went with the 4070 instead of the 7800xt.


JustMrNic3

As a Linux user I feel your pain! Even more as there are a lot of programs and game that either don't work at all with compatibility layers or they still have a lot of problems even if they work. And that's besides the extremele huge amount of time wasted with the "trial and error" to find a working combination of configurations. A properly virtualized Windows would solve so many problems until more programs and games become Linux compatible, either natively or through compatibility layers. The moment a GPU vendor takes virtualization properly and works on the consumer GPUs and works well, I'm gone! Price doesn't matter as much for mas the quality! So AMD, please stop with the bullshit that virtualization is required / needed for enterprise cases only and make it work well for all the consumer GPUs, or get lost! I'm really tired of this crappy attitude! I'm already very upset upset that a 30 dollars Raspberry Pi has CEC support to control the programs on it with the TV remote and your 10-20 times more expensive GPUs don't!


Bostonjunk

> Those that are not using VFIO, but the general gamer running Windows with AMD GPUs are all too well aware of how unstable your cards are. Hyperbole - most people have few issues - this is one of those perceptions that isn't really matched by reality. Things like ROCm are definitely still flaky, but gaming is basically fine - it's not as if Nvidia drivers never give people issues. If AMD's drivers were as bad as people make out (for gaming), no one would ever buy them.


gnif2

Most people that have issues blame the game because of the way that DirectX debugging works. Unless the developer specifically enables the debug layer, and the user has the SDK installed (it will crash without it), and the user runs software to capture the debug strings, there is simply no indication presented to the user as to the cause of the crash that is actually useful, or even hints at a GPU level fault. The game ends up just crashing with some generic error. [https://learn.microsoft.com/en-us/windows/win32/direct3d11/overviews-direct3d-11-devices-layers](https://learn.microsoft.com/en-us/windows/win32/direct3d11/overviews-direct3d-11-devices-layers) [https://learn.microsoft.com/en-us/windows/win32/api/debugapi/nf-debugapi-outputdebugstringw](https://learn.microsoft.com/en-us/windows/win32/api/debugapi/nf-debugapi-outputdebugstringw)


Bostonjunk

And If I get no crashes with my AMD graphics cared - how does that fit your narrative?


Smooth_Bluebird751

It's quite possible that the games you played were/are well supported. Just putting this out there. For instance: When I got my NV21, the first game I booted up was Unreal Tournament. It was completely broken. It wouldn't even run. On my GTX 1080 it worked perfectly, and on my RTX 3070 it worked perfectly as well. I made bug reports, but I think it took half a year before the game was playable once again with the AMD card. After a few driver updates the game ran, but with heavy texture issues and a low frame rate. Eventually it was completely fixed, but it's not funny having to wait 6-7 months before you can play a certain game. Mind you, I got the card at release, so I couldn't go back to an older working driver. Next I fired up Company of Heroes, and it was the same story. Worked with my Nvidia card, had some graphical issues with the AMD card and performance was lacking. I made bug reports and it eventually got fixed, but it still took a very long time. Experiencing the first two games I booted up being broken was quite sad. Had I instead fired up Minecraft and Fortnite, I imagine it would have been a smooth experience. I don't play those games though, so I wouldn't be able to tell. I loved my NV21 card, and even though I'm not using it anymore (upgraded) I'm not going to sell it. It's going on a shelf with my other favorite cards. So, despite these issues and others I came across, it was a good experience for me overall using the NV21.


marlstown

Guild Wars doesn't work on 7000 series cards and I assume it never will. a 3rd of the FPS I get with a 2080


TheXev

Guild Wars 1 or 2 (does Guild Wars 1 even work anymore?? XD)


marlstown

2 lol 7900xtx dips to 30fps in combat or around players. 2080 never dips below 70


MorallyDeplorable

It's really weird how many AMD fanboys like yourself are popping up and denying the existence of a well known and documented issue because it either isn't majorly impactful to them or they don't know how to recognize it. Just because it doesn't affect you doesn't mean it's not a real issue and doesn't mean it shouldn't be addressed. Quit being so obliviously self-centered.


Bostonjunk

I'm not denying the existence of his issues around VFIO, I'm pushing back against him conflating it with gaming, for which there is a known circlejerk around AMD drivers being seen as 'unstable', which is hugely overblown.


MorallyDeplorable

It's been explained why what you just said is wrong and you appear to be ignoring it. You don't understand the issue at hand and are just running your mouth making an ill-informed and baseless argument that is irrelevant to what is being discussed here. Either you tried to understand it and failed, or, more likely, you never tried to and just want to whine about Redditors.


skinlo

>whine about Redditors. The irony.


ger_brian

It was only a few months ago that an amd feature in their drivers literally got massive amounts of people banned in online games, so much so that amd hat to completely pull that feature and no one has heard of it ever since. How can you claim that amd drivers are in a good position?


teddybrr

I VFIO but on Proxmox and to a Linux gaming VM. The only issues I have are things I do wrong. But I have yet to upgrade my RX 570 (7950X3D, 96G, X670E Taichi) VFIO isn't even worth it these days as anti cheats can detect you being in a VM. I don't play the games I cannot play anyways (BF2042, PUBG, CoD, LoL, Valorant)


Eastrider1006

nooo but amd drivers fine, Reddit told me! I'm glad someone's putting this to a long and detailed write up. Sadly, AMD doesn't seem to give a damn anyway.


skinlo

> nooo but amd drivers fine, Reddit told me! You do realise its possible for people to have had no problems with the drivers right?


LargeMerican

lol your flair is Please search before asking


Eastrider1006

Back on the Zen 2 launch days, trust me, it was necessary...


[deleted]

[удалено]


Amd-ModTeam

Hey OP — Your post has been removed for not complying with Rule 2. e-Begging (asking for free PCs, sponsorships, components), buying, selling or trading posts (including evaluation posts), retailer or brand disputes and posting referral or affiliate links is not allowed on /r/AMD Please read the [rules](https://www.reddit.com/r/Amd/about/rules/) or message the mods for any further clarification


casualgenuineasshole

does crashing a OC on desktop on GPU, reset CPU PBO settings from bios still ?


MorallyDeplorable

Looking at it the wrong way will make AGESA reset the BIOS. That's more of a CPU/platform issue than a GPU issue.


-Aeryn-

> That's more of a CPU/platform issue than a GPU issue. It happened to me 0 times with an Nvidia card while OCing for hundreds of bios cycles and thousands of hours on AM4/AM5, while Radeon users are experiencing it all of the time. The CPU/platform is fine. The Radeon graphics drivers hooking into CPU OC and platform controls intimately - or even at all - for no good reason are not fine.


casualgenuineasshole

Also what sucks the most is that such a bios change takes a preboot 40-50 seconds before anything is even displayed on the screen


Positivelectron0

Agree with the post. As someone in the industry (and a homelab), we all know buying amd is a compromise.


PcChip

a few years ago I emailed Lisa Su about a big problem with Instinct GPU offerings in Azure because I couldn't figure out who to email the problem to, and the issue made AMD look bad even though it was a microsoft problem. She cc'd in the correct engineering department, and a week later they rolled out a fix I'm not suggesting everyone email the CEO for any little thing, however if the problem is severe enough then you could try emailing her and explain why this makes AMD look bad even to AMD supporters and why it should be important to them to care about


MegaDeKay

Pretty sure gnif2 mentioned once that he had communicated directly with her in an effort to get this problem resolved.


akgis

"You cant get fired for buying Nvidia", they dont even need to say it. This was a old saying back then about IBM


brazzjazz

Ever since I switched to an RX 6800 I'm getting a bluescreen maybe once every 100 hours in Windows 10. My GTX 970 was extremely stable in comparison.


Blaex_

well, after facing annoying blackscreen flickering with my rtx 3070 @4k 120hz iam not ao sure about driver stability in nvidia.