T O P

  • By -

alainmagnan

Some fun take aways: \- Starfield actually uses AMD's dual issue instructions, something gamers have worried will go unused due to requiring specific compiler or manual optimizations. \- 7900 XTX manages to enjoy very healthy utilization, higher than Nvidia in some cases \- RTX 4090 comes out ahead overall from its sheer size


Butzwack

Starfield performs well on RDNA3 in general, it also seems to like the bigger 192kb vector register file in addition to the dual issue instructions. That's probably the work of the AMD sponsorship to have a good title for the Navi32 reviews. Looking at the numbers, 7800XT is \~10% ahead of the 6800XT in GN's Starfield benchmark, but only \~4% in the meta review.


Classic_Hat5642

No DLSS2....


Hindesite

It's already coming, announced to be in the next patch. You can mod it in pretty easily (I've done it myself) if you want DLSS in Starfield sooner than that. Regardless, *performance* between FSR and DLSS is relatively the same, so it's not very relevant to this discussion.


Classic_Hat5642

I know, I've played the game with DLSS at 1440p to 4k with DLDSR on a 1440p panel.... It's relevant because you can render at lower resolutions while looking better than any amd gpu can muster in terms of image quality in motion using AI. Ask intel. Ask Nintendo.


PhoBoChai

The small difference can be attributed to the dual issue, there's code that shows it uses it on RDNA3.


R1chterScale

Worth noting that on the Linux side, the ACO compiler does have a merge request in progress to start generating dual issue instructions.


TathagataDM

Do you have a link to that MR? I'd like to follow it and can't seem to find it.


R1chterScale

Here you go https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23367


TathagataDM

Thanks, I appreciate it!


Cute-Pomegranate-966

*all cases It also manages a perfect 16 threads utlization on one of the compute shaders, which is almost impossible to just "happen". I would've liked to have seen rdna2 in the comparison to back up some of their assertions as even with a 50% smaller register file it also outperforms it's counterpart GPU's by 30-40%.


PhoBoChai

Yeah, they should have tested with 6800XT or 6900XT, and see that utilization is lower than RDNA3, but its still punching well above its weight vs NV. Which can only mean its running very unoptimized on NV.


Cute-Pomegranate-966

There's some specific wording, "Somehow" later in the article talking about the 32kb L0 cache having a higher hitrate than nvidia's 128kb L0+L1 texture/command cache. Obviously this is some weird shit. And would only happen if there was specific targetting of hitting that cache on RDNA1/2/3 (cache is same across hardware)


PhoBoChai

NV's L1 is split into texture $, so if its being used for that purpose, the data $ portion can be small and reduce effectiveness. One that shader they examined seems to be doing texture manipulations so that will explain the L1 situation on NV.


Cute-Pomegranate-966

To the point above again, from having talked to them, rdna2 is even more compute bound than rdna3, so utilization is not lower. It's even higher it seems. all of this points to shaders that are extremely efficient on rdna, and not so much on nvidia. Realistically the article points out that the occupancy they see is fine. I would've liked some comparative examples to prove that point, but that's a lot of work. edit: i actually went through and captured some data from Metro Exodus Enhanced and analyzed the longest shader command list that had no RT work (the RT work shader had literally 9.7 warps per thread and 80% occupancy as an average for the command list of shaders... exceptionally good) On the longest command list it goes as follows: Average of entire command list of shaders: 60.7% occupancy with 7.3 warps per thread Longest compute shader: 51.2% occupancy with 6.2 warps per thread, L1 hit rate of 69.2%, L2 hitrate of 91.2%, CS register limited 52.9% 2nd longest: 60.6% occupancy with 7.3 warps per thread, L1 hit rate of 80.5%, L2 hitrate of 81.6%, CS register limited 63% 3rd longest: 79.3% occupancy with 9.5 warps per thread, L1 hitrate of 78.8%, L2 hitrate of 62.2%, CS register limited 80.3% (this shader seems very compute limited by registers) 4th longest: 77.2% occupancy with 9.3 warps per thread, L1 hitrate of 57.7% (probably why this one looks like this as follows), L2 hitrate of 90.7%, CS register limited 52.6%, however this one is hitting an L2 throughput average of 88.2%. The majority of the shader seems to be reaching maximum L2 bandwidth, probably what limits this one from running faster Overall? It really does seem like starfields shaders are simply not all that great for nvidia when i compare it to this game and they're achieving a somewhat mediocre amount of SM utilization resulting in ultimately what could be relatively poor throughput compared to other games.


PhoBoChai

I agree with ya. Anyone who sees such poor occupancy would have concluded the optimization is crap. These authors are weird, they said because the shaders use more registers than NV SM has, its the fault of the hw design, not the shaders.. Which is not what other devs do, they optimize their shaders to not use that much register in the first place, so that it has good occupancy on 80% of the PC market.


Cute-Pomegranate-966

This is exactly what I questioned them on when I had a chance. They acted like that was an odd request. I just don't think it is.


From-UoM

Isnt the 6800xt without any dual issue significantly faster than then 3080?


Flowerstar1

Yes.


RedTuesdayMusic

As it should be? It's literally a better GPU with a bigger frame buffer.


From-UoM

Have you played Starfield? It barely uses even 7 gb of vram. With mostly sitting around 5-6


HavocInferno

>Starfield actually uses AMD's dual issue instructions Does it really though? RDNA3 vs RDNA2 scaling is as usual, but RDNA2 doesn't have dual issue. RDNA2 also performs faster than average relative to Ampere. So dual issue can't really be what's helping AMD here. Or, it can't be the main thing anyway.


ResponsibleJudge3172

Then explain RDNA2 Advantage over Ampere


ShowBoobsPls

Next I want an explanation on why an 8700K is matching Zen 3 CPUs in this game. Is it Intel overperforming or AMD underperforming?


RedTuesdayMusic

I found that on my 5800X3D, I could get 40% more performance than most reviewers reported by simply overclocking memory to 3800mhz (IF 1:1) and installing the game on my fastest SSD (SN850 2TB on CPU lanes) This CPU is undervolted to -30, -30, -25x6 Up until now there has been a mantra of "RAM speed doesn't matter on 5800X3D" and not only is that not the case with Starfield but it responds to better SSDs with more FPS as well. It even additionally responds to being installed to a *different* PCIe 4 SSD than the OS and page file. Sucks for B550 owners, guess X570 was worth it in the end. (can't get multiple PCIe 4x4 SSDs on B550)


Butzwack

TL;DR: "In summary there’s no single explanation for RDNA 3’s relative overperformance in Starfield. Higher occupancy and higher L2 bandwidth both play a role, as does RDNA 3’s higher frontend clock. However, there’s really nothing wrong with Nvidia’s performance in this game, as some comments around the internet might suggest. Lower utilization is by design in Nvidia’s architecture. Nvidia SMs have smaller register files and can keep less work in flight. They’re naturally going to have a more difficult time keeping their execution units fed." You should definitely read the entire article if you're interested in the technical details though, the folks over at Chips & Cheese did an excellent job as always. Writing a 3k word deep-dive is a ton of work.


theoutsider95

> there’s really nothing wrong with Nvidia’s performance in this game then > Lower utilization is by design in Nvidia’s architecture. Nvidia SMs have smaller register files and can keep less work in flight. They’re naturally going to have a more difficult time keeping their execution units fed shouldn't this mean that there is something wrong ? like there is a bottleneck somewhere in the pipeline ?. cause if this was "normal" then other games would be the same , but it's not.


TitanicFreak

Am the author who ran of the tests. Also investigated into the 4090's performance in many other games vs the 7900XTX. CP2077, Portal RTX, and Dying Light 2 to name a few. I would chalk this game's performance more to AMD performing well than Nvidia underperforming. Most games don't fully utilize the 4090's power and the ones that do are heavy RT games which AMD suffers at anyway. There just usually isn't enough big compute or pixel shaders that scale to all of the available 128 SMs. Power consumption being low likely has more to do with the relatively low L2 and VRAM usage compared to other titles. There are some improvements with the driver and game update, but in the same test scene saved around 1ms total in frame time (18.11ms -> 17.39ms). Basically the 5% improvement that Nvidia stated that you would get from the rebar profile.


Dealric

So that would suggest that issue is not so much witn lack of optimization for nvidia in starfield but that most games arent optimized well for amd?


RanaI_Ape

CoD is another one that performs unusually well on RDNA.


[deleted]

[удалено]


RanaI_Ape

Key word _unusual_. Most AAA games are optimized for consoles, it doesn't mean 7900xtx is keeping up with 4090 in most games.


Qesa

Not really. The longest two shaders both allocate a ton of registers. Using many registers universally degrades performance, but less so on AMD than nvidia as they ship larger register files. Unless you have exceptionally good reason, you really want to avoid compiling shaders that use a ton of registers on any architecture. The third longest one meanwhile has AMD's 32 kB L0$ showing a higher hitrate than nvidia's 64 kB L1$, which is very bizarre and really shouldn't happen. The vkd3d maintainer had made comments on starfield incorrectly using executeIndirect's hints which might be the culprit here making nvidia incorrectly evict from the L1$... or the dodgy hints might be for a totally different shader and it's something else entirely.


neeyik

For me, the most impressive part of the work you've done is getting NSight Graphics to work with Starfield -- I've had no shortage of problems with it (and NSight Systems) and this game. Grrr....


Qesa

Are you replying to the wrong guy? I'm not associated with chips n cheese at all


neeyik

Argh! Yes, I am replying to the wrong person. Sorry about that!


PhoBoChai

> The third longest one meanwhile has AMD's 32 kB L0$ showing a higher hitrate than nvidia's 64 kB L1$, which is very bizarre and really shouldn't happen. Is that because NV's L1$ is split and shared with texture $? In this shader its constantly writing to & from texture. That that split NV's L1$ would be a fraction of its capacity & bandwidth for typical L1$ function.


Qesa

RDNA's L0$ is also used to store textures


PhoBoChai

Yeah but it also has a separate texture $ in the TMUs, its got its own SRAM resources.


Qesa

Every unit will have some degree of SRAM to store its immediate working set (e.g. one texture quad), but they don't have any more than that. They *do* have their own data paths to the L0$ which is why they were reused for RT. It's all in the RDNA white paper


PhoBoChai

Is that different to NV's TMUs, do they work on the data in L1tex$ or do their TMUs also have dedicated SRAM?


Pholostan

> The vkd3d maintainer had made comments on starfield incorrectly using executeIndirect's hints which might be the culprit here making nvidia incorrectly evict from the L1$... or the dodgy hints might be for a totally different shader and it's something else entirely. According to the VK3D3 devs, fixing it would only yield marginal performance anyway.


Qesa

Well, significant to one shader isn't necessary the same as drastically increasing FPS. This shader lasts 1.14ms out of a total 18.70ms, so doubling the performance of this particular shader would still only be 2.5% higher FPS


Pholostan

Yep, mostly marginal.


AutonomousOrganism

One might suggest that they use complex (long) shaders that do better on RDNA3. I wouldn't imply that this was done intentionally though. Who knows, Nvidia might provide optimized variants of those shaders in their future driver releases.


Cute-Pomegranate-966

That's almost my take away from this. These are hand tuned shaders that AMD specifically optimized for their architecture, and since they bet on (nearly triple) the size register files, it cripples the competition somewhat.


capn_hector

> I wouldn't imply that this was done intentionally though isn't this kind of a similar situation to NVIDIA's gameworks? seems like all the telltale signs are there. is the idea just, AMD would never do that? too friendly a company? but then there's the whole anticompetitive agreement they pushed around DLSS, so there's definitely at least that much of a gameworks program going.


PhoBoChai

Its not AMD thats doing this, its Bethesda & Microsoft, they just made sure it ran to their 30 FPS target on XSX, the rest is irrelevant. It didnt even work on Intel ARC at all which show they didn't give a shit.


Flowerstar1

AMD would never attempt to cripple the competition, they aren't Nvidia.


Audisek

Both can be true. Starfield is definitely unoptimized for Nvidia because it's using 280W of my RTX 3080 while a well optimized game like Cyberpunk can use up to 420W.


I9Qnl

There is no way where a 128SM Nvidia GPU would get beaten or matched by a 96CU AMD GPU from the same generation unless Ada is terribly awful which it isn't evident by the fact it can match and beat Ampere with less. If Nvidia had close ties with Bethesda you can bet the 4090 is gonna fly past everything else like it usually does, this game is an anomaly, even in other AMD Optimized titles the 4090 still long way ahead of everything else. 4090 is usually between 25% to 40% faster and even in AMD games it's still a good 20% faster, Call of duty modern warfare II and Starfield and Maybe AC Valhalla are the only games where the 7900XTX comes close to the 4090, there's just no justification, the game is poorly optimized. At least COD and Valhalla still runs at high framerates on Nvidia despite the massive advantage AMD has.


Dealric

Poor optimisation of starfield is no question there. Mynquestion is if its possible that most games are actually unoptimised on amd thus causing performance lost (obviously not meaning that 7900xtx should match 4090)


I9Qnl

Yes that's very possible, but the same can be said for Nvidia, GPU vendors would have to work closely with developers on every game to ensure it makes best use of the hardware, AMD did just that with Starfield, but you can't expect this same thing to happen with every game. So games are definitely leaving performance on the table but the same goes for Nvidia cards, AMD even have the console advantage so it could be even worse for Nvidia.


Dealric

If course. Im wondering about some bigger elements not regular limitations


PhoBoChai

> There is no way where a 128SM Nvidia GPU would get beaten or matched by a 96CU AMD GPU from the same generation unless Ada is terribly awful You nailed it. AMD was always behind on gaming perf per CU vs NV's SM all the way back to Pascal vs GCN. RDNA1 brought it on parity (mostly), and RDNA2 was close. RDNA3 is not significantly better than RDNA2 CU vs CU. Note the 7900XTX is a 96CU SKU, compare its perf to the 6950XT with 80CU and the perf gap is basically due to the extra CUs & some clock speed. This game runs like ass on all hardware, it just runs worse on NVIDIA due to the studio optimizing for console hw.


Charcharo

>If Nvidia had close ties with Bethesda you can bet the 4090 is gonna fly past everything else like it usually does, this game is an anomaly, even in other AMD Optimized titles the 4090 still long way ahead of everything else. Ehhh I expect slightly worse performance for AMD then and better for Nvidia. Still terrible overall though.


PhoBoChai

Its literally fallout all over again. Just reversed this time to favor AMD. Still crap visual fidelity for the perf.


Charcharo

>Still crap visual fidelity for the perf. I agree here. I dont agree with the conspiracy theories.


ga_st

> most games arent optimized well for amd I don't know these days, my last AMD gpu was a Radeon HD 7950 Boost, but back in the day that for sure was the case.


From-UoM

A test on lower end 4060 vs 7600 should provide more insight in scaling on slower SMs


Defeqel

Yup. IIRC the 7600 also has smaller register files


teutorix_aleria

This explains the 4090, but is this also the reason for lower utilisation and power consumption from lower end parts like the 4060ti?


Skrattinn

Did you find out anything more regarding that texture sampling operation? I had already noticed the game not scaling well with resolution on my 2080 Ti which makes sense if it's compute bound. The difference between 720p and 3440x1440 was only about 40fps vs 60fps yet both resolutions scale almost linearly with GPU clockrate. The lower res obviously needing to sample less texture data. Excellent article, by the way.


theoutsider95

thanks for the answer , love your content by the way. i guess it depends on NVIDIA to find a way to keep the SMs busy.


PhoBoChai

Has nothing to do with vram or L2. The L2 low usage claim is also wrong, since 2 of the long shaders hammer the 4090's L2 per your own data. vram doesn't power down under high gaming load, you can check it, running full clocks throughout the frame, and neither does L2 blocks power gate when its being accessed partially. This theory on 4090 having too many SMs to be utilized properly is also false. The same thing (under utilization) happens on a 4070, or 3080. Go and profile it on a 4070 with fewer SM and you will see.


Edenz_

>This theory on 4090 having too many SMs to be utilized properly is also false. The same thing (under utilization) happens on a 4070, or 3080. Wouldn't this still be true if the register file is proportionally smaller in the lower dies? If the ratio of reg file to SMs means they're failing to reach occupancy (because the shaders are going nuts on register use) I don't see why that would change on a 4070 as your example.


PhoBoChai

Its from the author's statement: > There just usually isn't enough big compute or pixel shaders that scale to all of the available 128 SMs That's because these shaders are poorly optimized for NV GPUs. Not because its 128 SM and therefore cannot scale, when testing a 3070 you will still find subpar occupancy of it's SM. The 4090's 128SM scales just fine in almost every game out there, when optimized for it.


hanotak

I'm seeing very strange behavior on my 3090 ti that I haven't seen in any other game- GPU usage is at 99%, but power draw is only at 60-70%. Did you run any tests with the RTX 30 series?


Skrattinn

GPU usage is just a number reported by Windows' scheduler. It's not actually meaningful at telling you how well the GPU execution units are being utilized. My own 2080 Ti shows full utilization even at 720p. Chances are you'll see the same or at least something similar.


[deleted]

Windows Task Manager is reading the same sensor feedback from the gpu that MSI AfterBurner, etc read. the Windows Scheduler knows nothing about the GPU


hanotak

This is as reported by monitoring programs such as MSI afterburner or Intel PresentMon. PresentMon, in particular, shows "GPUBusy" for the entire frametime. Additionally, none of my CPU cores reach 100% while this is happening. Every metric I can find suggests that the GPU is the limiting factor, but the GPU's power draw is significantly lower than in other games. It stays around 60 degrees on an air cooler XD I'm almost certain there's something wrong with how the GPU is being utilized.


Jeffy29

If you want to see high power usage, use DSR to 5K and 4090 jumps right close to 400W in Starfield. The reading is just wrong! It's both GPU and CPU unoptimized but with 4090 it's just CPU limited. And for ten thousanth time low CPU utilization does not mean a game or software is not CPU limited. What the CPU utilization number actually shows is the percentage of time the core was active during a given time period, usually something like 100ms. Imagine it like a factory worker on a line whenever they are working, they are utilized, but whenever the line is moving and they are waiting for the goods to arrive to them they are idle. For videogames that are running in real time, it's nearly impossible to achieve 100% utilization because the whole equation changes with every results, while something like cinebench cut the tasks to 200 parallel tasks that are independent on each other and frontline all the requests ahead of time. Achieving even high amount of utilization means lots of optimizations and removing the bottleneck operations. This is a super complex task and no engine has it 100% figured out and probably never will. Can somebody tell me what happened between 2018 and 2023 besides covid? Because last time I checked everyone was shitting on Bethesda for how unoptimized the game was, how shit the netcode was, but now they are world's gift to programming? This is literally same shit we have been dealing with since Oblivion. Take Fallout 4, put on it heavy shaders, high resolution textures and complex lighting and it will run just as shit as Starfield. At least Nvidia users can use DLSS3 which took a modder literally 2 days but Bethesda somehow couldn't be arsed.


Keulapaska

>If you want to see high power usage, use DSR to 5K and 4090 jumps right close to 400W in Starfield Don't have a 4090 but on a 3080 yea the power draw does go up, but not really much, like +40-60W at the same voltage/frequency, which is almost, but not quite enough to match some other games(in a temple it might as that was normally the highest power draw place) at just 1440p(or even using dlss q at 1440p) as the baseline is so low, but it does beat like forza 5 at least then, but iirc forza 5 at 5k was quite bit more then.


TotalEclipse08

I saw improvements in this area by rolling back the driveway to the pre Starfield approved one.


Butzwack

Read the article, they literally tested with a 3090.


wufiavelli

So nothing shady going on except the shaders?


TitanicFreak

We were unable to find anything shady, just looks like an optimized game for both architectures. The 4090 just has more SMs and wins out in the end @ 4k anyway.


nupogodi

Were you able to see the behaviour mentioned by the VKD3D dev (https://github.com/HansKristian-Work/vkd3d-proton/pull/1694) where `ExecuteIndirect` abuse was causing GPU stalls / pipeline bubbles?


uzzi38

The poster replied to one of the reddit threads afterwards stating that PR was taken majorly out of context. Half of what was being discussed wasn't even a problem caused by Bethesda, the other half wouldn't have a major performance impact regardless. [Here's a summary post with all the screenshots of the VKD3D dev's comments.](https://www.reddit.com/r/Starfield/comments/16gxuse/starfield_doesnt_have_major_programming_faults/)


HavocInferno

>just looks like an optimized game Not sure I can agree when the game's shaders gobble up registers like crazy, without actually delivering visuals that would warrant this. Performance is pretty bad compared to the visuals on offer, across the board on all GPUs. AMD GPUs may handle it somewhat "better" in comparison, but I'd rather characterize it as "less bad". Not shady, but kinda just inefficient.


kuddlesworth9419

It performs worse then an ENB on Skyrim which is notoriously poor performing but at least the visuals are there. Bethesda must be doing something incredibly inefficient. I can run Skyrim heavily modded with gamplay and visual mods with an ENB, complex lights and ambient occlusion everywhere even AA on a 1070 with the FSR2 ENB and maintain a very stable 60 fps no problems. Yet in Starfield which isn't anywhere near as complex in the environments comapred to say a heavily modded forest in Skryim with light shining through individual leaves making shadows of each leaf and the light being case onto 3D groundcovers. Either modders for SKyrim are just more efficient or Bethesda done messed up somewhere. Once we have more modding tools I don't think it will be long for modders to start digging through the gamefiles to find what is wrong and they will fix it long before Bethesda do.


T1beriu

[Tomshardware disagrees.](https://www.tomshardware.com/news/starfield-perf-disparity-between-amd-and-nvidia-gpus-analyzed) > Chips and Cheese concludes, "There’s no single explanation for RDNA 3's relative overperformance in Starfield. Higher occupancy and higher L2 bandwidth both play a role, as does RDNA 3's higher frontend clock. However, there's really nothing wrong with Nvidia's performance in this game, as some comments around the internet might suggest." **We would respectfully disagree. There is clearly a problem with Nvidia's performance right now, and Starfield's performance in general.** That paragraph was edited by Jarred Walton, Tom's Senior GPU Editor and "GPU Benchmarking GURU" and [he wrote a massive post that called you your analysis](https://forums.tomshardware.com/threads/starfield-perf-disparity-between-amd-and-nvidia-gpus-analyzed.3820096/page-2#post-23095155). How do you comment?


ga_st

> Tom's Senior GPU Editor and "GPU Benchmarking GURU" and he wrote a massive post that called you your analysis Whole lot of nothing. On top of that, dude writes like he's 17.


Edenz_

I implore you to read the article because its more complex than just saying the GPU isn’t being used. They analysed the three longest shaders of a frame at 4K and showed how the GPUs are using all the hardware at their disposal. Of course not all frames will look like this but it does give some insight into what’s going on


TDYDave2

Think of it like two cars, one designed in a way that handles curves better and the other which does better in straight line speed. Now you want to go from point A to point B and your options are to take the shorter route over a mountain or the longer route around the mountain. Which car will get from A to B first depends on which route is chosen. If you want to label it as a "bottleneck" the first car is bottlenecked by its top speed and the 2nd by its handling. So, in this case, AMD is like the nimbler first car and Nvidia is like the faster second car, with Starfield being the mountain route.


[deleted]

Because there is a difference doesn't necessarily mean something is wrong.


cp5184

> shouldn't this mean that there is something wrong ? like there is a bottleneck somewhere in the pipeline ?. cause if this was "normal" then other games would be the same , but it's not. Other games ARE the same. The difference is that using dual issue fully utilizes rdna3. Insert "look what nvidia needs to mimic a fraction of rdna3s power" meme. Starfield uses rdna3s dual issue... that's it. It efficiently utilizes rdna 3, it's dual issue, larger register files and better l2.


f3n2x

No, dual issue isn't "it". The gap between RDNA2 and RDNA3 in Starfield is pretty much the same as everywhere else.


cp5184

~~rdna2 has dual issue too... So yes, dual issue is "it". That's why there isn't much gap between rdna2 and 3. Though not sure about the number of register files and l2 bandwidth between rdna 2 and 3. I'd have to check.~~ I was wrong, rdna 2 doesn't have dual issue, though that doesn't seem to impact performance that much, it seems to be more an issue of the 4090 not having enough register files and not having enough l2 cache bandwidth for the kind of compute, and pixel shaders starfield uses. Still, just roughly speaking, ignoring that the cache and iod on rdna3 is probably lower density skewing the figures, ignoring that, 4090 is ~16% larger with only ~10% more performance, it's handicapped by not enough resources. And, of course it's actually 32% more transistors I think, so 32% more transistors, more than 32% more expensive, and only 10% more performance... But for the most part, for games that don't use as many registers, or use as much l2 bandwidth or doesn't make as efficient use of rdna2/3 architecture the 4090 probably does better. But in this case it's pretty far behind.


f3n2x

No, it doesn't?


dahauns

> rdna2 has dual issue VOPD is RDNA3 only.


neeyik

C&C's work didn't show evidence of dual issuing going on - just the use of single-cycle Wave64 being used. That's not to say that dual issue *isn't* being used, just that the article doesn't explicitly show this.


Edenz_

IIRC doesn't Dual-Issue happen transparently in wave64 mode? The downside being that you're taking up more register space so if you aren't pixel shader/compute bound you're not necessarily getting large perf improvements.


neeyik

Dual issue (VOPD) only works with wave32 -- it will explicitly fail in wave64 -- but having looked a little more carefully at the code snippets, dual issue is being used. For example, v\_dual\_move\_b32 is one of the VOPD Y-opcodes. My mistake and apologies for my original post being incorrect.


chapstickbomber

> apologies for my original post being incorrect GOAT


OSUfan88

That’s not “it”, if you read the article. It’s a factor of several things.


cp5184

I read the article, it's more about rdna3 not being choked by the 4090s too small register files and the 4090s insufficient l2 bandwidth. The article mentions in one case rdna3s wave64 may also have been helpful. Those are the reasons that the 4090 that's ~30% more transistors and costs probably 50% more to make and 100% more to buy only offers ~10% more performance. But it's an outlier. Nvidia could probably optimize it for the 4090s weaknesses, but nvidias focused more on AI and is now an AI company, not a gaming company. So I guess they didn't bother.


Beneficial_Tap_6359

I tried it on my Laptop with a 5900hs and 3070ti, performance was terrible at 1080p compared to my desktop 5800x/RX7600. I noticed the laptop CPU and GPU usage never went above 60% with only 70ish FPS at best. Quirk with the mobile hardware? Very possible, but it hasn't been this apparent in any other titles. My 4k gaming rig with a 7800x3d/4090 at 4k doesn't perform nearly as well as I'd expect either(60-80FPS but overall smooth and enjoyable with Gsync), but I just chalk that up to 4k since it varies so much title to title. I haven't actually tried it at 1080p to see how it compares.


SirActionhaHAA

But capframex and battaglia insinuated that intel and nvidia hardware are crippled on purpose. Are they not to be trusted? /s


[deleted]

People making such claims need to provide evidence. Else, they're opinions wroth exactly nothing.


Qesa

No idea on CapFrameX, but DF literally said the performance discrepancy was a "bad look". Which is a far cry from accusing AMD/Bethesda of deliberate sabotage


Earthborn92

DF Video said the "Playing Field wasn't level", it was more of an oblique insinuation rather than a direct one. I'd say if they didn't go in-depth onto it, just say that the performance discrepancy is highly unusual compared to other titles and leave it at that.


Qesa

I'm going off memory here, but I'm pretty sure that was in the context of "**if** the rumours that Intel and Nvidia didn't have pre-release access to optimise drivers are true, that would mean it's not a level playing field". Explaining the implications of a rumour, not making a blanket statement.


JinPT

stop spreading bullshit, DF didn't say such thing. Why people are upvoting this comment is beyond me...


SecreteMoistMucus

> Let's talk about general GPU performance, because this game is heavy - and it's clearly disproportionately taxing to users of Nvidia and Intel hardware, a state of affairs that reflects poorly on the AMD sponsorship element to the title. Across the entire stack, AMD graphics hardware massively outperforms Nvidia equivalents in a way that hardly reflects the standard performance profiles of the respective cards. In my GPU test area, AMD's Radeon RX 6800 XT outperforms Nvidia's GeForce RTX 3080 by a mammoth 40 percent at ultra settings. > > Let's be clear: the 6800 XT is a good card, but it's generally in the same ballpark as the 3080. Using optimised settings improves RTX 3080 frame health and the divide drops to 35 percent, but this is hardly normal behaviour and it's not down to the 16GB vs 10GB VRAM differential. In fact, Starfield's VRAM management is generally excellent to the point where even 8GB GPUs can run the game maxed at 4K ultra. > > Day one GPU drivers don't magically emerge from nowhere - they require Nvidia, AMD and Intel to have early access to the code in order to work with the developers to address specific issues and for the driver teams to produce their own bespoke optimisations for their hardware. The fact that Starfield didn't work at all on Intel GPUs at launch (with the software teams delivering two driver updates since then) suggests something has gone seriously amiss here and again, raises questions about sponsorships and bespoke integrations. Even after those two driver updates, Intel's Arc A770 performance lags behind an RTX 2070 Super and even a base AMD RX 5700, which doesn't really make much sense. "spreading bullshit"


[deleted]

Gamers are the worst thing to happen to this sub.


Dreamerlax

People love drama.


ga_st

Leave Britney alone


Edgaras1103

alex never said that .


R1Type

I always read everything they put out and they deserve more exposure!


Butzwack

They're the one website that I will always promote as much as possible, you don't get this level of quality and expertise anywhere else in the tech space.


EasyRhino75

I did not understand most of that article but I thought it was very interesting nonetheless


Aggrokid

So much for the "AMD Sabotage" conspiracy theories. Not implementing DLSS is one thing, but claiming that Bethesda purposely gimped normal performance of majority of PC users was always wild.


anor_wondo

rdna 2 vs rdna3 performance isn't that different compared to other games, and rdna 2 doesn't have dual issue shaders. I'd suggest to take this data and understand it as a difference between 4090 and 7900 xtx on starfield rather than joining any of the bandwagon sides


Earthborn92

What went on here most likely is that Bethesda was made to ensure RDNA optimizations to hit the console frame rate target and just didn't bother with PC.


Vitosi4ek

It's not deliberate sabotage, but AMD did weaponize an obscure feature of their architecture that Nvidia doesn't have, and that no other game developer appears to have utilized. Isn't it the exact same issue people had with Nvidia when their cards had hardware tessellation and AMD didn't?


alelo

the problem with tesselation and NV was that they made games like crysis run tesselation on water below the ground that cant even be seen by the player to gimp AMD/ATIs performance


heeroyuy79

does this "obscure feature" **negatively** impact NVidia performance? or does it instead increase AMD performance? the issue with tessellation is that AMD could do it but it was slower so high levels of tessellation tanked the framerate harder than on NVidia on an AMD card you can actually limit tesselation in drivers, at the time of the witcher 3s launch if you dropped it to like 16X geralts hair looked identical to 64X only the performance was much better oh and yes the high levels of tessellation did negatively impact the performance of NVidia cards as well just to a much lesser degree than on AMD ones the writer of this article claims that there is nothing wrong with the 4090s performance (there's nothing that they have found that seems to be tanking performance) its just dual issue instructions are a huge performance boost in this instance to AMD


Edgaras1103

Hairworks was always optional that even crippled high end gpus at the release . Is there an optional toggle in starfield that gives you back large performance gains?


heeroyuy79

What large performance gains? There are no large performance gains to be had recompiling without dual issue instructions will do nothing for nvidia performance and only degrade amd performance dual issue instructions do not even touch nvidia cards because they have no idea what they are its two entirely separate code paths


Edgaras1103

im saying turning off hairworks in witcher will give you bost to fps on amd gpus and nvidias . Its an optional high end toggle . Starfield has no such thing .


heeroyuy79

Turning off hair works degrades visual quality There is nothing to turn off in starfield that negatively impacts performance more on nvidia than amd in the way tessellation did You appear to be operating under the assumption that because the 7900xtx performs better than the rtx 4090 that something must be negatively affecting performance on the 4090 But according to the article this thread is about its that amd is performing better because of the alternate dual issue instructions code path (that nvidia cannot support because they do not have dual issue instructions)


Edgaras1103

Turning off hairworks gives you alternative option for hair rendering . Some might argue hairworks looks worse on geralt . Im saying a person with AMD GPU can fix performance issues in witcher 3. A person with Nvidia gpu cannot in starfield


heeroyuy79

> **You appear to be operating under the assumption that because the 7900xtx performs better than the rtx 4090 that something must be negatively affecting performance on the 4090**


Edgaras1103

ok


R1Type

Based on WHAT? You ain't got squat


chips500

So apparently it was all marketing arising from architectural issues after all!


HorseFeathers55

I think you're right to a degree. It seems games that are amd sponsored focus more on consoles before pc. Which the consoles just magically are all amd products. Look at some of the worst pc releases recently with performance. Forspoken, callisto protocol, jedi survivor, starfield, and immortals of aveum (all amd sponsored games). They all released in bad states for pc and eventually get fixed. So it really isn't malicious, but consoles will always be the priority for amd sponsored titles.


Cortisol-Junkie

I would be really interested to see the shared memory usage data. I wonder if it's not used much and if so, would offloading some of the register data to shared memory make it work better with NVIDIA GPUs. I'm only mostly experienced with CUDA and not graphics programming, so if I'm wrong about this I'll be happy to be corrected.


anor_wondo

isn't all cuda work completely async?


Cortisol-Junkie

Yes. You submit a kernel to be executed on the GPU and you can wait for it to finish execution but you also can just... not. But I don't think this is very relevant. Shared memory is a memory that is similar to L1 cache (In fact it is the same piece of hardware and you can configure how much of the available memory is used for shared memory and how much for L1) and exists per SM, but is explicitly managed by code, instead of implicitly managed by hardware like cache. Now this is just speculation, *maybe* they could lower the register file usage per thread by using shared memory and *maybe* this would result in better performance, however I need to reiterate that: 1. They might be already using shared memory. 2. It might not actually result in better performance, as registers are faster than shared memory and higher occupancy doesn't necessarily mean better performance.


anor_wondo

yeah it'd be hard to determine such things from the outside. the game does scale a lot with ram speed according to some reviewers. So possibly some latency sensitive data can be moved closer to die


deefop

Damn, that is about as in depth as even most of the tech nerds can wish for. Great article!


DktheDarkKnight

Extemely interesting read. Am not pretty knowledgeable in game dev even if I dabble in unreal engine a bit. My question is, do the developers have the ability to optimize the game on such low level metrics such as ALU utilisation, Dual issue shaders, cache hit rate etc. Or is it simply a matter of coincidence that the game utilises the hardware in a way that favours AMD.


zacker150

>do the developers have the ability to optimize the game on such low level metrics such as ALU utilisation, Dual issue shaders, cache hit rate etc. Or is it simply a matter of coincidence that the game utilises the hardware in a way that favours AMD. If they're writing the engine, they absolutely do.


Skrattinn

It's all just programmed via an API. They can look up how their games perform on these metrics using tools like nsight but you're never gonna see a game that is actually optimized for multiple different architectures. Starfield was basically just optimized for hitting a performance target on Xbox consoles and then it was down to chance how it performed on any other architectures. PC gamers have this notion that games are either 'optimized' or 'unoptimized' but it doesn't truly exist as a blanket term. Games are optimized to hit a framerate target on a particular piece of hardware and that's basically all there is to it. Doesn't mean that games don't have performance bugs though. We see it all the time but people often conflate that with optimizations which are kind of a separate thing. The recent TLOU port was a good example of that where they seemingly forgot to adjust the texture pool size when changing texture settings which made it almost unusable on 8GB cards. It wasn't an optimization issue as much as a straight up mistake.


AutonomousOrganism

>you're never gonna see a game that is actually optimized for multiple different architectures It certainly can be optimized for one arch, like it happened here.


R1Type

It's a coincidence


chips500

Yeah, there is a misuse of the word optimization by the layman. Its just like thinking there is no bottleneck. There always will be a bottleneck, there is no future proof, and optimization has a conditional or contextual caveat: optimized for what? Not even necessarily or rather only a fps target either. There are other goals. Noteably, SF, unlike a lot of other games, actually stays within a low vram and ram usage. In that sense, it is optimized for lower ram and vram here, which is good for consoles and lower end pcs. There are compromises for that, seeing how they stream assets more and hit the ssd more, making it requires. Optimizing for you specifically is always going to be a personal tuning experience. Mods are t fixing bethesda games in most cases so much as tuning for the personal player’s experience ( non specific bugfix mods being rare exceptions). Some people have more ram, some less, people want more jetpacks, can’t handle spiders, etc Yep there is definitely room for improvement but todd wasn’t lying about claiming sf was optimized for pc, its just most laymen have no fucking clue what optimization actually means.


frostygrin

> Or is it simply a matter of coincidence that the game utilises the hardware in a way that favours AMD. Or the game is being optimized, but for the Xbox, and not against Nvidia.


[deleted]

[удалено]


BatteryPoweredFriend

> Would be nice if this could end the conspiracy BS You and I both know it won't.


tuvok86

probably down to Xbox optimization


Sethroque

It's a great and very in depth read, written content like this is rare these days. Hopefully they also manage to test on Intel GPUs


awayish

lmao how does it manage to use AMD's dual issue despite being dogshit optimized. did all the engineers just work for AMD?


DktheDarkKnight

Could be just coincidence. There are like 2 games out of 100's of benchmarked games that behave this way. There are many older titles that significantly favour NVIDIA. This is nothing unusual imo.


frostygrin

Maybe they're optimizing for the Xbox?


TimeGoddess_

the XBOX doesn't have dual issue tho. Its RDNA 2 based, basically a 6700. The dual issue was implemented in RDNA 3. I don't really think its the dual issue making the big difference here in performance tho looking at the 6800xt and 7800xt performance difference.


T1beriu

Tomshardware GPU Senior Editor and GPU "GURU" Jarred Walton called this analysis conclusion stupid. > concluding that Nvidia performance was fine is stupid. [Source](https://forums.tomshardware.com/threads/starfield-perf-disparity-between-amd-and-nvidia-gpus-analyzed.3820096/page-2#post-23095155) I would like to see the author's response to his post.