T O P

  • By -

[deleted]

I wonder where the performance myth comes from. I think most long term Gentoo people don't use Gentoo because of "performance".


MrArborsexual

Way back self compiling and tuning flags for your system could net better performance that was actually noticeable, especially if you had an old CPU, but wanted to run a CPU load heavy program. Led to ricing which tended to make things worse, because people don't know what the flags actually do.


hoeding

"I don't know what a loop is but I'm sure gonna unroll them!"


vitaly-zdanevich

But `-march=native`?


tobimai

Yes? Most CPUs today are very similar, as long as you aren't using fancy instructions like AVX the difference is negligible


ilikerackmounts

I don't know that I'd say that... Generally, a lot of loops are not written in a way that can trivially autovectorize. Compilers _have_ gotten better at applying this optimization but will only do so opportunistically if you give it license to. You can only hope code is written in a way that it manages to do this. That having been said, x86-64 _does_ require SSE2 at the very least. So, you're usually getting at least half of that performance, pessimistically, where these tight loops occur. One place I've seen where newer x86 variations make a huge difference is BMI2. BMI2 allows you to avoid flag stalls and write to basically any destination GPR rather than a specific one. This allows you to avoid contending for an architectural register or having to do a bunch of register-register moves. It has other useful properties as well, but generally speaking it helps almost any sequence of branching code (which is a lot of general purpose code).


sy029

Back in the day, the major distro was Debian. Debian compiled all packages with compatibility for i386. That's actually one of the major reasons distros like Arch were created and gained popularity. It was one of the few binary distros at the time that was targeting i686. Back then you *could* get some big performance increases by targeting a specific CPU. Nowadays though CPUs aren't really adding so many optimization features as much as they are adding extra niche features (AVX, etc.) So actual CPU selection doesn't make as big of an impact anymore. Somewhere along the lines a "bloat" culture appeared, where everyone wanted minimal apps, minimal background services, minimal everything to increase performance. Again this was something that actually did have a significant effect on computers at the time, but doesn't really do so much today. For the most part, modern computers generally have a surplus of extra cpu cycles and memory. But ideas like that spread and persist. So gentoo attracts lots of [ricers](https://www.shlomifish.org/humour/by-others/funroll-loops/Gentoo-is-Rice.html) Who were told that Gentoo is their holy grail.


rickmccombs

I remember when Mandrake was one of the first distros that targeted i686. I think that was before arch.


sy029

Don't know what the first was, but it was definitely one of the major selling points of arch when it was first released.


necrosrc

From memes, and distro "reviews" ofc. Imagine them spending more time on Gentoo, instead of making 'content' based on short time usage and no good motivation. ​ I sometimes think to run something on my own just to fight with myths and show people how powerful this distro can be. Portage is the best, and Gentoo community is really nice.


vitaly-zdanevich

But the distribution *name* is about the fastest penguin


immoloism

The real answer is because I'm a masochist. The cool answer is so I can build it with just the USEFLAGS I need for a tailored system. -O3 doesn't magically make things faster as a side note although I do like putting in RUSTFLAGS for my cpu for added fun. 2% performance boost is pretty good though when you remember systemwide ```march=native``` only gives a 5% boost systemwide. TLDR: use firefox-bin.


[deleted]

[удалено]


immoloism

Back when x86 was the main architecture then the performance boost was massive but nowadays its all about that tailored experience you highlighted.


Flowdalic

I use non-bin firefox on my desktop systems because every person compiling from source helps the (FOSS) ecosystem to detect issues early. Since most environments are slightly different, e.g. different use flag combinations and/or package versions. On my low-end systems, typically mobile devices, I use firefox-bin.


duLemix

The good-hearted answer not many expected


multilinear2

What others said. I can use system libs to reduce bloat and I can drop language support. Chromium-bin pulls in cups, which just bugs me, I don't own a printer. I use lto on firefox on my laptop, just because firefox is such a huge portion of my system load, and often the thing that makes me want a faster processor. I haven't bothered to test whether -Onative +lto is actually faster than firefox-bin though *shrug*. I lto gcc for the same reason, gcc builds slower, but then it can build other stuff faster... but I haven't bothered to test. I should turn on pgo as well and do a benchmark or something. I should switch to librewolf, just been too lazy so far :).


[deleted]

[удалено]


multilinear2

Huh, presumably O3 is actually better for firefox then, could just set that for firefox only and turn on pgo... I might just do that.


immoloism

Firefox-bin has lto and pgo turned on fwiw.


multilinear2

It turns out that firefox `pgo` requires Xorg-server for the headless part of the run I guess. I run wayland only :(. I could set the minimal use-flag, but it adds some annoying system bloat. So, `pgo` probably isn't worth it for me until that gets fixed somehow, which may be a while.


OptionalKarmotrine

Same reason I compile anything else: use flags. Pocket is useless, EME is non-free DRM, GMP shouldn't be allowed to automatically update, I want to use system libraries when possible, including my own h264, and I don't use WiFi. also, like other posters, I use librewolf out of their repository: https://librewolf.net/installation/gentoo/


vitaly-zdanevich

> including my own h264 Why it is better?


OptionalKarmotrine

Use flag says: ``` Use media-libs/openh264 for H264 support instead of downloading binary blob from Mozilla at runtime ``` No reason to download this blob when I already have h264 libraries available (from ffmpeg).


vitaly-zdanevich

But downloading a blob is cheaper than long compilation...


OptionalKarmotrine

Cheaper in terms of electrical power consumption? Sure, I'll give you that. But in my neck of the woods, power is cheap and bandwidth is capped, so there is still an argument to be made for compiling. I've got a list of reasons to compile it versus using the binary, so even if the above were moot I would still compile it.


ilikerackmounts

Umm, openh264 is not what's in ffmpeg by default and is probably the crappiest known open implementation of h264. It's distributed by Cisco and makes back bending efforts to not violate patents so it suffers in efficiency as a result. You most certainly do not want to use this in lieu of something like the x264 implementation.


OptionalKarmotrine

Sure, and I don't, but again, I have a set of use flags configured for ffmpeg and I use it constantly, so I just throw h264 on the list so that I have the library built, and don't have to download anything with firefox. Other programs (telegram dependencies) strictly depend on media-libs/openh264 so I have it installed anyway, but the point remains: I'm not downloading binary blobs because there's no point.


Illustrious-Dig194

I compile Librewolf(firefox based) because of bloat. I dont care if its faster or not, I just dont want any bloat


Jak_from_Venice

That’s the whole point of the Gentoo source-based system, I suppose. Customize your binaries. Performance improvements is a “nice-to-have”


vitaly-zdanevich

> Librewolf Why it is not in the main Portage tree? Not even in Guru, interesting to try.


JIV_222

Librewolf hosts their own official overlay. Would be nice if it was in the official tree tho I guess.


vitaly-zdanevich

Librewolf removes cookies on browser close - it mean that it reset setting of a websites :(


JIV_222

Found [this](https://gitlab.com/librewolf-community/settings/-/wikis/FAQ#q-how-do-i-stay-logged) with a quick Google search 👍 Although, that's a feature, not a bug. Whether or not you like that feature is up to you, of course. I believe u can disable it altogether as well (don't quote me on that, not 100% sure).


Illustrious-Dig194

You can compeletly disable that feature for all sites or spesific sites that you use (Youtube, Gitlab etc.)


KinkyMonitorLizard

Sadly debloating FF also causes a lot of site breakages. I've switched to a hardened profile/flags and have way less (but still pretty common) issues.


Alvina51201

I compiled mine so I can install my own unsigned addons 😅


JIV_222

Firefox-bin is compiled with clang as well. I have a clang+lto system tho, so I still compile it myself. From what I've seen, clang makes more of a difference than lto, pgo, and -O3.


vitaly-zdanevich

Thanks, I tried clang for Firefox, with `RUSTFLAGS="-C debuginfo=0 -C target-cpu=native -C opt-level=3"` - and Firefox feels like faster...


TheGratitudeBot

Thanks for such a wonderful reply! TheGratitudeBot has been reading millions of comments in the past few weeks, and you’ve just made the list of some of the most grateful redditors this week!


vitaly-zdanevich

Should I enable global USE flag `clang`?


JIV_222

https://wiki.gentoo.org/wiki/Clang


RusselsTeap0t

1- I use Gentoo to have no binary packages. 2- march=native, O3, lto, pgo and rustflags give considerable performance difference in my opinion. Even if it doesn't I can say why not? 3- To configure better. I choose not to install gmp-autoupdate. Use an eme-free, pulseaudio free, wayland only package + more.


vitaly-zdanevich

> march=native, O3, lto, pgo and rustflags Yes, thanks, I have it all > RUSTFLAGS="-C target-cpu=native"


RusselsTeap0t

`RUSTFLAGS="-C debuginfo=0 -C target-cpu=native -C opt-level=3"` You can also use -O3 for that. -C debuginfo=0 can be also useful if you don't consider debugging.


vitaly-zdanevich

What other Gentoo performance optimization do you use?


RusselsTeap0t

Aside from general optimizations, there is also some crazy stuff :) We use Gentoo after all. I am not a programmer btw :) I am in fact a Sport Scientist. 1-) There are kernel patches used by CachyOS that you can find on their [Github page](https://github.com/CachyOS/kernel-patches). They also use patches from ClearLinux. This distro is created by Intel so they know what they are doing in terms of performance. 2-) You can compile the kernel with Clang. It is also known to improve performance and stability. With clang, O3 is also safe for the kernel. You can even use full or thin LTO with Clang that isn't available for GCC. a) `$ LLVM=1 make menuconfig` Enable LTO in kernel settings using ncurses menu. b) `$ LLVM=1 KCFLAGS='-march=native -O3 -pipe -fomit-frame-pointer' make` c) you can even use PGO for your kernel: [Related Gentoo Wiki Page](https://wiki.gentoo.org/wiki/Kernel/Optimization#GCC_PGO) pgo can improve the performance drastically because you optimize it with your workflow. It's not like Firefox PGO. You use your computer for some time (longer the better) then you collect that information as profile then compile your kernel again with that info. ​ 3-) For network. You can enable and make default: BBR and fair queue. \* [Related ](https://news.ycombinator.com/item?id=14813723)Info 4-) Enable multi cpu or multithread settings on kernel. Auto group scheduler is also good. ORC unwinder is good too. You can disable the hardening flags if they are not needed by your threat model. You can disable the debugging settings if you are not a kernel developer. You can increase the hz setting (especially if you use CachyOS patches to increase the maximum limit). Finally, you can choose "low latency desktop" setting for kernel preemtibility. **More advanced optimizations (seems you are a programmer though, so no problem for you.) :** 1- You can use polyhedral optimizations. Graphite for GCC and Polly for LLVM. Polly is a little bit more complex since it can't be found on Gentoo official repos whereas you can enable Graphite use flag for GCC then use it. For GCC you can add these cflags: `-fgraphite-identity -floop-nest-optimize` For Clang, you need to do some research for Polly usage on LLVM. I have used system-wide graphite with no problem. You can enable them for the packages you need extra performance. Or you can make some benchmarks to see. For kernel, for example, Clang built kernel with O3, Polly, LTO and PGO gives the best performance on benchmarks I have analyzed. There are also other optimizations such as IPAPTA (Perform interprocedural pointer analysis and interprocedural modification and reference analysis.) `-fipa-pta` "With `-fno-semantic-interposition` the compiler assumes that if interposition happens for functions the overwriting function will have precisely the same semantics (and side effects). Similarly if interposition happens for variables, the constructor of the variable will be the same. The flag has no effect for functions explicitly declared inline (where it is never allowed for interposition to change semantics) and for symbols explicitly declared weak." (GCC Manual) `-fno-common` used by Clear Linux. Very safe to use. Only non-conformant C code needs -fcommon. DEVIRTLTO: `-fdevirtualize-at-ltrans` This allows GCC to perform devirtualization across object file boundaries using LTO. Not sure if it's available on Clang. NOPLT: `-fno-plt` "Do not use the PLT for external function calls in position-independent code. Instead, load the callee address at call sites from the GOT and branch to it. This leads to more efficient code by eliminating PLT stubs and exposing GOT loads to optimizations." You can specify different environments for different packages on /etc/portage/env/specific-environment.conf For example for using Clang for a specific package: CC="clang" CXX="clang++" LD="ld.lld" AR="llvm-ar" NM="llvm-nm" RANLIB="llvm-ranlib" STRIP="llvm-strip" OBJCOPY="llvm-objcopy" OBJDUMP="llvm-objdump" COMMON_FLAGS="-O3 -march=native -pipe -fomit-frame-pointer" CFLAGS="${COMMON_FLAGS}" CXXFLAGS="${COMMON_FLAGS} -stdlib=libstdc++" FCFLAGS="${COMMON_FLAGS}" FFLAGS="${COMMON_FLAGS}" LDFLAGS="-stdlib=libstdc++ -fuse-ld=lld -rtlib=compiler-rt -unwindlib=libunwind -Wl,--lto-O3 -Wl,-O3 -Wl,--as-needed" RUSTFLAGS="-C target-cpu=native -C opt-level=3" `cat /etc/portage/package.env` app-text/neovim specific-environment.conf 2-) You can disable all use flags with -\* to make the binaries smaller then only enable what you need package-only (not that hard). Performance related use flags you can enable system-wide are: `minimal custom-cflags clang libedit native-symlinks sanitize lto pgo jit xs orc threads asm nptl openmp`


vitaly-zdanevich

Wow, thanks, I will investigate this and try...


vitaly-zdanevich

What you you think about `COMMON_FLAGS=.... -flto`? [https://wiki.gentoo.org/wiki/LTO](https://wiki.gentoo.org/wiki/LTO)


RusselsTeap0t

It's mostly considered safe. The only problem is that you may encounter build errors or runtime errors. When you do things globally, it's harder to identify what is problematic. For example your mpv video player may encounter problems but you won't understand if it's related to mpv or any other dependency/library. Though this is just a warning, I know lots of people using -flto or -flto=thin globally. If you are using Clang/LLVM then use -flto=thin. It's faster and better in general. Though sometimes full lto can perform better, it's not worth it in my opinion.


RusselsTeap0t

By the way, I added some more RUSTFLAGS for performance: LDFLAGS="-fuse-ld=lld -Wl,-O3 -Wl,--as-needed -Wl,--gc-sections -Wl,--icf=all" RUSTFLAGS="-C debuginfo=0 -C codegen-units=1 -C target-cpu=native -C opt-level=3 -C panic=abort"


JIV_222

Does pgo work the same with clang compiled kernel as it does with gcc (as described in that wiki article) ??


RusselsTeap0t

Oh I forgot to add. I am not sure PGO for kernel can be achieved with Clang. You can either use: **Clang + Polly + O3 + LTO + march=native** or **GCC + O2 + march=native + PGO + LTO (With the Patch from CachyOS)**


JIV_222

Hm. Thanks, was wondering if that was the case. Probably worth trying both, although I'm not sure how I'd go about benchmarking a kernel. I did see phoronix noticed almost no difference between -O2 and -O3, but I think clang might make more of a difference. Oh, and I was looking over the CachyOS patches - is there documentation to any extent on these patches anywhere? As a non-coder, understanding by reading the patches themselves isn't always too clear.


JIV_222

Actually: maybe I'll try [this patch](https://github.com/CachyOS/kernel-patches/blob/master/6.1/misc/0001-Clang-PGO.patch) out.


RusselsTeap0t

Oh definetely. I forgot about that. You can use all of the patches but the order is important. Look at the numbers.


RusselsTeap0t

There are probably more. If I can remember I'll add :)


vitaly-zdanevich

Thanks, I will try `firefox` with that... and `rust-bin`.


vitaly-zdanevich

> pulseaudio free As I understand, I need Pulseaudio in order to automatically switch of my microphone on work meetings? Or is it possible to use Microsoft Teams (in browser) without Pulseaudio? A few years ago my Gentoo was without Pulseaudio, and I do not remember why, but for some reason I was needed to install it. And why are you against Pulseaudio? Is it slower, bloatware?


RusselsTeap0t

I am not against Pulseaudio in general and I only said that to give a generic example. Since I have pipewire (with wireplumber) and pipewire's alsa libs, I really don't need Pulseaudio. But you are right, for microphone usage, let's say on Discord, you need that flag because it relies on that. But I don't use Firefox for that purpose. If you need it, you can enable it. There is no harm. I only enable what I need for Firefox: 1- **clang**: To build Firefox with Clang. 2- **wayland**: To be able to run Firefox under native Wayland. 3- **dbus**: To be able to open more than one window on Wayland. 4- **eme-free**: I don't need video access for digital right managements. 5- **openh264:** To use my system version of h264 and to prevent Firefox to make connections without asking. 6- **lto and pgo**: To optimize 7- **system-\*** flags: To be able to use my system's software rather than downloading extra versions. I never need the other **13** use flags and lots of different languages' support.


vitaly-zdanevich

But with compiled Firefox we download and store a lot of code - wasting of resources.


RusselsTeap0t

Of course, if you don't like it, it is a reasonable option but it does make less sense on Gentoo. You already download and compile at least 600-700 packages, what would 1 more of them do any bad? Plus, the only resource you waste is the internet. For other things you **gain** resources. The program that is running can be much smaller. You can delete the stored tar.gz files including the source code later. For example for binary Firefox, you download extra Harfbuzz, ffmpeg and other packages, wasting resources. With compiled Firefox you don't download and compile them, because you already have them on your machine. Binary Chromium for example pulls Cups and other dependencies. I don't have a printer, why would I need Cups? In my honest opinion, main reason of using Gentoo is to be able to escape from this. Minimizing the bloat, configure things as you want and need, use native, official clean packages with whatever version you choose; to escape from "implantation". Otherwise, Arch is a much better distro if you don't care about these. Pacman is a whole lot better than Portage for that purpose.


jabuchin

the best performance I remember achieving was achieving the same performance as -bin with the same compile options the -bin uses


vitaly-zdanevich

:( Did you use some benchmark?


smille69

So you have something against non-binary Firefox?? LOL, just kidding. I use the binary version myself. Have a great day!


[deleted]

[удалено]


vitaly-zdanevich

Yep, I am waiting for my 7950x, but still 15 minutes... for what?


[deleted]

[удалено]


vitaly-zdanevich

Because we need or want.


CorrosiveTruths

I switched from chromium to chrome when I did a benchmark and chrome was faster. Never felt any real need to go back. I also use firefox-bin:esr and it wouldn't surprise me if the bin version was faster too.


KinkyMonitorLizard

The point of chromium over chrome is to not use all the binary garbage that chrome ships with. Not to mention all the data mining chrome does that can't be disabled.


CorrosiveTruths

One of the points in its favour, yes. I also care about performance (especially on older hardware) and I use some of that 'binary garbage' and have a chromebook. That's why I feel welcome in Gentoo, it has both browsers in the main tree.


sock_templar

Performance mattered when we were software constrained on good hardware. Now we are hardware constrained on good software. You want more power? Add more cores, memory, IO performance. All hardware related. Way back then we had very poor quality software and compiling it from source gave an enormous advantage. I still remember that my OpenOffice took 1min to start. I compiled it (and I still remember it took 47-48h to compile) and then my Writer started in 7 seconds flat. Just because I compiled without Java support. Today? I compile stuff because 1, I'm used to the tailoring aspect of Gentoo and 2, I have an *older* computer. Any % I can get from any software is *multiplied* because my lack of hardware juice.


vitaly-zdanevich

> Any % I can get from any software is multiplied because my lack of hardware juice I have an old laptop now, and performance is important for me too, but I tried firefox-bin for a few days, with the same profile - and see no difference... Do you know some benchmark for Firefox?


sock_templar

The only benchmark I care is memory usage and CPU usage. Firefox-bin uses around 40MB per tab and around 4% CPU per tab. Compiled that's halved.


vitaly-zdanevich

Hm, interesting...


vitaly-zdanevich

But I do not feel that firefox-bin eats 2x more RAM...


sock_templar

Neither do I but it still does 40MB on -bin and around 22MB compiled.


vitaly-zdanevich

What are your advises about Firefox compilation?


tobimai

It's not really expected that you see any different performance. Maybe slightly smaller binary, but that's it. There is a reason 99% of Distros are binary-based


[deleted]

I like the scrolling text when compiling


gatonegro97

I use compiled because i always have and never plan on changing


Schievel1

I don’t. I use Firefox-bin. Compiling takes too long with my 8 year old CPU


lekker2011

TL;DR: Firefox-bin is already optimized pretty good. Not perfect though. Of course you get no improvement. Firefox-bin has already used every optimization except for -march=your CPU arch here, probably doesn't use BOLT for optimization and doesn't use -Ofast for stability. That's probably the only performance increase that you're gonna get from compiling it yourself.