Moving GPU drivers out of the initramfs
Apr. 29th, 2024 02:59 pm![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
The firmware which drm/kms drivers need is becoming bigger and bigger and there is a push to move to generating a generic initramfs on distro's builders and signing the initramfs with the distro's keys for security reasons. When targetting desktops/laptops (as opposed to VMs) this means including firmware for all possible GPUs which leads to a very big initramfs.
This has made me think about dropping the GPU drivers from the initramfs and instead make plymouth work well/better with simpledrm (on top of efifb). A while ago I discussed making this change for Fedora with the Red Hat graphics team spoiler: For now nothing is going to change.
Let me repeat that: For now there are no plans to implement this idea so if you believe you would be impacted by such a change: Nothing is going to change.
Still this is something worthwhile to explore further.
Advantages:
1. Smaller initramfs size:
* E.g. a host specific initramfs with amdgpu goes down from 40MB to 20MB
* No longer need to worry about Nvidia GSP firmware size in initrd
* This should also significantly shrink the initrd used in liveimages
2. Faster boot times:
* Loading + unpacking the initrd can take a surprising amount of time. E.g. on my old AMD64 embedded PC (with BobCat cores) the reduction of 40MB -> 20MB in initrd size shaves approx. 3 seconds of initrd load time + 0.6s seconds from the time it takes to unpack the initrd
* Probing drm connectors can be slow and plymouth blocks the initrd -> rootfs transition while it is busy probing
3. Earlier showing of splash. By using simpledrm for the splash the splash can be shown earlier, avoiding the impression the machine is hanging during boot. An extreme example of this is my old AMD64 embedded PC, where the time to show the first frame of the splash goes down from 47 to 9 seconds.
4. One less thing to worry about when trying to create a uniform desktop pre-generated and signed initramfs (these would still need support for nvme + ahci and commonly used rootfs + lvm + luks).
Disadvantages:
Doing this will lead to user visible changes in the boot process:
1. Secondary monitors not lit up by the efifb will stay black during full-disk encryption password entry, since the GPU drivers will now only load after switching to the encrypted root. This includes any monitors connected to the non boot GPU in dual GPU setups.
2. With simpledrm plymouth does not get the physical size of the monitor, so plymouth will need to switch to using heuristics on the resolution instead of DPI info to decide whether or not to use hidpi (e.g. 2x size) rendering and even when switching to the real GPU driver plymouth needs to stay with its initial heuristics based decision to avoid the scaling changing when switching to the real driver which would lead to a big visual glitch / change halfway through the boot.
This may result in a different scaling factor for some setups, but I do not expect this really to be an issue.
3. On some (older) systems the efifb will not come up in native mode, but rather in 800x600 or 1024x768.
This will lead to a pretty significant discontinuity in the boot experience when switching from say 800x600 to 1920x1080 while plymouth was already showing the spinner at 800x600.
One possible workaround here is to add: 'video=efifb:auto' to the kernel commandline which will make the efistub switch to the highest available resolution before starting the kernel. But it seems that the native modes are simply not there on systems which come up at 800x600 / 1024x768 so this does not really help.
This does not actually break anything but it does look a bit ugly. So we will just need to document this as an unfortunate side-effect of the change and then we (and our users) will have to live with this (on affected hardware).
4. On systems where a full modeset is done the monitor going briefly black from the modeset will move from being just before plymouth starts to the switch from simpledrm drm to the real driver. So that is slightly worse. IMHO the answer here is to try and get fast modesets working on more systems.
5. On systems where the efifb comes up in the panel's native mode and a fast modeset can be done, the spinner will freeze for a (noticeable) fraction of a second as the switch to the real driver happens.
Preview:
To get an impression what this will look / feel like on your own systems, you can implement this right now on Fedora 40 with some manual configuration changes:
1. Create /etc/dracut.conf.d/omit-gpu-drivers.conf with:
omit_drivers+=" amdgpu radeon nouveau i915 "
And then run "sudo dracut -f" to regenerate your current initrd.
2. Add to kernel commandline: "plymouth.use-simpledrm"
3. Edit /etc/selinux/config, set SELINUX=permissive this is necessary because ATM plymouth has issues with accessing drm devices after the chroot from the initrd to the rootfs.
Note this all assumes EFI booting with efifb used to show the plymouth boot splash. For classic BIOS booting it is probably best to stick with having the GPU drivers inside the initramfs.
This has made me think about dropping the GPU drivers from the initramfs and instead make plymouth work well/better with simpledrm (on top of efifb). A while ago I discussed making this change for Fedora with the Red Hat graphics team spoiler: For now nothing is going to change.
Let me repeat that: For now there are no plans to implement this idea so if you believe you would be impacted by such a change: Nothing is going to change.
Still this is something worthwhile to explore further.
Advantages:
1. Smaller initramfs size:
* E.g. a host specific initramfs with amdgpu goes down from 40MB to 20MB
* No longer need to worry about Nvidia GSP firmware size in initrd
* This should also significantly shrink the initrd used in liveimages
2. Faster boot times:
* Loading + unpacking the initrd can take a surprising amount of time. E.g. on my old AMD64 embedded PC (with BobCat cores) the reduction of 40MB -> 20MB in initrd size shaves approx. 3 seconds of initrd load time + 0.6s seconds from the time it takes to unpack the initrd
* Probing drm connectors can be slow and plymouth blocks the initrd -> rootfs transition while it is busy probing
3. Earlier showing of splash. By using simpledrm for the splash the splash can be shown earlier, avoiding the impression the machine is hanging during boot. An extreme example of this is my old AMD64 embedded PC, where the time to show the first frame of the splash goes down from 47 to 9 seconds.
4. One less thing to worry about when trying to create a uniform desktop pre-generated and signed initramfs (these would still need support for nvme + ahci and commonly used rootfs + lvm + luks).
Doing this will lead to user visible changes in the boot process:
1. Secondary monitors not lit up by the efifb will stay black during full-disk encryption password entry, since the GPU drivers will now only load after switching to the encrypted root. This includes any monitors connected to the non boot GPU in dual GPU setups.
Generally speaking this is not really an issue, the secondary monitors will light up pretty quickly after the switch to the real rootfs. However when booting a docked laptop, with the lid closed and the only visible monitor(s) are connected to the non boot GPU, then the full-disk encryption password dialog will simply not be visible at all.
This is the main deal-breaker for not implementing this change.
Note because of the strict version lock between kernel driver and userspace with nvidia binary drivers, the nvidia binary drivers are usually already not part of the initramfs, so this problem already exists and moving the GPU drivers out of the initramfs does not really make this worse.
This is the main deal-breaker for not implementing this change.
Note because of the strict version lock between kernel driver and userspace with nvidia binary drivers, the nvidia binary drivers are usually already not part of the initramfs, so this problem already exists and moving the GPU drivers out of the initramfs does not really make this worse.
2. With simpledrm plymouth does not get the physical size of the monitor, so plymouth will need to switch to using heuristics on the resolution instead of DPI info to decide whether or not to use hidpi (e.g. 2x size) rendering and even when switching to the real GPU driver plymouth needs to stay with its initial heuristics based decision to avoid the scaling changing when switching to the real driver which would lead to a big visual glitch / change halfway through the boot.
This may result in a different scaling factor for some setups, but I do not expect this really to be an issue.
3. On some (older) systems the efifb will not come up in native mode, but rather in 800x600 or 1024x768.
This will lead to a pretty significant discontinuity in the boot experience when switching from say 800x600 to 1920x1080 while plymouth was already showing the spinner at 800x600.
One possible workaround here is to add: 'video=efifb:auto' to the kernel commandline which will make the efistub switch to the highest available resolution before starting the kernel. But it seems that the native modes are simply not there on systems which come up at 800x600 / 1024x768 so this does not really help.
This does not actually break anything but it does look a bit ugly. So we will just need to document this as an unfortunate side-effect of the change and then we (and our users) will have to live with this (on affected hardware).
4. On systems where a full modeset is done the monitor going briefly black from the modeset will move from being just before plymouth starts to the switch from simpledrm drm to the real driver. So that is slightly worse. IMHO the answer here is to try and get fast modesets working on more systems.
5. On systems where the efifb comes up in the panel's native mode and a fast modeset can be done, the spinner will freeze for a (noticeable) fraction of a second as the switch to the real driver happens.
Preview:
To get an impression what this will look / feel like on your own systems, you can implement this right now on Fedora 40 with some manual configuration changes:
1. Create /etc/dracut.conf.d/omit-gpu-drivers.conf with:
omit_drivers+=" amdgpu radeon nouveau i915 "
And then run "sudo dracut -f" to regenerate your current initrd.
2. Add to kernel commandline: "plymouth.use-simpledrm"
3. Edit /etc/selinux/config, set SELINUX=permissive this is necessary because ATM plymouth has issues with accessing drm devices after the chroot from the initrd to the rootfs.
Note this all assumes EFI booting with efifb used to show the plymouth boot splash. For classic BIOS booting it is probably best to stick with having the GPU drivers inside the initramfs.
no subject
Date: 2024-05-01 04:09 pm (UTC)1. You have a core initramfs, which has no GPU drivers
2. You have an overlay initramfs for each GPU driver (amdgpu, radeon, nouveau, nova, i915, xe, etc)
3. You have an overlay initramfs for each major hardware group's firmware files (e.g. one for Vega, one for RDNA1, one for Turing, one for Ada, etc)
It would then likely be the bootloader's responsibility to either detect dynamically on boot which overlays are needed, or when the system is running it is saved to a bootloader config (e.g. when running update-grub in the case of GRUB2).
I imagine there would be a fair bit of kernel plumbing necessary to allow having "overlay" initramfs filesystems loaded, but the end result would be that your system only loads what it needs, while still loading the stuff it does need in the initramfs phase.
no subject
Date: 2024-05-02 10:57 pm (UTC)no subject
Date: 2024-05-06 09:54 am (UTC)no subject
Date: 2024-05-16 07:19 am (UTC)Why not storing the required firmwares on that partition with some signature mechanism (signature stored in initramfs for example) to verify their validity ?
This would reduce the initramfs size, but also allow to load the firmware before going to the disk decryption.