For NVLink topology systems we need fabricmanager. Fabricmanager itself is
dependent on the datacenter driver set and not the regular x11 ones, it is also
tightly tied to the driver version. Furhtermore the current cudaPackages
defaults to version 11.8, which corresponds to the 520 datacenter drivers.
Future improvement should be to switch the main nvidia datacenter driver version
on the `config.cudaVersion` since these are well known from:
> https://docs.nvidia.com/deploy/cuda-compatibility/index.html#use-the-right-compat-package
This adds nixos configuration options `hardware.nvidia.datacenter.enable` and
`hardware.nvidia.datacenter.settings` (the settings configure fabricmanager)
Other interesting external links related to this commit are:
* Fabricmanager download site:
- https://developer.download.nvidia.com/compute/cuda/redist/fabricmanager/linux-x86_64/
* Data Center drivers:
- https://www.nvidia.com/Download/driverResults.aspx/193711/en-us/
Implementation specific details:
* Fabricmanager is added as a passthru package, similar to settings and
presistenced.
* Adds `use{Settings,Persistenced,Fabricmanager}` with defaults to preserve x11
expressions.
* Utilizes mkMerge to split the `hardware.nvidia` module into three comment
delimited sections:
1. Common
2. X11/xorg
3. Data Center
* Uses asserts to make the configurations mutualy exclusive.
Notes:
* Data Center Drivers are `x86_64` only.
* Reuses the `nvidia_x11` attribute in nixpkgs on enable, e.g. doesn't change it
to `nvidia_driver` and sets that to either `nvidia_x11` or `nvidia_dc`.
* Should have a helper function which is switched on `config.cudaVersion` like
`selectHighestVersion` but rather `selectCudaCompatibleVersion`.
nvidia_x11_open: unbreak 5.19
common kernel code is shared, if the closed build is broken, so is the open one.
nvidia_x11_production: add alias, sort names
nvidia_x11: reintroduce stable as a pure alias
nvidia_x11: don't use alias in override
- This is fetched from a different URL, so allow passing that explicitly.
- There also isn't an nvidia-persistenced or nvidia-settings release for
this version, so use 450.57 instead. Also implement passing
persistenced and settings version explicitly.
Co-authored-by: Dmitry Kalinkin <dmitry.kalinkin@gmail.com>
In some contexts, we don’t want to have to build the whole i686 stdenv
just to use the x86_64 nvidia driver. It’s hard to know ahead of time
what we want, so it’s best to leave this as an overridable option.
I now no longer use an nvidia card commonly, so it would be harder for
me to test at least a bit. And I'm overcommited anyway.
Hopefully someone else can be found.
Fixes the issue: https://github.com/NixOS/nixpkgs/issues/39149
Problem was that the Nvidia driver did not find the libxcb-glx at runtime.
(cherry picked from commit bda072cafcef4cf5ff99828852ddc8e06ce1fdbf)