Passing `-l$NIX_BUILD_CORES` improperly limits the overall system load.
For a build machine which is configured to run `$B` builds where each
build gets `total cores / B` cores (`$C`), passing `-l $C` to make will
improperly limit the load to `$C` instead of `$B * $C`.
This effect becomes quite pronounced on machines with 80 cores, with
40 simultaneous builds and a cores limit of 2. On a machine with this
configuration, Nix will run 40 builds and make will limit the overall
system load to approximately 2. A build machine with this many cores
can happily run with a load approaching 80.
A non-solution is to oversubscribe the machine, by picking a larger
`$C`. However, there is no way to divide the number of cores in a way
which fairly subdivides the available cores when `$B` is greater than
1.
There has been exploration of passing a jobserver in to the sandbox,
or sharing a jobserver between all the builds. This is one option, but
relatively complicated and only supports make. Lots of other software
uses its own implementation of `-j` and doesn't support either `-l` or
the Make jobserver.
For the case of an interactive user machine, the user should limit
overall system load using `$B`, `$C`, and optionally systemd's
cpu/network/io limiting features.
Making this change should significantly improve the utilization of our
build farm, and improve the throughput of Hydra.
These two tests are regularly creating problems for my hydra instance,
because its builders run on ZFS and that makes them fail consistently.
The issue has something to do with unicode normalization. My pools have
formD normalization configured, that might be the culprit in this case.
Closes: #185882
Nested attrsets don't get built when running `nix-build -A git.tests`, so we use the update operator to add the attributes from `tests.fetchgit` to `passthru.tests`.
however *do* provide a `passthru.tests.withInstallCheck`.
doInstallCheck takes a ridiculous amount of time on darwin, making
staging builds ever more painful.
The following failure was visible on v2.33.1 on staging-next.
t5003-archive-zip.sh ............................... 1/?
not ok 1 - populate workdir
t5003-archive-zip.sh ............................... Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/76 subtests
(less 58 skipped subtests: 17 okay)
Occasionally the test fails for unknown reasons but works as soon as we
change the derivation in any way. For now it is better to not have a
test that is flaky than having a test that occasionally breaks and
continously wastes time on debugging it.
This was found as part of a random build failure of gitMinimal in
response to the systemd v249 PR being merged [0].
[0] 64556974b6 (commitcomment-56385360)
Previosuly the test scripts used /bin/sh which is a bit of an impurity.
It is mostly well-behaved but it essentially leaks the hosts state into
the build as /bin/sh points to some minimal shell implementation
configured on the host OS.
By patching the shebgangs of all the test scripts in the test folder
(t/*.sh) we can make sure that those run with the correct shell binary.
This was found as part of a random build failure of gitMinimal in
response to the systemd v249 PR being merged [0]. Since we have to
somehow touch the hash of the derivation to make the build failure go
away we might as well fix the hardcoded /bin/sh issue.
[0] 64556974b6 (commitcomment-56385360)