One of the module that already supports the systemd-confinement module
is public-inbox. However with the changes to support DynamicUser and
ProtectSystem, the module will now fail at runtime if confinement is
enabled (it's optional and you'll need to override it via another
module).
The reason is that the RootDirectory is set to /var/empty in the
public-inbox module, which doesn't work well with the InaccessiblePaths
directive we now use to support DynamicUser/ProtectSystem.
To make this issue more visible, I decided to just change the priority
of the RootDirectory option definiton the default override priority so
that whenever another different option is defined, we'll get a conflict
at evaluation time.
Signed-off-by: aszlig <aszlig@nix.build>
Our more thorough parametrised tests uncovered that with the changes for
supporting DynamicUser, we now have the situation that for static users
the root directory within the confined environment is now writable for
the user in question.
This is obviously not what we want and I'd consider that a regression.
However while discussing this with @ju1m and my suggestion being to
set TemporaryFileSystem to "/" (as we had previously), they had an even
better idea[1]:
> The goal is to deny write access to / to non-root users,
>
> * TemporaryFileSystem=/ gives us that through the ownership of / by
> root (instead of the service's user inherited from
> RuntimeDirectory=).
> * ProtectSystem=strict gives us that by mounting / read-only (while
> keeping its ownership to the service's user).
>
> To avoid the incompatibilities of TemporaryFileSystem=/ mentioned
> above, I suggest to mount / read-only in all cases with
> ReadOnlyPaths = [ "+/" ]:
>
> ...
>
> I guess this would require at least two changes to the current tests:
>
> 1. to no longer expect root to be able to write to some paths (like
> /bin) (at least not without first remounting / in read-write
> mode).
> 2. to no longer expect non-root users to fail to write to certain
> paths with a "permission denied" error code, but with a
> "read-only file system" error code.
I like the solution with ReadOnlyPaths even more because it further
reduces the attack surface if the user is root. In chroot-only mode this
is especially useful, since if there are no other bind-mounted paths
involved in the unit configuration, the whole file system within the
confined environment is read-only.
[1]: https://github.com/NixOS/nixpkgs/pull/289593#discussion_r1586794215
Signed-off-by: aszlig <aszlig@nix.build>
these changes were generated with nixq 0.0.2, by running
nixq ">> lib.mdDoc[remove] Argument[keep]" --batchmode nixos/**.nix
nixq ">> mdDoc[remove] Argument[keep]" --batchmode nixos/**.nix
nixq ">> Inherit >> mdDoc[remove]" --batchmode nixos/**.nix
two mentions of the mdDoc function remain in nixos/, both of which
are inside of comments.
Since lib.mdDoc is already defined as just id, this commit is a no-op as
far as Nix (and the built manual) is concerned.
now nix-doc-munge will not introduce whitespace changes when it replaces
manpage references with the MD equivalent.
no change to the manpage, changes to the HTML manual are whitespace only.
In issue #157787 @martined wrote:
Trying to use confinement on packages providing their systemd units
with systemd.packages, for example mpd, fails with the following
error:
system-units> ln: failed to create symbolic link
'/nix/store/...-system-units/mpd.service': File exists
This is because systemd-confinement and mpd both provide a mpd.service
file through systemd.packages. (mpd got updated that way recently to
use upstream's service file)
To address this, we now place the unit file containing the bind-mounted
paths of the Nix closure into a drop-in directory instead of using the
name of a unit file directly.
This does come with the implication that the options set in the drop-in
directory won't apply if the main unit file is missing. In practice
however this should not happen for two reasons:
* The systemd-confinement module already sets additional options via
systemd.services and thus we should get a main unit file
* In the unlikely event that we don't get a main unit file regardless
of the previous point, the unit would be a no-op even if the options
of the drop-in directory would apply
Another thing to consider is the order in which those options are
merged, since systemd loads the files from the drop-in directory in
alphabetical order. So given that we have confinement.conf and
overrides.conf, the confinement options are loaded before the NixOS
overrides.
Since we're only setting the BindReadOnlyPaths option, the order isn't
that important since all those paths are merged anyway and we still
don't lose the ability to reset the option since overrides.conf comes
afterwards.
Fixes: https://github.com/NixOS/nixpkgs/issues/157787
Signed-off-by: aszlig <aszlig@nix.build>
bind mounting directories into the nix-store breaks nix commands.
In particular it introduces character devices that are not supported
by nix-store as valid files in the nix store. Use `/var/empty` instead
which is designated for these kind of use cases. We won't create any
files beause of the tmpfs mounted.
serviceConfig.ProtectSystem is usually a string so if set, the assert
itself would error out leaving no useable trace:
# nixos-rebuild switch --show-trace
building Nix...
building the system configuration...
error: while evaluating the attribute 'config.system.build.toplevel' at /nix/var/nix/profiles/per-user/root/channels/nixos/nixos/modules/system/activation/top-level.nix:293:5:
while evaluating 'foldr' at /nix/var/nix/profiles/per-user/root/channels/nixos/lib/lists.nix:52:20, called from /nix/var/nix/profiles/per-user/root/channels/nixos/nixos/modules/system/activation/top-level.nix:128:12:
while evaluating 'fold'' at /nix/var/nix/profiles/per-user/root/channels/nixos/lib/lists.nix:55:15, called from /nix/var/nix/profiles/per-user/root/channels/nixos/lib/lists.nix:59:8:
while evaluating anonymous function at /nix/var/nix/profiles/per-user/root/channels/nixos/nixos/modules/system/activation/top-level.nix:121:50, called from undefined position:
while evaluating the attribute 'assertion' at /nix/var/nix/profiles/per-user/root/channels/nixos/nixos/modules/security/systemd-confinement.nix:163:7:
value is a string while a Boolean was expected
Fix the check to give a sensible assert message instead; the attribute
should either be not set or false bool to pass.
Closes: #99000
systemd-confinement's automatic package extraction does not work correctly
if ExecStarts ExecReload etc are lists.
Add an extra flatten to make things smooth.
Fixes#96840.
Systemd ProtectSystem is incompatible with the chroot we make
for confinement. The options is redundant with what we do anyway
so warn if it had been set and advise to disable it.
Merges: https://github.com/NixOS/nixpkgs/pull/87420
So far we had MountFlags = "private", but as @Infinisil has correctly
noticed, there is a dedicated PrivateMounts option, which does exactly
that and is better integrated than providing raw mount flags.
When checking for the reason why I used MountFlags instead of
PrivateMounts, I found that at the time I wrote the initial version of
this module (Mar 12 06:15:58 2018 +0100) the PrivateMounts option didn't
exist yet and has been added to systemd in Jun 13 08:20:18 2018 +0200.
Signed-off-by: aszlig <aszlig@nix.build>
Noted by @Infinisil on IRC:
infinisil: Question regarding the confinement PR
infinisil: On line 136 you do different things depending on
RootDirectoryStartOnly
infinisil: But on line 157 you have an assertion that disallows that
option being true
infinisil: Is there a reason behind this or am I missing something
I originally left this in so that once systemd supports that, we can
just flip a switch and remove the assertion and thus support
RootDirectoryStartOnly for our confinement module.
However, this doesn't seem to be on the roadmap for systemd in the
foreseeable future, so I'll just remove this, especially because it's
very easy to add it again, once it is supported.
Signed-off-by: aszlig <aszlig@nix.build>
My implementation was relying on PrivateDevices, PrivateTmp,
PrivateUsers and others to be false by default if chroot-only mode is
used.
However there is an ongoing effort[1] to change these defaults, which
then will actually increase the attack surface in chroot-only mode,
because it is expected that there is no /dev, /sys or /proc.
If for example PrivateDevices is enabled by default, there suddenly will
be a mounted /dev in the chroot and we wouldn't detect it.
Fortunately, our tests cover that, but I'm preparing for this anyway so
that we have a smoother transition without the need to fix our
implementation again.
Thanks to @Infinisil for the heads-up.
[1]: https://github.com/NixOS/nixpkgs/issues/14645
Signed-off-by: aszlig <aszlig@nix.build>
From @edolstra at [1]:
BTW we probably should take the closure of the whole unit rather than
just the exec commands, to handle things like Environment variables.
With this commit, there is now a "fullUnit" option, which can be enabled
to include the full closure of the service unit into the chroot.
However, I did not enable this by default, because I do disagree here
and *especially* things like environment variables or environment files
shouldn't be in the closure of the chroot.
For example if you have something like:
{ pkgs, ... }:
{
systemd.services.foobar = {
serviceConfig.EnvironmentFile = ${pkgs.writeText "secrets" ''
user=admin
password=abcdefg
'';
};
}
We really do not want the *file* to end up in the chroot, but rather
just the environment variables to be exported.
Another thing is that this makes it less predictable what actually will
end up in the chroot, because we have a "globalEnvironment" option that
will get merged in as well, so users adding stuff to that option will
also make it available in confined units.
I also added a big fat warning about that in the description of the
fullUnit option.
[1]: https://github.com/NixOS/nixpkgs/pull/57519#issuecomment-472855704
Signed-off-by: aszlig <aszlig@nix.build>
Another thing requested by @edolstra in [1]:
We should not provide a different /bin/sh in the chroot, that's just
asking for confusion and random shell script breakage. It should be
the same shell (i.e. bash) as in a regular environment.
While I personally would even go as far to even have a very restricted
shell that is not even a shell and basically *only* allows "/bin/sh -c"
with only *very* minimal parsing of shell syntax, I do agree that people
expect /bin/sh to be bash (or the one configured by environment.binsh)
on NixOS.
So this should make both others and me happy in that I could just use
confinement.binSh = "${pkgs.dash}/bin/dash" for the services I confine.
[1]: https://github.com/NixOS/nixpkgs/pull/57519#issuecomment-472855704
Signed-off-by: aszlig <aszlig@nix.build>
Quoting @edolstra from [1]:
I don't really like the name "chroot", something like "confine[ment]"
or "restrict" seems better. Conceptually we're not providing a
completely different filesystem tree but a restricted view of the same
tree.
I already used "confinement" as a sub-option and I do agree that
"chroot" sounds a bit too specific (especially because not *only* chroot
is involved).
So this changes the module name and its option to use "confinement"
instead of "chroot" and also renames the "chroot.confinement" to
"confinement.mode".
[1]: https://github.com/NixOS/nixpkgs/pull/57519#issuecomment-472855704
Signed-off-by: aszlig <aszlig@nix.build>