summary refs log tree commit diff
path: root/doc
diff options
context:
space:
mode:
authorLudovic Courtès <ludovic.courtes@inria.fr>2023-01-05 12:39:06 +0100
committerLudovic Courtès <ludo@gnu.org>2023-01-09 17:40:53 +0100
commit47c1de22df30aa6c4a2f0d8249be93c3c7bc0022 (patch)
treed29aea108dc49edb7e02b07797210d22746c3398 /doc
parent8b314efd50742a37fc157f623148d3e4e587c09d (diff)
downloadguix-47c1de22df30aa6c4a2f0d8249be93c3c7bc0022.tar.gz
doc: cookbook: Add "Installing Guix on a Cluster" chapter.
This is derived from the article at
<https://hpc.guix.info/blog/2017/11/installing-guix-on-a-cluster/>, with
clarifications and updates.

* doc/guix-cookbook.texi (Installing Guix on a Cluster): New chapter.
Diffstat (limited to 'doc')
-rw-r--r--doc/guix-cookbook.texi433
1 files changed, 414 insertions, 19 deletions
diff --git a/doc/guix-cookbook.texi b/doc/guix-cookbook.texi
index bbbd1cde67..b9fb916f4a 100644
--- a/doc/guix-cookbook.texi
+++ b/doc/guix-cookbook.texi
@@ -21,7 +21,8 @@ Copyright @copyright{} 2020 Brice Waegeneire@*
 Copyright @copyright{} 2020 André Batista@*
 Copyright @copyright{} 2020 Christine Lemmer-Webber@*
 Copyright @copyright{} 2021 Joshua Branson@*
-Copyright @copyright{} 2022 Maxim Cournoyer*
+Copyright @copyright{} 2022 Maxim Cournoyer@*
+Copyright @copyright{} 2023 Ludovic Courtès
 
 Permission is granted to copy, distribute and/or modify this document
 under the terms of the GNU Free Documentation License, Version 1.3 or
@@ -73,8 +74,9 @@ Weblate} (@pxref{Translating Guix,,, guix, GNU Guix reference manual}).
 * Packaging::                   Packaging tutorials
 * System Configuration::        Customizing the GNU System
 * Containers::                  Isolated environments and nested systems
-* Advanced package management:: Power to the users!
+* Advanced package management::  Power to the users!
 * Environment management::      Control environment
+* Installing Guix on a Cluster:: High-performance computing.
 
 * Acknowledgments::             Thanks!
 * GNU Free Documentation License::  The license of this document.
@@ -83,28 +85,45 @@ Weblate} (@pxref{Translating Guix,,, guix, GNU Guix reference manual}).
 @detailmenu
  --- The Detailed Node Listing ---
 
-Scheme tutorials
-
-* A Scheme Crash Course::       Learn the basics of Scheme
-
 Packaging
 
-* Packaging Tutorial::          Let's add a package to Guix!
+* Packaging Tutorial::         A tutorial on how to add packages to Guix.
 
 System Configuration
 
-* Auto-Login to a Specific TTY::    Automatically Login a User to a Specific TTY
-* Customizing the Kernel::          Creating and using a custom Linux kernel on Guix System.
-* Guix System Image API::           Customizing images to target specific platforms.
-* Using security keys::             How to use security keys with Guix System.
-* Connecting to Wireguard VPN::     Connecting to a Wireguard VPN.
-* Customizing a Window Manager::    Handle customization of a Window manager on Guix System.
-* Running Guix on a Linode Server:: Running Guix on a Linode Server.  Running Guix on a Linode Server
-* Setting up a bind mount::         Setting up a bind mount in the file-systems definition.
-* Getting substitutes from Tor::    Configuring Guix daemon to get substitutes through Tor.
-* Setting up NGINX with Lua::       Configuring NGINX web-server to load Lua modules.
+* Auto-Login to a Specific TTY:: Automatically Login a User to a Specific TTY
+* Customizing the Kernel::       Creating and using a custom Linux kernel on Guix System.
+* Guix System Image API::        Customizing images to target specific platforms.
+* Using security keys::          How to use security keys with Guix System.
+* Connecting to Wireguard VPN::  Connecting to a Wireguard VPN.
+* Customizing a Window Manager:: Handle customization of a Window manager on Guix System.
+* Running Guix on a Linode Server:: Running Guix on a Linode Server
+* Setting up a bind mount:: Setting up a bind mount in the file-systems definition.
+* Getting substitutes from Tor:: Configuring Guix daemon to get substitutes through Tor.
+* Setting up NGINX with Lua:: Configuring NGINX web-server to load Lua modules.
 * Music Server with Bluetooth Audio:: Headless music player with Bluetooth output.
 
+Containers
+
+* Guix Containers::            Perfectly isolated environments
+* Guix System Containers::     A system inside your system
+
+Advanced package management
+
+* Guix Profiles in Practice::     Strategies for multiple profiles and manifests.
+
+Environment management
+
+* Guix environment via direnv:: Setup Guix environment with direnv
+
+Installing Guix on a Cluster
+
+* Setting Up a Head Node::      The node that runs the daemon.
+* Setting Up Compute Nodes::    Client nodes.
+* Cluster Network Access::      Dealing with network access restrictions.
+* Cluster Disk Usage::          Disk usage considerations.
+* Cluster Security Considerations::  Keeping the cluster secure.
+
 @end detailmenu
 @end menu
 
@@ -3635,6 +3654,380 @@ will have predefined environment variables and procedures.
 
 Run @command{direnv allow} to setup the environment for the first time.
 
+
+@c *********************************************************************
+@node Installing Guix on a Cluster
+@chapter Installing Guix on a Cluster
+
+@cindex cluster installation
+@cindex high-performance computing, HPC
+@cindex HPC, high-performance computing
+Guix is appealing to scientists and @acronym{HPC, high-performance
+computing} practitioners: it makes it easy to deploy potentially complex
+software stacks, and it lets you do so in a reproducible fashion---you
+can redeploy the exact same software on different machines and at
+different points in time.
+
+In this chapter we look at how a cluster sysadmin can install Guix for
+system-wide use, such that it can be used on all the cluster nodes, and
+discuss the various tradeoffs@footnote{This chapter is adapted from a
+@uref{https://hpc.guix.info/blog/2017/11/installing-guix-on-a-cluster/,
+blog post published on the Guix-HPC web site in 2017}.}.
+
+@quotation Note
+Here we assume that the cluster is running a GNU/Linux distro other than
+Guix System and that we are going to install Guix on top of it.
+@end quotation
+
+@menu
+* Setting Up a Head Node::      The node that runs the daemon.
+* Setting Up Compute Nodes::    Client nodes.
+* Cluster Network Access::      Dealing with network access restrictions.
+* Cluster Disk Usage::          Disk usage considerations.
+* Cluster Security Considerations::  Keeping the cluster secure.
+@end menu
+
+@node Setting Up a Head Node
+@section Setting Up a Head Node
+
+The recommended approach is to set up one @emph{head node} running
+@command{guix-daemon} and exporting @file{/gnu/store} over NFS to
+compute nodes.
+
+Remember that @command{guix-daemon} is responsible for spawning build
+processes and downloads on behalf of clients (@pxref{Invoking
+guix-daemon,,, guix, GNU Guix Reference Manual}), and more generally
+accessing @file{/gnu/store}, which contains all the package binaries
+built by all the users (@pxref{The Store,,, guix, GNU Guix Reference
+Manual}).  ``Client'' here refers to all the Guix commands that users
+see, such as @code{guix install}.  On a cluster, these commands may be
+running on the compute nodes and we'll want them to talk to the head
+node's @code{guix-daemon} instance.
+
+To begin with, the head node can be installed following the usual binary
+installation instructions (@pxref{Binary Installation,,, guix, GNU Guix
+Reference Manual}).  Thanks to the installation script, this should be
+quick.  Once installation is complete, we need to make some adjustments.
+
+Since we want @code{guix-daemon} to be reachable not just from the head
+node but also from the compute nodes, we need to arrange so that it
+listens for connections over TCP/IP.  To do that, we'll edit the systemd
+startup file for @command{guix-daemon},
+@file{/etc/systemd/system/guix-daemon.service}, and add a
+@code{--listen} argument to the @code{ExecStart} line so that it looks
+something like this:
+
+@example
+ExecStart=/var/guix/profiles/per-user/root/current-guix/bin/guix-daemon --build-users-group=guixbuild --listen=/var/guix/daemon-socket/socket --listen=0.0.0.0
+@end example
+
+For these changes to take effect, the service needs to be restarted:
+
+@example
+systemctl daemon-reload
+systemctl restart guix-daemon
+@end example
+
+@quotation Note
+The @code{--listen=0.0.0.0} bit means that @code{guix-daemon} will
+process @emph{all} incoming TCP connections on port 44146
+(@pxref{Invoking guix-daemon,,, guix, GNU Guix Reference Manual}). This
+is usually fine in a cluster setup where the head node is reachable
+exclusively from the cluster's local area network---you don't want that
+to be exposed to the Internet!
+@end quotation
+
+The next step is to define our NFS exports in
+@uref{https://linux.die.net/man/5/exports,@file{/etc/exports}} by adding
+something along these lines:
+
+@example
+/gnu/store    *(ro)
+/var/guix     *(rw, async)
+/var/log/guix *(ro)
+@end example
+
+The @file{/gnu/store} directory can be exported read-only since only
+@command{guix-daemon} on the master node will ever modify it.
+@file{/var/guix} contains @emph{user profiles} as managed by @code{guix
+package}; thus, to allow users to install packages with @code{guix
+package}, this must be read-write.
+
+Users can create as many profiles as they like in addition to the
+default profile, @file{~/.guix-profile}.  For instance, @code{guix
+package -p ~/dev/python-dev -i python} installs Python in a profile
+reachable from the @code{~/dev/python-dev} symlink.  To make sure that
+this profile is protected from garbage collection---i.e., that Python
+will not be removed from @file{/gnu/store} while this profile exists---,
+@emph{home directories should be mounted on the head node} as well so
+that @code{guix-daemon} knows about these non-standard profiles and
+avoids collecting software they refer to.
+
+It may be a good idea to periodically remove unused bits from
+@file{/gnu/store} by running @command{guix gc} (@pxref{Invoking guix
+gc,,, guix, GNU Guix Reference Manual}).  This can be done by adding a
+crontab entry on the head node:
+
+@example
+root@@master# crontab -e
+@end example
+
+@noindent
+... with something like this:
+
+@example
+# Every day at 5AM, run the garbage collector to make sure
+# at least 10 GB are free on /gnu/store.
+0 5 * * 1  /usr/local/bin/guix gc -F10G
+@end example
+
+We're done with the head node! Let's look at compute nodes now.
+
+@node Setting Up Compute Nodes
+@section Setting Up Compute Nodes
+
+First of all, we need compute nodes to mount those NFS directories that
+the head node exports.  This can be done by adding the following lines
+to @uref{https://linux.die.net/man/5/fstab,@file{/etc/fstab}}:
+
+@example
+@var{head-node}:/gnu/store    /gnu/store    nfs  defaults,_netdev,vers=3 0 0
+@var{head-node}:/var/guix     /var/guix     nfs  defaults,_netdev,vers=3 0 0
+@var{head-node}:/var/log/guix /var/log/guix nfs  defaults,_netdev,vers=3 0 0
+@end example
+
+@noindent
+... where @var{head-node} is the name or IP address of your head node.
+From there on, assuming the mount points exist, you should be able to
+mount each of these on the compute nodes.
+
+Next, we need to provide a default @command{guix} command that users can
+run when they first connect to the cluster (eventually they will invoke
+@command{guix pull}, which will provide them with their ``own''
+@command{guix} command).  Similar to what the binary installation script
+did on the head node, we'll store that in @file{/usr/local/bin}:
+
+@example
+mkdir -p /usr/local/bin
+ln -s /var/guix/profiles/per-user/root/current-guix/bin/guix \
+      /usr/local/bin/guix
+@end example
+
+We then need to tell @code{guix} to talk to the daemon running on our
+master node, by adding these lines to @code{/etc/profile}:
+
+@example
+GUIX_DAEMON_SOCKET="guix://@var{head-node}"
+export GUIX_DAEMON_SOCKET
+@end example
+
+To avoid warnings and make sure @code{guix} uses the right locale, we
+need to tell it to use locale data provided by Guix (@pxref{Application
+Setup,,, guix, GNU Guix Reference Manual}):
+
+@example
+GUIX_LOCPATH=/var/guix/profiles/per-user/root/guix-profile/lib/locale
+export GUIX_LOCPATH
+
+# Here we must use a valid locale name.  Try "ls $GUIX_LOCPATH/*"
+# to see what names can be used.
+LC_ALL=fr_FR.utf8
+export LC_ALL
+@end example
+
+For convenience, @code{guix package} automatically generates
+@file{~/.guix-profile/etc/profile}, which defines all the environment
+variables necessary to use the packages---@code{PATH},
+@code{C_INCLUDE_PATH}, @code{PYTHONPATH}, etc.  Thus it's a good idea to
+source it from @code{/etc/profile}:
+
+@example
+GUIX_PROFILE="$HOME/.guix-profile"
+if [ -f "$GUIX_PROFILE/etc/profile" ]; then
+  . "$GUIX_PROFILE/etc/profile"
+fi
+@end example
+
+Last but not least, Guix provides command-line completion notably for
+Bash and zsh.  In @code{/etc/bashrc}, consider adding this line:
+
+@verbatim
+. /var/guix/profiles/per-user/root/current-guix/etc/bash_completion.d/guix
+@end verbatim
+
+Voilà!
+
+You can check that everything's in place by logging in on a compute node
+and running:
+
+@example
+guix install hello
+@end example
+
+The daemon on the head node should download pre-built binaries on your
+behalf and unpack them in @file{/gnu/store}, and @command{guix install}
+should create @file{~/.guix-profile} containing the
+@file{~/.guix-profile/bin/hello} command.
+
+@node Cluster Network Access
+@section Network Access
+
+Guix requires network access to download source code and pre-built
+binaries.  The good news is that only the head node needs that since
+compute nodes simply delegate to it.
+
+It is customary for cluster nodes to have access at best to a
+@emph{white list} of hosts.  Our head node needs at least
+@code{ci.guix.gnu.org} in this white list since this is where it gets
+pre-built binaries from by default, for all the packages that are in
+Guix proper.
+
+Incidentally, @code{ci.guix.gnu.org} also serves as a
+@emph{content-addressed mirror} of the source code of those packages.
+Consequently, it is sufficient to have @emph{only}
+@code{ci.guix.gnu.org} in that white list.
+
+Software packages maintained in a separate repository such as one of the
+various @uref{https://hpc.guix.info/channels, HPC channels} are of
+course unavailable from @code{ci.guix.gnu.org}.  For these packages, you
+may want to extend the white list such that source and pre-built
+binaries (assuming this-party servers provide binaries for these
+packages) can be downloaded.  As a last resort, users can always
+download source on their workstation and add it to the cluster's
+@file{/gnu/store}, like this:
+
+@verbatim
+GUIX_DAEMON_SOCKET=ssh://compute-node.example.org \
+  guix download http://starpu.gforge.inria.fr/files/starpu-1.2.3/starpu-1.2.3.tar.gz
+@end verbatim
+
+The above command downloads @code{starpu-1.2.3.tar.gz} @emph{and} sends
+it to the cluster's @code{guix-daemon} instance over SSH.
+
+Air-gapped clusters require more work.  At the moment, our suggestion
+would be to download all the necessary source code on a workstation
+running Guix.  For instance, using the @option{--sources} option of
+@command{guix build} (@pxref{Invoking guix build,,, guix, GNU Guix
+Reference Manual}), the example below downloads all the source code the
+@code{openmpi} package depends on:
+
+@example
+$ guix build --sources=transitive openmpi
+
+@dots{}
+
+/gnu/store/xc17sm60fb8nxadc4qy0c7rqph499z8s-openmpi-1.10.7.tar.bz2
+/gnu/store/s67jx92lpipy2nfj5cz818xv430n4b7w-gcc-5.4.0.tar.xz
+/gnu/store/npw9qh8a46lrxiwh9xwk0wpi3jlzmjnh-gmp-6.0.0a.tar.xz
+/gnu/store/hcz0f4wkdbsvsdky3c0vdvcawhdkyldb-mpfr-3.1.5.tar.xz
+/gnu/store/y9akh452n3p4w2v631nj0injx7y0d68x-mpc-1.0.3.tar.gz
+/gnu/store/6g5c35q8avfnzs3v14dzl54cmrvddjm2-glibc-2.25.tar.xz
+/gnu/store/p9k48dk3dvvk7gads7fk30xc2pxsd66z-hwloc-1.11.8.tar.bz2
+/gnu/store/cry9lqidwfrfmgl0x389cs3syr15p13q-gcc-5.4.0.tar.xz
+/gnu/store/7ak0v3rzpqm2c5q1mp3v7cj0rxz0qakf-libfabric-1.4.1.tar.bz2
+/gnu/store/vh8syjrsilnbfcf582qhmvpg1v3rampf-rdma-core-14.tar.gz
+…
+@end example
+
+(In case you're wondering, that's more than 320@ MiB of
+@emph{compressed} source code.)
+
+We can then make a big archive containing all of this (@pxref{Invoking
+guix archive,,, guix, GNU Guix Reference Manual}):
+
+@verbatim
+$ guix archive --export \
+    `guix build --sources=transitive openmpi` \
+    > openmpi-source-code.nar
+@end verbatim
+
+@dots{} and we can eventually transfer that archive to the cluster on
+removable storage and unpack it there:
+
+@verbatim
+$ guix archive --import < openmpi-source-code.nar
+@end verbatim
+
+This process has to be repeated every time new source code needs to be
+brought to the cluster.
+
+As we write this, the research institutes involved in Guix-HPC do not
+have air-gapped clusters though. If you have experience with such
+setups, we would like to hear feedback and suggestions.
+
+@node Cluster Disk Usage
+@section Disk Usage
+
+@cindex disk usage, on a cluster
+A common concern of sysadmins' is whether this is all going to eat a lot
+of disk space.  If anything, if something is going to exhaust disk
+space, it's going to be scientific data sets rather than compiled
+software---that's our experience with almost ten years of Guix usage on
+HPC clusters.  Nevertheless, it's worth taking a look at how Guix
+contributes to disk usage.
+
+First, having several versions or variants of a given package in
+@file{/gnu/store} does not necessarily cost much, because
+@command{guix-daemon} implements deduplication of identical files, and
+package variants are likely to have a number of common files.
+
+As mentioned above, we recommend having a cron job to run @code{guix gc}
+periodically, which removes @emph{unused} software from
+@file{/gnu/store}. However, there's always a possibility that users will
+keep lots of software in their profiles, or lots of old generations of
+their profiles, which is ``live'' and cannot be deleted from the
+viewpoint of @command{guix gc}.
+
+The solution to this is for users to regularly remove old generations of
+their profile. For instance, the following command removes generations
+that are more than two-month old:
+
+@example
+guix package --delete-generations=2m
+@end example
+
+Likewise, it's a good idea to invite users to regularly upgrade their
+profile, which can reduce the number of variants of a given piece of
+software stored in @file{/gnu/store}:
+
+@example
+guix pull
+guix upgrade
+@end example
+
+As a last resort, it is always possible for sysadmins to do some of this
+on behalf of their users. Nevertheless, one of the strengths of Guix is
+the freedom and control users get on their software environment, so we
+strongly recommend leaving users in control.
+
+@node Cluster Security Considerations
+@section Security Considerations
+
+@cindex security, on a cluster
+On an HPC cluster, Guix is typically used to manage scientific software.
+Security-critical software such as the operating system kernel and
+system services such as @code{sshd} and the batch scheduler remain under
+control of sysadmins.
+
+The Guix project has a good track record delivering security updates in
+a timely fashion (@pxref{Security Updates,,, guix, GNU Guix Reference
+Manual}).  To get security updates, users have to run @code{guix pull &&
+guix upgrade}.
+
+Because Guix uniquely identifies software variants, it is easy to see if
+a vulnerable piece of software is in use. For instance, to check whether
+the glibc@ 2.25 variant without the mitigation patch against
+``@uref{https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt,Stack
+Clash}'', one can check whether user profiles refer to it at all:
+
+@example
+guix gc --referrers /gnu/store/…-glibc-2.25
+@end example
+
+This will report whether profiles exist that refer to this specific
+glibc variant.
+
+
 @c *********************************************************************
 @node Acknowledgments
 @chapter Acknowledgments
@@ -3656,8 +4049,10 @@ information on these fine people.  The @file{THANKS} file lists people
 who have helped by reporting bugs, taking care of the infrastructure,
 providing artwork and themes, making suggestions, and more---thank you!
 
-This document includes adapted sections from articles that have previously
-been published on the Guix blog at @uref{https://guix.gnu.org/blog}.
+This document includes adapted sections from articles that have
+previously been published on the Guix blog at
+@uref{https://guix.gnu.org/blog} and on the Guix-HPC blog at
+@uref{https://hpc.guix.info/blog}.
 
 
 @c *********************************************************************