# Inclusion in Python Environment Algebra !!! note "Note" If you do not want to hear me rambling about the shortcomings of Python packaging metadata standards, feel free to skip to section [1.c](#objective). \toc ## Background ### Motivation I sometimes [praise] Python wheels as the best binary packaging format for cross-platform delivery. It truly is, at least to my limited knowledge. Via compatibility tags, the constraints for the target environment can be exclusively specified. Even though {{pep 425 compressed-tag-sets}} was introduced > to allow for compact filenames of bdists[^bdists] > that work with more than one compatibility tag triple, the filenames are not always *compact* and it is not always possible to specify all environments a module can work on either. For instance, there's no direct answer for one of the FAQ: > **What tag do I use if my distribution uses > a feature exclusive to the newest version of Python?** I have seen workarounds like `cp35.cp36.cp37.cp38.cp39`, which accept CPython from version 3.5, but not 3.10, which is the topic of the next Q&A: > **Why isn't there a `.` in the Python version number?** > > CPython has lasted 20+ years without a 3-digit major release. > This should continue for some time. Other implementations may use `_` > as a delimiter, since both `-` and `.` delimit the surrounding filename. The platform tag was not carefully standardized either, which gives us inconsistent cyphers such as `win32`, `win_amd64` and `manylinux2010_x86_64`.[^glibc] ### Inspiration A few years later, {{pep 508 environment-markers}} came along after multiple iterations ({{pep 345}}, {{pep 426}}) with a much more human-readable format, which is essentially a Boolean algebra of comparisons. With this, the aforementioned CPython constraint can be correctly expressed as ```python platform_python_implementation == 'CPython' and python_version >= '3.5' ``` Unfortunately, they are only standardized for marking dependencies, usually to watch out for environments where a library is either unnecessary or unusable. But *who watches the watchers*? While it is possible to [specify Python and platform tags when building wheels][params], the feature is undocumented and almost unknown. Therefore, many, if not most, platform-specific pure-Python wheels are tagged as `none` platform and all their dependents needs to have the same redundant markers. Furthermore, due to the exclusive nature of compatibility tags, a GNU/Linux-only purelib wheel should be tagged with something like the following abomination: ``` manylinux_2_17_x86_64.manylinux_2_17_i686.manylinux_2_17_aarch64.manylinux_2_17_armv7l.manylinux_2_17_ppc64.manylinux_2_17_ppc64le.manylinux_2_17_s390x ``` Now imagine the filename of a wheel that only works on Unix-like systems. `uvloop` is one and it is among the 500 [most-downloaded on PyPI][3719394] at the time of writing. ### Objective This article is, though, not going to sell you the idea of porting environment markers to top-level `WHEEL` metadata. I should sit down with the [PyPA] people and [discuss] it one day, but the currently defined markers are lacking CPU and ABI information. The reason for this is that Python's standard library does not implement a way to retrieve such information, and for completeness, the operating system's version is not meaningfully available either. If we want to redo this right, I highly recommend looking into [Zig]'s [std.Target], which was designed from the start for cross-compilation. Whatever we will come up with, it shall not be done overnight. Instead, right now I need to solve a more immediate problem. For the last few months, I have been working on [IPWHL][ipwhl], a downstream repository for Python distributions. Since we enforce that all dependencies to be satisfied, packages with too strict or too lax environment declaration will break the automated tests, even though effectively the declared packages integrate well with each other. We came to a conclusion that adding information would improve the checks' accuracy and decided to use the [floating cheeses][ipwhl] as a testbed for [top-level environment specifiers][proposal]. In short, we are going to merge the `METADATA` field `Requires-Python` with other environmental information to a single field `environment`. Since other information may or may not be present in the filename, during metadata integrity check we can only check if what declared is [included][inclusion] by what found from the original wheel. ### Theory Ignoring `extra`, a Python environment can be represented as a 11-tuple of ```python (python_version, python_full_version, os_name, sys_platform, platform_release, platform_system, platform_version, platform_machine, platform_python_implementation, implementation_name, implementation_version) ``` Denote $S$ as the set of all strings, then the set of all environments $E \subset S^{11}$. Let \[O = \{\le, <, \ne, =, \ge, >, \cong, \equiv, \in, \notin, \ni, \notni\}\] the set of all marker expressions[^var] can be defined as \[X = \{e \mapsto e \bigcirc s \mid \bigcirc \in O \land s \in S\}\] and the set of all environment markers[^markers] can be recursively defined as \[\begin{aligned} M = \{c_\top\} &\cup \{(e_1, \ldots, e_{11}) \mapsto m(e_1, \ldots, e_{11}) \land x(e_{\imath}) \mid m \in M \land x \in X \land \imath \in I\}\\ &\cup \{(e_1, \ldots, e_{11}) \mapsto m(e_1, \ldots, e_{11}) \lor x(e_{\imath}) \mid m \in M \land x \in X \land \imath \in I\} \end{aligned}\] where $\top$ is the *top* element in Boolean algebra, $c_\top: E \to \{\top\}$ is a constant function and $I = [1, 11]$. With all Boolean expressions converted into one of their standard forms, we have the following alternative definitions of an environment marker \[\begin{aligned} (e_1, \ldots, e_{11}) &\mapsto \bigvee_{\imath = 1}^n\bigwedge_{\jmath = 1}^{f(\imath)} x(\imath, \jmath)\left(e_{k(\imath, \jmath)}\right)\\ (e_1, \ldots, e_{11}) &\mapsto \bigwedge_{\imath = 1}^n\bigvee_{\jmath = 1}^{f(\imath)} x(\imath, \jmath)\left(e_{k(\imath, \jmath)}\right) \end{aligned}\] where $n \in \N$, $f: \N_1 \to \N_1$, $x: \N_1^2 \to X$, $k: \N_1^2 \to I$ and an empty Boolean expression is implicitly $c_\top$. Now it is possible to formalize inclusion for markers as \[\forall\,\{m, n\} \subset M: (m \le n) = (\forall\,e \in E: m(e) \le n(e))\] Thus, checking if a marker includes another is equivalent to evaluating the following expression, for all $(e_1, \ldots, e_{11}) \in E$ \[\begin{aligned} r &= \bigvee_{\imath' = 1}^{n'}\bigwedge_{\jmath' = 1}^{f'(\imath')} x'(\imath', \jmath')\left(e_{k'(\imath', \jmath')}\right) \le \bigwedge_{\imath = 1}^n\bigvee_{\jmath = 1}^{f(\imath)} x(\imath, \jmath)\left(e_{k(\imath, \jmath)}\right)\\ &= \bigwedge_{\imath' = 1}^{n'}\left(\bigwedge_{\jmath' = 1}^{f'(\imath')} x'(\imath', \jmath')\left(e_{k'(\imath', \jmath')}\right) \le \bigwedge_{\imath = 1}^n\bigvee_{\jmath = 1}^{f(\imath)} x(\imath, \jmath)\left(e_{k(\imath, \jmath)}\right)\right)\\ &= \bigwedge_{\imath' = 1}^{n'}\bigwedge_{\imath = 1}^n \left(\bigwedge_{\jmath' = 1}^{f'(\imath')} x'(\imath', \jmath')\left(e_{k'(\imath', \jmath')}\right) \le \bigvee_{\jmath = 1}^{f(\imath)} x(\imath, \jmath)\left(e_{k(\imath, \jmath)}\right)\right) \end{aligned}\] where $r$ is whether the original marker (defined by $n$, $f$, $x$ and $k$) includes the declared one (defined by $n'$, $f'$, $x'$ and $k'$). Let \[\begin{aligned} g(\imath, \ell) &= \{\jmath \mid \jmath \in [1, f(\imath)] \land k(\imath, \jmath) = \ell\}\\ g'(\imath, \ell) &= \{\jmath \mid \jmath \in [1, f'(\imath)] \land k'(\imath, \jmath) = \ell\} \end{aligned}\] and assume $e_1, \ldots, e_{11}$ are orthogonal, we have \[\begin{aligned} r &= \bigwedge_{\imath' = 1}^{n'}\bigwedge_{\imath = 1}^n\bigvee_{\ell \in I} \left(\bigwedge_{\jmath' \in g'(\imath', \ell)} x'(\imath', \jmath')\left(e_{\ell}\right) \le \bigvee_{\jmath \in g(\imath, \ell)} x(\imath, \jmath)\left(e_{\ell}\right)\right)\\ &= \bigwedge_{\imath' = 1}^{n'}\bigwedge_{\imath = 1}^n\bigvee_{\ell \in I} \left(\bigwedge_{\jmath' \in g'(\imath', \ell)} x'(\imath', \jmath')\left(e_{\ell}\right) \land \lnot \bigvee_{\jmath \in g(\imath, \ell)} x(\imath, \jmath)\left(e_{\ell}\right) = \bot\right) \end{aligned}\] where $\bot$ is the Boolean algebra bottom element. By [De Morgan's laws], \[\begin{aligned} r &= \bigwedge_{\imath' = 1}^{n'}\bigwedge_{\imath = 1}^n\bigvee_{\ell \in I} \left(\bigwedge_{\jmath' \in g'(\imath', \ell)} x'(\imath', \jmath')\left(e_{\ell}\right) \land \bigwedge_{\jmath \in g(\imath, \ell)} \lnot x(\imath, \jmath)\left(e_{\ell}\right) = \bot\right) \end{aligned}\] Essentially, checking if \[\bigwedge_{\jmath' \in g'(\imath', \ell)} x'(\imath', \jmath')\left(e_{\ell}\right) \land \bigwedge_{\jmath \in g(\imath, \ell)} \lnot x(\imath, \jmath)\left(e_{\ell}\right)\] is unsatisfiable for all $e_{\ell}$ is a [SAT problem] which we can write a solver for. Indeed, the original expression could be formulated as a SAT, but it would be more complex due to the multiple dimensions of environments. ## Implementation ### Generation ### Conversion ### Verification [^bdists]: [Built distributions][bdist], as opposed to [sdist] [^glibc]: Bonus: try to sort `manylinux1`, `manylinux_2_24` and `manylinux2014` [^var]: Ignoring relations between two variables or two strings [^markers]: Or *environment specifiers* when [not marking anything][proposal] [praise]: https://nixnet.social/notice/A9GyFTnNnQDVZrmtUm [bdist]: https://packaging.python.org/glossary/#term-Built-Distribution [sdist]: https://packaging.python.org/glossary/#term-Source-Distribution-or-sdist [params]: https://github.com/pypa/wheel/blob/0.37.0/src/wheel/bdist_wheel.py#L136-L164 [3719394]: https://hugovk.github.io/top-pypi-packages [PyPA]: https://pypa.io [discuss]: https://discuss.python.org/t/7831 [Zig]: https://ziglang.org [std.Target]: https://ziglang.org/documentation/master/std/#std;Target [ipwhl]: https://man.sr.ht/~cnx/ipwhl [proposal]: https://lists.sr.ht/~cnx/ipwhl-discuss/%3CCEP1VQH719Y4.1EX12MJII0WVE%40nix%3E [inclusion]: https://en.wikipedia.org/wiki/Inclusion_(Boolean_algebra) [De Morgan's laws]: https://en.wikipedia.org/wiki/De_Morgan%27s_laws [SAT problem]: https://en.wikipedia.org/wiki/Boolean_satisfiability_problem