diff options
-rw-r--r-- | blog/boolin.md | 233 |
1 files changed, 0 insertions, 233 deletions
diff --git a/blog/boolin.md b/blog/boolin.md deleted file mode 100644 index 00f118f..0000000 --- a/blog/boolin.md +++ /dev/null @@ -1,233 +0,0 @@ -<!-- - -- title = "Inclusion in Python Environment Algebra" - -- rss = "Specifying a subset of environment constraints for Python packages" - -- date = Date(2021, 11, 26) - -- tags = ["fun", "math", "packaging", "python"] - --> -# Inclusion in Python Environment Algebra - -!!! note "Note" - - If you do not want to hear me rambling about the shortcomings - of Python packaging metadata standards, feel free to skip - to section [1.c](#objective). - -\toc - -## Background - -### Motivation - -I sometimes [praise] Python wheels as the best binary packaging format -for cross-platform delivery. It truly is, at least to my limited knowledge. -Via compatibility tags, the constraints for the target environment -can be exclusively specified. Even though {{pep 425 compressed-tag-sets}} -was introduced - -> to allow for compact filenames of bdists[^bdists] -> that work with more than one compatibility tag triple, - -the filenames are not always *compact* and it is not always possible -to specify all environments a module can work on either. For instance, -there's no direct answer for one of the FAQ: - -> **What tag do I use if my distribution uses -> a feature exclusive to the newest version of Python?** - -I have seen workarounds like `cp35.cp36.cp37.cp38.cp39`, which accept -CPython from version 3.5, but not 3.10, which is the topic of the next Q&A: - -> **Why isn't there a `.` in the Python version number?** -> -> CPython has lasted 20+ years without a 3-digit major release. -> This should continue for some time. Other implementations may use `_` -> as a delimiter, since both `-` and `.` delimit the surrounding filename. - -The platform tag was not carefully standardized either, -which gives us inconsistent cyphers such as `win32`, `win_amd64` -and `manylinux2010_x86_64`.[^glibc] - -### Inspiration - -A few years later, {{pep 508 environment-markers}} came along after multiple -iterations ({{pep 345}}, {{pep 426}}) with a much more human-readable format, -which is essentially a Boolean algebra of comparisons. With this, -the aforementioned CPython constraint can be correctly expressed as - -```python -platform_python_implementation == 'CPython' -and python_version >= '3.5' -``` - -Unfortunately, they are only standardized for marking dependencies, -usually to watch out for environments where a library is either unnecessary -or unusable. But *who watches the watchers*? While it is possible -to [specify Python and platform tags when building wheels][params], the feature -is undocumented and almost unknown. Therefore, many, if not most, -platform-specific pure-Python wheels are tagged as `none` platform and all -their dependents needs to have the same redundant markers. Furthermore, -due to the exclusive nature of compatibility tags, a GNU/Linux-only -purelib wheel should be tagged with something like the following abomination: - -``` -manylinux_2_17_x86_64.manylinux_2_17_i686.manylinux_2_17_aarch64.manylinux_2_17_armv7l.manylinux_2_17_ppc64.manylinux_2_17_ppc64le.manylinux_2_17_s390x -``` - -Now imagine the filename of a wheel that only works on Unix-like systems. -`uvloop` is one and it is among the 500 [most-downloaded on PyPI][3719394] -at the time of writing. - -### Objective - -This article is, though, not going to sell you the idea of porting -environment markers to top-level `WHEEL` metadata. I should sit down -with the [PyPA] people and [discuss] it one day, but the currently defined -markers are lacking CPU and ABI information. The reason for this is that -Python's standard library does not implement a way to retrieve such information, -and for completeness, the operating system's version is not meaningfully -available either. If we want to redo this right, I highly recommend -looking into [Zig]'s [std.Target], which was designed from the start -for cross-compilation. Whatever we will come up with, it shall not be -done overnight. - -Instead, right now I need to solve a more immediate problem. For the last -few months, I have been working on [IPWHL][ipwhl], a downstream repository for -Python distributions. Since we enforce that all dependencies to be satisfied, -packages with too strict or too lax environment declaration will break -the automated tests, even though effectively the declared packages integrate -well with each other. We came to a conclusion that adding information would -improve the checks' accuracy and decided to use the [floating cheeses][ipwhl] -as a testbed for [top-level environment specifiers][proposal]. - -In short, we are going to merge the `METADATA` field `Requires-Python` -with other environmental information to a single field `environment`. -Since other information may or may not be present in the filename, -during metadata integrity check we can only check if what declared -is [included][inclusion] by what found from the original wheel. - -### Theory - -Ignoring `extra`, a Python environment can be represented as a 11-tuple of - -```python -(python_version, python_full_version, os_name, sys_platform, - platform_release, platform_system, platform_version, - platform_machine, platform_python_implementation, - implementation_name, implementation_version) -``` - -Denote $S$ as the set of all strings, then the set of all environments -$E \subset S^{11}$. Let -\[O = \{\le, <, \ne, =, \ge, >, \cong, \equiv, \in, \notin, \ni, \notni\}\] -the set of all marker expressions[^var] can be defined as -\[X = \{e \mapsto e \bigcirc s \mid \bigcirc \in O \land s \in S\}\] -and the set of all environment markers[^markers] can be recursively defined as -\[\begin{aligned} - M = \{c_\top\} &\cup \{(e_1, \ldots, e_{11}) - \mapsto m(e_1, \ldots, e_{11}) \land x(e_{\imath}) - \mid m \in M \land x \in X \land \imath \in I\}\\ - &\cup \{(e_1, \ldots, e_{11}) - \mapsto m(e_1, \ldots, e_{11}) \lor x(e_{\imath}) - \mid m \in M \land x \in X \land \imath \in I\} -\end{aligned}\] -where $\top$ is the *top* element in Boolean algebra, $c_\top: E \to \{\top\}$ -is a constant function and $I = [1, 11]$. With all Boolean expressions -converted into one of their standard forms, we have the following alternative -definitions of an environment marker -\[\begin{aligned} - (e_1, \ldots, e_{11}) - &\mapsto \bigvee_{\imath = 1}^n\bigwedge_{\jmath = 1}^{f(\imath)} - x(\imath, \jmath)\left(e_{k(\imath, \jmath)}\right)\\ - (e_1, \ldots, e_{11}) - &\mapsto \bigwedge_{\imath = 1}^n\bigvee_{\jmath = 1}^{f(\imath)} - x(\imath, \jmath)\left(e_{k(\imath, \jmath)}\right) -\end{aligned}\] -where $n \in \N$, $f: \N_1 \to \N_1$, $x: \N_1^2 \to X$, $k: \N_1^2 \to I$ -and an empty Boolean expression is implicitly $c_\top$. - -Now it is possible to formalize inclusion for markers as -\[\forall\,\{m, n\} \subset M: (m \le n) = (\forall\,e \in E: m(e) \le n(e))\] - -Thus, checking if a marker includes another is equivalent to evaluating -the following expression, for all $(e_1, \ldots, e_{11}) \in E$ -\[\begin{aligned} - r &= \bigvee_{\imath' = 1}^{n'}\bigwedge_{\jmath' = 1}^{f'(\imath')} - x'(\imath', \jmath')\left(e_{k'(\imath', \jmath')}\right) - \le \bigwedge_{\imath = 1}^n\bigvee_{\jmath = 1}^{f(\imath)} - x(\imath, \jmath)\left(e_{k(\imath, \jmath)}\right)\\ - &= \bigwedge_{\imath' = 1}^{n'}\left(\bigwedge_{\jmath' = 1}^{f'(\imath')} - x'(\imath', \jmath')\left(e_{k'(\imath', \jmath')}\right) - \le \bigwedge_{\imath = 1}^n\bigvee_{\jmath = 1}^{f(\imath)} - x(\imath, \jmath)\left(e_{k(\imath, \jmath)}\right)\right)\\ - &= \bigwedge_{\imath' = 1}^{n'}\bigwedge_{\imath = 1}^n - \left(\bigwedge_{\jmath' = 1}^{f'(\imath')} - x'(\imath', \jmath')\left(e_{k'(\imath', \jmath')}\right) - \le \bigvee_{\jmath = 1}^{f(\imath)} - x(\imath, \jmath)\left(e_{k(\imath, \jmath)}\right)\right) -\end{aligned}\] -where $r$ is whether the original marker (defined by $n$, $f$, $x$ and $k$) -includes the declared one (defined by $n'$, $f'$, $x'$ and $k'$). Let -\[\begin{aligned} - g(\imath, \ell) &= \{\jmath \mid \jmath \in [1, f(\imath)] - \land k(\imath, \jmath) = \ell\}\\ - g'(\imath, \ell) &= \{\jmath \mid \jmath \in [1, f'(\imath)] - \land k'(\imath, \jmath) = \ell\} -\end{aligned}\] -and assume $e_1, \ldots, e_{11}$ are orthogonal, we have -\[\begin{aligned} - r &= \bigwedge_{\imath' = 1}^{n'}\bigwedge_{\imath = 1}^n\bigvee_{\ell \in I} - \left(\bigwedge_{\jmath' \in g'(\imath', \ell)} - x'(\imath', \jmath')\left(e_{\ell}\right) - \le \bigvee_{\jmath \in g(\imath, \ell)} - x(\imath, \jmath)\left(e_{\ell}\right)\right)\\ - &= \bigwedge_{\imath' = 1}^{n'}\bigwedge_{\imath = 1}^n\bigvee_{\ell \in I} - \left(\bigwedge_{\jmath' \in g'(\imath', \ell)} - x'(\imath', \jmath')\left(e_{\ell}\right) - \land \lnot \bigvee_{\jmath \in g(\imath, \ell)} - x(\imath, \jmath)\left(e_{\ell}\right) = \bot\right) -\end{aligned}\] -where $\bot$ is the Boolean algebra bottom element. By [De Morgan's laws], -\[\begin{aligned} - r &= \bigwedge_{\imath' = 1}^{n'}\bigwedge_{\imath = 1}^n\bigvee_{\ell \in I} - \left(\bigwedge_{\jmath' \in g'(\imath', \ell)} - x'(\imath', \jmath')\left(e_{\ell}\right) - \land \bigwedge_{\jmath \in g(\imath, \ell)} - \lnot x(\imath, \jmath)\left(e_{\ell}\right) = \bot\right) -\end{aligned}\] - -Essentially, checking if -\[\bigwedge_{\jmath' \in g'(\imath', \ell)} - x'(\imath', \jmath')\left(e_{\ell}\right) - \land \bigwedge_{\jmath \in g(\imath, \ell)} - \lnot x(\imath, \jmath)\left(e_{\ell}\right)\] -is unsatisfiable for all $e_{\ell}$ is a [SAT problem] which we can write -a solver for. Indeed, the original expression could be formulated as a SAT, -but it would be more complex due to the multiple dimensions of environments. - -## Implementation - -### Generation - -### Conversion - -### Verification - -[^bdists]: [Built distributions][bdist], as opposed to [sdist] -[^glibc]: Bonus: try to sort `manylinux1`, `manylinux_2_24` and `manylinux2014` -[^var]: Ignoring relations between two variables or two strings -[^markers]: Or *environment specifiers* when [not marking anything][proposal] - -[praise]: https://nixnet.social/notice/A9GyFTnNnQDVZrmtUm -[bdist]: https://packaging.python.org/glossary/#term-Built-Distribution -[sdist]: https://packaging.python.org/glossary/#term-Source-Distribution-or-sdist -[params]: https://github.com/pypa/wheel/blob/0.37.0/src/wheel/bdist_wheel.py#L136-L164 -[3719394]: https://hugovk.github.io/top-pypi-packages -[PyPA]: https://pypa.io -[discuss]: https://discuss.python.org/t/7831 -[Zig]: https://ziglang.org -[std.Target]: https://ziglang.org/documentation/master/std/#std;Target -[ipwhl]: https://man.sr.ht/~cnx/ipwhl -[proposal]: https://lists.sr.ht/~cnx/ipwhl-discuss/%3CCEP1VQH719Y4.1EX12MJII0WVE%40nix%3E -[inclusion]: https://en.wikipedia.org/wiki/Inclusion_(Boolean_algebra) -[De Morgan's laws]: https://en.wikipedia.org/wiki/De_Morgan%27s_laws -[SAT problem]: https://en.wikipedia.org/wiki/Boolean_satisfiability_problem |