about summary refs log tree commit diff homepage
path: root/blog
diff options
context:
space:
mode:
Diffstat (limited to 'blog')
-rw-r--r--blog/boolin.md233
1 files changed, 0 insertions, 233 deletions
diff --git a/blog/boolin.md b/blog/boolin.md
deleted file mode 100644
index 00f118f..0000000
--- a/blog/boolin.md
+++ /dev/null
@@ -1,233 +0,0 @@
-<!--
-  -- title = "Inclusion in Python Environment Algebra"
-  -- rss = "Specifying a subset of environment constraints for Python packages"
-  -- date = Date(2021, 11, 26)
-  -- tags = ["fun", "math", "packaging", "python"]
-  -->
-# Inclusion in Python Environment Algebra
-
-!!! note "Note"
-
-    If you do not want to hear me rambling about the shortcomings
-    of Python packaging metadata standards, feel free to skip
-    to section [1.c](#objective).
-
-\toc
-
-## Background
-
-### Motivation
-
-I sometimes [praise] Python wheels as the best binary packaging format
-for cross-platform delivery.  It truly is, at least to my limited knowledge.
-Via compatibility tags, the constraints for the target environment
-can be exclusively specified.  Even though {{pep 425 compressed-tag-sets}}
-was introduced
-
-> to allow for compact filenames of bdists[^bdists]
-> that work with more than one compatibility tag triple,
-
-the filenames are not always *compact* and it is not always possible
-to specify all environments a module can work on either.  For instance,
-there's no direct answer for one of the FAQ:
-
-> **What tag do I use if my distribution uses
-> a feature exclusive to the newest version of Python?**
-
-I have seen workarounds like `cp35.cp36.cp37.cp38.cp39`, which accept
-CPython from version 3.5, but not 3.10, which is the topic of the next Q&A:
-
-> **Why isn't there a `.` in the Python version number?**
->
-> CPython has lasted 20+ years without a 3-digit major release.
-> This should continue for some time.  Other implementations may use `_`
-> as a delimiter, since both `-` and `.` delimit the surrounding filename.
-
-The platform tag was not carefully standardized either,
-which gives us inconsistent cyphers such as `win32`, `win_amd64`
-and `manylinux2010_x86_64`.[^glibc]
-
-### Inspiration
-
-A few years later, {{pep 508 environment-markers}} came along after multiple
-iterations ({{pep 345}}, {{pep 426}}) with a much more human-readable format,
-which is essentially a Boolean algebra of comparisons.  With this,
-the aforementioned CPython constraint can be correctly expressed as
-
-```python
-platform_python_implementation == 'CPython'
-and python_version >= '3.5'
-```
-
-Unfortunately, they are only standardized for marking dependencies,
-usually to watch out for environments where a library is either unnecessary
-or unusable.  But *who watches the watchers*?  While it is possible
-to [specify Python and platform tags when building wheels][params], the feature
-is undocumented and almost unknown.  Therefore, many, if not most,
-platform-specific pure-Python wheels are tagged as `none` platform and all
-their dependents needs to have the same redundant markers.  Furthermore,
-due to the exclusive nature of compatibility tags, a GNU/Linux-only
-purelib wheel should be tagged with something like the following abomination:
-
-```
-manylinux_2_17_x86_64.manylinux_2_17_i686.manylinux_2_17_aarch64.manylinux_2_17_armv7l.manylinux_2_17_ppc64.manylinux_2_17_ppc64le.manylinux_2_17_s390x
-```
-
-Now imagine the filename of a wheel that only works on Unix-like systems.
-`uvloop` is one and it is among the 500 [most-downloaded on PyPI][3719394]
-at the time of writing.
-
-### Objective
-
-This article is, though, not going to sell you the idea of porting
-environment markers to top-level `WHEEL` metadata.  I should sit down
-with the [PyPA] people and [discuss] it one day, but the currently defined
-markers are lacking CPU and ABI information.  The reason for this is that
-Python's standard library does not implement a way to retrieve such information,
-and for completeness, the operating system's version is not meaningfully
-available either.  If we want to redo this right, I highly recommend
-looking into [Zig]'s [std.Target], which was designed from the start
-for cross-compilation.  Whatever we will come up with, it shall not be
-done overnight.
-
-Instead, right now I need to solve a more immediate problem.  For the last
-few months, I have been working on [IPWHL][ipwhl], a downstream repository for
-Python distributions.  Since we enforce that all dependencies to be satisfied,
-packages with too strict or too lax environment declaration will break
-the automated tests, even though effectively the declared packages integrate
-well with each other.  We came to a conclusion that adding information would
-improve the checks' accuracy and decided to use the [floating cheeses][ipwhl]
-as a testbed for [top-level environment specifiers][proposal].
-
-In short, we are going to merge the `METADATA` field `Requires-Python`
-with other environmental information to a single field `environment`.
-Since other information may or may not be present in the filename,
-during metadata integrity check we can only check if what declared
-is [included][inclusion] by what found from the original wheel.
-
-### Theory
-
-Ignoring `extra`, a Python environment can be represented as a 11-tuple of
-
-```python
-(python_version, python_full_version, os_name, sys_platform,
- platform_release, platform_system, platform_version,
- platform_machine, platform_python_implementation,
- implementation_name, implementation_version)
-```
-
-Denote $S$ as the set of all strings, then the set of all environments
-$E \subset S^{11}$.  Let
-\[O = \{\le, <, \ne, =, \ge, >, \cong, \equiv, \in, \notin, \ni, \notni\}\]
-the set of all marker expressions[^var] can be defined as
-\[X = \{e \mapsto e \bigcirc s \mid \bigcirc \in O \land s \in S\}\]
-and the set of all environment markers[^markers] can be recursively defined as
-\[\begin{aligned}
-  M = \{c_\top\} &\cup \{(e_1, \ldots, e_{11})
-                         \mapsto m(e_1, \ldots, e_{11}) \land x(e_{\imath})
-                         \mid m \in M \land x \in X \land \imath \in I\}\\
-                 &\cup \{(e_1, \ldots, e_{11})
-                         \mapsto m(e_1, \ldots, e_{11}) \lor x(e_{\imath})
-                         \mid m \in M \land x \in X \land \imath \in I\}
-\end{aligned}\]
-where $\top$ is the *top* element in Boolean algebra, $c_\top: E \to \{\top\}$
-is a constant function and $I = [1, 11]$.  With all Boolean expressions
-converted into one of their standard forms, we have the following alternative
-definitions of an environment marker
-\[\begin{aligned}
-  (e_1, \ldots, e_{11})
-  &\mapsto \bigvee_{\imath = 1}^n\bigwedge_{\jmath = 1}^{f(\imath)}
-  x(\imath, \jmath)\left(e_{k(\imath, \jmath)}\right)\\
-  (e_1, \ldots, e_{11})
-  &\mapsto \bigwedge_{\imath = 1}^n\bigvee_{\jmath = 1}^{f(\imath)}
-  x(\imath, \jmath)\left(e_{k(\imath, \jmath)}\right)
-\end{aligned}\]
-where $n \in \N$, $f: \N_1 \to \N_1$, $x: \N_1^2 \to X$, $k: \N_1^2 \to I$
-and an empty Boolean expression is implicitly $c_\top$.
-
-Now it is possible to formalize inclusion for markers as
-\[\forall\,\{m, n\} \subset M: (m \le n) = (\forall\,e \in E: m(e) \le n(e))\]
-
-Thus, checking if a marker includes another is equivalent to evaluating
-the following expression, for all $(e_1, \ldots, e_{11}) \in E$
-\[\begin{aligned}
-  r &= \bigvee_{\imath' = 1}^{n'}\bigwedge_{\jmath' = 1}^{f'(\imath')}
-       x'(\imath', \jmath')\left(e_{k'(\imath', \jmath')}\right)
-       \le \bigwedge_{\imath = 1}^n\bigvee_{\jmath = 1}^{f(\imath)}
-       x(\imath, \jmath)\left(e_{k(\imath, \jmath)}\right)\\
-    &= \bigwedge_{\imath' = 1}^{n'}\left(\bigwedge_{\jmath' = 1}^{f'(\imath')}
-       x'(\imath', \jmath')\left(e_{k'(\imath', \jmath')}\right)
-       \le \bigwedge_{\imath = 1}^n\bigvee_{\jmath = 1}^{f(\imath)}
-       x(\imath, \jmath)\left(e_{k(\imath, \jmath)}\right)\right)\\
-    &= \bigwedge_{\imath' = 1}^{n'}\bigwedge_{\imath = 1}^n
-       \left(\bigwedge_{\jmath' = 1}^{f'(\imath')}
-       x'(\imath', \jmath')\left(e_{k'(\imath', \jmath')}\right)
-       \le \bigvee_{\jmath = 1}^{f(\imath)}
-       x(\imath, \jmath)\left(e_{k(\imath, \jmath)}\right)\right)
-\end{aligned}\]
-where $r$ is whether the original marker (defined by $n$, $f$, $x$ and $k$)
-includes the declared one (defined by $n'$, $f'$, $x'$ and $k'$). Let
-\[\begin{aligned}
-  g(\imath, \ell) &= \{\jmath \mid \jmath \in [1, f(\imath)]
-                              \land k(\imath, \jmath) = \ell\}\\
-  g'(\imath, \ell) &= \{\jmath \mid \jmath \in [1, f'(\imath)]
-                               \land k'(\imath, \jmath) = \ell\}
-\end{aligned}\]
-and assume $e_1, \ldots, e_{11}$ are orthogonal, we have
-\[\begin{aligned}
-  r &= \bigwedge_{\imath' = 1}^{n'}\bigwedge_{\imath = 1}^n\bigvee_{\ell \in I}
-       \left(\bigwedge_{\jmath' \in g'(\imath', \ell)}
-       x'(\imath', \jmath')\left(e_{\ell}\right)
-       \le \bigvee_{\jmath \in g(\imath, \ell)}
-       x(\imath, \jmath)\left(e_{\ell}\right)\right)\\
-    &= \bigwedge_{\imath' = 1}^{n'}\bigwedge_{\imath = 1}^n\bigvee_{\ell \in I}
-       \left(\bigwedge_{\jmath' \in g'(\imath', \ell)}
-       x'(\imath', \jmath')\left(e_{\ell}\right)
-       \land \lnot \bigvee_{\jmath \in g(\imath, \ell)}
-       x(\imath, \jmath)\left(e_{\ell}\right) = \bot\right)
-\end{aligned}\]
-where $\bot$ is the Boolean algebra bottom element.  By [De Morgan's laws],
-\[\begin{aligned}
-  r &= \bigwedge_{\imath' = 1}^{n'}\bigwedge_{\imath = 1}^n\bigvee_{\ell \in I}
-       \left(\bigwedge_{\jmath' \in g'(\imath', \ell)}
-       x'(\imath', \jmath')\left(e_{\ell}\right)
-       \land \bigwedge_{\jmath \in g(\imath, \ell)}
-       \lnot x(\imath, \jmath)\left(e_{\ell}\right) = \bot\right)
-\end{aligned}\]
-
-Essentially, checking if
-\[\bigwedge_{\jmath' \in g'(\imath', \ell)}
-  x'(\imath', \jmath')\left(e_{\ell}\right)
-  \land \bigwedge_{\jmath \in g(\imath, \ell)}
-  \lnot x(\imath, \jmath)\left(e_{\ell}\right)\]
-is unsatisfiable for all $e_{\ell}$ is a [SAT problem] which we can write
-a solver for.  Indeed, the original expression could be formulated as a SAT,
-but it would be more complex due to the multiple dimensions of environments.
-
-## Implementation
-
-### Generation
-
-### Conversion
-
-### Verification
-
-[^bdists]: [Built distributions][bdist], as opposed to [sdist]
-[^glibc]: Bonus: try to sort `manylinux1`, `manylinux_2_24` and `manylinux2014`
-[^var]: Ignoring relations between two variables or two strings
-[^markers]: Or *environment specifiers* when [not marking anything][proposal]
-
-[praise]: https://nixnet.social/notice/A9GyFTnNnQDVZrmtUm
-[bdist]: https://packaging.python.org/glossary/#term-Built-Distribution
-[sdist]: https://packaging.python.org/glossary/#term-Source-Distribution-or-sdist
-[params]: https://github.com/pypa/wheel/blob/0.37.0/src/wheel/bdist_wheel.py#L136-L164
-[3719394]: https://hugovk.github.io/top-pypi-packages
-[PyPA]: https://pypa.io
-[discuss]: https://discuss.python.org/t/7831
-[Zig]: https://ziglang.org
-[std.Target]: https://ziglang.org/documentation/master/std/#std;Target
-[ipwhl]: https://man.sr.ht/~cnx/ipwhl
-[proposal]: https://lists.sr.ht/~cnx/ipwhl-discuss/%3CCEP1VQH719Y4.1EX12MJII0WVE%40nix%3E
-[inclusion]: https://en.wikipedia.org/wiki/Inclusion_(Boolean_algebra)
-[De Morgan's laws]: https://en.wikipedia.org/wiki/De_Morgan%27s_laws
-[SAT problem]: https://en.wikipedia.org/wiki/Boolean_satisfiability_problem