about summary refs log tree commit diff homepage
path: root/blog/gsoc2020/blog20200622.md
diff options
context:
space:
mode:
authorNguyễn Gia Phong <mcsinyx@disroot.org>2021-09-21 17:02:17 +0700
committerNguyễn Gia Phong <mcsinyx@disroot.org>2021-09-21 17:02:17 +0700
commit2c085d53133fd267a809d0a4e2cbf9421ea2a2a8 (patch)
treea0ede5321105f8a92449d17bf0fcd999dac0a382 /blog/gsoc2020/blog20200622.md
parent7d8ce2a7f598312e3501b53d34ff8146b4dba0a6 (diff)
downloadsite-2c085d53133fd267a809d0a4e2cbf9421ea2a2a8.tar.gz
Reorganize GSoC 2020
Diffstat (limited to 'blog/gsoc2020/blog20200622.md')
-rw-r--r--blog/gsoc2020/blog20200622.md113
1 files changed, 0 insertions, 113 deletions
diff --git a/blog/gsoc2020/blog20200622.md b/blog/gsoc2020/blog20200622.md
deleted file mode 100644
index 3bb3a2c..0000000
--- a/blog/gsoc2020/blog20200622.md
+++ /dev/null
@@ -1,113 +0,0 @@
-+++
-rss = "GSoC 2020: The Wonderful Wizard of O'zip"
-date = Date(2020, 6, 22)
-+++
-@def tags = ["pip", "gsoc"]
-
-# The Wonderful Wizard of O'zip
-
-> Never give up... No one knows what's going to happen next.
-
-\toc
-
-## Preface
-
-Greetings and best wishes!  I had a lot of fun during the last week,
-although admittedly nothing was really finished.  In summary,
-these are the works I carried out in the last seven days:
-
-* Finilizing {{pip 8320 "utilities for parallelization"}}
-* {{pip 8467 "Continuing experimenting"}}
-  on {{pip 8442 "using lazy wheels or dependency resolution"}}
-* Polishing up {{pip 8411 "the patch"}} refactoring
-  `operations.prepare.prepare_linked_requirement`
-* Adding `flake8-logging-format`
-  {{pip 8423#issuecomment-645418725 "to the linter"}}
-* Splitting {{pip 8456 "the linting patch"}} from {{pip 8332 "the PR adding
-  the license requirement to vendor README"}}
-
-## The `multiprocessing[.dummy]` wrapper
-
-Yes, you read it right, this is the same section as last fortnight's blog.
-My mentor Pradyun Gedam gave me a green light to have {{pip 8411}} merged
-without support for Python 2 and the non-lazy map variant, which turns out
-to be troublesome for multithreading.
-
-The tests still needs to pass of course and the flaky tests (see failing tests
-over Azure Pipeline in the past) really gave me a panic attack earlier today.
-We probably need to mark them as xfail or investigate why they are
-undeterministic specifically on Azure, but the real reason I was *all caught up
-and confused* was that the unit tests I added mess with the cached imports
-and as `pip`'s tests are run in parallel, who knows what it might affect.
-I was so relieved to not discover any new set of tests made flaky by ones
-I'm trying to add!
-
-## The file-like object mapping ZIP over HTTP
-
-This is where the fun starts.  Before we dive in, let's recall some
-background information on this.  As discovered by Danny McClanahan
-in {{pip 7819}}, it is possible to only download a potion of a wheel
-and it's still valid for `pip` to get the distribution's metadata.
-In the same thread, Daniel Holth suggested that one may use
-HTTP range requests to specifically ask for the tail of the wheel,
-where the ZIP's central directory record as well as where usually
-`dist-info` (the directory containing `METADATA`) can be found.
-
-Well, *usually*.  While {{pep 427}} does indeed recommend
-
-> Archivers are encouraged to place the `.dist-info` files physically
-> at the end of the archive.  This enables some potentially interesting
-> ZIP tricks including the ability to amend the metadata without
-> rewriting the entire archive.
-
-one of the mentioned *tricks* is adding shared libraries to wheels
-of extension modules (using e.g. `auditwheel` or `delocate`).
-Thus for non-pure Python wheels, it is unlikely that the metadata
-lie in the last few megabytes.  Ignoring source distributions is bad enough,
-we can't afford making an optimization that doesn't work for extension modules,
-which are still an integral part of the Python ecosystem )-:
-
-But hey, the ZIP's directory record is warrantied to be at the end of the file!
-Couldn't we do something about that?  The short answer is yes.  The long answer
-is, well, yessssssss! That, plus magic provided by most operating systems,
-this is what we figured out:
-
-1. We can download a realatively small chunk at the end of the wheel
-   until it is recognizable as a valid ZIP file.
-2. In order for the end of the archive to actually appear as the end to
-   `zipfile`, we feed to it an object with `seek` and `read` defined.
-   As navigating to the rear of the file is performed by calling `seek`
-   with relative offset and `whence=SEEK_END` (see `man 3 fseek`
-   for more details), we are completely able to make the wheels in the cloud
-   to behave as if it were available locally.
-
-   ![Wheel in the cloud](/assets/cloud.gif)
-
-3. For large wheels, it is better to store them in hard disks instead of memory.
-   For smaller ones, it is also preferable to store it as a file to avoid
-   (error-prony and often not really efficient) manual tracking and joining
-   of downloaded segments.  We only use a small potion of the wheel, however
-   just in case one is wonderring, we have very little control over
-   when `tempfile.SpooledTemporaryFile` rolls over, so the memory-disk hybrid
-   is not exactly working as expected.
-4. With all these in mind, all we have to do is to define an intermediate object
-   check for local availability and download if needed on calls to `read`,
-   to lazily provide the data over HTTP and reduce execution time.
-
-The only theoretical challenge left is to keep track of downloaded intervals,
-which I finally figured out after a few trials and errors.  The code
-was submitted as a pull request to `pip` at {{pip 8467}}.  A more modern
-(read: Python 3-only) variant was packaged and uploaded to PyPI under
-the name of lazip_.  I am unaware of any use case for it outside of `pip`,
-but it's certainly fun to play with d-:
-
-## What's next?
-
-I have been falling short of getting the PRs mention above merged for
-quite a while.  With `pip`'s next beta coming really soon, I have to somehow
-make the patches reach a certain standard and enough attention to be part of
-the pre-release—beta-testing would greatly help the success of the GSoC project.
-To other GSoC students and mentors reading this, I also hope your projects
-to turn out successful!
-
-[lazip]: https://pypi.org/project/lazip/