From 1ff1746272a97d9c58d2e6a8936592f90fd5cd47 Mon Sep 17 00:00:00 2001 From: Nguyễn Gia Phong Date: Tue, 9 Mar 2021 15:36:38 +0700 Subject: Migrate GSoC 2020 check-ins --- blog/gsoc2020/checkin20200601.md | 45 ++++++++++++ blog/gsoc2020/checkin20200615.md | 45 ++++++++++++ blog/gsoc2020/checkin20200629.md | 44 ++++++++++++ blog/gsoc2020/checkin20200713.md | 35 +++++++++ blog/gsoc2020/checkin20200727.md | 37 ++++++++++ blog/gsoc2020/checkin20200810.md | 33 +++++++++ blog/gsoc2020/checkin20200824.md | 26 +++++++ blog/gsoc2020/index.md | 151 +++++++++++++++++++++++++++++++++++++++ 8 files changed, 416 insertions(+) create mode 100644 blog/gsoc2020/checkin20200601.md create mode 100644 blog/gsoc2020/checkin20200615.md create mode 100644 blog/gsoc2020/checkin20200629.md create mode 100644 blog/gsoc2020/checkin20200713.md create mode 100644 blog/gsoc2020/checkin20200727.md create mode 100644 blog/gsoc2020/checkin20200810.md create mode 100644 blog/gsoc2020/checkin20200824.md create mode 100644 blog/gsoc2020/index.md diff --git a/blog/gsoc2020/checkin20200601.md b/blog/gsoc2020/checkin20200601.md new file mode 100644 index 0000000..a362f28 --- /dev/null +++ b/blog/gsoc2020/checkin20200601.md @@ -0,0 +1,45 @@ ++++ +rss = "GSoC 2020: First Check-In" +date = Date(2020, 6, 1) ++++ +@def tags = ["pip", "gsoc"] + +# First Check-In + +Hi everyone, I am McSinyx, a Vietnamese undergraduate student +who loves [free software][]. This summer I am working with +the maintainers and the contributors of `pip` to make +the package manager {{pip 825 "download in parallel"}}. + +## What did I do during the community bonding period? + +Aside from bonding with `pip`'s maintainers and contributors as well as +with my mentors, I was also experimenting on the theoretical and technical +obstacles blocking this GSoC project. Pradyun Gedam (a mentor of mine) +suggested making [a proof of concept][] to determine if parallel downloading +can play nicely with ResolveLib_'s abstraction and we are reviewing it +together. On the technical side, we `pip`'s committers are exploring +{{pip 8169 "available options for parallelization"}} and I made an attempt to +{{pip 8320 "make use of Python's standard worker pool in a portable way"}}. + +## Did I get stuck anywhere? + +Yes, of course! Neither of the experiments above is finished as of +this moment. Though, I am optimistic that the issues will not be +real blockers and we will figure that out in the next few days. + +## What is coming up next? + +As planned, this week I am going to refactor the package downloading code +in `pip`. The main purpose is to decouple the networking code from +the package preparation operation and make sure that it is thread-safe. + +In addition, I am also continuing mentioned experiments to have a better +confidence on the future of this GSoC project. + +To other GSoC students, mentors and admins reading this, I am wishing +you all good health and successful projects this summer! + +[free software]: https://www.gnu.org/philosophy/free-sw.html +[a proof of concept]: https://gist.github.com/McSinyx/513dbff71174fcc79f1cb600e09881af +[ResolveLib]: https://pypi.org/project/resolvelib diff --git a/blog/gsoc2020/checkin20200615.md b/blog/gsoc2020/checkin20200615.md new file mode 100644 index 0000000..e59cac2 --- /dev/null +++ b/blog/gsoc2020/checkin20200615.md @@ -0,0 +1,45 @@ ++++ +rss = "GSoC 2020: Second Check-In" +date = Date(2020, 6, 15) ++++ +@def tags = ["pip", "gsoc"] + +# Second Check-In + +Hi everyone and may the odds ever in your favor, especially during this +tough time! + +## What did I do last week? + +Not as much I wished, apparently (-: + +* Finalizing {{pip 8411 "the refactoring patch"}} + of `operations.prepare.prepare_linked_requirement` +* {{pip 8423 "Nitpicking some logging calls"}}. This (as well as the next one) + was to fill up the time my brain not being as productive as I want it to XD +* {{pip 8423 "Beginning to migrate"}} from `%`- to `{}`-style logging. + The amount of tests failing due to this was way beyond my imagination, + but I got functional tests for `pip install` and unit tests passing now! +* {{pip 8442 "Mocking up a working partial wheel download during + dependency resolution"}} for [the new resolver][]. + +## Did I get stuck anywhere? + +Yes, of course! {{pip 8320 "Parallel maps"}} are still stalling +as well as other small PRs listed above. The failure related to +`logging` are still making me pulling my hair out and the proof of +concept for partial wheel downloading is too ugly even for a PoC. +I imagine that I will have a lot of clean up to do this week (yay!). + +## What is coming up next? + +I'm trying get the multi-{threading,processing} facilities merged ASAP +to start rolling it out in practice. The first thing popping out of my +head is to get back {{pip 7962 "the multi-threaded"}} `pip list -o`. + +The other experimental improvement (this phrase does not sound right!) +I would like to get done is the partial wheel download. It would be +really nice if I can get both included as `unstable-feature`'s +in {{pip 7628#issuecomment-636319539 "the upcoming beta release of pip 20.2"}}. + +[the new resolver]: http://www.ei8fdb.org/thoughts/2020/05/test-pips-alpha-resolver-and-help-us-document-dependency-conflicts/ diff --git a/blog/gsoc2020/checkin20200629.md b/blog/gsoc2020/checkin20200629.md new file mode 100644 index 0000000..93699d1 --- /dev/null +++ b/blog/gsoc2020/checkin20200629.md @@ -0,0 +1,44 @@ ++++ +rss = "GSoC 2020: Third Check-In" +date = Date(2020, 6, 29) ++++ +@def tags = ["pip", "gsoc"] + +# Third Check-In + +Holla, holla, holla! Last seven days has not been a really productive week +for me, though I think there are still some nice things to share with +you all here! The good news is that I've finish my last leçon as a somophore, +the bad news is that I have a bunch of upcoming tests, mainly in the form +of group projects and/or presentation (phew!). Enough about me, +let's get back to `pip`: + +## What did I do last week? + +Not much, actually )-: + +* Write some tests for {{pip 8467 "the HTTP range mapping for wheel"}}. +* {{pip 8504 "Try to bring back"}} multithreaded `pip list --outdated` + and `--uptodate`, as {{pip 8320 "the parallel"}} `map` was merged + earlier today. +* Nitpick {{pip 8332}} + (yep it's a new low for me to include this to the list (-:). + +## Did I get stuck anywhere? + +Not exactly, since I didn't do much d-; [Many of my PRs][] are stalling though. +On one hand the maintainers of `pip` are all volunteers working in +their free time, on the other hand I don't think I have tried hard enough +to get their attention on my PRs. + +## What is coming up next? + +I'll try my best getting the following merged upstream before +{{pip 8206 "the upcoming beta release"}}: + +* Parallel networking for `pip list`: {{pip 8504}} +* Lazy wheel for dependency information: {{pip 8467}}, {{pip 8411}} + (to determine if hashing is required) and {{pip 8467#issuecomment-648717032 + "a new patch introducing this as an unstable feature"}} + +[Many of my PRs]: https://github.com/pulls?q=is:open+is:pr+author:McSinyx+repo:pypa/pip+sort:updated-desc diff --git a/blog/gsoc2020/checkin20200713.md b/blog/gsoc2020/checkin20200713.md new file mode 100644 index 0000000..417db58 --- /dev/null +++ b/blog/gsoc2020/checkin20200713.md @@ -0,0 +1,35 @@ ++++ +rss = "GSoC 2020: Fourth Check-In" +date = Date(2020, 7, 13) ++++ +@def tags = ["pip", "gsoc"] + +# Fourth Check-In + +Hello there! I'm having my second year's last exam tomorrow, +but it [feels like summer][] already! I've been finalizing quite a few things +to get them ready for pip 20.2b2. + +## What did I do last week? + +I've spent most of the time on getting {{pip 8532 "the opt-in"}} for obtaining +dependency information via lazy wheels ready. It will be available as +`--use-feature=fast-deps` and only has effect when +`--use-feature=2020-resolver` also presents. + +While waiting for reviews and suggestions, I made some patches for +internal cleansing, namely {{pip 8568}}, {{pip 8571}} and {{pip 8578}}. +Some of the similar patches I made earlier were also merged last week: +{{pip 8456}} and {{pip 8538}}. + +## Did I get stuck anywhere? + +Not really, everything was going as expected for me. + +## What is coming up next? + +After {{pip 8532}}, I'll work on the parallel download of the postponed wheels. +My main current concern is with how the download progress will be reported +to the users, but I think I'll figure it out soon. + +[feels like summer]: https://www.youtube.com/watch?v=F1B9Fk_SgI0 diff --git a/blog/gsoc2020/checkin20200727.md b/blog/gsoc2020/checkin20200727.md new file mode 100644 index 0000000..5e50f67 --- /dev/null +++ b/blog/gsoc2020/checkin20200727.md @@ -0,0 +1,37 @@ ++++ +rss = "GSoC 2020: Fifth Check-In" +date = Date(2020, 7, 27) ++++ +@def tags = ["pip", "gsoc"] + +# Fifth Check-In + +Hello and I hope y'all are still doing well! + +## What did I do last week? + +I was not really productive last week—most of the following tickets are fillers +to make use of the spare cycles I had when I was still trying to figure out +the way to implement the main work. + +* Finalize the `--use-feature=fast-deps` flag ({{pip 8588}}) +* Improve mocking of environment variables in the test suit ({{pip 8614}}) +* Finalize the fix for verbose/quiet options specified via + configuration files and environment variables ({{pip 8578}}) +* Clean up a tiny bit in the resolver internal API ({{pip 8629}}) +* Start working on seperating the download of wheels + from dependency resolution ({{pip 8638}}) + +## Did I get stuck anywhere? + +I'm struggling on refactoring the code to support separate download. +`pip`'s codebase was not intended for this and thus there are +many execution paths and other details entangled around the relevant area. + +## What is coming up next? + +`pip` 20.2 is going to be released within the next few days with +`--use-feature=fast-deps` included and I'm mentally prepare to fix +any undiscovered problem. At the same time, I will continue working +on {{pip 8638}} and hopefully get it done soon enough to begin drafting +download parallelization strategies, mostly with the UI. diff --git a/blog/gsoc2020/checkin20200810.md b/blog/gsoc2020/checkin20200810.md new file mode 100644 index 0000000..aea9d5a --- /dev/null +++ b/blog/gsoc2020/checkin20200810.md @@ -0,0 +1,33 @@ ++++ +rss = "GSoC 2020: Sixth Check-In" +date = Date(2020, 8, 10) ++++ +@def tags = ["pip", "gsoc"] + +# Sixth Check-In + +Hello there! + +## What did I do last week? + +It has been a quite fun week for me, given the current state of +development and the newly dicovered bugs thanks to pip 20.2 release: + +* Initiate discussion with the maintainers of pip on isolating + networking code for late download in parallel ({{pip 8697}}) +* Discuss the UI of parallel download ({{pip 8698}}) +* Log debug information relating lazy wheel decision ({{pip 8710}}) +* Disable caching for range requests ({{pip 8716}}) +* Dedent late download logs ({{pip 8722}}) +* Add a hook for batch downloading (third attempt I think) ({{pip 8737}}) +* Test hash checking for fast-deps ({{pip 8743}}) + +## Did I get stuck anywhere? + +Not exactly, everything is going smoothly and I'm feeling awesome! + +## What is coming up next? + +I'll try to solve {{pip 8697}} and {{pip 8698}} within the next few days. +I am optimistic that the parallel download prototype will be done +within this week. diff --git a/blog/gsoc2020/checkin20200824.md b/blog/gsoc2020/checkin20200824.md new file mode 100644 index 0000000..b87a7fd --- /dev/null +++ b/blog/gsoc2020/checkin20200824.md @@ -0,0 +1,26 @@ ++++ +rss = "GSoC 2020: Final Check-In" +date = Date(2020, 8, 24) ++++ +@def tags = ["pip", "gsoc"] + +# Final Check-In + +Hello there! + +## What did I do last week? + +Not much, but seemingly implementation-wise I have finished my GSoC project: + +* Finish the implementation of wheels' parallel download ({{pip 8771}}) +* Help make `pip`'s CI green again ({{pip 8790}}) +* Reformat a few spots in user guide ({{pip 8795}}) + +## Did I get stuck anywhere? + +I got sick, but I am recovering now! + +## What is coming up next? + +I will try to spend the time I got left within the scope of GSoC +to {{pip 8720 "improve cache usage of the fast-deps feature"}}. diff --git a/blog/gsoc2020/index.md b/blog/gsoc2020/index.md new file mode 100644 index 0000000..09f208b --- /dev/null +++ b/blog/gsoc2020/index.md @@ -0,0 +1,151 @@ ++++ +rss = "GSoC 2020 final report" +date = Date(2020, 8, 31) ++++ +@def tags = ["fun", "pip", "gsoc"] + +# Google Summer of Code 2020 + +In the summer of 2020, I worked with the contributors of `pip`, trying +to improve the networking performance of the package manager. Admittedly, at +the end of the [internship][] period, [the benchmark said otherwise][benchmark]; +though I really hope the clean-up and minor fixes I happened to be doing +to the codebase over the summer, in addition to the implementation of parallel +utils and lazy wheel, might actually help the project. + +Personally, I learned a lot: not just about Python packaging and +networking stuff, but also on how to work with others. I am really +grateful to {{github pradyunsg}} (my mentor), {{github chrahunt}}, +{{github uranusjr}}, {{github pfmoore}}, {{github brainwane}}, +{{github sbidoul}}, {{github xavfernandez}}, {{github webknjaz}}, +{{github jaraco}}, {{github deveshks}}, {{github gutsytechster}}, +{{github dholth}}, {{github dstufft}}, {{github cosmicexplorer}} +and {{github ofek}}. While this feels like a long shout-out list, +it really isn't. These people are the maintainers, the contributors of `pip` +and/or other Python packaging projects, and more importantly, they have been +more than helpful, encouraging and patient to me throughout my every activities, +showing me the way when I was lost, fixing me when I was wrong, putting up with +my carelessness and showing me support across different social media. + +To best serve the community, below I have tried my best to document +what I have done, how I've done it and why I've done it for over +the last three months. At the time of writing, some work is still in progress, +so these also serve as a reference point for myself and others to reason +about decisions in relevant topics. + +\toc + +## The Main Story + +The storyline can be divided into the following four main acts. + +### Act One: Parallelization Utilities + +In this first act, I ensured the portibility of parallelization +measures for later use in the final act. Multithreading and multiprocessing +`map` were properly fellback on platforms without full support. + +* {{pip 8320}}: Add utilities for parallelization (close {{pip 8169}}) +* {{pip 8538}}: Make `utils.parallel` tests tear down properly +* {{pip 8504}}: Parallelize `pip list --outdated` and `--uptodate` + (using {{pip 8320}}) + +### Act Two: Lazy Wheels + +As proposed by {{github cosmicexplorer}} in {{pip 7819}}, it is possible to only +download a portion of a wheel to obtain metadata during dependency resolution. +Not only that this would reduce the total amount of data to be transmitted over +the network in case the resolver needs to perform heavy backtracking, but also +it would create a synchronization point at the end of the resolution progress +where parallel downloading can be applied to the needed wheels (some wheels +solely serve their metadata during dependency backtracking and are not needed +by the users). + +* {{pip 8467}}: Add utitlity to lazily acquire wheel metadata over HTTP +* {{pip 8584}}: Revise lazy wheel and its tests +* {{pip 8681}}: Make range requests closer to chunk size (help {{pip 8670}}) +* {{pip 8716}} and {{pip 8730}}: Disable caching for range requests + +### Act Three: Late Downloading + +During this act, the main works were refactoring to integrate the *lazy wheel* +into `pip`'s codebase and clean up the way for download parallelization. + +* {{pip 8411}}: Refactor `operations.prepare.prepare_linked_requirement` +* {{pip 8629}}: Abstract away `AbstractDistribution` + in higher-level resolver code +* {{pip 8442}}, {{pip 8532}} and {{pip 8588}} (later reworked by + {{github chrahunt}} in {{pip 8685}}): Use lazy wheel to obtain + dependency information for the new resolver +* {{pip 8743}}: Test hash checking for `fast-deps` +* {{pip 8804}}: Check download directory before making range requests + +### Act Four: Batch Downloading in Parallel + +The final act is mostly about the UI of the parallel download. +My work involved around how the progress should be displayed +and how other relevant information should be reported to the users. + +* {{pip 8710}}: Revise method fetching metadata using lazy wheels +* {{pip 8722}}: Dedent late download logs (fix {{pip 8721}}) +* {{pip 8737}}: Add a hook for batch downloading +* {{pip 8771}}: Parallelize wheel download + +The Side Quests +--------------- + +In order to keep the wheel turning (no pun intended) and avoid wasting time +waiting for the pull requests above to be reviewed, I decided to create +even more PRs (as I am typing this, many of the patches listed below +are nowhere near being merged). + +* {{pip 7878}}: Fail early when install path is not writable +* {{pip 7928}}: Fix rst syntax in Getting Started guide +* {{pip 7988}}: Fix tabulate col size in case of empty cell +* {{pip 8137}}: Add subcommand alias mechanism +* {{pip 8143}}: Make mypy happy with beta release automation +* {{pip 8248}}: Fix typo and simplify ireq call +* {{pip 8332}}: Add license requirement to `_vendor/README.rst` +* {{pip 8423}}: Nitpick logging calls +* {{pip 8435}}: Use str.format style in logging calls +* {{pip 8456}}: Lint `src/pip/_vendor/README.rst` +* {{pip 8568}}: Declare constants in configuration.py as such +* {{pip 8571}}: Clean up `Configuration.unset_value` and nit `__init__` +* {{pip 8578}}: Allow verbose/quiet level to be specified + via config files and environment variables +* {{pip 8599}}: Replace tabs by spaces for consistency +* {{pip 8614}}: Use `monkeypatch.setenv` to mock environment variables +* {{pip 8674}}: Fix `tests/functional/test_install_check.py`, + when run with new resolver +* {{pip 8692}}: Make assertion failure give better message +* {{pip 8709}}: List downloaded distributions before exiting (fix {{pip 8696}}) +* {{pip 8759}}: Allow py2 deprecation warning from setuptools +* {{pip 8766}}: Use the new resolver for test requirements +* {{pip 8790}}: Mark tests using remote svn and hg as xfail +* {{pip 8795}}: Reformat a few spots in user guide + +## The Plot Summary + +Every Monday throughout the Summer of Code, I summarized what I had done +in the week before in the form of either a short blog or an (even shorter) +check-in. These write-ups often contain handfuls of popular culture references +and was originally hosted on [Python GSoC][]. + +* [{{fill title blog/gsoc2020/checkin20200601}}](/blog/gsoc2020/checkin20200601) +* [{{fill title blog/gsoc2020/blog20200609}}](/blog/gsoc2020/blog20200609) +* [{{fill title blog/gsoc2020/checkin20200615}}](/blog/gsoc2020/checkin20200615) +* [{{fill title blog/gsoc2020/blog20200622}}](/blog/gsoc2020/blog20200622) +* [{{fill title blog/gsoc2020/checkin20200629}}](/blog/gsoc2020/checkin20200629) +* [{{fill title blog/gsoc2020/blog20200706}}](/blog/gsoc2020/blog20200706) +* [{{fill title blog/gsoc2020/checkin20200713}}](/blog/gsoc2020/checkin20200713) +* [{{fill title blog/gsoc2020/blog20200720}}](/blog/gsoc2020/blog20200720) +* [{{fill title blog/gsoc2020/checkin20200727}}](/blog/gsoc2020/checkin20200727) +* [{{fill title blog/gsoc2020/blog20200803}}](/blog/gsoc2020/blog20200803) +* [{{fill title blog/gsoc2020/checkin20200810}}](/blog/gsoc2020/checkin20200810) +* [{{fill title blog/gsoc2020/blog20200817}}](/blog/gsoc2020/blog20200817) +* [{{fill title blog/gsoc2020/checkin20200824}}](/blog/gsoc2020/checkin20200824) +* [{{fill title blog/gsoc2020/blog20200831}}](/blog/gsoc2020/blog20200831) + +[internship]: https://summerofcode.withgoogle.com/archive/2020/projects/6238594655584256 +[benchmark]: /blog/gsoc2020/blog20200831/#the_benchmark +[Python GSoC]: https://blogs.python-gsoc.org/en/mcsinyxs-blog/ -- cgit 1.4.1