From 2c085d53133fd267a809d0a4e2cbf9421ea2a2a8 Mon Sep 17 00:00:00 2001 From: Nguyễn Gia Phong Date: Tue, 21 Sep 2021 17:02:17 +0700 Subject: Reorganize GSoC 2020 --- blog/2020/gsoc/article/6.md | 52 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 52 insertions(+) create mode 100644 blog/2020/gsoc/article/6.md (limited to 'blog/2020/gsoc/article/6.md') diff --git a/blog/2020/gsoc/article/6.md b/blog/2020/gsoc/article/6.md new file mode 100644 index 0000000..40caad5 --- /dev/null +++ b/blog/2020/gsoc/article/6.md @@ -0,0 +1,52 @@ ++++ +rss = "GSoC 2020: Parallelizing Wheel Downloads" +date = Date(2020, 8, 17) ++++ +@def tags = ["pip", "gsoc"] + +# Parallelizing Wheel Downloads + +> And now it's clear as this promise\ +> That we're making\ +> Two progress bars into one + +\toc + +Hello there! It has been raining a lot lately and some mosquito has given me +the Dengue fever today. To whoever reading this, I hope it would never happen +to you. + +Download Parallelization +------------------------ + +I've been working on `pip`'s download parallelization for quite a while now. +As distribution download in `pip` was modeled as a lazily evaluated iterable +of chunks, parallelizing such procedure is as simple as submitting routines +that write files to disk to a worker pool. + +Or at least that is what I thought. + +Progress Reporting UI +--------------------- + +`pip` is currently using customly defined progress reporting classes, +which was not designed to working with multithreading code. Firstly, I want to +try using these instead of defining separate UI for multithreaded progresses. +As they use system signals for termination, one must the progress bars has to be +running the main thread. Or sort of. + +Since the progress bars are designed as iterators, I realized that we +can call `next` on them. So quickly, I throw in some queues and locks, +and prototyped the first *working* {{pip 8771 "implementation of +progress synchronization"}}. + +Performance Issues +------------------ + +Welp, I only said that it works, but I didn't mention the performance, +which is terrible. I am pretty sure that the slow down is with +the synchronization, since the `map_multithread` call doesn't seem +to trigger anything that may introduce any sort of blocking. + +This seems like a lot of fun, and I hope I'll get better tomorrow +to continue playing with it! -- cgit 1.4.1