about summary refs log tree commit diff homepage
path: root/blog/2020/gsoc/article/7.md
blob: c13d0dbfe0de517e94455932e8e8518bb2b88d04 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
+++
title = "Outro"
rss = "GSoC 2020: Outro"
date = Date(2020, 8, 31)
tags = ["gsoc", "pip", "python"]
+++

> Steamed fish was amazing, matter of fact\
> Let me get some jerk chicken to go\
> Grabbed me one of them lemon pie theories\
> And let me get some of them benchmarks you theories too

\toc

## The Look

At the time of writing,
{{pip 8771 "implementation-wise parallel download is ready"}}:

[![asciicast](/assets/pip-8771.svg)](https://asciinema.org/a/356704)

Does this mean I've finished everything just-in-time?  This sounds to good
to be true!  And how does it perform?  Welp...

## The Benchmark

Here comes the bad news: under a decent connection to the package index,
using `fast-deps` does not make `pip` faster.  For best comparison,
I will time `pip download` on the following cases:

### Average Distribution

For convenience purposes, let's refer to the commands to be used as follows

```console
$ pip --no-cache-dir download {requirement}  # legacy-resolver
$ pip --use-feature=2020-resolver \
   --no-cache-dir download {requirement}  # 2020-resolver
$ pip --use-feature=2020-resolver --use-feature=fast-deps \
   --no-cache-dir download {requirement}  # fast-deps
```

In the first test, I used [axuy] and obtained the following results

| legacy-resolver | 2020-resolver | fast-deps |
| --------------- | ------------- | --------- |
| 7.709s          | 7.888s        | 10.993s   |
| 7.068s          | 7.127s        | 11.103s   |
| 8.556s          | 6.972s        | 10.496s   |

Funny enough, running `pip download` with `fast-deps` in a directory
with downloaded files already took around 7-8 seconds.  This is because
to lazily download a wheel, `pip` has to {{pip 8670 "make many requests"}}
which are apparently more expensive than actual data transmission on my network.

!!! note "When is it useful then?"

    With unstable connection to PyPI (for some reason I am not confident enough
    to state), this is what I got
    
    | 2020-resolver | fast-deps |
    | ------------- | --------- |
    | 1m16.134s     | 0m54.894s |
    | 1m0.384s      | 0m40.753s |
    | 0m50.102s     | 0m41.988s |
    
    As the connection was *unstable* and that the majority of `pip` networking
    is performed as CI/CD with large and stable bandwidth, I am unsure what this
    result is supposed to tell (-;

### Large Distribution

In this test, I used [TensorFlow] as the requirement and obtained
the following figures:

| legacy-resolver | 2020-resolver | fast-deps |
| --------------- | ------------- | --------- |
| 0m52.135s       | 0m58.809s     | 1m5.649s  |
| 0m50.641s       | 1m14.896s     | 1m28.168s |
| 0m49.691s       | 1m5.633s      | 1m22.131s |

### Distribution with Conflicting Dependencies

Some requirement that will trigger a decent amount of backtracking by
the current implementation of the new resolver `oslo-utils==1.4.0`:

| 2020-resolver | fast-deps |
| ------------- | --------- |
| 14.497s       | 24.010s   |
| 17.680s       | 28.884s   |
| 16.541s       | 26.333s   |

## What Now?

I don't know, to be honest.  At this point I'm feeling I've failed my own
(and that of other stakeholders of `pip`) expectation and wasted the time
and effort of `pip`'s maintainers reviewing dozens of PRs I've made
in the last three months.

On the bright side, this has been an opportunity for me to explore the codebase
of package manager and discovered various edge cases where the new resolver
has yet to cover (e.g. I've just noticed that `pip download` would save
to-be-discarded distributions, I'll file an issue on that soon).  Plus I got
to know many new and cool people and idea, which make me a more helpful
individual to work on Python packaging in the future, I hope.

[TensorFlow]: https://www.tensorflow.org
[axuy]: https://sr.ht/~cnx/axuy