about summary refs log tree commit diff homepage
path: root/blog/reply.md
blob: e2f4a2f636eb58d4eade78e360bababf3a89736c (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
+++
rss = "Comments for Static Sites without JavaScript via Emails"
date = Date(2022, 1, 9)
tags = ["fun", "recipe", "net"]
+++

# Comments for Static Sites without JavaScripts

> I'm open for criticism\
> But really, is it any room for criticism?

Recently, I've switched my [feed] reader from [Newsboat] to [Liferea].
The latter has a GUI and some extra features which make the experience
a lot more comfy.  For instance, custom enclosure handling lets me
to finally migrate all of my YouTube subscriptions to [Atom] and *conveniently*
browse and watch videos using [mpv].  Image support also allows me
to directly view web comics.[^image]  One of them, [The Monster Under
the Bed][TMUTB],[^nsfw] does not embed the strips in its feed, but it
has comments.

Yes, [RSS] includes support for `<comments>`, and I was not aware of it
until [very recently][spark].  I suppose many other people late to
the (web feed) party are neither.  Since the rise of static sites,
feeds have regain popularity, even for [Google to reconsider
its direction][android].  Compare to RSS or Atom, alternatives have
the following shortcomings:

* [Usenet] is generally obsolete to most people.
* [Mailing list] messages are immutable.
* Fora and social media are silos.[^silo]
* Social media are designed for ephemeral discussions.
* Instant messaging is awful for archival.

On the other hand, news feeds are commonly read-only: only a few readers
can render comments and even fewer are able to post one.  On the server side,
a dynamic server is needed to accept comments.  Traditionally, it's the same
as the system serving the website.  Although this works, it is significantly
more costly than a server dedicated to static sites, which scale a lot better.

[Hackers] have came up with multiple workarounds such as using [microblogging]
or [instant messaging][cactus] to add comments to their static sites,
but all require client-side code execution, which is an option for neither RSS
nor Atom.  Furthermore, [JavaScript hurts portability and performance][curlpit]
on the WWW, hence it should be avoided unless it is absolutely impossible
to implement a feature otherwise.  Commenting is not an exception.

Following is my adventure implementing a comment section for this very blog.
If you're also up to the task, I think you should view what I did
as an inspiration (rather than a reference) and don't be afraid
to experiment around until satisfaction.

\toc

## Choosing Back-End

As mentioned earlier, static sites or not, there still needs to be
a dynamic component to accept incoming replies.  HTTP requests would be
the most portable since all netizen obviously have a web browser, but those
are what we're trying to replace here.  What else does everyone has nowadays?
Something so common that it can be used to identify people upon
service registrations?  Exactly, emails and phone numbers!

OK, Imma stop horsing around.  My back-end of choice would be emails.
It's global, it's cheap and federated.  Cellular services almost fit the bill,
except that they would cost an arm and leg for one to comment around the web
everyday via SMS, whose character limit is not facilitating thoughtful
discussions either.  As for forum, social medium or instant messaging,
no platform has nearly as large of an user base as electronic mails.

![HTML is often a trojan horse for JavaScript](/assets/html5-js.png)

It's not like any email would fit the comment section though.  Especially
not the HTML kind with a few hundred kilobytes of embedded CSS, JS
and non-content images.  From the security standpoint alone 'tis already
a no-go.  A light markup language like Markdown[^mime] would be much better.

One great thing about using a mature technology like email is that we have
all use cases covered.  Filtering, exporting and parsing emails work out-of-box
regardless of one's provider, [MUA] and programming preferences.  I have
an SourceHut account with which I can create mailing lists on-demand
so I'm using it; however there's no reason exporting from your private inbox
is any more difficult, presuming you have set up [offline email].

!!! note "Tips and tricks"

    Speaking of SourceHut, exporting a mailing list archive is rather easy,
    one could either use the button on the web UI or download from the API.
    As the operation is not exactly cost-free, the former is protected
    by a [CSRF] token and the latter by [OAuth 2.0].  If you are a fellow
    [sr.ht] user, you can use [acurl] on the build service with the URL
    from the [GraphQL] `query { me { lists { results { name, archive } } } }`.

## Designing Data Flow

I promise, this sounds bigger than it really is, but first,
let's have a glance at how static generators work.  Typically,
there are three times templating happens:

1. Conversion of individual articles into HTML *content*
2. Inserting each article content in a page template
   to create a complete HTML document
3. Inserting multiple HTML contents into one RSS or Atom feed template

At completion, two kinds of output are generated: website and web feed.
Similarly, comments have to be rendered for both targets: an HTML
comment section for web browsing and a separate RSS feed for each article's
`<wfw:commentRss>`.[^wfw]  Therefore, injections should be done separately
at stage 2 and 3.  The overall process of static site generation
with email comments is illustrated as follows.

![Data transformation during generation process](/assets/formbox.svg)

For clarity, HTML and RSS input templates for comments and their parent page
and web feed are omitted.  Path to each *comment feed* output being injected
in the respective *web feed item* is also not shown in the figure.

## Implementation

At the time of writing, this personal website of mine was generated
by [Julia] [Franklin], who was neither fast[^speed] nor [semantic],
but was the only one I knew supporting LaTeX prerendering out of the box.
Franklin is also rather [extendable] via Julia functions.

### Accepting Replies

Let's start with how each article can be programmatically and uniquely
identified.  By default in RSS, a [GUID][^guid] is the permanent URL
of the associated web page.  I am not exactly a creative person, so I mirrored
this idea, although I only used the difference between URLs, i.e. minus
the scheme, network location and trailing `index.html` (Franklin always
appends it to the target path of any source file that is neither `index.md`
nor `index.html`):

```julia
dir_url() = strip(dirname(locvar(:fd_url)), '/')
message_id() = "%3C$(dir_url())@cnx%3E"
```

For maximum portability, threading identification is used in emails'
`In-Reply-To` header, which expects a message ID, which must match
`<.+@.+>`.  Once again, to avoid having to think, I opted for
the path difference for the left hand side and my nickname `cnx`
for the right.  The `mailto` URI could be then be constructed accordingly:

```julia
using Printf: @sprintf

function hfun_mailto_comment()
  @sprintf("mailto:%s?%s=%s&%s=Re: %s",
           "~cnx/site@lists.sr.ht",
           "In-Reply-To", message_id(),
           "Subject", locvar(:title))
end
```

The anchor was then added to the page foot:

```html
<a href="{{mailto_comment}}"
   title="Reply via email">{{author}}</a>
```

### Rendering Comments

This is when the fun begins.  Julia's standard library does not include
an email parser, and I doubt your favorite language does either,
unless it is named after a British comedy troupe.  Python is often described
as *batteries included*, or at least it used to (seemingly the consensus among
current core devs has shifted towards [favoring third-party libraries][3rd]).

!!! note "Off-topic rambling"

    Standard library inclusion wasn't really the deal breaker here though.
    I still needed a Markdown engine and a HTML sanitizer (because Markdown
    can include HTML), and AFAICT no stdlib has them.  The read issue was
    with the lack of Julia packaging on most distributions (apart from Guix),
    and most certainly [not on NixOS], my current distro.  For the same reason
    the idea of rewriting Franklin in Python has been running in my head
    for a while now.  Python packaging is much more downstream-friendly
    and unlike Julia compilation overhead is almost non-existent.

On the other hand, it's trivial to pipe an external program's output to Julia,
e.g. ``readchomp(`echo foo bar`)`` would give you the string "foo bar".  Thus,
the to-be-written *comment generator* should take (the path to) a mail box,
the message ID of the article and a template, and write the result to stdout.
Argument parsing is, again, thankfully in Python's stdlib:

```python
from argparse import ArgumentParser
from pathlib import Path
from urllib.parse import unquote

parser = ArgumentParser()
parser.add_argument('mbox')
parser.add_argument('id', type=unquote)
parser.add_argument('template', type=Path)
args = parser.parse_args()
```

I then parsed the [mbox] into a mapping indexed by parent message IDs
as follows.  They would be HTML-unquoted so that was why I needed
to do the same for the input message ID.

```python
from collections import defaultdict
from email.utils import parsedate_to_datetime
from mailbox import mbox

date = lambda m: parsedate_to_datetime(m['Date']).date()
archive = defaultdict(list)
for message in sorted(mbox(args.mbox), key=date):
    archive[message['In-Reply-To']].append(message)
```

As said earlier, arbitrary HTML content is not exactly suitable for comments.
However, it is undeniable that HTML emails have taken over the world
and compromises must be made: allowing `multipart/alternative` of both
`text/plain` and `text/html`.  It is not the only multipart, so are
attachments and cryptographic signatures.  Since we are only interested
in the plaintext part, it is actually easier done than said to extract it:

```python
from bleach import clean, linkify
from markdown import markdown

def get_body(message):
    if message.is_multipart():
        for payload in map(get_body, message.get_payload()):
            if payload is not None: return payload
    elif message.get_content_type() == 'text/plain':
        body = message.get_payload(decode=True)
        return clean(linkify(body, output_format='html5')),
                     tags=..., protocols=...)
    return None
```

Now all that's left is to render that body and relevant headers
as an HTML segment or an RSS item.  This is when we revisit the template.
Jinja is probably the most popular in Python, thanks to Django and Flask,
but its complexity is rather unnecessary.  Instead, I went with the built-in
`str.format`.

![Double braces are brilliant, but I prefer single ones](/assets/format.jpg)

What are templates for, exactly?  Not the complete document, apparently,
because that would differs from article to article and increase the complexity
for injection.  Neither a single comment, as comments are threaded into trees
(or a forest) and their relationship can be useful.  We gotta [meet
in tha middle] and use recursive templates instead, e.g. for nested comments:

```html
<div class=comment>
  ...
  {children}
</div>
```

To render linear comments, such as for `<wfw:commentRss>`, simply move
the children out of the item as follows.

```xml
<item>
  ...
</item>
{children}
```

The rest substitutions are mostly just extracted from the email's headers.
Another bit that needs some extra decisions, though, is the parameters
for the `mailto` URI to reply to each comment:

* `In-Reply-To` set to current `Message-Id`
* `Cc` set to current `Reply-To` (if exists) or `From`
* `Subject` is inherited, with `Re:` prepended if missing

This is getting boring with a lot of trivial code, so I'll leave you
with a pointer to the completed script named [formbox] and move on
to more interesting stuff.

### Injecting Comments

Inserting HTML comment sections is pretty simple.  First I wrote a simple
Julia function `render_comments` calling `formbox` under the hood, then

```julia
hfun_comments_rendered() = render_comments("comment.html")
```

`comments_rendered` is then injected below the article.  For RSS,
it took an extra steps:

1. Insert `render_comments("comment.xml")` to the comment feed template
   `comments.xml` (notice they are two different templates) and write it
   next to the article's output `index.html`
2. Insert the path of the written comment feed to the `<wfw:commentRss>` tag
   in the article's feed item

That's it!

## Moderation

I don't want a *Terms of Services* page, it'd feel too corporate
for my *personal* website, so I will list the rules here:

1. Please be excellent to each other.  Disagreements are okay,
   personal insults are not.
2. Stay on topic.  If you want to publicly discuss with me
   about something else, start a new thread on a [mailing list]
   or reach me via social media.
3. [Use plaintext emails] and do not top post.  Markdown inline markups,
   block quotes, lists and code blocks are supported.
4. Comments are implied to be under [CC BY-SA 4.0] unless declared otherwise.
5. I reserve the right to remove any comment I don't like.
   I generally don't delete comments, but if you want to exercise
   your freedom of speech, publish it yourself.
6. I do not warrant the availability of the comments either.
   I will try my best but one day all comments may just disappear,
   just like this website itself.  Archive what you deem important.
7. These rules are subject to change according to my personal liking
   without notice.

Replies will only be rendered on the website and feed after I see them,
so please expect a delay of at least 24 hours.  If you are eager to reply
to each other, subscribe to the [site's mailing list] instead.

[^image]: TBF there are image preview scripts in Newsboat's [contrib].
[^nsfw]: Content warning: occasionally NSFW
[^silo]: Federation is getting there for social media; not so much for fora.
[^mime]: But don't use [text/markdown] for your emails.
[^wfw]: Unfortunately there's no equivalence for Atom.
[^speed]: Over 30 seconds to generate a few hundred kB of web pages.
[^guid]: Not to be confused with the micro soft hijacked term for [UUID].

[feed]: https://en.wikipedia.org/wiki/Web_feed
[Newsboat]: https://newsboat.org
[Liferea]: https://lzone.de/liferea
[Atom]: https://en.wikipedia.org/wiki/Atom_(Web_standard)
[mpv]: https://mpv.io
[TMUTB]: https://themonsterunderthebed.net
[RSS]: https://www.rssboard.org/rss-specification
[spark]: https://nixnet.social/notice/AEO3fYbuzYCJl85eD2
[android]: https://www.theregister.com/2021/05/20/google_rss_chrome_android
[Mailing list]: https://en.wikipedia.org/wiki/Mailing_list
[Usenet]: https://en.wikipedia.org/wiki/Usenet
[Hackers]: https://en.wikipedia.org/wiki/Hacker
[microblogging]: https://carlschwan.eu/2020/12/29/adding-comments-to-your-static-blog-with-mastodon
[cactus]: https://cactus.chat
[curlpit]: https://unixsheikh.com/articles/so-called-modern-web-developers-are-the-culprits.html
[MUA]: https://en.wikipedia.org/wiki/Email_client
[offline email]: https://drewdevault.com/2021/05/17/aerc-with-mbsync-postfix.html
[CSRF]: https://en.wikipedia.org/wiki/Cross-site_request_forgery
[OAuth 2.0]: https://man.sr.ht/meta.sr.ht/oauth.md
[sr.ht]: https://sr.ht
[acurl]: https://man.sr.ht/builds.sr.ht/manifest.md#tasks
[GraphQL]: https://lists.sr.ht/graphql
[wfw]: https://web.archive.org/web/20050301040756/http://www.sellsbrothers.com/spout/#exposingRssComments
[Julia]: https://julialang.org
[Franklin]: https://franklinjl.org
[semantic]: https://github.com/tlienart/Franklin.jl/issues/936
[extendable]: https://franklinjl.org/syntax/utils
[GUID]: https://www.rssboard.org/rss-profile#element-channel-item-guid
[3rd]: https://discuss.python.org/t/adopting-recommending-a-toml-parser/4068
[not on NixOS]: https://github.com/NixOS/nixpkgs/issues/20649
[mbox]: https://datatracker.ietf.org/doc/html/rfc4155
[meet in tha middle]: https://genius.com/Timbaland-meet-in-tha-middle-lyrics
[formbox]: https://sr.ht/~cnx/formbox
[Use plaintext emails]: https://useplaintext.email
[mailing list]: https://lists.sr.ht/~cnx/misc
[CC BY-SA 4.0]: https://creativecommons.org/licenses/by-sa/4.0
[site's mailing list]: https://lists.sr.ht/~cnx/site
[contrib]: https://drewdevault.com/2020/06/06/Add-a-contrib-directory.html
[text/markdown]: https://blog.brixit.nl/markdown-email
[UUID]: https://en.wikipedia.org/wiki/Universally_unique_identifier