about summary refs log tree commit diff homepage
path: root/blog/reply.md
diff options
context:
space:
mode:
authorNguyễn Gia Phong <mcsinyx@disroot.org>2022-01-09 21:47:22 +0700
committerNguyễn Gia Phong <mcsinyx@disroot.org>2022-01-09 21:47:22 +0700
commit9fd639eff5e47e8e15776f1974f0fcb9337b12f6 (patch)
tree908b0f776a2b6cccf6ff8ef3a6643d7a7d3cc39f /blog/reply.md
parentcb35d1b5811aac349fd4d09bc3c0d666bd7ebeae (diff)
downloadsite-9fd639eff5e47e8e15776f1974f0fcb9337b12f6.tar.gz
Offically introduce commenting
Diffstat (limited to 'blog/reply.md')
-rw-r--r--blog/reply.md373
1 files changed, 373 insertions, 0 deletions
diff --git a/blog/reply.md b/blog/reply.md
new file mode 100644
index 0000000..609ab97
--- /dev/null
+++ b/blog/reply.md
@@ -0,0 +1,373 @@
++++
+rss = "Comments for Static Sites without JavaScript via Emails"
+date = Date(2022, 1, 9)
+tags = ["fun", "recipe"]
++++
+
+# Comments for Static Sites without JavaScripts
+
+> I'm open for criticism\
+> But really, is it any room for criticism?
+
+Recently, I've switched my [feed] reader from [Newsboat] to [Liferea].
+The latter has a GUI and some extra features which make the experience
+a lot more comfy.  For instance, custom enclosure handling lets me
+to finally migrate all of my YouTube subscriptions to [Atom] and *conveniently*
+browse and watch videos using [mpv].  Image support also allows me
+to directly view web comics.[^image]  One of them, [The Monster Under
+the Bed][TMUTB],[^nsfw] does not embed the strips in its feed, but it
+has comments.
+
+Yes, [RSS] includes support for `<comments>`, and I was not aware of it
+until [very recently][spark].  I suppose many other people late to
+the (web feed) party are neither.  Since the rise of static sites,
+feeds have regain popularity, even for [Google to reconsider
+its direction][android].  Compare to RSS or Atom, alternatives have
+the following shortcomings:
+
+* [Usenet] is generally obsolete to most people.
+* [Mailing list] messages are immutable.
+* Fora and social media are silos.[^silo]
+* Social media are designed for ephemeral discussions.
+* Instant messaging is awful for archival.
+
+On the other hand, news feeds are commonly read-only: only a few readers
+can render comments and even fewer are able to post one.  On the server side,
+a dynamic server is needed to accept comments.  Traditionally, it's the same
+as the system serving the website.  Although this works, it is significantly
+more costly than a server dedicated to static sites, which scale a lot better.
+
+[Hackers] have came up with multiple workarounds such as using [microblogging]
+or [instant messaging][cactus] to add comments to their static sites,
+but all require client-side code execution, which is an option for neither RSS
+nor Atom.  Furthermore, [JavaScript hurts portability and performance][curlpit]
+on the WWW, hence it should be avoided unless it is absolutely impossible
+to implement a feature otherwise.  Commenting is not an exception.
+
+Following is my adventure implementing a comment section for this very blog.
+If you're also up to the task, I think you should view what I did
+as an inspiration (rather than a reference) and don't be afraid
+to experiment around until satisfaction.
+
+\toc
+
+## Choosing Back-End
+
+As mentioned earlier, static sites or not, there still needs to be
+a dynamic component to accept incoming replies.  HTTP requests would be
+the most portable since all netizen obviously have a web browser, but those
+are what we're trying to replace here.  What else does everyone has nowadays?
+Something so common that it can be used to identify people upon
+service registrations?  Exactly, emails and phone numbers!
+
+OK, Imma stop horsing around.  My back-end of choice would be emails.
+It's global, it's cheap and federated.  Cellular services almost fit the bill,
+except that they would cost an arm and leg for one to comment around the web
+everyday via SMS, whose character limit is not facilitating thoughtful
+discussions either.  As for forum, social medium or instant messaging,
+no platform has nearly as large of an user base as electronic mails.
+
+![HTML is often a trojan horse for JavaScript](/assets/html5-js.png)
+
+It's not like any email would fit the comment section though.  Especially
+not the HTML kind with a few hundred kilobytes of embedded CSS, JS
+and non-content images.  From the security standpoint alone 'tis already
+a no-go.  A light markup language like Markdown[^mime] would be much better.
+
+One great thing about using a mature technology like email is that we have
+all use cases covered.  Filtering, exporting and parsing emails work out-of-box
+regardless of one's provider, [MUA] and programming preferences.  I have
+an SourceHut account with which I can create mailing lists on-demand
+so I'm using it; however there's no reason exporting from your private inbox
+is any more difficult, presuming you have set up [offline email].
+
+!!! note "Tips and tricks"
+
+    Speaking of SourceHut, exporting a mailing list archive is rather easy,
+    one could either use the button on the web UI or download from the API.
+    As the operation is not exactly cost-free, the former is protected
+    by a [CSRF] token and the latter by [OAuth 2.0].  If you are a fellow
+    [sr.ht] user, you can use [acurl] on the build service with the URL
+    from the [GraphQL] `query { me { lists { results { name, archive } } } }`.
+
+## Designing Data Flow
+
+I promise, this sounds bigger than it really is, but first,
+let's have a glance at how static generators work.  Typically,
+there are three times templating happens:
+
+1. Conversion of individual articles into HTML *content*
+2. Inserting each article content in a page template
+   to create a complete HTML document
+3. Inserting multiple HTML contents into one RSS or Atom feed template
+
+At completion, two kinds of output are generated: website and web feed.
+Similarly, comments have to be rendered for both targets: an HTML
+comment section for web browsing and a separate RSS feed for each article's
+`<wfw:commentRss>`.[^wfw]  Therefore, injections should be done separately
+at stage 2 and 3.  The overall process of static site generation
+with email comments is illustrated as follows.
+
+![Data transformation during generation process](/assets/formbox.svg)
+
+For clarity, HTML and RSS input templates for comments and their parent page
+and web feed are omitted.  Path to each *comment feed* output being injected
+in the respective *web feed item* is also not shown in the figure.
+
+## Implementation
+
+At the time of writing, this personal website of mine was generated
+by [Julia] [Franklin], who was neither fast[^speed] nor [semantic],
+but was the only one I knew supporting LaTeX prerendering out of the box.
+Franklin is also rather [extendable] via Julia functions.
+
+### Accepting Replies
+
+Let's start with how each article can be programmatically and uniquely
+identified.  By default in RSS, a [GUID][^guid] is the permanent URL
+of the associated web page.  I am not exactly a creative person, so I mirrored
+this idea, although I only used the difference between URLs, i.e. minus
+the scheme, network location and trailing `index.html` (Franklin always
+appends it to the target path of any source file that is neither `index.md`
+nor `index.html`):
+
+```julia
+dir_url() = strip(dirname(locvar(:fd_url)), '/')
+message_id() = "%3C$(dir_url())@cnx%3E"
+```
+
+For maximum portability, threading identification is used in emails'
+`In-Reply-To` header, which expects a message ID, which must match
+`<.+@.+>`.  Once again, to avoid having to think, I opted for
+the path difference for the left hand side and my nickname `cnx`
+for the right.  The `mailto` URI could be then be constructed accordingly:
+
+```julia
+using Printf: @sprintf
+
+function hfun_mailto_comment()
+  @sprintf("mailto:%s?%s=%s&%s=Re: %s",
+           "~cnx/site@lists.sr.ht",
+           "In-Reply-To", message_id(),
+           "Subject", locvar(:title))
+end
+```
+
+The anchor was then added to the page foot:
+
+```html
+<a href="{{mailto_comment}}"
+   title="Reply via email">{{author}}</a>
+```
+
+### Rendering Comments
+
+This is when the fun begins.  Julia's standard library does not include
+an email parser, and I doubt your favorite language does either,
+unless it is named after a British comedy troupe.  Python is often described
+as *batteries included*, or at least it used to (seemingly the consensus among
+current core devs has shifted towards [favoring third-party libraries][3rd]).
+
+!!! note "Off-topic rambling"
+
+    Standard library inclusion wasn't really the deal breaker here though.
+    I still needed a Markdown engine and a HTML sanitizer (because Markdown
+    can include HTML), and AFAICT no stdlib has them.  The read issue was
+    with the lack of Julia packaging on most distributions (apart from Guix),
+    and most certainly [not on NixOS], my current distro.  For the same reason
+    the idea of rewriting Franklin in Python has been running in my head
+    for a while now.  Python packaging is much more downstream-friendly
+    and unlike Julia compilation overhead is almost non-existent.
+
+On the other hand, it's trivial to pipe an external program's output to Julia,
+e.g. ``readchomp(`echo foo bar`)`` would give you the string "foo bar".  Thus,
+the to-be-written *comment generator* should take (the path to) a mail box,
+the message ID of the article and a template, and write the result to stdout.
+Argument parsing is, again, thankfully in Python's stdlib:
+
+```python
+from argparse import ArgumentParser
+from pathlib import Path
+from urllib.parse import unquote
+
+parser = ArgumentParser()
+parser.add_argument('mbox')
+parser.add_argument('id', type=unquote)
+parser.add_argument('template', type=Path)
+args = parser.parse_args()
+```
+
+I then parsed the [mbox] into a mapping indexed by parent message IDs
+as follows.  They would be HTML-unquoted so that was why I needed
+to do the same for the input message ID.
+
+```python
+from collections import defaultdict
+from email.utils import parsedate_to_datetime
+from mailbox import mbox
+
+date = lambda m: parsedate_to_datetime(m['Date']).date()
+archive = defaultdict(list)
+for message in sorted(mbox(args.mbox), key=date):
+    archive[message['In-Reply-To']].append(message)
+```
+
+As said earlier, arbitrary HTML content is not exactly suitable for comments.
+However, it is undeniable that HTML emails have taken over the world
+and compromises must be made: allowing `multipart/alternative` of both
+`text/plain` and `text/html`.  It is not the only multipart, so are
+attachments and cryptographic signatures.  Since we are only interested
+in the plaintext part, it is actually easier done than said to extract it:
+
+```python
+from bleach import clean, linkify
+from markdown import markdown
+
+def get_body(message):
+    if message.is_multipart():
+        for payload in map(get_body, message.get_payload()):
+            if payload is not None: return payload
+    elif message.get_content_type() == 'text/plain':
+        body = message.get_payload(decode=True)
+        return clean(linkify(body, output_format='html5')),
+                     tags=..., protocols=...)
+    return None
+```
+
+Now all that's left is to render that body and relevant headers
+as an HTML segment or an RSS item.  This is when we revisit the template.
+Jinja is probably the most popular in Python, thanks to Django and Flask,
+but its complexity is rather unnecessary.  Instead, I went with the built-in
+`str.format`.
+
+![Double braces are brilliant, but I prefer single ones](/assets/format.jpg)
+
+What are templates for, exactly?  Not the complete document, apparently,
+because that would differs from article to article and increase the complexity
+for injection.  Neither a single comment, as comments are threaded into trees
+(or a forest) and their relationship can be useful.  We gotta [meet
+in tha middle] and use recursive templates instead, e.g. for nested comments:
+
+```html
+<div class=comment>
+  ...
+  {children}
+</div>
+```
+
+To render linear comments, such as for `<wfw:commentRss>`, simply move
+the children out of the item as follows.
+
+```xml
+<item>
+  ...
+</item>
+{children}
+```
+
+The rest substitutions are mostly just extracted from the email's headers.
+Another bit that needs some extra decisions, though, is the parameters
+for the `mailto` URI to reply to each comment:
+
+* `In-Reply-To` set to current `Message-Id`
+* `Cc` set to current `Reply-To` (if exists) or `From`
+* `Subject` is inherited, with `Re:` prepended if missing
+
+This is getting boring with a lot of trivial code, so I'll leave you
+with a pointer to the completed script named [formbox] and move on
+to more interesting stuff.
+
+### Injecting Comments
+
+Inserting HTML comment sections is pretty simple.  First I wrote a simple
+Julia function `render_comments` calling `formbox` under the hood, then
+
+```julia
+hfun_comments_rendered() = render_comments("comment.html")
+```
+
+`comments_rendered` is then injected below the article.  For RSS,
+it took an extra steps:
+
+1. Insert `render_comments("comment.xml")` to the comment feed template
+   `comments.xml` (notice they are two different templates) and write it
+   next to the article's output `index.html`
+2. Insert the path of the written comment feed to the `<wfw:commentRss>` tag
+   in the article's feed item
+
+That's it!
+
+## Moderation
+
+I don't want a *Terms of Services* page, it'd feel too corporate
+for my *personal* website, so I will list the rules here:
+
+1. Please be excellent to each other.  Disagreements are okay,
+   personal insults are not.
+2. Stay on topic.  If you want to publicly discuss with me
+   about something else, start a new thread on a [mailing list]
+   or reach me via social media.
+3. [Use plaintext emails] and do not top post.  Markdown inline markups,
+   block quotes, lists and code blocks are supported.
+4. Comments are implied to be under [CC BY-SA 4.0] unless declared otherwise.
+5. I reserve the right to remove any comment I don't like.
+   I generally don't delete comments, but if you want to exercise
+   your freedom of speech, publish it yourself.
+6. I do not warrant the availability of the comments either.
+   I will try my best but one day all comments may just disappear,
+   just like this website itself.  Archive what you deem important.
+7. These rules are subject to change according to my personal liking
+   without notice.
+
+Replies will only be rendered on the website and feed after I see them,
+so please expect a delay of at least 24 hours.  If you are eager to reply
+to each other, subscribe to the [site's mailing list] instead.
+
+[^image]: TBF there are image preview scripts in Newsboat's [contrib].
+[^nsfw]: Content warning: occasionally NSFW
+[^silo]: Federation is getting there for social media; not so much for fora.
+[^mime]: But don't use [text/markdown] for your emails.
+[^wfw]: Unfortunately there's no equivalence for Atom.
+[^speed]: Over 30 seconds to generate a few hundred kB of web pages.
+[^guid]: Not to be confused with the micro soft hijacked term for [UUID].
+
+[feed]: https://en.wikipedia.org/wiki/Web_feed
+[Newsboat]: https://newsboat.org
+[Liferea]: https://lzone.de/liferea
+[Atom]: https://en.wikipedia.org/wiki/Atom_(Web_standard)
+[mpv]: https://mpv.io
+[TMUTB]: https://themonsterunderthebed.net
+[RSS]: https://www.rssboard.org/rss-specification
+[spark]: https://nixnet.social/notice/AEO3fYbuzYCJl85eD2
+[android]: https://www.theregister.com/2021/05/20/google_rss_chrome_android
+[Mailing list]: https://en.wikipedia.org/wiki/Mailing_list
+[Usenet]: https://en.wikipedia.org/wiki/Usenet
+[Hackers]: https://en.wikipedia.org/wiki/Hacker
+[microblogging]: https://carlschwan.eu/2020/12/29/adding-comments-to-your-static-blog-with-mastodon
+[cactus]: https://cactus.chat
+[curlpit]: https://unixsheikh.com/articles/so-called-modern-web-developers-are-the-culprits.html
+[MUA]: https://en.wikipedia.org/wiki/Email_client
+[offline email]: https://drewdevault.com/2021/05/17/aerc-with-mbsync-postfix.html
+[CSRF]: https://en.wikipedia.org/wiki/Cross-site_request_forgery
+[OAuth 2.0]: https://man.sr.ht/meta.sr.ht/oauth.md
+[sr.ht]: https://sr.ht
+[acurl]: https://man.sr.ht/builds.sr.ht/manifest.md#tasks
+[GraphQL]: https://lists.sr.ht/graphql
+[wfw]: https://web.archive.org/web/20050301040756/http://www.sellsbrothers.com/spout/#exposingRssComments
+[Julia]: https://julialang.org
+[Franklin]: https://franklinjl.org
+[semantic]: https://github.com/tlienart/Franklin.jl/issues/936
+[extendable]: https://franklinjl.org/syntax/utils
+[GUID]: https://www.rssboard.org/rss-profile#element-channel-item-guid
+[3rd]: https://discuss.python.org/t/adopting-recommending-a-toml-parser/4068
+[not on NixOS]: https://github.com/NixOS/nixpkgs/issues/20649
+[mbox]: https://datatracker.ietf.org/doc/html/rfc4155
+[meet in tha middle]: https://genius.com/Timbaland-meet-in-tha-middle-lyrics
+[formbox]: https://sr.ht/~cnx/formbox
+[Use plaintext emails]: https://useplaintext.email
+[mailing list]: https://lists.sr.ht/~cnx/misc
+[CC BY-SA 4.0]: https://creativecommons.org/licenses/by-sa/4.0
+[site's mailing list]: https://lists.sr.ht/~cnx/site
+[contrib]: https://drewdevault.com/2020/06/06/Add-a-contrib-directory.html
+[text/markdown]: https://blog.brixit.nl/markdown-email
+[UUID]: https://en.wikipedia.org/wiki/Universally_unique_identifier