blog/pixml.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550

+++
rss = "Comments for Static Sites without JavaScript via Emails"
date = Date(2023, 3, 17)
tags = ["fun", "recipe"]
+++

# XML and Photo Gallery Generation: A Love Story

> I'm just a language, whose style sheets are good\
> Oh, Lord, please, don't let me be misunderstood

!!! note "Tips"

    As usual, the article starts with a text wall of random rambling.
    If you are only interested in the technical aspects, feel free to skip
    the first two sections.

\toc

## Introduction

Neural-optic live streaming probably, no, definitely offers
the most photorealistic graphics one can set eyes on.  [CGI] is just
a pathetic mimic, and photography or videography is no more
than a poor plagiarism attempt when compared to quantum ray-tracing
and other advanced physics simulations^W happenings.

On the other hand, we humen are rather shite at replaying visual memories,
whilst ([bit rot] aside) media can be archived [for forever].  Besides,
many of us are too busy to *touch grass* or go see cool things
as regularly as we wish to.  This is how an industry based on showing us
[mundane stuff] or [obvious bullcrap] can still manage to make tens
of thousands of [craploads] each year any why the interwebs are flooded
with pictures of cats, kitties and pussies.

Finding new shits means dopamine dispensation and that's why
[they are dope][new is always better].  As a model netizen, I adhere
to the web's social contract of mutual [shitposting] so that everyone
can have a piece.  Every blue moon, I also enjoy posting more quality
stuff like what you are reading right now, should you ignore the number
of [Mozart] references in the last three paragraphs.

## Motivation

Some other times, I also want to share the living things and sceneries
I encounter in the [new][move] place.  My camera was gifted by father
before I moved and yet I shared more photos [with strangers][pixelfed]
than with my family.  The PixelFed instance I landed on irreversibly
shrank and lossily compressed them, while dumping 5 MB images to the family
chat room just feels weird, hence I decided to gather the decency
to build a photo gallery to show my loved ones (and admittedly,
flex with online strangers).

There are not many [CMS] in the wild for photo hosting, which are
often either acts as a wall garden and/or a social network.
Building and hosting a new one is quite overkill, thus the obvious
solution left would be generating a static site.  Out of the gazillion [SSG],
I couldn't found any that meets the my requirements:

1. Generate a [web feed]
2. Automate filling [image] title and alt text
3. Offer fine-grain control for permanent [pagination]
4. Generate thumbnails with custom size and name

I mean, they perhaps exist, but the number I had to try and fight through
would cost more time than writing the web pages and feed by hand.
So I wrote them from scratch.  Y'all can stand up and clap now!

## Preliminary

Yes, I really started with writing [XHTML] and [Atom] by hand.
A web page has the following structure with namespaces omitted
and denoted in WXML ([Wisp]$\times$[SXML]) so I don't have
to close the tags (have I given up on XML too early?-).

!!! note "Syntax hints"

    For the uninitiated, any indentation or colon in Wisp represents
    an additional nest level, while a dot escape the nesting.  The at signs
    are used by SXML to denote attributes, which may remind you of [XPath].
    For example, the anchor to the previous page is `<a href=41>PREV</a>`.

```
html
  head
    link
      @ : rel "alternate"
          type "application/atom+xml"
          href "/atom.xml"
    ...
  body
    nav
      a : @ : href "41"
        . "PREV"
      h1 "PAGE 42"
      a : @ : href "43"
        . "NEXT"
    article
      @ : id "foobar"
      h2
        a : @ : href "#foobar"
          . "foobar"
      a : @ : href "/42/foo.jpg"
          img
            @ : src "/42/foo.small.jpg"
                alt "pic of foo"
                title "pic of foo"
      a : @ : href "/42/bar.jpg"
          ...
    article ...
    ...
    footer ...
```

So far, adding an `article` is not yet too cumbersome, there's only a bit
of redundancy for permanent links and the nesting level is acceptable
with the deepest being `/html/body/article/a/img`.  It gets more repetitive
once we publish it to to the linked Atom feed:

```
feed
  entry
    link
      @ : rel "alternate"
          type "application/xhtml+xml"
          href "https://gallery.example/42/#foobar"
    id "https://gallery.example/42/#foobar"
    title "foobar"
    content
      @ : type "xhtml"
      div
        img
          @ : src "https://gallery.example/42/foo.jpg"
              alt "pic of foo"
              title "pic of foo"
        img ...
    updated ...
  entry ...
  ...
```

Since web feeds are standalone documents, they must always use absolute URLs.
(Welp that's not entirely true, [XML Base] does exists, but not all readers
support it, and more importantly, certain elements such as `atom:id` disallow
relative references.)  In addition, whilst the web page links a thumbnail
to the original image to save bandwidths, the feed can be consumed one post
at a time, which thus points to the full size version.  Therefore,
copying the markup to embed it inside the Atom is error-prone and doesn't
exactly spark joy.

!!! note "Fun fact"

    What does spark joy is that we can embed XHTML directly into the web feed,
    which means the content is still XML and we don't need to quote it in CDATA.
    For other sites where contents don't accumulate up to hundreds of megabytes,
    this will allow us to slap some (SPOILER ALERT!) stylesheet on the Atom feed
    and let the user agent render it in a [human-readable form][XSL].

## Approach

I actually already spoiled it in the epigraph,[^spoiler] but for the sake
of completeness let us [discuss a few possible solutions][efficiency].
What I wanted was to reduce the redundancy of manual input, in other words,
a system transforming a custom information-dense format to standard
yet sparser ones, which in this case are XHTML and Atom.  Given some new photos
and their relevant data, the purpose was to minimize the publishing friction.

It's worth mentioning that the goal was not to minimize the input format,
the transformation speed, or feedback latency, but all of the above,
plus the cost of constructing the tool, incrementally as our requirements
slightly changes over time.  Our choice for the base [programming system]
shall affect each and every of these aspects and more.

Some technical dimensions are more equal than others, though.
For this use case, IMHO immediate feedback loop should be given
the number one priority, not only because it'd be frustrating
to have to complete multiple rituals just to preview the changes,
but also as watching and reflecting file system changes is (sadly still)
a difficult problem.

For Linux[^interjection] there's [inotify] which doesn't suck,
except when it does and misses events, and the standard POSIX build tool
[make] relies on [mtime which is also flaky][mtime].  Some SSG
work around this by spawning up a server with more sophisticated
caching mechanism and even include a HTTP server sending out refresh events.
Implementing such system is easily [more expensive][automation] than doing
the original task manually.

Luckily, there is another way.  *After* the birth of imperative
DOM manipulation programs running on VM inside browsers (Ecma scripts),
there came a (now forgotten) art of purely functional DOM transformation.
More specifically, [XSLT] can declaratively transform any XML document
to another, and its best part is that modern browser native support it,
i.e. there's no difference between editing the input document
and the hypothetical output XHTML.  For better portability
and rendering performance, we can still generate the latter
ahead-of-time (AoT) during deployment.

## Implementation

Going back to the example, the input format could boil down
to a more concise XML file, e.g. `42/index.xml`:

```
page
  @ : prev "41"
      curr "42"
      next "43"
  post : @ : title "foobar"
         picture
           @ : filename "foo"
               desc "pic of foo"
         picture ...
         ...
         time ...
  post ...
  ...
```

### Page Generation

The stylesheet should then be declared at the beginning of the file,
so that the user agent can automatically fetch and apply it
to render the output XHML:

```
<?xml-stylesheet href="/page.xslt" type="text/xsl"?>
```

XSLT is essentially a templating language, similar to PHP (which is also older)
and template libraries in your favorite languages.  For the ease of reading,
I will let the target document's namespace be the default, while aliasing
the transformation one as `xsl`.  The stylesheet for the web pages would
look something like the following, which should be self-explanatory.

```
xsl:stylesheet
  xsl:template : @ : match "/page"
    xsl:variable : @ : name "base"
      xsl:text "/"
      xsl:value-of : @ : select "@curr"
      xsl:text "/"
    html
      head ...
      body
        nav
          xsl:if : @ : test "@prev != ''"
            a : @ : href "/{@prev}/"
              . "PREV"
          h1 : xsl:text "PAGE "
               xsl:value-of : @ : select "@curr"
          xsl:if : @ : test "@next != ''"
            ...
        xsl:for-each : @ : select "post"
          xsl:variable : @ : name "id"
            xsl:value-of
              @ : select "translate(@title, ' ', '-')"
          article
            @ : id "{$id}"
            h2
              a : @ : href "#{$id}"
                  xsl:value-of : @ : select "@title"
            xsl:for-each : @ : select "picture"
              a : @ : href "{$base}{@filename}.jpg"
                  img
                    @ : src "{$base}{@filename}.small.jpg"
                        alt "{@desc}"
                        title "{@desc}"
        footer ...
```

### Feed Generation

Similarly, for Atom entries on a single page,

```
xsl:stylesheet
  xsl:variable : @ : name "root"
    . "https://gallery.example/"
  xsl:template : @ : match "/page"
    xsl:variable : @ : name "base"
      xsl:value-of : @ : select "$root"
      xsl:value-of : @ : select "@curr"
      xsl:text "/"
    xsl:for-each : @ : select "post"
      xsl:variable : @ : name "url"
        xsl:value-of : @ : select "$base"
        xsl:text "#"
        xsl:value-of
          @ : select "translate(@title, ' ', '-')"
      entry
        link
          @ : rel "alternate"
              type "application/xhtml+xml"
              href "{$url}"
        id : xsl:value-of : @ : select "$id"
        title : xsl:value-of : @ : select "@title"
        content
          @ : type "xhtml"
          div
            xsl:for-each : @ : select "picture"
              img
                @ : src "{$base}{@filename}.jpg"
                    alt "{@desc}"
                    title "{@desc}"
        updated : xsl:value-of : @ : select "time"
```

The trickier part here is concatenating the entries together.
Simple enough, instead of linking to the stylesheet in the data,
we can read XML files directly from XSLT.

```
xsl:template
  @ : match "/"
  ...
  xsl:apply-templates
    @ : select "document('42/index.xml')/page"
  xsl:apply-templates ...
  ...
```

This allows us to do other cool things, such as embedding SVG in XHTML
to make use of the parent element's [currentcolor], while keeping
the source files separate.  It is especially useful for monochromatic icons,
e.g.

```
xsl:copy-of : @ : select "document('cc.svg')/*"
xsl:copy-of : @ : select "document('by.svg')/*"
xsl:copy-of : @ : select "document('sa.svg')/*"
```

### Thumbnail Generation

So far, we have met three out of the [four requirements](#motivation),
only thing left is creating the thumbnails.  Inspired by Ethan Dalool,
I am going for [fairly large ones of 1024 px in width][big thumbs],

> large enough to comfortably browse the photos without clicking through
> to the big version of each, and the thumbnails are decently light
> and not too jpeggy at about 125-150 kilobytes on average.

At such size, I can aim for around ten photoes[^toes] per page
while maintaining a somewhat decent load time.  Plus, since the width
of images are hardcoded, page [margin] could be automatically inferred
to never stretch them.

```css
html {
    box-sizing: border-box;
    margin: auto;
    max-width: calc(1024px + 2ch);
}
body { margin: 0 1ch }
```

To generate the thumbnails, I use [epeg] together with `make` for wildcarding:

```
PICTURES := $(filter-out %.small.jpg $(PREFIX)/%.jpg, $(wildcard */*.jpg))
THUMBNAILS := $(patsubst %.jpg,%.small.jpg,$(PICTURES))

%.small.jpg: %.jpg
	epeg -w 1024 -p -q 80 $< $@
```

The Makefile also define rules for AoT compilation using [xsltproc]
for the web pages and feed.  Apparently no feed reader supports XSLT,
and for pages runtime processing negatively affect the performance
due to the multiple round trips for the stylesheet and the vector icons.

```
DATA := $(wildcard */index.xml) index.xml
PAGES := $(patsubst %.xml,%.xhtml,$(DATA))
OUTPUTS := $(THUMBNAILS) $(PAGES) atom.xml

all: $(OUTPUTS)

index.xml: $(LATEST)/index.xml
	ln -fs $< $@

%.xhtml: %.xml page.xslt
	xsltproc page.xslt $< > $@

atom.xml: atom.xslt $(DATA) $(wildcard *.svg)
	xsltproc atom.xslt > atom.xml
```

The [full implementation][src] is deployed to [px.cnx.gdn],
mirrored to the [OpenNIC] domain [pix.sinyx.indy] reusing
the former's TLS certificate, because CA/Browser Forum
disallows support for domains not recognized by ICANN and no
[CA for OpenNIC] is mature enough.

## Discussion

> *Okay you built your site using XML macros, so what?
> The syntax is clunky and you hate it so much yourself
> that not even a single line of code example here is in actual XML.
> Doesn't seem like a love story to me!*

Like all relationships, it's not that simple.  I've learned to not judge
a book by its cover and come to the understanding that XML is the (ugly)
equivalence of [sexp].[^sex]  Unlike afterthoughts such as C preprocessors,
[Django]-like templates, or even the Wisp-lookalike syntax of [Slim],
XML stylesheets is in the same data structure.  To put it another way,
one can use XSLT to generate XSLT from XSLT.  Do I need it in this case
or ever at all?  Probably not, but that certainly makes XSL a lot more
attractive in my eyes.

Furthermore, the tooling for XML is highly mature, from editors to linters
and processors to rendering engines.  It'd be lying to say you ain't
fascinated that tis possible to directly feed browsers pure data
instead of markup representations.  More than that, one can have
entirely static API endpoints that are both human- and machine-readable.

> *XSL is just declarative JS!  You are so blinded
> by your lust for functional programming that you have
> become [the very thing you swore to destroy](/blog/reply)!*

My distaste for Ecma scripts is not due to DOM manipulation.
Sure, I do find in-place modification inelegant for documents,
but if only that's the only issue.  I block them on most sites
because they can interact with many things other than just the DOM,
imposing [privacy] and [security] risks while [fucking up the UX].

Architecturally, Ecma scripts enable the absolute bloody worst possible
kind of web pages with zero data at all, fetching tiny pieces of content
in JSON and turn performance [to shit].  The user agents then try to salvage
efficiency by turning themselves into a distributed system component
and adding optimizations that shall never be (ab)used for the sake of users.
O ye [cycle of doom]!

Note that one can make a similar mistake with XSL regarding the number
of round trips, and XML stylesheets can provide the same front-end/back-end
separation.  Both can be used to provide hot loading during development
and AoT rendering in production (if not all, then many JS libraries support
pre-rendering, ignoring the monstrous [dependency graph](/blog/dedep)).
At the end of the day, it's not the matter of technology but principle:
to be the [users' best interest].

> *There is nothing complex about the photo gallery,
> any existing SSG can do the same with minor tweaks!
> You never needed to write a new one to begin with!*

I am wondering the same myself, but keep in mind there are details
I've been hiding from in the example.  I went all-in for the semantic web
with the hope for best portability and accessibility.  One thing
I haven't mentioned is the `lang` attribute, e.g. `en`, `vi` or `fr`
depending on the post.  Adding this to the web pages requires the SSG
to be somewhat modular, and even harder for the web feed.

Moreover, generic SSG are not designed to handle the difference
in content between a page's `article` and the feed's corresponding `entry`,
neither for having multiple posts in a single page.  Pagination is
also commonly implemented backwards, i.e. page 2 being the second latest one,
making it impossible to avoid link rot.

Not to suggest that the majority of SSG are poorly designed, just that
from a certain amount of [context] difference, tis cheaper to just redesign
from scratch.  This is not about XSL vs Go/Python/JS for SSG or web dev
in general, but this specific and happen-to-be-far-from-complex case.

## Conclusion

At the time of writing, XML has pretty much been superseded by JSON or YAML,
for the better or worse.  I have no love for YAML for obvious reasons,
but it also saddens me to sometimes see JSON being solely used as a container
for HTML.  I hope that this essay can [awaken something in you] about XML
and remind you about the semantic web in your next project.  It worked out
for me, maybe it'll work out for you too!

The story between XML and my photo gallery is a fond love story.
They were born for each other, there was no drama, everything just werkt.
Their romance inspire me to better appreciate stability and maturity,
and value those right in front of my eyes yet I had been *too blind to see*.
Anyway, this is getting too long, so Imma end it with another [song].

> Lookin' for perfect\
> Surrounded by artificial\
> You're the closest thing to real I've seen\
> Sure, everyone has their problems\
> That's a given\
> Yours are the easiest to tolerate

[^spoiler]: If you know, you know.
[^interjection]: Yup, just the kernel.
[^toes]: *Thumb*nails, pho*toes*, get it?-)
[^sex]: Or conventionally in most Lisp 1's, `sex?`.

[CGI]: https://en.wikipedia.org/wiki/Computer-generated_imagery
[bit rot]: https://en.wikipedia.org/wiki/Data_degradation
[for forever]: https://xkcd.com/1683
[mundane stuff]: https://en.wikipedia.org/wiki/Drama
[obvious bullcrap]: https://en.wikipedia.org/wiki/Fiction
[craploads]: https://antifandom.com/how-i-met-your-mother/wiki/Crapload
[new is always better]: https://www.youtube.com/watch?v=1SNRULEnTVQ
[shitposting]: https://fe.disroot.org/@mcsinyx
[Mozart]: https://peervideo.club/w/uByA7Czy7PWYMqnu8FgXvW

[move]: https://github.com/zig-community/user-map/pull/120
[pixelfed]: https://fotofed.nl/cnx
[CMS]: https://en.wikipedia.org/wiki/Content_management_system
[SSG]: https://en.wikipedia.org/wiki/Static_site_generator
[web feed]: https://en.wikipedia.org/wiki/Web_feed
[image]: https://developer.mozilla.org/en-US/docs/Web/HTML/Element/Img
[pagination]: https://en.wikipedia.org/wiki/Pagination

[XHTML]: https://en.wikipedia.org/wiki/XHTML
[Atom]: https://www.rfc-editor.org/rfc/rfc4287
[Wisp]: https://www.draketo.de/software/wisp
[SXML]: https://okmij.org/ftp/Scheme/SXML.html
[XPath]: https://www.w3.org/TR/xpath
[XML Base]: https://www.w3.org/TR/xmlbase
[XSL]: https://simonesilvestroni.com/blog/build-a-human-readable-rss-with-jekyll

[efficiency]: https://xkcd.com/1445
[programming system]: https://programming-journal.org/2023/7/13
[inotify]: https://man7.org/linux/man-pages/man7/inotify.7.html
[make]: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/make.html
[mtime]: https://apenwarr.ca/log/20181113
[automation]: https://xkcd.com/1319
[XSLT]: https://www.w3.org/standards/xml/transformation

[currentcolor]: https://developer.mozilla.org/en-US/docs/Web/CSS/color_value#currentcolor_keyword
[big thumbs]: https://voussoir.net/writing/sharing_photos
[epeg]: https://github.com/mattes/epeg
[margin]: https://en.wikipedia.org/wiki/Margin_(typography)
[xsltproc]: https://gnome.pages.gitlab.gnome.org/libxslt/xsltproc.html
[src]: https://trong.loang.net/~cnx/px
[px.cnx.gdn]: https://px.cnx.gdn
[OpenNIC]: https://www.opennic.org
[pix.sinyx.indy]: https://pix.sinyx.indy
[CA for OpenNIC]: https://wiki.opennic.org/opennic/tls

[sexp]: https://en.wikipedia.org/wiki/S-expression
[Django]: https://docs.djangoproject.com/en/dev/topics/templates
[Slim]: https://github.com/slim-template/slim
[privacy]: https://en.wikipedia.org/wiki/Mouse_tracking
[security]: https://react-etc.net/entry/exploiting-speculative-execution-meltdown-spectre-via-javascript
[fucking up the UX]: https://meta.stackexchange.com/q/2980/698165
[to shit]: https://unixsheikh.com/articles/so-called-modern-web-developers-are-the-culprits.html
[cycle of doom]: https://en.wikipedia.org/wiki/Wirth%27s_law
[users' best interest]: https://pluralistic.net/2023/01/21/potemkin-ai/#hey-guys
[link rot]: https://en.wikipedia.org/wiki/Link_rot
[context]: https://guide.handmade-seattle.com/c/2021/context-is-everything

[awaken something in you]: https://www.youtube.com/watch?v=F3QPWrLFsOA
[song]: https://www.youtube.com/watch?v=5LvOdWi3Qno