about summary refs log tree commit diff
path: root/content
diff options
context:
space:
mode:
Diffstat (limited to 'content')
-rw-r--r--content/posts/2022-01-16-dict-1.md278
1 files changed, 278 insertions, 0 deletions
diff --git a/content/posts/2022-01-16-dict-1.md b/content/posts/2022-01-16-dict-1.md
new file mode 100644
index 0000000..37d5660
--- /dev/null
+++ b/content/posts/2022-01-16-dict-1.md
@@ -0,0 +1,278 @@
+---
+title: "Implementing DICT protocol: Part 1"
+date: 2022-01-16
+lang: en
+categories: [ blog ]
+tags: [dict, dictionary, go, golang, rfc2229, tcp ]
+translationKey: "2022-01-16-Dict-1"
+---
+
+## DICT Protocol
+
+What is DICT protocol?
+
+<figure>
+  <blockquote>
+    <p>
+    The Dictionary Server Protocol (DICT) is a TCP transaction based
+    query/response protocol that allows a client to access dictionary
+    definitions from a set of natural language dictionary databases.
+    </p>
+    <figcaption>
+      <cite>
+        <a href="https://datatracker.ietf.org/doc/html/rfc2229">
+          DICT Protocol - RFC 2229
+        </a>
+      </cite>
+    </figcaption>
+  </blockquote>
+</figure>
+
+Notable implementations for this include [dict(d)][dict]
+and [GNU dico(d)][dico]; the former is the reference implementation that
+supports multiple database formats, as listed in [dictfmt (1)][man-dictfmt].
+
+[dict]: https://github.com/cheusov/dictd
+[dico]: https://www.gnu.org.ua/software/dico/
+[man-dictfmt]: https://linux.die.net/man/1/dictfmt
+
+I intend to implement a server and multiple clients (CLI, GUI, ~~web~~) to this
+protocol, as well as some tools to easily create a dictd-readable database.
+
+## Why?
+
+No practical reason, but [dict] is one of the first command line tool
+introduced to me and easily one of my favorite, along with curl and [jq][jq].
+It's basically just a dictionary app, but it's cool:
+
+- works perfectly in terminal
+- easily self-hostable
+- fast
+- has cool dictionaries (though only Debian, Arch and derivatives distribute
+  those)
+
+[jq]: /posts/2021-06-13-jq/
+
+Also, I'm writing dictionaries for my [conlangs][conlang] and I want to
+distribute them via this protocol.  Clearly, implementing a server that is
+already implemented doesn't help, but I tend to go down rabbit holes.
+
+[conlang]: /misc/#conlangs
+
+I also like to explore non-web protocols, and starting with something simple
+like DICT might be a good idea.
+
+## Reading the spec
+
+The spec (linked at the top of this post) is shorter and easier to read than I
+thought.  Ignoring the introduction,  examples and citation, it's les than 20
+pages.  There are five classes of commands:
+
+- Querying the database: `DEFINE`, `MATCH`
+- `SHOW` metadata about the servers and the databases
+- Utilities: informing `CLIENT` name, check `STATUS`, show `HELP`, show
+    `OPTION` and `QUIT`
+- Authentication: `AUTH` and `SASLAUTH`
+
+The authentication ones are optional, and I don't find that useful, so I
+won't implement it anyway, this limits to the first three categories.
+
+## Handling TCP
+
+DICT is based on <abbr title="Transmission Control Protocol">TCP</abbr>,
+and there is a neat interactive <abbr>TCP</abbr> tool called [`telnet`][telnet],
+which I used for testing the commands.
+
+[telnet]: https://en.wikipedia.org/wiki/Telnet
+
+### telnet
+
+DICT runs on port 2628:
+
+```sh
+$ telnet dict.org 2628
+Trying 199.48.130.6...
+Connected to dict.org.
+Escape character is '^]'.
+220 dict.dict.org dictd 1.12.1/rf on Linux 4.19.0-10-amd64 <auth.mime> <89168346.27665.1642303045@dict.dict.org>
+```
+
+Let's try out some commands to understand how this work.  Note that I prefix
+the command with `~> ` here so that it stands out of the response, and truncate
+long results with `[...]`.
+
+Let's first show what databases there are
+
+```
+~> SHOW DB
+110 166 databases present
+[...]
+.
+250 ok
+```
+
+There are a lot of dictionaries here, including [GCIDE][gcide], [WordNet][wn],
+[The Jargon File][jargon], [V.E.R.A.][vera], [FOLDOC][foldoc], but most of them
+are [FreeDict][fd] dictionaries.
+
+To  a word, the syntax is
+
+```
+~> MATCH database strategy word
+```
+
+Strategy is how the server will match the word you're looking up.  To list all
+strategies available, send the command:
+
+```
+~> SHOW STRATEGIES
+```
+
+There are various strategies supported by dictd, for example, `substring`,
+which matches if the entry has the queried word as substring:
+
+```
+~> MATCH jargon substring program
+152 13 matches found
+jargon "c programmer's disease"
+jargon "cargo cult programming"
+jargon "mickey mouse program"
+jargon "perfect programmer syndrome"
+jargon "program"
+[...]
+.
+250 ok [d/m/c = 0/13/5775; 0.000r 0.000u 0.000s]
+```
+
+This command only show which words in the database, if any, satisfy the match,
+without showing the definition.  To actually view a definition, one has to
+supply the dictionary name to the `DEFINE` command.  Note that, you can also
+use `*` for both `DEFINE` and `MATCH` command, which will define/match for all
+dictionaries.
+
+```
+~> DEFINE * programming
+150 3 definitions retrieved
+151 "programming" wn "WordNet (r) 3.0 (2006)"
+programming
+    [...]
+.
+151 "programming" jargon "The Jargon File (version 4.4.7, 29 Dec 2003)"
+programming
+ n.
+
+    [...]
+
+.
+151 "programming" foldoc "The Free On-line Dictionary of Computing (30 December 2018)"
+programming
+
+.
+250 ok [d/m/c = 3/0/145; 0.000r 0.000u 0.000s]
+```
+
+That's a gist of how to look up words with DICT protocol.  You can find more
+commands with:
+
+```
+~> HELP
+[...]
+.
+250 ok
+```
+
+Finally, to end the session, the command is:
+
+```
+~> QUIT
+221 bye [d/m/c = 0/0/0; 123.000r 0.000u 0.000s]
+```
+
+Note that, the response always ends with a period and a `250 ok`
+response---this is equivalent to HTTP's 200 OK---except for `QUIT`. These
+response code are defined in [the protocol specification][rfc2229].
+
+Commands other than `HELP` has some additional statistics, though this is
+optional.  I figured out that `d` means definitions, `m` means matches, and `s`
+is probably the time it took to query (why are they always zero, though?), but
+no clues on what `c`, `r`, and `u` mean.  I might check the [source code][dict]
+to figure that out, but let's leave it for another time.
+
+[gcide]: https://gcide.gnu.org.ua/
+[wn]: https://wordnet.princeton.edu/
+[jargon]: http://www.catb.org/~esr/jargon/
+[foldoc]: https://foldoc.org/
+[vera]: https://savannah.gnu.org/projects/vera
+[fd]: https://freedict.org/
+[rfc2229]: https://datatracker.ietf.org/doc/html/rfc2229#page-23
+
+### Go
+
+Of course we are not going to make the users type these commands (though it's
+not too unintuitive and can be easily remembered).  I chose Go to build the CLI
+client, though without any conscious consideration of fitness.  I'm trying out
+new things[^0] after all.
+
+From the [doc][go-net], we can figure out how to make a TCP connection.
+
+```go
+conn, err := net.Dial("tcp", "golang.org:80")
+if err != nil {
+	// handle error
+}
+fmt.Fprintf(conn, "GET / HTTP/1.0\r\n\r\n")
+status, err := bufio.NewReader(conn).ReadString('\n')
+// ...
+```
+
+Let's copy that and replace with DICT command instead of HTTP:
+
+```go
+conn, err := net.Dial("tcp", "dict.org:2628")
+if err != nil {
+	panic(err)
+}
+defer conn.Close()
+buf := bufio.NewReader(conn)
+fmt.Fprintf(conn, "MATCH jargon word programming\n")
+fmt.Fprintf(conn, "QUIT\n")
+
+for {
+	response, err := buf.ReadString('\n')
+	if err != nil {
+		// oftentimes this is EOF error
+		fmt.Println(err)
+		break
+	}
+	fmt.Printf(response)
+}
+```
+
+Running this code, we get response:
+
+```
+220 dict.dict.org dictd 1.12.1/rf on Linux 4.19.0-10-amd64 <auth.mime> <89266600.1914.1642341395@dict.dict.org>
+152 4 matches found
+jargon "cargo cult programming"
+jargon "programming"
+jargon "programming fluid"
+jargon "voodoo programming"
+.
+250 ok [d/m/c = 0/4/3814; 0.000r 0.000u 0.000s]
+221 bye [d/m/c = 0/0/0; 0.000r 0.000u 0.000s]
+EOF
+```
+
+which is a good start.
+
+There is a problem with this code: currently we are reading line by line,
+rather than reading the whole response for each command.  We can't know if line
+3 is response for the first command or the second this way.  A solution is to
+check if the line is prefixed with a status code, but do we have a better
+solution?
+
+Let's wait till next week!
+
+[go-net]: https://pkg.go.dev/net
+
+[^0]: Not really, I've written a CLI client for Wiktionary API with Go before.