--- title: "Implementing DICT protocol: Part 1" date: 2022-01-16 lang: en categories: [ blog, dev, guide ] tags: [dict, dictionary, go, golang, rfc2229, tcp ] translationKey: "2022-01-16-Dict-1" --- ## DICT Protocol What is DICT protocol?

The Dictionary Server Protocol (DICT) is a TCP transaction based query/response protocol that allows a client to access dictionary definitions from a set of natural language dictionary databases.

DICT Protocol - RFC 2229
Notable implementations for this include [dict(d)][dict] and [GNU dico(d)][dico]; the former is the reference implementation that supports multiple database formats, as listed in [dictfmt (1)][man-dictfmt]. [dict]: https://github.com/cheusov/dictd [dico]: https://www.gnu.org.ua/software/dico/ [man-dictfmt]: https://linux.die.net/man/1/dictfmt I intend to implement a server and multiple clients (CLI, GUI, ~~web~~) to this protocol, as well as some tools to easily create a dictd-readable database. ## Why? No practical reason, but [dict] is one of the first command line tool introduced to me and easily one of my favorite, along with curl and [jq][jq]. It's basically just a dictionary app, but it's cool: - works perfectly in terminal - easily self-hostable - fast - has cool dictionaries (though only Debian, Arch and derivatives distribute those) [jq]: /posts/2021-06-13-jq/ Also, I'm writing dictionaries for my [conlangs][conlang] and I want to distribute them via this protocol. Clearly, implementing a server that is already implemented doesn't help, but I tend to go down rabbit holes. [conlang]: /misc/#conlangs I also like to explore non-web protocols, and starting with something simple like DICT might be a good idea. ## Reading the spec The spec (linked at the top of this post) is shorter and easier to read than I thought. Ignoring the introduction, examples and citation, it's les than 20 pages. There are five classes of commands: - Querying the database: `DEFINE`, `MATCH` - `SHOW` metadata about the servers and the databases - Utilities: informing `CLIENT` name, check `STATUS`, show `HELP`, show `OPTION` and `QUIT` - Authentication: `AUTH` and `SASLAUTH` The authentication ones are optional, and I don't find that useful, so I won't implement it anyway, this limits to the first three categories. ## Handling TCP DICT is based on TCP, and there is a neat interactive TCP tool called [`telnet`][telnet], which I used for testing the commands. [telnet]: https://en.wikipedia.org/wiki/Telnet ### telnet DICT runs on port 2628: ```sh $ telnet dict.org 2628 Trying 199.48.130.6... Connected to dict.org. Escape character is '^]'. 220 dict.dict.org dictd 1.12.1/rf on Linux 4.19.0-10-amd64 <89168346.27665.1642303045@dict.dict.org> ``` Let's try out some commands to understand how this work. Note that I prefix the command with `~> ` here so that it stands out of the response, and truncate long results with `[...]`. Let's first show what databases there are ``` ~> SHOW DB 110 166 databases present [...] . 250 ok ``` There are a lot of dictionaries here, including [GCIDE][gcide], [WordNet][wn], [The Jargon File][jargon], [V.E.R.A.][vera], [FOLDOC][foldoc], but most of them are [FreeDict][fd] dictionaries. To a word, the syntax is ``` ~> MATCH database strategy word ``` Strategy is how the server will match the word you're looking up. To list all strategies available, send the command: ``` ~> SHOW STRATEGIES ``` There are various strategies supported by dictd, for example, `substring`, which matches if the entry has the queried word as substring: ``` ~> MATCH jargon substring program 152 13 matches found jargon "c programmer's disease" jargon "cargo cult programming" jargon "mickey mouse program" jargon "perfect programmer syndrome" jargon "program" [...] . 250 ok [d/m/c = 0/13/5775; 0.000r 0.000u 0.000s] ``` This command only show which words in the database, if any, satisfy the match, without showing the definition. To actually view a definition, one has to supply the dictionary name to the `DEFINE` command. Note that, you can also use `*` for both `DEFINE` and `MATCH` command, which will define/match for all dictionaries. ``` ~> DEFINE * programming 150 3 definitions retrieved 151 "programming" wn "WordNet (r) 3.0 (2006)" programming [...] . 151 "programming" jargon "The Jargon File (version 4.4.7, 29 Dec 2003)" programming n. [...] . 151 "programming" foldoc "The Free On-line Dictionary of Computing (30 December 2018)" programming . 250 ok [d/m/c = 3/0/145; 0.000r 0.000u 0.000s] ``` That's a gist of how to look up words with DICT protocol. You can find more commands with: ``` ~> HELP [...] . 250 ok ``` Finally, to end the session, the command is: ``` ~> QUIT 221 bye [d/m/c = 0/0/0; 123.000r 0.000u 0.000s] ``` Note that, the response always ends with a period and a `250 ok` response---this is equivalent to HTTP's 200 OK---except for `QUIT`. These response code are defined in [the protocol specification][rfc2229]. Commands other than `HELP` has some additional statistics, though this is optional. I figured out that `d` means definitions, `m` means matches, and `s` is probably the time it took to query (why are they always zero, though?), but no clues on what `c`, `r`, and `u` mean. I might check the [source code][dict] to figure that out, but let's leave it for another time. [gcide]: https://gcide.gnu.org.ua/ [wn]: https://wordnet.princeton.edu/ [jargon]: http://www.catb.org/~esr/jargon/ [foldoc]: https://foldoc.org/ [vera]: https://savannah.gnu.org/projects/vera [fd]: https://freedict.org/ [rfc2229]: https://datatracker.ietf.org/doc/html/rfc2229#page-23 ### Go Of course we are not going to make the users type these commands (though it's not too unintuitive and can be easily remembered). I chose Go to build the CLI client, though without any conscious consideration of fitness. I'm trying out new things[^0] after all. From the [doc][go-net], we can figure out how to make a TCP connection. ```go conn, err := net.Dial("tcp", "golang.org:80") if err != nil { // handle error } fmt.Fprintf(conn, "GET / HTTP/1.0\r\n\r\n") status, err := bufio.NewReader(conn).ReadString('\n') // ... ``` Let's copy that and replace with DICT command instead of HTTP: ```go conn, err := net.Dial("tcp", "dict.org:2628") if err != nil { panic(err) } defer conn.Close() buf := bufio.NewReader(conn) fmt.Fprintf(conn, "MATCH jargon word programming\n") fmt.Fprintf(conn, "QUIT\n") for { response, err := buf.ReadString('\n') if err != nil { // oftentimes this is EOF error fmt.Println(err) break } fmt.Printf(response) } ``` Running this code, we get response: ``` 220 dict.dict.org dictd 1.12.1/rf on Linux 4.19.0-10-amd64 <89266600.1914.1642341395@dict.dict.org> 152 4 matches found jargon "cargo cult programming" jargon "programming" jargon "programming fluid" jargon "voodoo programming" . 250 ok [d/m/c = 0/4/3814; 0.000r 0.000u 0.000s] 221 bye [d/m/c = 0/0/0; 0.000r 0.000u 0.000s] EOF ``` which is a good start. There is a problem with this code: currently we are reading line by line, rather than reading the whole response for each command. We can't know if line 3 is response for the first command or the second this way. A solution is to check if the line is prefixed with a status code, but do we have a better solution? Let's wait till next week! [go-net]: https://pkg.go.dev/net [^0]: Not really, I've written a CLI client for Wiktionary API with Go before.