diff options
author | Huy Ngo <huyngo@disroot.org> | 2022-01-16 21:04:12 +0700 |
---|---|---|
committer | Huy Ngo <huyngo@disroot.org> | 2022-01-16 21:04:12 +0700 |
commit | 69aa7b9de1827abd1e0b4a2541c12e13bcd0f796 (patch) | |
tree | 4e020eb8d3b2184e7265b03ed986d9a37a6a2b18 /content/posts | |
parent | 45605c48d95ff78541add9d16082c2caac308d9d (diff) | |
download | blog-69aa7b9de1827abd1e0b4a2541c12e13bcd0f796.tar.gz |
Add new blog post
Diffstat (limited to 'content/posts')
-rw-r--r-- | content/posts/2022-01-16-dict-1.md | 278 |
1 files changed, 278 insertions, 0 deletions
diff --git a/content/posts/2022-01-16-dict-1.md b/content/posts/2022-01-16-dict-1.md new file mode 100644 index 0000000..37d5660 --- /dev/null +++ b/content/posts/2022-01-16-dict-1.md @@ -0,0 +1,278 @@ +--- +title: "Implementing DICT protocol: Part 1" +date: 2022-01-16 +lang: en +categories: [ blog ] +tags: [dict, dictionary, go, golang, rfc2229, tcp ] +translationKey: "2022-01-16-Dict-1" +--- + +## DICT Protocol + +What is DICT protocol? + +<figure> + <blockquote> + <p> + The Dictionary Server Protocol (DICT) is a TCP transaction based + query/response protocol that allows a client to access dictionary + definitions from a set of natural language dictionary databases. + </p> + <figcaption> + <cite> + <a href="https://datatracker.ietf.org/doc/html/rfc2229"> + DICT Protocol - RFC 2229 + </a> + </cite> + </figcaption> + </blockquote> +</figure> + +Notable implementations for this include [dict(d)][dict] +and [GNU dico(d)][dico]; the former is the reference implementation that +supports multiple database formats, as listed in [dictfmt (1)][man-dictfmt]. + +[dict]: https://github.com/cheusov/dictd +[dico]: https://www.gnu.org.ua/software/dico/ +[man-dictfmt]: https://linux.die.net/man/1/dictfmt + +I intend to implement a server and multiple clients (CLI, GUI, ~~web~~) to this +protocol, as well as some tools to easily create a dictd-readable database. + +## Why? + +No practical reason, but [dict] is one of the first command line tool +introduced to me and easily one of my favorite, along with curl and [jq][jq]. +It's basically just a dictionary app, but it's cool: + +- works perfectly in terminal +- easily self-hostable +- fast +- has cool dictionaries (though only Debian, Arch and derivatives distribute + those) + +[jq]: /posts/2021-06-13-jq/ + +Also, I'm writing dictionaries for my [conlangs][conlang] and I want to +distribute them via this protocol. Clearly, implementing a server that is +already implemented doesn't help, but I tend to go down rabbit holes. + +[conlang]: /misc/#conlangs + +I also like to explore non-web protocols, and starting with something simple +like DICT might be a good idea. + +## Reading the spec + +The spec (linked at the top of this post) is shorter and easier to read than I +thought. Ignoring the introduction, examples and citation, it's les than 20 +pages. There are five classes of commands: + +- Querying the database: `DEFINE`, `MATCH` +- `SHOW` metadata about the servers and the databases +- Utilities: informing `CLIENT` name, check `STATUS`, show `HELP`, show + `OPTION` and `QUIT` +- Authentication: `AUTH` and `SASLAUTH` + +The authentication ones are optional, and I don't find that useful, so I +won't implement it anyway, this limits to the first three categories. + +## Handling TCP + +DICT is based on <abbr title="Transmission Control Protocol">TCP</abbr>, +and there is a neat interactive <abbr>TCP</abbr> tool called [`telnet`][telnet], +which I used for testing the commands. + +[telnet]: https://en.wikipedia.org/wiki/Telnet + +### telnet + +DICT runs on port 2628: + +```sh +$ telnet dict.org 2628 +Trying 199.48.130.6... +Connected to dict.org. +Escape character is '^]'. +220 dict.dict.org dictd 1.12.1/rf on Linux 4.19.0-10-amd64 <auth.mime> <89168346.27665.1642303045@dict.dict.org> +``` + +Let's try out some commands to understand how this work. Note that I prefix +the command with `~> ` here so that it stands out of the response, and truncate +long results with `[...]`. + +Let's first show what databases there are + +``` +~> SHOW DB +110 166 databases present +[...] +. +250 ok +``` + +There are a lot of dictionaries here, including [GCIDE][gcide], [WordNet][wn], +[The Jargon File][jargon], [V.E.R.A.][vera], [FOLDOC][foldoc], but most of them +are [FreeDict][fd] dictionaries. + +To a word, the syntax is + +``` +~> MATCH database strategy word +``` + +Strategy is how the server will match the word you're looking up. To list all +strategies available, send the command: + +``` +~> SHOW STRATEGIES +``` + +There are various strategies supported by dictd, for example, `substring`, +which matches if the entry has the queried word as substring: + +``` +~> MATCH jargon substring program +152 13 matches found +jargon "c programmer's disease" +jargon "cargo cult programming" +jargon "mickey mouse program" +jargon "perfect programmer syndrome" +jargon "program" +[...] +. +250 ok [d/m/c = 0/13/5775; 0.000r 0.000u 0.000s] +``` + +This command only show which words in the database, if any, satisfy the match, +without showing the definition. To actually view a definition, one has to +supply the dictionary name to the `DEFINE` command. Note that, you can also +use `*` for both `DEFINE` and `MATCH` command, which will define/match for all +dictionaries. + +``` +~> DEFINE * programming +150 3 definitions retrieved +151 "programming" wn "WordNet (r) 3.0 (2006)" +programming + [...] +. +151 "programming" jargon "The Jargon File (version 4.4.7, 29 Dec 2003)" +programming + n. + + [...] + +. +151 "programming" foldoc "The Free On-line Dictionary of Computing (30 December 2018)" +programming + +. +250 ok [d/m/c = 3/0/145; 0.000r 0.000u 0.000s] +``` + +That's a gist of how to look up words with DICT protocol. You can find more +commands with: + +``` +~> HELP +[...] +. +250 ok +``` + +Finally, to end the session, the command is: + +``` +~> QUIT +221 bye [d/m/c = 0/0/0; 123.000r 0.000u 0.000s] +``` + +Note that, the response always ends with a period and a `250 ok` +response---this is equivalent to HTTP's 200 OK---except for `QUIT`. These +response code are defined in [the protocol specification][rfc2229]. + +Commands other than `HELP` has some additional statistics, though this is +optional. I figured out that `d` means definitions, `m` means matches, and `s` +is probably the time it took to query (why are they always zero, though?), but +no clues on what `c`, `r`, and `u` mean. I might check the [source code][dict] +to figure that out, but let's leave it for another time. + +[gcide]: https://gcide.gnu.org.ua/ +[wn]: https://wordnet.princeton.edu/ +[jargon]: http://www.catb.org/~esr/jargon/ +[foldoc]: https://foldoc.org/ +[vera]: https://savannah.gnu.org/projects/vera +[fd]: https://freedict.org/ +[rfc2229]: https://datatracker.ietf.org/doc/html/rfc2229#page-23 + +### Go + +Of course we are not going to make the users type these commands (though it's +not too unintuitive and can be easily remembered). I chose Go to build the CLI +client, though without any conscious consideration of fitness. I'm trying out +new things[^0] after all. + +From the [doc][go-net], we can figure out how to make a TCP connection. + +```go +conn, err := net.Dial("tcp", "golang.org:80") +if err != nil { + // handle error +} +fmt.Fprintf(conn, "GET / HTTP/1.0\r\n\r\n") +status, err := bufio.NewReader(conn).ReadString('\n') +// ... +``` + +Let's copy that and replace with DICT command instead of HTTP: + +```go +conn, err := net.Dial("tcp", "dict.org:2628") +if err != nil { + panic(err) +} +defer conn.Close() +buf := bufio.NewReader(conn) +fmt.Fprintf(conn, "MATCH jargon word programming\n") +fmt.Fprintf(conn, "QUIT\n") + +for { + response, err := buf.ReadString('\n') + if err != nil { + // oftentimes this is EOF error + fmt.Println(err) + break + } + fmt.Printf(response) +} +``` + +Running this code, we get response: + +``` +220 dict.dict.org dictd 1.12.1/rf on Linux 4.19.0-10-amd64 <auth.mime> <89266600.1914.1642341395@dict.dict.org> +152 4 matches found +jargon "cargo cult programming" +jargon "programming" +jargon "programming fluid" +jargon "voodoo programming" +. +250 ok [d/m/c = 0/4/3814; 0.000r 0.000u 0.000s] +221 bye [d/m/c = 0/0/0; 0.000r 0.000u 0.000s] +EOF +``` + +which is a good start. + +There is a problem with this code: currently we are reading line by line, +rather than reading the whole response for each command. We can't know if line +3 is response for the first command or the second this way. A solution is to +check if the line is prefixed with a status code, but do we have a better +solution? + +Let's wait till next week! + +[go-net]: https://pkg.go.dev/net + +[^0]: Not really, I've written a CLI client for Wiktionary API with Go before. |