about summary refs log tree commit diff
path: root/content/posts/2022-01-16-dict-1.md
blob: 37d5660d468ca0ec4a5e1a9b045d6fc93bfddcda (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
---
title: "Implementing DICT protocol: Part 1"
date: 2022-01-16
lang: en
categories: [ blog ]
tags: [dict, dictionary, go, golang, rfc2229, tcp ]
translationKey: "2022-01-16-Dict-1"
---

## DICT Protocol

What is DICT protocol?

<figure>
  <blockquote>
    <p>
    The Dictionary Server Protocol (DICT) is a TCP transaction based
    query/response protocol that allows a client to access dictionary
    definitions from a set of natural language dictionary databases.
    </p>
    <figcaption>
      <cite>
        <a href="https://datatracker.ietf.org/doc/html/rfc2229">
          DICT Protocol - RFC 2229
        </a>
      </cite>
    </figcaption>
  </blockquote>
</figure>

Notable implementations for this include [dict(d)][dict]
and [GNU dico(d)][dico]; the former is the reference implementation that
supports multiple database formats, as listed in [dictfmt (1)][man-dictfmt].

[dict]: https://github.com/cheusov/dictd
[dico]: https://www.gnu.org.ua/software/dico/
[man-dictfmt]: https://linux.die.net/man/1/dictfmt

I intend to implement a server and multiple clients (CLI, GUI, ~~web~~) to this
protocol, as well as some tools to easily create a dictd-readable database.

## Why?

No practical reason, but [dict] is one of the first command line tool
introduced to me and easily one of my favorite, along with curl and [jq][jq].
It's basically just a dictionary app, but it's cool:

- works perfectly in terminal
- easily self-hostable
- fast
- has cool dictionaries (though only Debian, Arch and derivatives distribute
  those)

[jq]: /posts/2021-06-13-jq/

Also, I'm writing dictionaries for my [conlangs][conlang] and I want to
distribute them via this protocol.  Clearly, implementing a server that is
already implemented doesn't help, but I tend to go down rabbit holes.

[conlang]: /misc/#conlangs

I also like to explore non-web protocols, and starting with something simple
like DICT might be a good idea.

## Reading the spec

The spec (linked at the top of this post) is shorter and easier to read than I
thought.  Ignoring the introduction,  examples and citation, it's les than 20
pages.  There are five classes of commands:

- Querying the database: `DEFINE`, `MATCH`
- `SHOW` metadata about the servers and the databases
- Utilities: informing `CLIENT` name, check `STATUS`, show `HELP`, show
    `OPTION` and `QUIT`
- Authentication: `AUTH` and `SASLAUTH`

The authentication ones are optional, and I don't find that useful, so I
won't implement it anyway, this limits to the first three categories.

## Handling TCP

DICT is based on <abbr title="Transmission Control Protocol">TCP</abbr>,
and there is a neat interactive <abbr>TCP</abbr> tool called [`telnet`][telnet],
which I used for testing the commands.

[telnet]: https://en.wikipedia.org/wiki/Telnet

### telnet

DICT runs on port 2628:

```sh
$ telnet dict.org 2628
Trying 199.48.130.6...
Connected to dict.org.
Escape character is '^]'.
220 dict.dict.org dictd 1.12.1/rf on Linux 4.19.0-10-amd64 <auth.mime> <89168346.27665.1642303045@dict.dict.org>
```

Let's try out some commands to understand how this work.  Note that I prefix
the command with `~> ` here so that it stands out of the response, and truncate
long results with `[...]`.

Let's first show what databases there are

```
~> SHOW DB
110 166 databases present
[...]
.
250 ok
```

There are a lot of dictionaries here, including [GCIDE][gcide], [WordNet][wn],
[The Jargon File][jargon], [V.E.R.A.][vera], [FOLDOC][foldoc], but most of them
are [FreeDict][fd] dictionaries.

To  a word, the syntax is

```
~> MATCH database strategy word
```

Strategy is how the server will match the word you're looking up.  To list all
strategies available, send the command:

```
~> SHOW STRATEGIES
```

There are various strategies supported by dictd, for example, `substring`,
which matches if the entry has the queried word as substring:

```
~> MATCH jargon substring program
152 13 matches found
jargon "c programmer's disease"
jargon "cargo cult programming"
jargon "mickey mouse program"
jargon "perfect programmer syndrome"
jargon "program"
[...]
.
250 ok [d/m/c = 0/13/5775; 0.000r 0.000u 0.000s]
```

This command only show which words in the database, if any, satisfy the match,
without showing the definition.  To actually view a definition, one has to
supply the dictionary name to the `DEFINE` command.  Note that, you can also
use `*` for both `DEFINE` and `MATCH` command, which will define/match for all
dictionaries.

```
~> DEFINE * programming
150 3 definitions retrieved
151 "programming" wn "WordNet (r) 3.0 (2006)"
programming
    [...]
.
151 "programming" jargon "The Jargon File (version 4.4.7, 29 Dec 2003)"
programming
 n.

    [...]

.
151 "programming" foldoc "The Free On-line Dictionary of Computing (30 December 2018)"
programming

.
250 ok [d/m/c = 3/0/145; 0.000r 0.000u 0.000s]
```

That's a gist of how to look up words with DICT protocol.  You can find more
commands with:

```
~> HELP
[...]
.
250 ok
```

Finally, to end the session, the command is:

```
~> QUIT
221 bye [d/m/c = 0/0/0; 123.000r 0.000u 0.000s]
```

Note that, the response always ends with a period and a `250 ok`
response---this is equivalent to HTTP's 200 OK---except for `QUIT`. These
response code are defined in [the protocol specification][rfc2229].

Commands other than `HELP` has some additional statistics, though this is
optional.  I figured out that `d` means definitions, `m` means matches, and `s`
is probably the time it took to query (why are they always zero, though?), but
no clues on what `c`, `r`, and `u` mean.  I might check the [source code][dict]
to figure that out, but let's leave it for another time.

[gcide]: https://gcide.gnu.org.ua/
[wn]: https://wordnet.princeton.edu/
[jargon]: http://www.catb.org/~esr/jargon/
[foldoc]: https://foldoc.org/
[vera]: https://savannah.gnu.org/projects/vera
[fd]: https://freedict.org/
[rfc2229]: https://datatracker.ietf.org/doc/html/rfc2229#page-23

### Go

Of course we are not going to make the users type these commands (though it's
not too unintuitive and can be easily remembered).  I chose Go to build the CLI
client, though without any conscious consideration of fitness.  I'm trying out
new things[^0] after all.

From the [doc][go-net], we can figure out how to make a TCP connection.

```go
conn, err := net.Dial("tcp", "golang.org:80")
if err != nil {
	// handle error
}
fmt.Fprintf(conn, "GET / HTTP/1.0\r\n\r\n")
status, err := bufio.NewReader(conn).ReadString('\n')
// ...
```

Let's copy that and replace with DICT command instead of HTTP:

```go
conn, err := net.Dial("tcp", "dict.org:2628")
if err != nil {
	panic(err)
}
defer conn.Close()
buf := bufio.NewReader(conn)
fmt.Fprintf(conn, "MATCH jargon word programming\n")
fmt.Fprintf(conn, "QUIT\n")

for {
	response, err := buf.ReadString('\n')
	if err != nil {
		// oftentimes this is EOF error
		fmt.Println(err)
		break
	}
	fmt.Printf(response)
}
```

Running this code, we get response:

```
220 dict.dict.org dictd 1.12.1/rf on Linux 4.19.0-10-amd64 <auth.mime> <89266600.1914.1642341395@dict.dict.org>
152 4 matches found
jargon "cargo cult programming"
jargon "programming"
jargon "programming fluid"
jargon "voodoo programming"
.
250 ok [d/m/c = 0/4/3814; 0.000r 0.000u 0.000s]
221 bye [d/m/c = 0/0/0; 0.000r 0.000u 0.000s]
EOF
```

which is a good start.

There is a problem with this code: currently we are reading line by line,
rather than reading the whole response for each command.  We can't know if line
3 is response for the first command or the second this way.  A solution is to
check if the line is prefixed with a status code, but do we have a better
solution?

Let's wait till next week!

[go-net]: https://pkg.go.dev/net

[^0]: Not really, I've written a CLI client for Wiktionary API with Go before.