about summary refs log tree commit diff
path: root/content/posts/2024-01-05-mayim-sqlite.md
blob: 3ce666a0bbec7deda34f26317737ebe157ede685 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
---
title: "Querying data from SQLite with Mayim"
date: 2024-01-05
lang: en
categories: [ blog ]
tags: [mayim, sqlite, python]
translationKey: "2024-01-05-mayim-sqlite"
---

(No resolution post this year huh? Maybe I'll save for lunar new year.)

So I'm dabbling with [Sanic][sanic] as an asynchronous web framework, where I
learned about [Mayim][mayim], which is like a wrapper for SQL connectors to
"hydrate" the data into models to be validated.  Additionally, it also helps
with separating the SQL queries from the code to call those more elegantly.

[sanic]: https://sanic.dev
[mayim]: https://ahopkins.github.io/mayim/

Using Mayim is really simple:

1. Put your SQL queries inside `queries/` folder
2. Declare models for outputs
3. Write an executor class with methods named the same as the SQL files

These methods will execute the SQL queries and return the result as the model
object.

(See their documentation for more info and code samples)

So far so good.  I'm trying out with a small project as a
playground for this, so I don't want a DBMS, so I'm sticking to SQLite.
That turns out to be some (solvable) troubles because there are stuff that's
specific to each SQL implementation.

Most of its example include only one table, to keep the query simple.  For an
actual app, there are usually several tables, with some one-to-many or
many-to-many relations.  The most complicated query I see in the doc is [an
one-to-one](https://ahopkins.github.io/mayim/guide/pydantic.html#nested-models)
relation.  It was also PostgresQL though, so they're not immediately applicable
for my experiment.  Argh it requires pydantic, another library for this to work
also---I'm expecting to keep the dependencies minimal, but okay.

For this experiment, I have these tables with products, categories, and prices
with both one-to-many and many-to-many relations.  The schema is defined by the
queries below:

```sql
CREATE TABLE product (
  id INTEGER PRIMARY KEY,
  name TEXT NOT NULL,
);

CREATE TABLE price (
  id INTEGER PRIMARY KEY,
  product_id INTEGER NOT NULL,
  unit TEXT NOT NULL,
  price INTEGER NOT NULL,
  FOREIGN KEY (product_id) REFERENCES product (id),
);

CREATE TABLE category (
  id INTEGER PRIMARY KEY,
  name INTEGER TEXT UNIQUE NOT NULL
);

CREATE TABLE product_category (
  id INTEGER PRIMARY KEY,
  product_id INTEGER NOT NULL,
  category_id INTEGER NOT NULL,
  FOREIGN KEY (product_id) REFERENCES product (id),
  FOREIGN KEY (category_id) REFERENCES category (id)
);
```

```python
class Price(BaseModel):
    id: int
    unit: str
    price: int


class Product(BaseModel):
    id: int
    name: str
    prices: list[Price]
```

Let's try with the one-to-many one first---the product-price relation.

(As for why there are multiple prices for a single product: when one buys in
bulk, the price is usually cheaper.)

Normally, one would query it as:

```sql
SELECT product.id, product.name, price, unit
FROM product
JOIN price ON product.id = price.product_id
```

That doesn't work here: we need price as a list of price objects for each
product.  From the example linked above, we learned that nested object requires
turning to JSON, so let's look at [SQLite's document for JSON
functions][sqlite-json].  `json_group_array()` seems to do what we need here,
and we can use `json_object()` to construct the JSON.

[sqlite-json]: https://sqlite.org/json1.html

```sql
SELECT product.id, name, json_group_array(
  json_object('id', price.id, 'price', price, 'unit', unit)
) as prices
FROM product JOIN price
ON product.id = price.product_id
GROUP BY product.id;
```

This turns out to be simpler than the PostgresQL example!

However, when I ran that:

```
pydantic_core._pydantic_core.ValidationError: 1 validation error for Product
prices
  Input should be a valid list [type=list_type, input_value='[{"id":1,"price":8000,...}]', input_type=str]
    For further information visit https://errors.pydantic.dev/2.5/v/list_type
```

It turns out, the SQLite library can only returns string and not JSON.  Maybe
they should support that?  I don't know.  But we can solve that ourselves by
extending pydantic's `BaseModel`.  To save you from inspecting pydantic's
source code, upon instantiation BaseModel takes `**data` as initialization
parameters, and has `model_fields` attributes to store type constraints that we
defined above.  This maps from the field name to an object of type `FieldInfo`,
which contains `annotation`, which is the data type that we need.

Let's customize that to parse the JSON data:

```python
class SQLiteModel(BaseModel):
    """Custom models to parse JSON from SQLite which is returned as string."""
    def __init__(self, /, **data):
        for field, info in self.model_fields.items():
            # To be careful, only try to parse data expected to be list|dict
            if (type(info.annotation) is types.GenericAlias
                    and info.annotation.__origin__ in [list, dict]
                    and type(data[field]) is str):
                data[field] = json.loads(data[field])
            else:
                if type(info.annotation) is types.GenericAlias:
                    print(data[field])
        super().__init__(**data)
```

That should work now.

Now, let's add categories to the products:

```python
class Product(SQLiteModel):
    id: int
    name: str
    prices: list[Price]
    categories: list[str]
```

I'm skipping a bit here because categories has only one data field, though I
probably should write full objects as for prices.

```sql
SELECT product.id, product.name, json_group_array(category.name) as categories, json_group_array(
  json_object('id', price.id, 'price', price, 'unit', unit)
) as prices
FROM product
JOIN price ON product.id = price.product_id
JOIN product_category ON product.id = product_category.product_id
JOIN category ON category.id = product_category.category_id
GROUP BY product.id;
```

Running that would result in rows with duplicated categories like:

```python
Product(id=1, name='energy drink', prices=[Price(id=1, unit='bottle', price=12000), Price(id=2, unit='packets', price=70000)], categories=['drink', 'drink']),
```

This is because I joined three tables at once, and the categories are repeated
the same number of the prices of a product.  If I wrote a full model for
Category, it wouldn't run at all due to duplicated primary key.  To solve this,
simply add `distinct` keyword would work.

So, after this, I found writing bare queries for this small project not too
complicated and the encountered problems quite trivial to solve, but I expect
it to be less so for larger projects, and so appreciate how much
<abbr>ORM</abbr> simplifies the process.  I'll try out some more with this
approach, though.

P.S. Contact me for the test data, which is public but I don't want to publish
on this blog.