From e68e93e3fcb8053ea3f8d839af22b02495036fdc Mon Sep 17 00:00:00 2001 From: Ngô Ngọc Đức Huy Date: Sat, 6 Jan 2024 00:14:25 +0700 Subject: Add new post about mayim --- content/posts/2024-01-05-mayim-sqlite.md | 197 +++++++++++++++++++++++++++++++ 1 file changed, 197 insertions(+) create mode 100644 content/posts/2024-01-05-mayim-sqlite.md diff --git a/content/posts/2024-01-05-mayim-sqlite.md b/content/posts/2024-01-05-mayim-sqlite.md new file mode 100644 index 0000000..b1d38c1 --- /dev/null +++ b/content/posts/2024-01-05-mayim-sqlite.md @@ -0,0 +1,197 @@ +--- +title: "Querying data from SQLite with Mayim" +date: 2024-01-05 +lang: en +categories: [ blog ] +tags: [mayim, sqlite, python] +translationKey: "2024-01-05-mayim-sqlite" +--- + +(No resolution post this year huh? Maybe I'll save for lunar new year.) + +So I'm dabbling with [Sanic][sanic] as an asynchronous web framework, where I +learned about [Mayim][mayim], which is like a wrapper for SQL connectors to +"hydrate" the data into models to be validated. Additionally, it also helps +with separating the SQL queries from the code to call those more elegantly. + +[sanic]: https://sanic.dev +[mayim]: https://ahopkins.github.io/mayim/ + +Using Mayim is really simple: + +1. Put your SQL queries inside `queries/` folder +2. Declare models for outputs +3. Write an executor class with methods named the same as the SQL files + +These methods will execute the SQL queries and return the result as the model +object. + +(See their documentation for more info and code samples) + +So far so good. I'm trying out with a small project as a +playground for this, so I don't want a DBMS, so I'm sticking to SQLite. +That turns out to be some (solvable) troubles because there are stuff that's +specific to each SQL implementation. + +Most of its example include only one table, to keep the query simple. For an +actual app, there are usually several tables, with some one-to-many or +many-to-many relations. The most complicated query I see in the doc is [an +one-to-one](https://ahopkins.github.io/mayim/guide/pydantic.html#nested-models) +relation. It was also PostgresQL though, so they're not immediately applicable +for my experiment. Argh it requires pydantic, another library for this to work +also---I'm expecting to keep the dependencies minimal, but okay. + +For this experiment, I have these tables with products, categories, and prices +with both one-to-many and many-to-many relations. The schema is defined by the +queries below: + +```sql +CREATE TABLE product ( + id INTEGER PRIMARY KEY, + name TEXT NOT NULL, +); + +CREATE TABLE price ( + id INTEGER PRIMARY KEY, + product_id INTEGER NOT NULL, + unit TEXT NOT NULL, + price INTEGER NOT NULL, + FOREIGN KEY (product_id) REFERENCES product (id), +); + +CREATE TABLE category ( + id INTEGER PRIMARY KEY, + name INTEGER TEXT UNIQUE NOT NULL +); + +CREATE TABLE product_category ( + id INTEGER PRIMARY KEY, + product_id INTEGER NOT NULL, + category_id INTEGER NOT NULL, + FOREIGN KEY (product_id) REFERENCES product (id), + FOREIGN KEY (category_id) REFERENCES category (id) +); +``` + +```python +class Price(BaseModel): + id: int + unit: str + price: int + + +class Product(BaseModel): + id: int + name: str + prices: list[Price] +``` + +Let's try with the one-to-many one first---the product-price relation. + +(As for why there are multiple prices for a single product: when one buys in +bulk, the price is usually cheaper.) + +Normally, one would query it as: + +```sql +SELECT product.id, product.name, price, unit +FROM product +JOIN price ON product.id = price.product_id +``` + +That doesn't work here: we need price as a list of price objects for each +product. From the example linked above, we learned that nested object requires +turning to JSON, so let's look at [SQLite's document for JSON +functions][sqlite-json]. `json_group_array()` seems to do what we need here, +and we can use `json_object()` to construct the JSON. + +[sqlite-json]: https://sqlite.org/json1.html + +```sql +SELECT product.id, name, json_group_array( + json_object('id', price.id, 'price', price, 'unit', unit) +) as prices +FROM product JOIN price +ON product.id = price.product_id +GROUP BY product.id; +``` + +This turns out to be simpler than the PostgresQL example! + +However, when I ran that: + +``` +pydantic_core._pydantic_core.ValidationError: 1 validation error for Product +prices + Input should be a valid list [type=list_type, input_value='[{"id":1,"price":8000,...}]', input_type=str] + For further information visit https://errors.pydantic.dev/2.5/v/list_type +``` + +It turns out, the SQLite library can only returns string and not JSON. Maybe +they should support that? I don't know. But we can solve that ourselves by +extending pydantic's `BaseModel`. To save you from inspecting pydantic's +source code, upon instantiation BaseModel takes `**data` as initialization +parameters, and has `model_fields` attributes to store type constraints that we +defined above. This maps from the field name to an object of type `FieldInfo`, +which contains `annotation`, which is the data type that we need. + +Let's customize that to parse the JSON data: + +```python +class SQLiteModel(BaseModel): + """Custom models to parse JSON from SQLite which is returned as string.""" + def __init__(self, /, **data): + for field, info in self.model_fields.items(): + # To be careful, only try to parse data expected to be list|dict + if (type(info.annotation) is types.GenericAlias + and info.annotation.__origin__ in [list, dict] + and type(data[field]) is str): + data[field] = json.loads(data[field]) + else: + if type(info.annotation) is types.GenericAlias: + print(data[field]) + super().__init__(**data) +``` + +That should work now. + +Now, let's add categories to the products: + +```python +class Product(SQLiteModel): + id: int + name: str + prices: list[Price] + categories: list[str] +``` + +I'm skipping a bit here because categories has only one data field, though I +probably should write full objects as for prices. + +```sql +SELECT product.id, product.name, json_group_array(category.name) as categories, json_group_array( + json_object('id', price.id, 'price', price, 'unit', unit) +) as prices +FROM product +JOIN price ON product.id = price.product_id +JOIN product_category ON product.id = product_category.product_id +JOIN category ON category.id = product_category.category_id +GROUP BY product.id; +``` + +Running that would result in rows with duplicated categories like: + +```python +Product(id=1, name='energy drink', prices=[Price(id=1, unit='bottle', price=12000), Price(id=2, unit='packets', price=70000)], categories=['drink', 'drink']), +``` + +This is because I joined three tables at once, and the categories are repeated +the same number of the prices of a product. If I wrote a full model for +Category, it wouldn't run at all due to duplicated primary key. To solve this, +simply add `distinct` keyword would work. + +So, after this, I found writing bare queries for this small project not too +complicated and the encountered problems quite trivial to solve, but I expect +it to be less so for larger projects, and so appreciate how much +ORM simplifies the process. I'll try out some more with this +approach, though. -- cgit 1.4.1