--- title: "Querying data from SQLite with Mayim" date: 2024-01-05 lang: en categories: [ blog, dev ] tags: [mayim, sqlite, async, python] translationKey: "2024-01-05-mayim-sqlite" --- (No resolution post this year huh? Maybe I'll save for lunar new year.) So I'm dabbling with [Sanic][sanic] as an asynchronous web framework, where I learned about [Mayim][mayim], which is like a wrapper for SQL connectors to "hydrate" the data into models to be validated. Additionally, it also helps with separating the SQL queries from the code to call those more elegantly. [sanic]: https://sanic.dev [mayim]: https://ahopkins.github.io/mayim/ Using Mayim is really simple: 1. Put your SQL queries inside `queries/` folder 2. Declare models for outputs 3. Write an executor class with methods named the same as the SQL files These methods will execute the SQL queries and return the result as the model object. (See their documentation for more info and code samples) So far so good. I'm trying out with a small project as a playground for this, so I don't want a DBMS, so I'm sticking to SQLite. That turns out to be some (solvable) troubles because there are stuff that's specific to each SQL implementation. Most of its example include only one table, to keep the query simple. For an actual app, there are usually several tables, with some one-to-many or many-to-many relations. The most complicated query I see in the doc is [an one-to-one](https://ahopkins.github.io/mayim/guide/pydantic.html#nested-models) relation. It was also PostgresQL though, so they're not immediately applicable for my experiment. Argh it requires pydantic, another library for this to work also---I'm expecting to keep the dependencies minimal, but okay. For this experiment, I have these tables with products, categories, and prices with both one-to-many and many-to-many relations. The schema is defined by the queries below: ```sql CREATE TABLE product ( id INTEGER PRIMARY KEY, name TEXT NOT NULL, ); CREATE TABLE price ( id INTEGER PRIMARY KEY, product_id INTEGER NOT NULL, unit TEXT NOT NULL, price INTEGER NOT NULL, FOREIGN KEY (product_id) REFERENCES product (id), ); CREATE TABLE category ( id INTEGER PRIMARY KEY, name INTEGER TEXT UNIQUE NOT NULL ); CREATE TABLE product_category ( id INTEGER PRIMARY KEY, product_id INTEGER NOT NULL, category_id INTEGER NOT NULL, FOREIGN KEY (product_id) REFERENCES product (id), FOREIGN KEY (category_id) REFERENCES category (id) ); ``` ```python class Price(BaseModel): id: int unit: str price: int class Product(BaseModel): id: int name: str prices: list[Price] ``` Let's try with the one-to-many one first---the product-price relation. (As for why there are multiple prices for a single product: when one buys in bulk, the price is usually cheaper.) Normally, one would query it as: ```sql SELECT product.id, product.name, price, unit FROM product JOIN price ON product.id = price.product_id ``` That doesn't work here: we need price as a list of price objects for each product. From the example linked above, we learned that nested object requires turning to JSON, so let's look at [SQLite's document for JSON functions][sqlite-json]. `json_group_array()` seems to do what we need here, and we can use `json_object()` to construct the JSON. [sqlite-json]: https://sqlite.org/json1.html ```sql SELECT product.id, name, json_group_array( json_object('id', price.id, 'price', price, 'unit', unit) ) as prices FROM product JOIN price ON product.id = price.product_id GROUP BY product.id; ``` This turns out to be simpler than the PostgresQL example! However, when I ran that: ``` pydantic_core._pydantic_core.ValidationError: 1 validation error for Product prices Input should be a valid list [type=list_type, input_value='[{"id":1,"price":8000,...}]', input_type=str] For further information visit https://errors.pydantic.dev/2.5/v/list_type ``` It turns out, the SQLite library can only returns string and not JSON. Maybe they should support that? I don't know. But we can solve that ourselves by extending pydantic's `BaseModel`. To save you from inspecting pydantic's source code, upon instantiation BaseModel takes `**data` as initialization parameters, and has `model_fields` attributes to store type constraints that we defined above. This maps from the field name to an object of type `FieldInfo`, which contains `annotation`, which is the data type that we need. Let's customize that to parse the JSON data: ```python class SQLiteModel(BaseModel): """Custom models to parse JSON from SQLite which is returned as string.""" def __init__(self, /, **data): for field, info in self.model_fields.items(): # To be careful, only try to parse data expected to be list|dict if (type(info.annotation) is types.GenericAlias and info.annotation.__origin__ in [list, dict] and type(data[field]) is str): data[field] = json.loads(data[field]) else: if type(info.annotation) is types.GenericAlias: print(data[field]) super().__init__(**data) ``` That should work now. Now, let's add categories to the products: ```python class Product(SQLiteModel): id: int name: str prices: list[Price] categories: list[str] ``` I'm skipping a bit here because categories has only one data field, though I probably should write full objects as for prices. ```sql SELECT product.id, product.name, json_group_array(category.name) as categories, json_group_array( json_object('id', price.id, 'price', price, 'unit', unit) ) as prices FROM product JOIN price ON product.id = price.product_id JOIN product_category ON product.id = product_category.product_id JOIN category ON category.id = product_category.category_id GROUP BY product.id; ``` Running that would result in rows with duplicated categories like: ```python Product(id=1, name='energy drink', prices=[Price(id=1, unit='bottle', price=12000), Price(id=2, unit='packets', price=70000)], categories=['drink', 'drink']), ``` This is because I joined three tables at once, and the categories are repeated the same number of the prices of a product. If I wrote a full model for Category, it wouldn't run at all due to duplicated primary key. To solve this, simply add `distinct` keyword would work. So, after this, I found writing bare queries for this small project not too complicated and the encountered problems quite trivial to solve, but I expect it to be less so for larger projects, and so appreciate how much ORM simplifies the process. I'll try out some more with this approach, though. P.S. Contact me for the test data, which is public but I don't want to publish on this blog.