about summary refs log tree commit diff
diff options
context:
space:
mode:
-rw-r--r--content/posts/2024-01-05-mayim-sqlite.md197
1 files changed, 197 insertions, 0 deletions
diff --git a/content/posts/2024-01-05-mayim-sqlite.md b/content/posts/2024-01-05-mayim-sqlite.md
new file mode 100644
index 0000000..b1d38c1
--- /dev/null
+++ b/content/posts/2024-01-05-mayim-sqlite.md
@@ -0,0 +1,197 @@
+---
+title: "Querying data from SQLite with Mayim"
+date: 2024-01-05
+lang: en
+categories: [ blog ]
+tags: [mayim, sqlite, python]
+translationKey: "2024-01-05-mayim-sqlite"
+---
+
+(No resolution post this year huh? Maybe I'll save for lunar new year.)
+
+So I'm dabbling with [Sanic][sanic] as an asynchronous web framework, where I
+learned about [Mayim][mayim], which is like a wrapper for SQL connectors to
+"hydrate" the data into models to be validated.  Additionally, it also helps
+with separating the SQL queries from the code to call those more elegantly.
+
+[sanic]: https://sanic.dev
+[mayim]: https://ahopkins.github.io/mayim/
+
+Using Mayim is really simple:
+
+1. Put your SQL queries inside `queries/` folder
+2. Declare models for outputs
+3. Write an executor class with methods named the same as the SQL files
+
+These methods will execute the SQL queries and return the result as the model
+object.
+
+(See their documentation for more info and code samples)
+
+So far so good.  I'm trying out with a small project as a
+playground for this, so I don't want a DBMS, so I'm sticking to SQLite.
+That turns out to be some (solvable) troubles because there are stuff that's
+specific to each SQL implementation.
+
+Most of its example include only one table, to keep the query simple.  For an
+actual app, there are usually several tables, with some one-to-many or
+many-to-many relations.  The most complicated query I see in the doc is [an
+one-to-one](https://ahopkins.github.io/mayim/guide/pydantic.html#nested-models)
+relation.  It was also PostgresQL though, so they're not immediately applicable
+for my experiment.  Argh it requires pydantic, another library for this to work
+also---I'm expecting to keep the dependencies minimal, but okay.
+
+For this experiment, I have these tables with products, categories, and prices
+with both one-to-many and many-to-many relations.  The schema is defined by the
+queries below:
+
+```sql
+CREATE TABLE product (
+  id INTEGER PRIMARY KEY,
+  name TEXT NOT NULL,
+);
+
+CREATE TABLE price (
+  id INTEGER PRIMARY KEY,
+  product_id INTEGER NOT NULL,
+  unit TEXT NOT NULL,
+  price INTEGER NOT NULL,
+  FOREIGN KEY (product_id) REFERENCES product (id),
+);
+
+CREATE TABLE category (
+  id INTEGER PRIMARY KEY,
+  name INTEGER TEXT UNIQUE NOT NULL
+);
+
+CREATE TABLE product_category (
+  id INTEGER PRIMARY KEY,
+  product_id INTEGER NOT NULL,
+  category_id INTEGER NOT NULL,
+  FOREIGN KEY (product_id) REFERENCES product (id),
+  FOREIGN KEY (category_id) REFERENCES category (id)
+);
+```
+
+```python
+class Price(BaseModel):
+    id: int
+    unit: str
+    price: int
+
+
+class Product(BaseModel):
+    id: int
+    name: str
+    prices: list[Price]
+```
+
+Let's try with the one-to-many one first---the product-price relation.
+
+(As for why there are multiple prices for a single product: when one buys in
+bulk, the price is usually cheaper.)
+
+Normally, one would query it as:
+
+```sql
+SELECT product.id, product.name, price, unit
+FROM product
+JOIN price ON product.id = price.product_id
+```
+
+That doesn't work here: we need price as a list of price objects for each
+product.  From the example linked above, we learned that nested object requires
+turning to JSON, so let's look at [SQLite's document for JSON
+functions][sqlite-json].  `json_group_array()` seems to do what we need here,
+and we can use `json_object()` to construct the JSON.
+
+[sqlite-json]: https://sqlite.org/json1.html
+
+```sql
+SELECT product.id, name, json_group_array(
+  json_object('id', price.id, 'price', price, 'unit', unit)
+) as prices
+FROM product JOIN price
+ON product.id = price.product_id
+GROUP BY product.id;
+```
+
+This turns out to be simpler than the PostgresQL example!
+
+However, when I ran that:
+
+```
+pydantic_core._pydantic_core.ValidationError: 1 validation error for Product
+prices
+  Input should be a valid list [type=list_type, input_value='[{"id":1,"price":8000,...}]', input_type=str]
+    For further information visit https://errors.pydantic.dev/2.5/v/list_type
+```
+
+It turns out, the SQLite library can only returns string and not JSON.  Maybe
+they should support that?  I don't know.  But we can solve that ourselves by
+extending pydantic's `BaseModel`.  To save you from inspecting pydantic's
+source code, upon instantiation BaseModel takes `**data` as initialization
+parameters, and has `model_fields` attributes to store type constraints that we
+defined above.  This maps from the field name to an object of type `FieldInfo`,
+which contains `annotation`, which is the data type that we need.
+
+Let's customize that to parse the JSON data:
+
+```python
+class SQLiteModel(BaseModel):
+    """Custom models to parse JSON from SQLite which is returned as string."""
+    def __init__(self, /, **data):
+        for field, info in self.model_fields.items():
+            # To be careful, only try to parse data expected to be list|dict
+            if (type(info.annotation) is types.GenericAlias
+                    and info.annotation.__origin__ in [list, dict]
+                    and type(data[field]) is str):
+                data[field] = json.loads(data[field])
+            else:
+                if type(info.annotation) is types.GenericAlias:
+                    print(data[field])
+        super().__init__(**data)
+```
+
+That should work now.
+
+Now, let's add categories to the products:
+
+```python
+class Product(SQLiteModel):
+    id: int
+    name: str
+    prices: list[Price]
+    categories: list[str]
+```
+
+I'm skipping a bit here because categories has only one data field, though I
+probably should write full objects as for prices.
+
+```sql
+SELECT product.id, product.name, json_group_array(category.name) as categories, json_group_array(
+  json_object('id', price.id, 'price', price, 'unit', unit)
+) as prices
+FROM product
+JOIN price ON product.id = price.product_id
+JOIN product_category ON product.id = product_category.product_id
+JOIN category ON category.id = product_category.category_id
+GROUP BY product.id;
+```
+
+Running that would result in rows with duplicated categories like:
+
+```python
+Product(id=1, name='energy drink', prices=[Price(id=1, unit='bottle', price=12000), Price(id=2, unit='packets', price=70000)], categories=['drink', 'drink']),
+```
+
+This is because I joined three tables at once, and the categories are repeated
+the same number of the prices of a product.  If I wrote a full model for
+Category, it wouldn't run at all due to duplicated primary key.  To solve this,
+simply add `distinct` keyword would work.
+
+So, after this, I found writing bare queries for this small project not too
+complicated and the encountered problems quite trivial to solve, but I expect
+it to be less so for larger projects, and so appreciate how much
+<abbr>ORM</abbr> simplifies the process.  I'll try out some more with this
+approach, though.