Case: model inference on images
Imagine we have two input tables:
models
indexed bymodel_id
images
indexed byimage_id
We need to run transform model_infence
which result in table
model_inference_for_image
indexed by model_id,image_id
.
Transform (individual tasks to run) is indexed by model_id,image_id
.
Query is built by the following strategy:
- aggregate each input table by the intersection of it's keys and transform keys
- for each input aggregate:
- outer join transform table with input aggregate by intersection of keys
- select rows where update_ts > process_ts
- union all results
- select distinct rows
SQL query to find which tasks should be run looks like:
WITH models__update_ts AS (
SELECT model_id, update_ts
FROM models
),
images__update_ts AS (
SELECT image_id, update_ts
FROM images
)
SELECT
COALESCE(i.image_id, t.image_id) image_id,
COALESCE(i.model_id, t.model_id) model_id
FROM input__update_ts i
OUTER JOIN transform_meta t ON i.image_id = t.image_id AND i.model_id = t.model_id
WHERE i.update_ts > t.process_ts