I have a table partitioned by the update column, and I want to use id as the upsert key. When the existing row and the incoming row share the same id but have different update values (landing in different partitions), the upsert inserts a duplicate rather than updating.
Reproduction:
db = database("dfs://test_upsert", VALUE, 2024.01.01..2024.01.03)
t = db.createPartitionedTable(
table(1:0, `id`update`val, [INT, DATE, INT]),
`t, `update
)
t.append!(table(1 as id, 2024.01.01 as update, 100 as val))
select * from t
// id=1, update=2024-01-01, val=100
upsert!(t, table(1 as id, 2024.01.02 as update, 200 as val), keyColNames=`id)
select * from t
// id=1, update=2024-01-01, val=100
// id=1, update=2024-01-02, val=200 ← inserted, not updated
I tried including both id and update in keyColNames, but that changes the semantics — I want every row with the same id to be updated regardless of update. When the incoming row lands in the same partition as the existing one, upsert works correctly.
Does upsert! only look for matching rows within the partition the new row belongs to? Is there a way to do a cross-partition upsert by a non-partition-key column?