In PolarDB for PostgreSQL with pgvector, I have a table containing 10 million rows of vector embeddings. I need to perform hybrid queries that combine vector similarity search with attribute filtering (e.g.,
SELECT *
FROM myTable
WHERE user_name = 'nameabc' AND status = 'active'
ORDER by embedding <=> '[2.0, 3.0 ...]'::vector
LIMIT 100;
Index status
user_name => btree index
embedding => hnsw index
In some case, the attribute filter narrows the dataset to about 300 thousand records. However, there will be some performace issue:
If force using vector index, the query is slow(>10 seconds). The nereast neighor search may get the candidate not qualified with attribuite filter.
If force using B-tree index, (to get the 300K candidates) and then computing exact vector distances for all matches will cost about 1 second, but it maxes out CPU usage.
So I would be happy to hear some suggestions on this hybrid search situation. Or is there a kind of hybrid index(similar as multi column btree but combined with vector and attribute) exists and can be used for query?