Best way to generate embeddings for structured product attributes in B2B ecommerce search?

01:55 04 Feb 2026

I am building a B2B product search system using vector embeddings and would like advice specifically on how to generate embeddings for structured product attributes.

Context

Domain: B2B ecommerce
Queries: Short keyword-style searches (4 to 5 tokens), often containing numbers, units, and alphanumeric attributes
Examples:
- “12 kva diesel generator”
- “5 hp air compressor”
- “cnc milling machine 3 axis”

Search architecture

Initial candidate retrieval using product title embeddings
Reranking using product attribute embeddings

Product data

Each product has a title and a set of structured attributes stored as key-value pairs.

Example:

Product: Diesel Generator

Attributes:

“power_rating: 12 kva”
“fuel_type: diesel”
“phase: 3”
“cooling_type: air cooled”
“application: industrial backup”

Main question

What is the best way to preprocess and embed these attributes for semantic reranking?

Attribute embedding strategies we are considering

Flat concatenation

power rating 12 kva fuel type diesel phase 3 cooling type air cooled application industrial backup

Key-value with separators

power_rating: 12 kva | fuel_type: diesel | phase: 3 | cooling_type: air cooled | application: industrial backup

Line-separated attributes

power_rating: 12 kva
fuel_type: diesel
phase: 3
cooling_type: air cooled
application: industrial backup

Natural language passage

This diesel generator has a power rating of 12 kva, uses diesel fuel, supports 3 phase operation, and is air cooled for industrial backup usage.

Per-attribute embeddings
- Generate one embedding per attribute and aggregate scores during reranking
Any other recommended method?

Specific questions

Should attributes be embedded as a single combined text or as individual attribute embeddings
Does explicitly preserving attribute keys help embedding quality
Are separator tokens or structured formatting important for short, attribute-heavy queries
Any best practices for handling numeric values, units, and alphanumeric attributes
Whether passage-style text performs better than structured key-value text for dense retrieval

Model considerations

Currently considering Marqo ecommerce embedding (large)
Open to recommendations for other models that work well for:
- Short B2B queries
- Numeric and unit-heavy matching
- Attribute-based reranking

nlp embedding information-retrieval vector-search semantic-search

Your Answer

Privacy & Cookie Consent