Best way to generate embeddings for structured product attributes in B2B ecommerce search?
01:55 04 Feb 2026

I am building a B2B product search system using vector embeddings and would like advice specifically on how to generate embeddings for structured product attributes.

Context

  • Domain: B2B ecommerce

  • Queries: Short keyword-style searches (4 to 5 tokens), often containing numbers, units, and alphanumeric attributes
    Examples:

    • “12 kva diesel generator”

    • “5 hp air compressor”

    • “cnc milling machine 3 axis”

Search architecture

  • Initial candidate retrieval using product title embeddings

  • Reranking using product attribute embeddings

Product data

Each product has a title and a set of structured attributes stored as key-value pairs.

Example:

Product: Diesel Generator

Attributes:

  • “power_rating: 12 kva”

  • “fuel_type: diesel”

  • “phase: 3”

  • “cooling_type: air cooled”

  • “application: industrial backup”

Main question

What is the best way to preprocess and embed these attributes for semantic reranking?

Attribute embedding strategies we are considering

  1. Flat concatenation

    power rating 12 kva fuel type diesel phase 3 cooling type air cooled application industrial backup
    
    
    
  2. Key-value with separators

    power_rating: 12 kva | fuel_type: diesel | phase: 3 | cooling_type: air cooled | application: industrial backup
    
    
    
  3. Line-separated attributes

    power_rating: 12 kva
    fuel_type: diesel
    phase: 3
    cooling_type: air cooled
    application: industrial backup
    
    
    
  4. Natural language passage

    This diesel generator has a power rating of 12 kva, uses diesel fuel, supports 3 phase operation, and is air cooled for industrial backup usage.
    
    
    
  5. Per-attribute embeddings

    • Generate one embedding per attribute and aggregate scores during reranking
  6. Any other recommended method?

Specific questions

  • Should attributes be embedded as a single combined text or as individual attribute embeddings

  • Does explicitly preserving attribute keys help embedding quality

  • Are separator tokens or structured formatting important for short, attribute-heavy queries

  • Any best practices for handling numeric values, units, and alphanumeric attributes

  • Whether passage-style text performs better than structured key-value text for dense retrieval

Model considerations

  • Currently considering Marqo ecommerce embedding (large)

  • Open to recommendations for other models that work well for:

    • Short B2B queries

    • Numeric and unit-heavy matching

    • Attribute-based reranking

nlp embedding information-retrieval vector-search semantic-search