Extracting structured deal data from noisy WhatsApp chats using LLMs (context handling + overlapping negotiations)
02:37 20 Apr 2026

Problem Description

I am building a system to analyze WhatsApp chats between two parties (typically a seller/distributor and a buyer/dealer) who negotiate bulk deals for smartphones.

The goal is to automatically extract structured data for each deal, such as:

  • brand

  • model

  • variant (RAM/ROM)

  • price

  • quantity

  • color distribution (if available)

  • deal status (offered / negotiating / finalized / abandoned)


Example of a well-formatted message

INFINIX

GT30 Pro 12/256 @ 32 Pcs @ 27750

With Kit

White - 24
Black - 8

Fresh With GST

Today/Tomorrow Dispatch

From this, extraction is straightforward.


Real-world challenges

In actual chats, messages are:

  • unstructured and informal

  • full of spelling mistakes and shorthand

  • split across multiple messages

  • dependent on context

Example:

A: edge 60 available
B: price?
A: 21500
B: 21400 final?
A: qty?
B: 100 pcs

Here, a single deal is spread across multiple messages.


Key challenges

  1. Context across multiple messages

    • Important information (price, quantity, variant) is often split across messages

    • A single message does not contain enough data

  2. Overlapping deals

    • Multiple deals may be discussed in parallel

    • Conversations can switch between products mid-thread

  3. Implicit references

    • Messages like “same as before”, “last price”, “final?” require context tracking
  4. Incomplete or abandoned deals

    • Some deals never conclude explicitly

    • Need to infer status (negotiating vs abandoned)

  5. Role identification

    • No explicit labels for buyer/seller

    • Must infer based on conversation style (who is offering vs negotiating)


Current approach

  • Using WhatsApp Web API (Baileys) to capture messages

  • Sending each message individually to an LLM (GPT-4o-mini) for extraction


Problems with current approach

  • LLM API is stateless → no context across messages

  • Accuracy drops significantly for multi-message negotiations

  • Cannot link messages belonging to the same deal

  • Missing data when information is distributed across messages


What I am trying to solve

  1. How to maintain context efficiently across messages without sending full chat every time?

  2. How to handle overlapping deals within the same conversation?

  3. Best way to structure prompts for incremental extraction (state updates instead of full reprocessing)?

  4. Strategies to reduce API cost while preserving accuracy?

  5. Any existing architectures or patterns for this type of conversational information extraction?


Constraints

  • Chats can be long and noisy

  • High volume of messages (cost matters)

  • Need near real-time processing


Looking for

  • Architecture suggestions

  • Prompt design strategies

  • Context management techniques (sliding window, memory, etc.)

  • Any similar implementations or research references

node.js nlp architecture openai-api chatgpt-api