Problem Description
I am building a system to analyze WhatsApp chats between two parties (typically a seller/distributor and a buyer/dealer) who negotiate bulk deals for smartphones.
The goal is to automatically extract structured data for each deal, such as:
brand
model
variant (RAM/ROM)
price
quantity
color distribution (if available)
deal status (offered / negotiating / finalized / abandoned)
Example of a well-formatted message
INFINIX
GT30 Pro 12/256 @ 32 Pcs @ 27750
With Kit
White - 24
Black - 8
Fresh With GST
Today/Tomorrow Dispatch
From this, extraction is straightforward.
Real-world challenges
In actual chats, messages are:
unstructured and informal
full of spelling mistakes and shorthand
split across multiple messages
dependent on context
Example:
A: edge 60 available
B: price?
A: 21500
B: 21400 final?
A: qty?
B: 100 pcs
Here, a single deal is spread across multiple messages.
Key challenges
Context across multiple messages
Important information (price, quantity, variant) is often split across messages
A single message does not contain enough data
Overlapping deals
Multiple deals may be discussed in parallel
Conversations can switch between products mid-thread
Implicit references
- Messages like “same as before”, “last price”, “final?” require context tracking
Incomplete or abandoned deals
Some deals never conclude explicitly
Need to infer status (negotiating vs abandoned)
Role identification
No explicit labels for buyer/seller
Must infer based on conversation style (who is offering vs negotiating)
Current approach
Using WhatsApp Web API (Baileys) to capture messages
Sending each message individually to an LLM (GPT-4o-mini) for extraction
Problems with current approach
LLM API is stateless → no context across messages
Accuracy drops significantly for multi-message negotiations
Cannot link messages belonging to the same deal
Missing data when information is distributed across messages
What I am trying to solve
How to maintain context efficiently across messages without sending full chat every time?
How to handle overlapping deals within the same conversation?
Best way to structure prompts for incremental extraction (state updates instead of full reprocessing)?
Strategies to reduce API cost while preserving accuracy?
Any existing architectures or patterns for this type of conversational information extraction?
Constraints
Chats can be long and noisy
High volume of messages (cost matters)
Need near real-time processing
Looking for
Architecture suggestions
Prompt design strategies
Context management techniques (sliding window, memory, etc.)
Any similar implementations or research references