A UK-based marketplace experienced major inefficiencies in manual product categorization. Over 20 % of listings were misclassified, degrading search relevance and recommendation quality. They needed a scalable solution to interpret vendor product titles, descriptions, and attributes regardless of language or formatting.
E-Commerce Product Categorization Agent (Under NDA)
A clinical decision-support system that uses a fine-tuned LLM to validate prescriptions, detect drug-to-drug conflicts, and flag duplications before orders reach the pharmacy system.
Let's discuss your project
How we started
Partners since 2023
Services Delivered
LLM agent for taxonomy classification, metadata normalization, active-learning feedback loop
team Composition
1 Tech Lead, 2 ML Engineers, 1 DevOps Engineer, 1 Clinical Advisor
Technology Stack
Why They Chose Us
The client evaluated several off-the-shelf classifiers but found they failed in multilingual, long-tail categories. We proposed a hybrid architecture: rule-based pre-filter + LLM semantic reasoning + active retraining, all within their AWS stack.
Insights
Common Issues We Identified
01 Heterogeneous Input:
legacy systems stored > Vendor CSV uploads with inconsistent field
names and noisy HTML.
02 Low Recall:
Rare categories underrepresented in training data.
03 Cold-Start Problem:
Clinicians ignored > New sellers introduced unseen product types daily.
04 Latency Constraints:
Required < 1 s classification per item
for real-time publishing.
Solutions
What We Did
1. Pre-processing and Schema Alignment
- Built normalization scripts to clean
and standardize vendor data. - Extracted features (title, attributes, materials)
into unified schema.
2. Semantic Categorization Engine
- Designed hybrid pipeline: BM25 keyword filtering → embedding
similarity search → LLM reasoning for disambiguation. - Used vectorized category descriptions to anchor model
predictions to known taxonomy.
3. Active-Learning Loop
- Developed an annotation interface
for category managers. - Misclassifications automatically retrained lightweight
adapters weekly.
4. Scalability & Deployment
- Deployed inference functions on AWS Lambda
with asynchronous batch processing. - Caching common embeddings reduced per-item
latency from 1.8 s → 0.6 s.
Impact and Results
Impact and Results
01/
Classification accuracy ↑ from 71 % → 92 %.
02/
Manual review workload ↓ by 75 %.
03/
CTR on search results ↑ by 18 %.
04/
System now handles > 50 000 new listings daily with minimal human oversight.