E-Commerce
Product Categorization Agent
(Under NDA)

A clinical decision-support system that uses a fine-tuned LLM to validate prescriptions, detect drug-to-drug conflicts, and flag duplications before orders reach the pharmacy system.
Let's discuss your project
legaltech
Industry
Partnership
Since 2024
Team size
6 Engineers
How we started

A UK-based marketplace experienced major inefficiencies in manual product categorization. Over 20 % of listings were misclassified, degrading search relevance and recommendation quality. They needed a scalable solution to interpret vendor product titles, descriptions, and attributes regardless of language or formatting.

Partners since 2023
Services Delivered
team Composition
Technology Stack
logo logo logo logo

Why They Chose Us

The client evaluated several off-the-shelf classifiers but found they failed in multilingual, long-tail categories. We proposed a hybrid architecture: rule-based pre-filter + LLM semantic reasoning + active retraining, all within their AWS stack.

image
https://djangostars.com/wp-content/uploads/2025/04/badges-6.svg https://djangostars.com/wp-content/uploads/2025/04/badges-5-1.svg https://djangostars.com/wp-content/uploads/2025/04/badges-4-1.svg
Insights

Common Issues
We Identified

01 Heterogeneous Input:
legacy systems stored > Vendor CSV uploads with inconsistent field
names and noisy HTML.

02 Low Recall:
Rare categories underrepresented in training data.

03 Cold-Start Problem:
Clinicians ignored > New sellers introduced unseen product types daily.

04 Latency Constraints:
Required < 1 s classification per item for real-time publishing.

Solutions

What We Did

1. Pre-processing and Schema Alignment

  • Built normalization scripts to clean
    and standardize vendor data.
  • Extracted features (title, attributes, materials)
    into unified schema.

2. Semantic Categorization Engine

  • Designed hybrid pipeline: BM25 keyword filtering → embedding
    similarity search → LLM reasoning for disambiguation.
  • Used vectorized category descriptions to anchor model
    predictions to known taxonomy.

3. Active-Learning Loop

  • Developed an annotation interface
    for category managers.
  • Misclassifications automatically retrained lightweight
    adapters weekly.

4. Scalability & Deployment

  • Deployed inference functions on AWS Lambda
    with asynchronous batch processing.
  • Caching common embeddings reduced per-item
    latency from 1.8 s → 0.6 s.
Impact and Results

Impact and Results

01/
Classification accuracy ↑ from 71 % → 92 %.
02/
Manual review workload ↓ by 75 %.
03/
CTR on search results ↑
by 18 %.
04/
System now handles > 50 000 new listings daily with minimal human oversight.