One of the most persistent problems in construction supply distribution is that different companies name the same product differently. A client sends a PDF with their product list, and our team has to manually cross-reference every item against our internal catalog.

The Problem

A typical client request: a PDF with 50–200 line items, each described in the client's own naming convention. Our catalog might call the same product something completely different.

Manual matching was slow (hours per request), error-prone (wrong matches, missed items), and dependent on experienced employees who knew both naming systems.

The Solution

1. Document Parsing

The PDF is parsed to extract structured product data. This handles various formats: tables, free-form lists, and mixed layouts.

2. AI Matching

Each extracted product description is compared against our internal database using an LLM model. The AI understands that "Cable THHN 12 AWG" and "Alambre #12 THHN" are the same product, even though the names share almost no words.

3. Confidence Scoring

Each match gets a confidence score:

  • High confidence (>90%): Auto-matched, ready for review
  • Medium confidence (60–90%): Suggested match, requires human confirmation
  • Low confidence (<60%): Flagged for manual matching

Technical Stack

  • Python with FastAPI for the backend
  • LLM integration for semantic product matching
  • PostgreSQL for product database
  • PDF parsing libraries for document extraction

Results

What previously took hours of manual work now takes minutes. The system handles the straightforward matches automatically, and our team only needs to review the edge cases.