How I Built a Smart Ticket Search System Using PyTorch and GloVe

· Source: NLP on Medium · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Intermediate, short

Summary

A developer built a smart ticket search system using PyTorch and GloVe to find similar customer support tickets from a dataset of 8,469 complaints. The system processes text categories using Label Encoding for priority and One-Hot Encoding for channel, both implemented from scratch. It features a custom TF-IDF component with a regex tokenizer, a 5,000-word vocabulary, and bigram/trigram generators, storing scores as sparse tensors. Semantic understanding is achieved using 300-dimensional GloVe embeddings, with out-of-vocabulary words handled by random normal vectors and TF-IDF weighted averaging. A hybrid search formula combines 40% TF-IDF and 60% GloVe scores. Optimized on dual Kaggle T4 GPUs, the system processes 100 queries in 0.141 seconds, achieving an average of 1.41ms per query and a Precision@5 of 21.10%. An interactive Gradio web app allows users to query and adjust search parameters.

Key takeaway

For NLP Engineers building custom search or recommendation systems, implementing core components like TF-IDF and GloVe from scratch provides deeper algorithmic understanding and fine-grained control. You should consider a hybrid approach (e.g., 40% TF-IDF, 60% GloVe) to balance exact keyword matching with semantic understanding, especially when query speed and accuracy on large datasets are critical.

Key insights

Combining TF-IDF with GloVe embeddings creates a robust hybrid search for semantic and keyword matching.

Principles

Method

Implement custom regex tokenization, build a top-5000 word vocabulary, compute TF-IDF scores, load 300-dim GloVe vectors, and combine TF-IDF and GloVe scores with a 0.4:0.6 weighting for hybrid search.

In practice

Topics

Code references

Best for: AI Student, NLP Engineer, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by NLP on Medium.