Building a 7× Faster Vector Search Engine in Go
Summary
An HNSW (Hierarchical Navigable Small World) vector search engine implemented in Go achieved a 7x speedup, reducing build time from 39.5 seconds to 5.7 seconds, and cutting memory allocations by 84%. The optimization process involved six stages, starting with a naive Go translation of the HNSW algorithm. Key improvements included simplifying cosine similarity to a dot product by pre-normalizing vectors, switching from float64 to float32 for vector operations to leverage SIMD, and optimizing memory management through object pooling and a generation-based visited tracking system. While the Go implementation, using OpenBLAS via cgo, was 38% slower than the C++/Python HNSWLib, it demonstrated Go's capability for compute-heavy tasks when rigorously profiled and optimized. The project used the coco-i2i-512d dataset with 10k vectors on an Intel Core Ultra 7-155h.
Key takeaway
For Machine Learning Engineers building or optimizing vector search systems, prioritize profiling to pinpoint actual bottlenecks rather than guessing. Focus first on algorithmic improvements, such as simplifying distance calculations, before moving to micro-optimizations. Additionally, pay close attention to memory allocation patterns in Go, as reducing GC pressure through techniques like object pooling and generation counters can significantly improve performance, even if Go's cgo overhead still presents a challenge against highly optimized C++ libraries.
Key insights
Systematic profiling and algorithmic optimization are crucial for high-performance Go applications.
Principles
- Profile before optimizing.
- Algorithmic changes yield largest gains.
- Memory management impacts Go performance.
Method
Build a naive implementation, profile to identify bottlenecks, apply algorithmic and data type optimizations, then address memory allocation and garbage collection, finally refining with data structure choices like max-heaps.
In practice
- Pre-normalize vectors for dot product similarity.
- Use float32 for vector operations where precision allows.
- Implement object pooling for frequent allocations.
Topics
- HNSW Algorithm
- Go Performance Optimization
- Vector Search Engines
- BLAS Libraries
- Memory Management
Code references
Best for: Software Engineer, Machine Learning Engineer, AI Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by AI Advances - Medium.