Lightweight Code Retrieval Models
MiniLM-based sentence-transformer models (22M/33M parameters) fine-tuned for domain-specific code retrieval achieving 97% Recall@10
Trained and published lightweight sentence-transformer embedding models optimized for code search (512-dim) to enable fast semantic retrieval over codebases and technical text.
- Fine-tuned a MiniLM-based embedding model for code-to-code / text-to-code retrieval use cases.
- Designed for low-latency similarity search and practical deployment in RAG and developer tooling.
- Packaged and released on Hugging Face for easy integration into embedding pipelines and vector databases.
Results: 97% Recall@10 and 95% MRR@10 on internal benchmarks.
Models (public on Hugging Face):