Nomic Embed - Open Source Embedding

Open Source Embedding Model N AI Processing & RAG

Basic Information

Product Description

Nomic Embed is a fully open-source embedding model series developed by Nomic AI, representing the first fully reproducible open-source embedding model. Nomic Embed Text v1 is the first open-source model to surpass OpenAI's Ada-002 and text-embedding-3-small on the MTEB benchmark with an 8192 context length. The v2 version is the first general-purpose text embedding model to adopt a Mixture of Experts (MoE) architecture, trained on 1.6 billion contrastive learning pairs across ~100 languages.

Core Features/Characteristics

  • Nomic Embed Text v2 (Latest):
  • First MoE architecture text embedding model
  • 8 experts, top-2 routing, only activates 305M/475M parameters during inference
  • 1.6 billion contrastive learning pairs, ~100 languages
  • Excellent performance on BEIR and MIRACL benchmarks, competing with models twice its size
  • Matryoshka representation learning: 768 dimensions can be truncated to 256 dimensions while maintaining embedding quality
  • GGUF format support
  • Nomic Embed Text v1:
  • First fully reproducible open-source embedding model
  • 8192 context length
  • Surpasses OpenAI's Ada-002 and text-embedding-3-small
  • Fully Open Source: Pre-training data, fine-tuning data, training code, and model weights are all open-source
  • Ollama Support: Easy local deployment via Ollama

Business Model

  • Fully Open Source and Free: Apache-2.0 license
  • All Resources Open: Training data, code, and weights are publicly available
  • Nomic Atlas: Nomic AI's data visualization and exploration platform (commercial product)

Target Users

  • Researchers seeking full transparency and reproducibility
  • Developers needing locally deployable embedding models
  • Enterprises with strict open-source license requirements
  • Application developers requiring efficient multilingual embeddings
  • Resource-constrained deployment scenarios (MoE architecture efficiency)

Competitive Advantages

  • Industry's most transparent open-source embedding model (data + code + weights fully open)
  • First MoE architecture embedding model with high inference efficiency
  • Matryoshka representation learning supports flexible dimensions
  • Surpasses OpenAI's commercial models on short-context MTEB
  • Easy local operation via Ollama
  • Apache-2.0 license, business-friendly

Relationship with OpenClaw Ecosystem

Nomic Embed's fully open-source nature aligns closely with OpenClaw's open-source philosophy. The efficient inference of the MoE architecture makes it suitable for running on personal devices, supporting OpenClaw's local deployment scenarios. Matryoshka representation learning allows OpenClaw to flexibly adjust embedding dimensions based on storage and computational resources. The ease of deployment via Ollama also lowers the barrier to entry for OpenClaw users.

External References

Learn more from these authoritative sources: