Nomic Embed - Open Source Embeddings

Fully Open Source Embedding Model N AI Processing & RAG

Basic Information

Company/Brand: Nomic AI
Country/Region: USA (New York)
Official Website: https://www.nomic.ai
GitHub: https://huggingface.co/nomic-ai
Type: Fully Open Source Embedding Model
Latest Version: nomic-embed-text-v2-moe
Open Source License: Apache License 2.0

Product Description

Nomic Embed is a series of fully open-source embedding models launched by Nomic AI, centered around the core concept of "fully reproducible"—open-sourcing model weights, training data, and training code to ensure transparency and reproducibility in research. The latest Nomic Embed Text V2 is the world's first general-purpose text embedding model using the MoE (Mixture of Experts) architecture, trained on 1.6 billion contrastive pairs across approximately 100 languages.

Core Features/Characteristics

Nomic Embed Text V2 (MoE)

MoE Architecture: First MoE general-purpose text embedding model
Efficient Inference: Alternating MoE layers, top-2 routing with 8 experts, activating only 305M/475M parameters during inference
Multilingual: Supports approximately 100 languages
Matryoshka Representation: Supports dimensionality reduction from 768 to 256
Strong Performance: Excels on BEIR and MIRACL benchmarks, comparable to models with twice the parameters
Fully Open Source: Weights, training data, and training code are all publicly available

Nomic Embed Text V1

First Fully Reproducible: The first fully reproducible open-source English embedding model with an 8192 context length
Surpasses OpenAI: Outperformed Ada-002 and text-embedding-3-small at the time of release
8192 Context Length: Supports long documents

Technical Specifications

Feature	V2 (MoE)	V1
Parameters	475M (305M activated)	137M
Max Dimensions	768	768
Trimmable Dimensions	256	256
Context Length	8192	8192
Language Support	~100 languages	English
Architecture	MoE	Dense

Business Model

Fully Free and Open Source: Apache License 2.0
Nomic Atlas: Visualization and data exploration platform (commercial product)
Ollama Integration: Can be run locally directly via Ollama
Free Deployment: No commercial usage restrictions

Target Users

Open-source community developers
Researchers requiring full transparency and reproducibility
Privacy-sensitive scenarios requiring local deployment
Small teams with limited resources but in need of high-quality embeddings
Developers of multilingual retrieval systems

Competitive Advantages

The only fully open-source (weights + data + code) high-quality embedding model
Innovative MoE architecture with high inference efficiency (only 64% parameters activated)
Apache 2.0 license with no commercial restrictions
Performance comparable to models with twice the parameters
Supports one-click local running via Ollama
Multilingual capabilities across approximately 100 languages
Reproducible training process ensures transparency

Limitations

Still lags behind commercial models (Cohere, OpenAI) on some benchmarks
V1 only supports English
Lacks commercial-grade API and technical support
Smaller community compared to models like BGE

Relationship with the OpenClaw Ecosystem

Nomic Embed is an ideal choice for OpenClaw's local embeddings. Its fully open-source philosophy aligns perfectly with OpenClaw's open-source positioning. Through Ollama, it can be run locally with a single click, requiring no API calls or network connections, perfectly protecting user privacy. The efficient inference of the MoE architecture makes it suitable for running on personal devices. The Apache 2.0 license also ensures that OpenClaw can freely integrate and distribute it.

External References

Learn more from these authoritative sources:

Categories

Top Skills

Topics A-I

Topics L-W

Popular Articles