BGE (BAAI) - Chinese Embedding Model

Open-source embedding model series B AI Processing & RAG

Basic Information

Product Description

BGE (BAAI General Embedding) is an open-source embedding model series developed by the Beijing Academy of Artificial Intelligence, leading the field of Chinese embedding models. Its flagship product, BGE-M3, is the first embedding model to simultaneously support three retrieval functions (dense, multi-vector, and sparse retrieval), supports 100+ languages, and allows input lengths of up to 8192 tokens. It has achieved state-of-the-art results in multilingual (MIRACL) and cross-lingual (MKQA) benchmarks.

Core Features

  • BGE-M3 (Flagship Model):
  • Three-in-one retrieval: dense retrieval + multi-vector retrieval + sparse retrieval
  • Support for 100+ languages
  • Maximum input length of 8192 tokens
  • Based on XLM-RoBERTa architecture
  • SOTA in MIRACL and MKQA benchmarks
  • Trained via self-knowledge distillation
  • BGE-VL (March 2025): Multimodal embedding model supporting visual search
  • BGE Series: bge-large-zh, bge-base-zh, bge-small-zh, and other Chinese-specific models
  • FlagEmbedding Toolkit: Comprehensive retrieval and retrieval-augmented LLM tools

Business Model

  • Completely Open Source and Free: MIT license
  • Multi-platform Availability: Hugging Face, Ollama, NVIDIA NIM, DeepInfra, etc.
  • No API Fees: Can be run locally or on your own infrastructure for free

Target Users

  • Developers of Chinese NLP and RAG applications
  • Builders of multilingual search systems
  • Teams requiring locally deployed embedding models
  • Academic researchers and open-source communities
  • Technical teams pursuing hybrid retrieval (dense + sparse)

Competitive Advantages

  • Industry-leading Chinese embedding quality
  • Unique three-in-one retrieval capability (dense + multi-vector + sparse)
  • Support for 100+ languages, strong multilingual and cross-lingual capabilities
  • Completely open source and free, with an active community
  • Supports ultra-long inputs (8192 tokens)
  • Easy local deployment via Ollama

Model Matrix

ModelParametersDimensionsLanguageFeatures
BGE-M3~568MVariable100+Three-in-one retrieval flagship
BGE-VL--MultilingualMultimodal visual search
bge-large-zh326M1024ChineseLarge Chinese-specific model
bge-base-zh102M768ChineseBalanced Chinese model
bge-small-zh24M512ChineseLightweight Chinese model

Relationship with the OpenClaw Ecosystem

BGE is the preferred embedding model for Chinese RAG capabilities within the OpenClaw ecosystem. For Chinese users, BGE-M3's Chinese embedding quality and support for 100+ languages enable OpenClaw agents to handle Chinese document retrieval with high quality. BGE's completely open-source and local deployment capabilities also align with OpenClaw's privacy-focused design philosophy. The three-in-one retrieval capability provides OpenClaw with more flexible retrieval strategy options.

External References

Learn more from these authoritative sources: