InternVL - Multimodal Model

Open-source Multimodal Large Language Model I Voice & Memory

Basic Information

Product Description

InternVL is an open-source multimodal large language model series developed by the OpenGVLab team at Shanghai AI Lab. The latest InternVL3.5 (released in August 2025) achieves comprehensive upgrades in reasoning capabilities, deployment efficiency, and general-purpose abilities through Cascade Reinforcement Learning (Cascade RL), Dynamic Visual Resolution Routing (ViR), and a decoupled deployment framework. The flagship InternVL3.5-241B scored 77.7 on the MMMU benchmark, surpassing GPT-5.

Core Features/Characteristics

  • InternVL3.5-241B Flagship: MMMU 77.7, MMStar 77.9, OCRBench 90.7
  • Cascade Reinforcement Learning (Cascade RL): Innovative multi-stage reinforcement learning training method
  • Dynamic Visual Resolution Routing (ViR): Reduced inference latency from 369ms to 91ms for the 38B model
  • Decoupled Deployment Framework: Supports flexible distributed deployment
  • Full Series Models: Complete coverage from 1B to 241B parameters
  • Tool Usage: Supports function calls and tool integration
  • GUI Agent: User interface understanding and operation
  • Industrial Image Analysis: Supports industrial visual inspection and analysis
  • 3D Visual Perception: Three-dimensional spatial understanding capabilities

Business Model

  • Fully Open Source and Free: MIT/Apache 2.0 license
  • Academic-Driven: Research project by Shanghai AI Lab
  • HuggingFace: All model weights available for free download
  • InternLM Platform: Provided through Shanghai AI Lab's InternLM platform

Target Users

  • Multimodal AI researchers
  • Enterprises requiring top-tier open-source VLM
  • Industrial visual inspection and quality control
  • GUI automation and RPA applications
  • 3D understanding and spatial analysis applications

Competitive Advantages

  • MMMU 77.7 surpasses GPT-5, strongest open-source model
  • ViR dynamic resolution routing, improves inference efficiency by 4x
  • Complete coverage from 1B to 241B for various scenarios
  • Unique capabilities in industrial vision and 3D perception
  • Continuous R&D investment by Shanghai AI Lab
  • Fully open-source, MIT/Apache 2.0 license

Relationship with OpenClaw Ecosystem

InternVL3.5 provides OpenClaw with open-source visual understanding capabilities that surpass GPT-5. Its ViR dynamic resolution routing technology significantly optimizes OpenClaw's performance in terms of inference latency. Industrial visual analysis capabilities expand OpenClaw's applications in manufacturing and quality inspection scenarios. The 1B small model can run on edge devices, while the 241B flagship offers top-tier performance via cloud API, covering all deployment scenarios for OpenClaw.

External References

Learn more from these authoritative sources: