InternVL - Multimodal Model
Basic Information
- Product Number: 707
- Company/Brand: Shanghai AI Lab / OpenGVLab
- Country/Region: China
- Official Website: https://internvl.github.io / https://github.com/opengvlab
- Type: Open-source Multimodal Large Language Model
- License: MIT / Apache 2.0
- Release Date: January 2024
Product Description
InternVL is an open-source multimodal large language model series developed by the OpenGVLab team at Shanghai AI Lab. The latest InternVL3.5 (released in August 2025) achieves comprehensive upgrades in reasoning capabilities, deployment efficiency, and general-purpose abilities through Cascade Reinforcement Learning (Cascade RL), Dynamic Visual Resolution Routing (ViR), and a decoupled deployment framework. The flagship InternVL3.5-241B scored 77.7 on the MMMU benchmark, surpassing GPT-5.
Core Features/Characteristics
- InternVL3.5-241B Flagship: MMMU 77.7, MMStar 77.9, OCRBench 90.7
- Cascade Reinforcement Learning (Cascade RL): Innovative multi-stage reinforcement learning training method
- Dynamic Visual Resolution Routing (ViR): Reduced inference latency from 369ms to 91ms for the 38B model
- Decoupled Deployment Framework: Supports flexible distributed deployment
- Full Series Models: Complete coverage from 1B to 241B parameters
- Tool Usage: Supports function calls and tool integration
- GUI Agent: User interface understanding and operation
- Industrial Image Analysis: Supports industrial visual inspection and analysis
- 3D Visual Perception: Three-dimensional spatial understanding capabilities
Business Model
- Fully Open Source and Free: MIT/Apache 2.0 license
- Academic-Driven: Research project by Shanghai AI Lab
- HuggingFace: All model weights available for free download
- InternLM Platform: Provided through Shanghai AI Lab's InternLM platform
Target Users
- Multimodal AI researchers
- Enterprises requiring top-tier open-source VLM
- Industrial visual inspection and quality control
- GUI automation and RPA applications
- 3D understanding and spatial analysis applications
Competitive Advantages
- MMMU 77.7 surpasses GPT-5, strongest open-source model
- ViR dynamic resolution routing, improves inference efficiency by 4x
- Complete coverage from 1B to 241B for various scenarios
- Unique capabilities in industrial vision and 3D perception
- Continuous R&D investment by Shanghai AI Lab
- Fully open-source, MIT/Apache 2.0 license
Relationship with OpenClaw Ecosystem
InternVL3.5 provides OpenClaw with open-source visual understanding capabilities that surpass GPT-5. Its ViR dynamic resolution routing technology significantly optimizes OpenClaw's performance in terms of inference latency. Industrial visual analysis capabilities expand OpenClaw's applications in manufacturing and quality inspection scenarios. The 1B small model can run on edge devices, while the 241B flagship offers top-tier performance via cloud API, covering all deployment scenarios for OpenClaw.
External References
Learn more from these authoritative sources: