InternVL - Multimodal Model

Open-source Multimodal Large Language Model I Voice & Memory

Basic Information

Product Number: 707
Company/Brand: Shanghai AI Lab / OpenGVLab
Country/Region: China
Official Website: https://internvl.github.io / https://github.com/opengvlab
Type: Open-source Multimodal Large Language Model
License: MIT / Apache 2.0
Release Date: January 2024

Product Description

InternVL is an open-source multimodal large language model series developed by the OpenGVLab team at Shanghai AI Lab. The latest InternVL3.5 (released in August 2025) achieves comprehensive upgrades in reasoning capabilities, deployment efficiency, and general-purpose abilities through Cascade Reinforcement Learning (Cascade RL), Dynamic Visual Resolution Routing (ViR), and a decoupled deployment framework. The flagship InternVL3.5-241B scored 77.7 on the MMMU benchmark, surpassing GPT-5.

Core Features/Characteristics

InternVL3.5-241B Flagship: MMMU 77.7, MMStar 77.9, OCRBench 90.7
Cascade Reinforcement Learning (Cascade RL): Innovative multi-stage reinforcement learning training method
Dynamic Visual Resolution Routing (ViR): Reduced inference latency from 369ms to 91ms for the 38B model
Decoupled Deployment Framework: Supports flexible distributed deployment
Full Series Models: Complete coverage from 1B to 241B parameters
Tool Usage: Supports function calls and tool integration
GUI Agent: User interface understanding and operation
Industrial Image Analysis: Supports industrial visual inspection and analysis
3D Visual Perception: Three-dimensional spatial understanding capabilities

Business Model

Fully Open Source and Free: MIT/Apache 2.0 license
Academic-Driven: Research project by Shanghai AI Lab
HuggingFace: All model weights available for free download
InternLM Platform: Provided through Shanghai AI Lab's InternLM platform

Target Users

Multimodal AI researchers
Enterprises requiring top-tier open-source VLM
Industrial visual inspection and quality control
GUI automation and RPA applications
3D understanding and spatial analysis applications

Competitive Advantages

MMMU 77.7 surpasses GPT-5, strongest open-source model
ViR dynamic resolution routing, improves inference efficiency by 4x
Complete coverage from 1B to 241B for various scenarios
Unique capabilities in industrial vision and 3D perception
Continuous R&D investment by Shanghai AI Lab
Fully open-source, MIT/Apache 2.0 license

Relationship with OpenClaw Ecosystem

InternVL3.5 provides OpenClaw with open-source visual understanding capabilities that surpass GPT-5. Its ViR dynamic resolution routing technology significantly optimizes OpenClaw's performance in terms of inference latency. Industrial visual analysis capabilities expand OpenClaw's applications in manufacturing and quality inspection scenarios. The 1B small model can run on edge devices, while the 241B flagship offers top-tier performance via cloud API, covering all deployment scenarios for OpenClaw.

External References

Learn more from these authoritative sources:

Categories

Top Skills

Topics A-I

Topics L-W

Popular Articles