InternVL Family

Posts

2025/05/26
Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces
2025/04/11
InternVL3: Advancing Open-Source Multimodal Models with Native Multimodal Pretraining
2025/03/13
VisualPRM: An Effective Process Reward Model for Multimodal Reasoning
2024/12/20
InternVL2.5-MPO: Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
2024/12/05
InternVL2.5: Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
2024/10/25
Mini-InternVL 2.0: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance
2024/10/10
Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
2024/07/31
InternOmni: Extending InternVL with Audio Modality
2024/07/04
InternVL2: Better than the Best—Expanding Performance Boundaries of Open-Source Multimodal Models with the Progressive Scaling Strategy
2024/05/31
ShareGPT-4o: Comprehensive Multimodal Annotations With GPT-4o
2024/05/25
Mini-InternVL 1.5: A Powerful Pocket Multimodal Model with 8% Parameters for 80% Performance
2024/04/30
InternVL 1.5: How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites
2024/02/21
InternVL 1.2: Scaling up LLM to 34B
2024/01/24
InternVL 1.1: Enhance Chinese and OCR Capabilities
2023/12/12
InternVL 1.0: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks