XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models
📄 XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models #基准测试 #多模态模型 #音频问答 #跨模态 #模型评估 ✅ 7.5/10 | 前25% | #基准测试 | #多模态模型 | #音频问答 #跨模态 学术质量 6.5/7 | 选题价值 1.8/2 | 复现加成 0.7 | 置信度 高 👥 作者与机构 第一作者:Xingrui Wang (1. Advanced Micro Devices, 2. Johns Hopkins University) 通讯作者:Jiang Liu (Advanced Micro Devices) 作者列表:Xingrui Wang (Advanced Micro Devices, Johns Hopkins University), Jiang Liu (Advanced Micro Devices), Chao Huang (Advanced Micro Devices, University of Rochester), Xiaodong Yu (Advanced Micro Devices), Ze Wang (Advanced Micro Devices), Ximeng Sun (Advanced Micro Devices), Jialian Wu (Advanced Micro Devices), Alan Yuille (Johns Hopkins University), Emad Barsoum (Advanced Micro Devices), Zicheng Liu (Advanced Micro Devices) 💡 毒舌点评 亮点: 基准设计极其系统且具有诊断性,通过“模态平衡”的六种排列组合,像精密仪器一样能测量出模型对不同模态的“偏科”程度,这是超越简单平均分的深度评测。 短板: 论文将最强的闭源模型(Gemini)作为标杆,但自身并未提出新的模型或算法,因此更像一份详尽的“体检报告”而非“治疗方案”;同时,尽管承诺开源,但评测完全依赖现有模型,缺乏对新模型训练的直接指导细节。 ...