MM-Conv: A Multimodal Dataset and Benchmark for Context-Aware Grounding in 3D Dialogue
📄 MM-Conv: A Multimodal Dataset and Benchmark for Context-Aware Grounding in 3D Dialogue #多模态学习 #视觉语言模型 #指代表达定位 #具身对话 ✅ 6.5/10 | 前50% | #跨模态 | #跨模态 | #多模态学习 #视觉语言模型 | arxiv 学术质量 6.5/7 | 影响力 5.5/2 | 可复现性 0.3/2 | 置信度 high 👥 作者与机构 Anna Deichler, Jim O’Regan, Fethiye Irmak Dogan, Lubos Marcinek, Anna Klezovich, Iolanda Leite, and Jonas Beskow KTH Royal Institute of Technology, Stockholm, Sweden {deichler, joregan, fidogan, lubosm, annkle, iolanda, beskow}@kth.se ...