📄 JAEGER: Joint 3D Audio-Visual Grounding and Reasoning in Simulated Physical Environments ✅ 7.0/10 | 前50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递