📄 OmniDenseCap: Scripting Multi-Scene Videos with Time-Aware and Structural Audio-Visual Captions 🔥 8.0/10 | 前25% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递