Video-Robin: Autoregressive Diffusion Planning for Intent-Grounded Video-to-Music Generation
📄 Video-Robin: Autoregressive Diffusion Planning for Intent-Grounded Video-to-Music Generation #音乐生成 #自回归模型 #多模态模型 #基准测试 #音视频 🔥 评分:8.0/10 | arxiv 👥 作者与机构 第一作者:Vaibhavi Lokegaonkar(University of Maryland College Park, USA) 通讯作者:Aryan Vijay Bhosale, Vishnu Raj(根据“Corresponding authors”及邮箱 {vlokegao,aryanvib}@umd.edu 推断,均来自 University of Maryland College Park, USA) 其他作者: Gouthaman KV(University of Maryland College Park, USA) Ramani Duraiswami(University of Maryland College Park, USA) Lie Lu(Dolby Laboratories, USA) Sreyan Ghosh(University of Maryland College Park, USA) Dinesh Manocha(University of Maryland College Park, USA) 💡 毒舌点评 亮点在于巧妙地将自回归模型的“宏观规划”能力和扩散模型的“细节雕刻”能力缝合在一起,解决了视频配乐中“既要懂视频又要听指挥”的痛点,还顺手做了个挺专业的评测基准ReelBench。槽点是缝合的“线”(如FSQ, RITE)都是现成的,而且目前只能给10秒短片配乐,离给一部电影完整配乐的“终极梦想”还有不小的距离,更像是个精致的概念验证版。 ...