VowelPrompt: Hearing Speech Emotions from Text via Vowel-level Prosodic Augmentation
📄 VowelPrompt: Hearing Speech Emotions from Text via Vowel-level Prosodic Augmentation #语音情感识别 #强化学习 #多语言 #大语言模型 🔥 8.5/10 | 前25% | #语音情感识别 | #强化学习 | #多语言 #大语言模型 学术质量 6.2/7 | 选题价值 1.8/2 | 复现加成 0.3 | 置信度 高 👥 作者与机构 第一作者:Yancheng Wang(Arizona State University; Meta Superintelligence Labs) 通讯作者:Osama Hanna(Meta Superintelligence Labs,基于邮箱推测) 作者列表: Yancheng Wang (Arizona State University, Meta Superintelligence Labs) Osama Hanna (Meta Superintelligence Labs) Ruiming Xie (Meta Superintelligence Labs) Xianfeng Rui (Meta Superintelligence Labs) Maohao Shen (Massachusetts Institute of Technology; Meta Superintelligence Labs) Xuedong Zhang (Meta Superintelligence Labs) Christian Fuegen (Meta Superintelligence Labs) Jilong Wu (Meta Superintelligence Labs) Debjyoti Paul (Meta Superintelligence Labs) Arthur Guo (Meta Superintelligence Labs) Zhihong Lei (Meta Superintelligence Labs) Ozlem Kalinli (Meta Superintelligence Labs) Qing He (Meta Superintelligence Labs) Yingzhen Yang (Arizona State University) 💡 毒舌点评 亮点在于从语音学常识(元音承载韵律)出发,设计了一套精巧且可解释的“翻译”流程,将隐晦的语音信号转化为LLM能读的文本,比直接灌入黑盒音频嵌入“高级”不少。短板则是其效果高度依赖强制对齐的准确性,对于口音重、背景噪或语速极快的语音,这套“元音显微镜”可能会失灵,且忽略辅音区域可能存在的互补情感线索(如送气、鼻化)。 ...