临床应用 | 语音/音乐/音频论文速递

📄 Psychologically-Grounded Graph Modeling for Interpretable Depression Detection #语音情感识别 #图神经网络 #数据增强 #可解释AI #临床应用 🔥 8.0/10 | 前25% | #语音情感识别 | #图神经网络 | #数据增强 #可解释AI | arxiv 学术质量 6.0/7 | 选题价值 1.5/2 | 复现加成 0.5 | 置信度高 👥 作者与机构第一作者：Rishitej Reddy Vyalla (与Kritarth Prasad贡献相等) 通讯作者：未说明作者列表：Rishitej Reddy Vyalla（IIIT Delhi），Kritarth Prasad（IIIT Delhi），Avinash Anand（Singapore Institute of Technology），Erik Cambria（Singapore Institute of Technology；Nanyang Technological University；ELLIS Institute Finland；University of Turku），Shaoxiong Ji（未说明），Faten S. Alamri（Princess Nourah bint Abdulrahman University），Zhengkui Wang（未说明） 💡 毒舌点评论文的亮点在于其临床心理学理论与图神经网络建模的扎实结合，提出的“心理表达单元”和人格感知上下文为抑郁症检测提供了有临床意义的解释性框架。但其短板也很明显：数据增强的“有效性”和“安全性”高度依赖人工验证（未提供量化结果）与LLM生成质量，且声称“超越GPT-5”的结论在缺乏更严格、更多样化基准测试的情况下，说服力有待商榷。 ...

语音/音乐/音频论文速递 2026-04-28 共分析 24 篇论文 ⚡ 今日概览 📥 抓取 24 篇 → 🔬 深度分析完成 🏷️ 热门方向方向数量分布 #语音合成 2篇 ██ #语音伪造检测 2篇 ██ #音视频 1篇 █ #音频大模型 1篇 █ #语音生物标志物 1篇 █ #语音生成 1篇 █ #语音情感识别 1篇 █ #图神经网络 1篇 █ 📊 论文评分排行榜（24 篇，按分数降序）排名论文评分分档主任务 🥇 Hallo-Live: Real-Time Streaming Joint Audio-Video Avata 8.5分前25% #音视频 🥈 HeadRouter: Dynamic Head-Weight Routing for Task-Adapti 8.0分前25% #音频大模型 🥉 Comparison of sEMG Encoding Accuracy Across Speech Mode 8.0分前25% #语音生物标志物 4. Scaling Properties of Continuous Diffusion Spoken Langu 8.0分前25% #语音生成 5. Psychologically-Grounded Graph Modeling for Interpretab 8.0分前25% #语音情感识别 6. Latent-Hysteresis Graph ODEs: Modeling Coupled Topology 8.0分前25% #图神经网络 7. Meta-Ensemble Learning with Diverse Data Splits for Imp 8.0分前25% #音频分类 8. CineAGI: Character-Consistent Movie Creation through LL 8.0分前25% #跨模态 9. Listening with Time: Precise Temporal Awareness for Lon 8.0分前25% #音频场景理解 10. An event-based sequence modeling approach to recognizin 7.5分前25% #音乐理解 11. Speech Enhancement Based on Drifting Models 7.5分前25% #语音增强 12. Talker-T2AV: Joint Talking Audio-Video Generation with 7.5分前25% #语音合成 13. Explainable AI in Speaker Recognition – Making Latent 7.5分前25% #说话人识别 14. Predictive Directional Selective Fixed-Filter Active No 7.5分前25% #声源定位 15. RAS: a Reliability Oriented Metric for Automatic Speech 7.5分前25% #语音识别 16. Robust Audio-Text Retrieval via Cross-Modal Attention a 7.5分前25% #音频检索 17. RTCFake: Speech Deepfake Detection in Real-Time Communi 7.0分前25% #语音伪造检测 18. MAGIC-TTS: Fine-Grained Controllable Speech Synthesis w 7.0分前25% #语音合成 19. TTS-PRISM: A Perceptual Reasoning and Interpretable Spe 7.0分前25% #语音合成评估 20. All That Glitters Is Not Audio: Rethinking Text Priors 6.5分前50% #音频问答 21. Opening the Design Space: Two Years of Performance with 6.5分前50% #音乐生成 22. Spectro-Temporal Modulation Representation Framework fo 6.5分前50% #语音伪造检测 23. Come Together: Analyzing Popular Songs Through Statisti 6.5分前50% #音乐信息检索 24. A Functorial Formulation of Neighborhood Aggregating De 6.5分前25% #理论分析 📋 论文列表 🥇 Hallo-Live: Real-Time Streaming Joint Audio-Video Avatar Generation with Asynchronous Dual-Stream and Human-Centric Preference Distillation 🔥 8.5/10 | 前25% | #音视频 | #扩散模型 | #知识蒸馏 #流式处理 | arxiv ...