医疗音频 | 语音/音乐/音频论文速递

Toward Generalizable Cognitive Impairment Detection with Speech-Based Multimodal Large Language Models

📄 Toward Generalizable Cognitive Impairment Detection with Speech-Based Multimodal Large Language Models 标签：#多模态模型 #语音情感识别 #医疗音频 #语音大模型 #音频理解 7.0/10 | 创新 1/2 | 严谨 1/1.5 | 实验 1.2/1.5 | 清晰 0.8/1 | 影响 1/1.5 | 开源 0.5/1.5 | 复现 0.3/0.5 | 工程 1.2/1.5 ✅ 7.0/10 | 前50% | 文档类型：方法研究 | 评分置信度：高 | #语音情感识别 | #多模态模型 | #医疗音频 #语音大模型 | arxiv 👥 作者与机构第一作者：Yingchao Huang (Saskatchewan Polytechnic, Faculty of Digital Innovation, Arts & Sciences) 通讯作者：Yingchao Huang (Saskatchewan Polytechnic, Faculty of Digital Innovation, Arts & Sciences) 作者列表：Yingchao Huang (Saskatchewan Polytechnic, Faculty of Digital Innovation, Arts & Sciences)、Xin Wang (Saskatchewan Polytechnic, Faculty of Digital Innovation, Arts & Sciences)、Yuhan Su (Hebei University, School of Basic Medical Sciences)、Shanshan Yao (University of Alberta, Department of Civil & Environmental Engineering and School of Mining & Petroleum Engineering) 💡 毒舌点评论文提出了一个基于开源音频和文本大模型的多模态框架用于认知障碍（CI）检测，并在跨数据集泛化上展示了良好的结果，这确实指向了临床部署的关键需求。然而，其核心方法缺乏新颖性，本质上是将现成的“黑盒”Qwen-Audio和Qwen模型作为特征提取器，进行简单的向量拼接和分类，缺乏对模型内部机制、融合策略或训练范式的深入探索。论文更像是一份优秀的工程应用报告或基准测试，而非提出了具有启发性的新研究范式。其宣称的“新SOTA”主要依赖于大型预训练模型强大的表征能力，而非方法设计的巧妙。 ...

语音/音乐/音频论文速递 2026-07-24

语音/音乐/音频论文速递 2026-07-24 共分析 18 篇论文 ⚡ 今日概览 📥 抓取 18 篇 → 🔬 深度分析完成 🏷️ 热门方向方向数量分布 #语音识别 4篇 ████ #语音交互 2篇 ██ #语音情感识别 2篇 ██ #多模态模型 1篇 █ #数据集 1篇 █ #语音伪造检测 1篇 █ #语音分离 1篇 █ #语音合成 1篇 █ 📊 论文评分排行榜（18 篇，按分数降序）排名论文总分分档文档类型主任务 🥇 DONDO: Open w2v-BERT Speech-Recognition Base Models for 8.1分前25% 系统技术报告 #语音识别 🥈 Designed Vocalizations Dataset: Sound-Designed Human an 7.9分前25% 数据集与基准 #语音转换 🥉 VibeVoice-ASR-BitNet Technical Report 7.8分前25% 系统技术报告 #语音识别 4. Faster IndexTTS-2: Accelerating and Streaming Autoregre 7.6分前25% 系统技术报告 #语音合成 5. From Read Speech to Spoken Digits: A Task-Specific Eval 7.5分前25% 应用研究 #语音识别 6. Instruct-FD: Can Your Full-Duplex Speech System Follow 7.2分前50% 数据集与基准 #语音交互 7. OPOD: On-Policy Omni Distillation 7.1分前50% 方法研究 #多模态模型 8. X\(^3\)-OPD: Distilling Reasoning into Large Audio-Langua 7.1分前50% 方法研究 #音频理解 9. Toward Interpretable Speech Deepfake Detection using Ar 7.0分前50% 方法研究 #语音伪造检测 10. Toward Generalizable Cognitive Impairment Detection wit 7.0分前50% 方法研究 #语音情感识别 11. Safeguards for Speech2Speech LLM-Assistants: A Case Stu 6.5分前50% 系统技术报告 #语音交互 12. Investigating Codec-Internal Latent Audio Watermarking 6.4分前50% 系统技术报告 #音频水印 13. TF-MossFormer: Integrating Convolution Gated Local-Glob 6.3分前50% 模型报告 #语音分离 14. Phonetic forced alignment for low-resource language var 6.2分前50% 方法研究 #语音识别 15. SCoPE: Shift-Aware Speaker-Conditioned Priors for Emoti 6.0分前50% 方法研究 #语音情感识别 16. Word meaning co-determines vowel-inherent spectral chan 5.9分前50% 方法研究 #语音属性识别 17. An Evaluation Framework for Structured Audio Captions V 5.3分后50% 数据集与基准 #数据集 18. Improving the performance of an ASV system using hybrid 5.0分后50% 方法研究 #说话人验证 📋 论文列表 🥇 DONDO: Open w2v-BERT Speech-Recognition Base Models for African Languages 8.1/10 | 创新 1.2/2 | 严谨 1/1.5 | 实验 1/1.5 | 清晰 0.8/1 | 影响 1.3/1.5 | 开源 1.2/1.5 | 复现 0.3/0.5 | 工程 1.3/1.5 ...

Comparing Spectrogram Front-Ends for Abnormal Heart-Sound Detection with a Convolutional Neural Network

📄 Comparing Spectrogram Front-Ends for Abnormal Heart-Sound Detection with a Convolutional Neural Network 标签：#音频分类 #CNN #医疗音频 #可解释性 #音频理解 5.7/10 | 创新 1.2/2 | 严谨 1/1.5 | 实验 0.8/1.5 | 清晰 0.6/1 | 影响 1/1.5 | 开源 0/1.5 | 复现 0.3/0.5 | 工程 0.8/1.5 📝 5.7/10 | 前50% | 文档类型：方法研究 | 评分置信度：高 | #音频分类 | #CNN | #医疗音频 #可解释性 | arxiv 👥 作者与机构第一作者：Abhinav Pala（圣克拉拉大学）通讯作者：未说明作者列表：Abhinav Pala（圣克拉拉大学）、Dhanush Pala（独立研究员） 💡 毒舌点评实验设计在控制变量（固定CNN、优化器、种子）方面是严谨的，Grad-CAM分析也增强了结论的可解释性。但论文存在严重问题：写作中充斥着大量拼写和语法错误（如“abonral”、“teh”、“arceitecture”、“teh”），这在正式投稿中是无法接受的。核心结论“多分辨率是最可靠前端”在仅测试两种简单CNN架构、且性能差异微小（~0.006 MAcc）的情况下得出，缺乏统计显著性检验的支撑，有过度解读之嫌。与PhysioNet 2016挑战赛冠军的对比缺乏公平的测试集划分依据。完全未开源代码、模型或数据，严重阻碍可复现性。 ...

语音/音乐/音频论文速递 2026-07-22

语音/音乐/音频论文速递 2026-07-22 共分析 20 篇论文 ⚡ 今日概览 📥 抓取 20 篇 → 🔬 深度分析完成 🏷️ 热门方向方向数量分布 #语音识别 5篇 █████ #语音合成 3篇 ███ #音频分类 2篇 ██ #基准测试 1篇 █ #语音交互 1篇 █ #语音分离 1篇 █ #语音增强 1篇 █ #语音情感识别 1篇 █ 📊 论文评分排行榜（20 篇，按分数降序）排名论文总分分档文档类型主任务 🥇 Content is What Remains: Invariant Speech Tokenization 9.2分前10% 方法研究 #语音编码 🥈 Fusion Embedding: A Unified Embedding Space for Text, I 8.6分前25% 系统技术报告 #音频检索 🥉 End-to-End Markov State Sequence Learning for Auditory 8.3分前25% 方法研究 #语音交互 4. Staged Depth-Pruning Distillation of a Flow-Matching Te 7.9分前25% 系统技术报告 #语音合成 5. Constrained CTC Decoding for Efficient Diacritic Restor 7.7分前25% 方法研究 #语音识别 6. Fretiq: Browser-Native Electric Guitar String Classific 7.5分前25% 系统技术报告 #音频分类 7. MeetingToM: Evaluating Multimodal LLMs on Theory-of-Min 7.2分前50% 数据集与基准 #基准测试 8. Transcription Policy as a Latent Variable: Activating C 7.1分前50% 方法研究 #语音识别 9. Benchmarking Human and Automatic Speech Recognition of 7.0分前50% 系统技术报告 #语音识别 10. A Situational Speech Synthesizer for Yoruba: System Des 6.7分前50% 系统技术报告 #语音合成 11. From a Multilingual Streaming ASR Backbone to Kenyan-La 6.5分前50% 系统技术报告 #语音识别 12. Towards Array-Invariant Speech Enhancement via Geometry 6.3分前50% 方法研究 #语音增强 13. Comparing Spectrogram Front-Ends for Abnormal Heart-Sou 5.7分前50% 方法研究 #音频分类 14. EmoEUS: Uncertainty Supervision for Multimodal Emotion 5.6分前50% 方法研究 #语音情感识别 15. Summary of DCASE 2026 Task 5: Audio-Dependent Question 5.4分后50% 数据集与基准 #音频理解 16. Towards a reproducible cross-venue method for quantifyi 5.4分后50% 方法研究 #音频质量评估 17. CS-ETS: Chaos-Inspired Samba-Based EMG-To-Speech Synthe 5.3分后50% 方法研究 #语音合成 18. Addressing Limited Data in Auditory Attention Decoding 5.1分后50% 应用研究 #语音分离 19. What the Waveform Knows: Transparent-first Speech and A 4.8分后50% 系统技术报告 #语音识别 20. Teleportation Game: Quantum Teleportation in Multi-Agen 4.4分后50% 系统技术报告 #音乐生成 📋 论文列表 🥇 Content is What Remains: Invariant Speech Tokenization from Parallel Utterances 9.2/10 | 创新 1.5/2 | 严谨 1.5/1.5 | 实验 1.3/1.5 | 清晰 0.9/1 | 影响 1.2/1.5 | 开源 1.2/1.5 | 复现 0.3/0.5 | 工程 1.3/1.5 ...

Multi-Level Privacy-Preserving Dementia Detection from Speech via Targeted Adversarial Obfuscation and Representation Learning

📄 Multi-Level Privacy-Preserving Dementia Detection from Speech via Targeted Adversarial Obfuscation and Representation Learning 标签：#语音属性识别 #对抗训练 #医疗音频 #音频理解 #Transformer 5.5/10 | 创新 1.2/2 | 严谨 1/1.5 | 实验 0.6/1.5 | 清晰 0.8/1 | 影响 0.8/1.5 | 开源 0/1.5 | 复现 0.3/0.5 | 工程 0.8/1.5 📝 5.5/10 | 前50% | 文档类型：方法研究 | 评分置信度：中 | #语音属性识别 | #对抗训练 | #医疗音频 #音频理解 | arxiv 👥 作者与机构第一作者：Henriette Flore Kenne（Richard A Miner School of Computer and Information Sciences, University of Massachusetts Lowell, Lowell, USA）通讯作者：未说明作者列表：Henriette Flore Kenne（Richard A Miner School of Computer and Information Sciences, University of Massachusetts Lowell, Lowell, USA）、Raphael Anaadumba（Richard A Miner School of Computer and Information Sciences, University of Massachusetts Lowell, Lowell, USA）、Mohammad Arif Ul Alam（Richard A Miner School of Computer and Information Sciences, University of Massachusetts Lowell, Lowell, USA） 💡 毒舌点评亮点在于提出多层次（信号+特征）隐私保护框架的视角颇为新颖，将对抗攻击转化为隐私保护工具的思路有启发性。短板是实验验证极其薄弱，所有结果仅基于单一（且经典）的DementiaBank数据集，缺乏跨数据集泛化性验证，且对所提方法的失败案例、边界条件及实际部署复杂度毫无讨论，使得论文更像一个初步的实验报告而非成熟的会议论文。 ...

Robust Summarization of Doctor-Patient Conversations: TalTech Systems for the Beyond Transcription Challenge

📄 Robust Summarization of Doctor-Patient Conversations: TalTech Systems for the Beyond Transcription Challenge 标签：#语音交互 #强化学习 #医疗音频 #语音大模型 #参数高效微调 6.3/10 | 创新 1.2/2 | 严谨 0.8/1.5 | 实验 1/1.5 | 清晰 0.8/1 | 影响 1/1.5 | 开源 0/1.5 | 复现 0.3/0.5 | 工程 1.2/1.5 ✅ 6.3/10 | 前50% | 文档类型：系统技术报告 | 评分置信度：高 | #语音交互 | #强化学习 | #医疗音频 #语音大模型 | arxiv 👥 作者与机构第一作者：Aivo Olev (TalTech, Estonia) 通讯作者：未说明作者列表：Aivo Olev (TalTech, Estonia)、Tanel Alumäe (TalTech, Estonia) 💡 毒舌点评亮点：论文展示了一套完整且在竞赛中双赛道获胜的端到端工程化流程——从基于WER的零样本模型筛选，到SFT+DAPO RL的微调策略，再到LLM-as-judge独立评估——为构建可靠的长音频临床文档生成系统提供了清晰且可复制的路线图。RL优化Concept F1未导致幻觉率上升或笔记过度冗长的实证结论具有重要参考价值；文本SFT到语音输入的跨模态迁移发现同样是一个值得关注的工程洞见。短板：1）研究深度存在明显的“实用主义”短板——对DAPO相比标准PPO在长序列生成上究竟在哪些具体案例中表现更好、token级损失聚合如何缓解奖励稀释，缺乏实证对比或案例分析；2）核心组件（微调后模型权重、训练代码、数据处理流水线）均未开源，严重限制了技术贡献的可验证性和社区传播；3）官方测试集排名指标第一名仅领先第二名0.003（0.543 vs 0.540），胜利并不稳固；4）域外鲁棒性结论建立在仅3条真实录音之上，本质上是轶事性质的。 ...

语音/音乐/音频论文速递 2026-07-21

语音/音乐/音频论文速递 2026-07-21 共分析 34 篇论文 ⚡ 今日概览 📥 抓取 34 篇 → 🔬 深度分析完成 🏷️ 热门方向方向数量分布 #语音情感识别 3篇 ███ #音频理解 3篇 ███ #语音伪造检测 2篇 ██ #语音翻译 2篇 ██ #说话人验证 2篇 ██ #音频事件检测 2篇 ██ #基准测试 1篇 █ #多模态模型 1篇 █ 📊 论文评分排行榜（34 篇，按分数降序）排名论文总分分档文档类型主任务 🥇 HARP: Harmonic-Aware Residual Partitioning for Neural A 9.6分前10% 方法研究 #音频编码 🥈 SALMONN-2: Advancing General-Purpose Hearing Abilities 9.4分前10% 模型报告 #音频理解 🥉 Pseudo-label distillation for discriminative anomalous 9.0分前10% 方法研究 #音频事件检测 4. ESCUCHA: A Spanish Speech Benchmark for Heterogeneous A 8.8分前25% 数据集与基准 #基准测试 5. RealDESED: A Real-World Domestic Sound Event Detection 7.9分前25% 数据集与基准 #音频事件检测 6. FlowSonic: Stable Zero-Shot Music Editing via High-Orde 7.9分前25% 方法研究 #音乐生成 7. Time-Frequency Consistency Learning for Robust Speech D 7.9分前25% 方法研究 #语音伪造检测 8. AMECxSV: Adaptive Metadata-Driven Embedding-Fusion Cali 7.8分前25% 方法研究 #说话人验证 9. X-Translator: A Real-Time Multilingual Speaker-Aware Sp 7.8分前25% 系统技术报告 #语音翻译 10. Dense-Sparse Dynamic Time Warping for Customizing Piano 7.8分前25% 系统技术报告 #音乐源分离 11. Do Speech Tokens Leak Voiceprints? Speaker Inversion At 7.7分前25% 方法研究 #说话人验证 12. Is One Score Enough? Assessing Singing Quality of Songs 7.6分前25% 方法研究 #音乐理解 13. FlashRT: Agent Harness for Guiding Agents to Deploy Rea 7.5分前25% 系统技术报告 #音视频生成 14. AI_LectureNote: A Retrospective Pilot Study of a Post-A 7.2分前50% 系统技术报告 #语音识别 15. Should Missing Modalities Always Be Necessary to Repair 7.0分前50% 方法研究 #多模态模型 16. Re-Sonance: A Dysarthric Asynchronous Real-Time Speech 6.9分前50% 系统技术报告 #语音转换 17. NABEATs: Noise-Aware Audio Representation Learning 6.7分前50% 方法研究 #音频理解 18. When to Use Extra Context: Evidence-Grounded Terminolog 6.7分前50% 系统技术报告 #语音翻译 19. How Reliable Are Multimodal Signals of Conversational S 6.6分前50% 方法研究 #鲁棒性 20. SSTMark: Robust Training-Free Semantic-Level Speech Wat 6.5分前50% 系统技术报告 #音频水印 21. The tttAI System for the TSA-ASR Task of the SmartGlass 6.5分前50% 系统技术报告 #说话人日志 22. Audio Cross Verification Using Dual Alignment Likelihoo 6.5分前50% 方法研究 #音频伪造检测 23. Component-Level Ensemble Fusion for Speech and Environm 6.4分前50% 系统技术报告 #语音伪造检测 24. Adaptive Momentum Enhanced Distributed Multichannel Act 6.3分前50% 应用研究 #音频理解 25. Robust Summarization of Doctor-Patient Conversations: T 6.3分前50% 系统技术报告 #语音交互 26. An Audio Language Model-Based Voice Concept Bottleneck 6.2分前50% 应用研究 #语音质量评估 27. FillGauss: Fine-Grained Filling-Aware Impact Sound Gene 6.2分前50% 方法研究 #音频生成 28. Harness TTS: Towards Context-Aware Expressive Speech Sy 6.2分前50% 方法研究 #语音合成 29. Modeling turn-taking with distant viewing: investigatin 6.2分前50% 系统技术报告 #音视频 30. Efficient Audio-Visual Event Recognition via Knowledge 5.8分前50% 方法研究 #音视频理解 31. Multi-Level Privacy-Preserving Dementia Detection from 5.5分前50% 方法研究 #语音属性识别 32. Explainable Lightweight Compact Deep Models for Speech 5.4分后50% 方法研究 #语音情感识别 33. Team RAS in 11th ABAW Competition: Multimodal Ambivalen 5.3分后50% 系统技术报告 #语音情感识别 34. EII-SCL: Harnessing Emotional Inertia for Multimodal Em 5.2分后50% 方法研究 #语音情感识别 📋 论文列表 🥇 HARP: Harmonic-Aware Residual Partitioning for Neural Audio Codecs 9.6/10 | 创新 1.4/2 | 严谨 1.3/1.5 | 实验 1.4/1.5 | 清晰 1/1 | 影响 1.2/1.5 | 开源 1.5/1.5 | 复现 0.5/0.5 | 工程 1.3/1.5 ...

Transcript-Free Lightweight Detection of Alzheimer's Disease from Spontaneous Speech Using Handcrafted MFCC-Dominant Acoustic Biomarkers

📄 Transcript-Free Lightweight Detection of Alzheimer’s Disease from Spontaneous Speech Using Handcrafted MFCC-Dominant Acoustic Biomarkers 标签：#语音属性识别 #医疗音频 #可解释性 #音频理解 #Transformer 4.9/10 | 创新 0.8/2 | 严谨 1/1.5 | 实验 0.7/1.5 | 清晰 0.8/1 | 影响 0.6/1.5 | 开源 0/1.5 | 复现 0.5/0.5 | 工程 0.5/1.5 📝 4.9/10 | 后50% | 文档类型：方法研究 | 评分置信度：高 | #语音属性识别 | #医疗音频 | #可解释性 #音频理解 | arxiv 👥 作者与机构第一作者：Rashin Gholijani Farahani（伊斯兰阿扎德大学卡拉杰分校计算机工程系）通讯作者：Azam Bastanfard（伊斯兰阿扎德大学卡拉杰分校计算机工程系）作者列表：Rashin Gholijani Farahani（伊斯兰阿扎德大学卡拉杰分校计算机工程系）、Azam Bastanfard（伊斯兰阿扎德大学卡拉杰分校计算机工程系） 💡 毒舌点评本文的出发点值得肯定，试图在语音AD检测领域建立一个基于严格评估协议的、可复现的音频基线。但其核心缺陷在于性能平庸（AUC~0.67），与随机猜测的差距有限，极大地削弱了其作为“有实用价值的基线”的主张。在深度学习成为主流的当下，论文完全停留在传统特征+SVM的范式，创新性止步于流程设计和实证分析，缺乏方法论突破。虽然作者坦率承认了探索性实验的数据泄露问题，但未能解决主实验在如此小数据集上的统计效力问题，结论的可靠性存疑。 ...

语音/音乐/音频论文速递 2026-07-14

语音/音乐/音频论文速递 2026-07-14 共分析 53 篇论文 ⚡ 今日概览 📥 抓取 53 篇 → 🔬 深度分析完成 🏷️ 热门方向方向数量分布 #语音识别 5篇 █████ #音乐生成 5篇 █████ #音频理解 5篇 █████ #音频生成 4篇 ████ #多模态模型 3篇 ███ #语音伪造检测 3篇 ███ #语音分离 3篇 ███ #语音质量评估 3篇 ███ 📊 论文评分排行榜（53 篇，按分数降序）排名论文总分分档文档类型主任务 🥇 Simple Features and Honest Calibration for Ambivalence 9.0分前10% 系统技术报告 #模型集成 🥈 PC-Mix: Partial-Component Audio Spoofing Detection unde 8.9分前25% 数据集与基准 #音频伪造检测 🥉 BeatEdit: Symbolic Music Generation as Explicit Editing 8.9分前25% 方法研究 #音乐生成 4. CHARM: Charge Calibration and Acoustic Rescue for LLM-b 8.8分前25% 方法研究 #提示学习 5. FdAudio: MeanFlow-Anchored Fréchet-Distance Post-Traini 8.6分前25% 方法研究 #音频生成 6. Evaluating SSL and ViViT Architectures for Cross-Corpus 8.3分前25% 系统技术报告 #语音质量评估 7. ECHOv2: Two-Level Band-Splitting Representation Learnin 8.2分前25% 方法研究 #音频事件检测 8. GigaAM Multilingual: Foundation Model for Underrepresen 8.1分前25% 系统技术报告 #语音识别 9. Evidence Subspace Projection: Measuring How Much Eviden 8.1分前25% 方法研究 #语音伪造检测 10. VoxENES 2026: Benchmarking Generalization of Speech Spo 8.1分前25% 数据集与基准 #语音伪造检测 11. WaveNet-Style Guitar Amplifier Model Pruning for Real-T 8.0分前25% 系统技术报告 #音频生成 12. TabPFN beyond Tabular Data: Calibration and Accuracy on 7.9分前25% 应用研究 #音频分类 13. ARIMA: Reconstruction-Grounded Predictive Representatio 7.7分前25% 方法研究 #自监督学习 14. Qwen-Audio-VAE Technical Report 7.7分前25% 系统技术报告 #音频编码 15. Local Multimodal Music Alignment from Global Supervisio 7.6分前25% 方法研究 #对比学习 16. MeloBottleneck: Self-Supervised Melody Skeleton Extract 7.5分前25% 方法研究 #音乐理解 17. Dance to Music Generation leveraging Pre-training with 7.5分前25% 方法研究 #音乐生成 18. GigaChat Audio: Time-aware Large Audio Language Model 7.4分前50% 系统技术报告 #音频理解 19. Difference-Driven Gating: Adaptive Feature Fusion for U 7.4分前50% 方法研究 #语音分离 20. BackgroundMellow: A Multi-Modal Cohesive Framework for 7.4分前50% 系统技术报告 #音频生成 21. Qwen-Music Technical Report 7.4分前50% 系统技术报告 #音乐生成 22. CoFi-Lite: Pushing the Limits of Ultra-Lightweight Spee 7.3分前50% 方法研究 #语音增强 23. MusicMark: A Robust Generative Watermarking Framework f 7.3分前50% 方法研究 #音频水印 24. Unified Gradient Projection: Language-Balanced Continua 7.2分前50% 方法研究 #语音识别 25. Data Augmentation for L2 English Speaking Assessment us 7.0分前50% 方法研究 #语音质量评估 26. A Production-Oriented Framework for Evaluation of SFX G 6.9分前50% 系统技术报告 #音频生成 27. Learn2Chat: Rethinking Dyadic Talking Heads via Interac 6.8分前50% 方法研究 #音视频生成 28. Tight-Frame Reconstruction for Acoustic Intensity Estim 6.8分前50% 理论研究 #声源定位 29. The SonicAGI System for the REAL-TSE Challenge 6.8分前50% 系统技术报告 #语音分离 30. Anysynth:Zero-Shot Instrument Cloning via In-Context Le 6.8分前50% 方法研究 #音乐生成 31. Where Speech Enhancement Hurts Recognition: An Inferenc 6.7分前50% 方法研究 #语音识别 32. Teaching Speech Enhancement Models to Sing: Domain Adap 6.7分前50% 方法研究 #音乐源分离 33. What You Train Is What You Get: Gender Bias, Training C 6.6分前50% 应用研究 #语音伪造检测 34. Listen to the Features: Voice Anonymization Driven by C 6.5分前50% 方法研究 #语音克隆 35. Efficiently Adapting Spoken Language Models for the Sin 6.5分前50% 系统技术报告 #语音交互 36. Which Languages Transfer Best to Warlpiri? A Similarity 6.5分前50% 应用研究 #语音识别 37. Encoder-Side Neuron Identification and Amplification fo 6.4分前50% 方法研究 #音频理解 38. Breaking the Quality–Intelligibility Trade-off in Stre 6.3分前50% 方法研究 #语音分离 39. An Objective Intelligibility Metric Evaluation on Spani 6.2分前50% 数据集与基准 #语音质量评估 40. Hearing Like Humans? Sound Symbolism and Perceptual Ali 6.1分前50% 方法研究 #多模态模型 41. Anamnesis: An Open-Source Platform for Large-Scale Back 6.1分前50% 系统技术报告 #提示学习 42. LOGOS: A Living Logic for AI Agent Teams That Evolve Wi 6.1分前50% 系统技术报告 #多模态模型 43. Verifier-Guided Twelve-Tone Composition: A Generate-Ver 6.0分前50% 系统技术报告 #音乐生成 44. MRUF: Multi-granularity Routing with Uncertainty-Aware 5.9分前50% 方法研究 #多模态模型 45. Omni-Decision: A Progressive Evidence-State Agent Syste 5.9分前50% 系统技术报告 #音频理解 46. Graph Representation of RaagBase: A Unique Dataset for 5.7分前50% 数据集与基准 #音乐理解 47. Synchronized Three-Dimensional Vocal-Tract Motion for S 5.7分前50% 系统技术报告 #语音合成 48. LightMem-Ego: Your AI Memory for Everyday Life 5.6分前50% 系统技术报告 #流式处理 49. Casting Everything to Online API Services? A Survey of 5.4分后50% 综述 #语音识别 50. A Closed-Form Noise-Sensitivity Asymmetry for Causal Br 5.3分后50% 理论研究 #音频理解 51. Semantic Sampling via Learnable Observation Front Ends 5.1分后50% 方法研究 #音频理解 52. Transcript-Free Lightweight Detection of Alzheimer’s Di 4.9分后50% 方法研究 #语音属性识别 53. Perceived Annoyance in Multi-source Electric Vehicle AV 3.5分后50% 应用研究 #音频质量评估 📋 论文列表 🥇 Simple Features and Honest Calibration for Ambivalence and Hesitancy Recognition in Video 9.0/10 | 创新 1.2/2 | 严谨 1.4/1.5 | 实验 1.5/1.5 | 清晰 0.9/1 | 影响 0.5/1.5 | 开源 1.5/1.5 | 复现 0.5/0.5 | 工程 1.5/1.5 ...

Multimodal Digital Biomarker for Asthma: Complementary Roles of Vocal, Clinical and Demographic Factors

📄 Multimodal Digital Biomarker for Asthma: Complementary Roles of Vocal, Clinical and Demographic Factors 标签：#Transformer #多模态模型 #医疗音频 #可解释性 #自监督学习 5.3/10 | 创新 1.2/2 | 严谨 1/1.5 | 实验 0.8/1.5 | 清晰 0.8/1 | 影响 0.6/1.5 | 开源 0/1.5 | 复现 0.3/0.5 | 工程 0.6/1.5 📝 5.3/10 | 后50% | 文档类型：应用研究 | 评分置信度：高 | #多模态模型 | #Transformer | #医疗音频 #可解释性 | arxiv 👥 作者与机构第一作者：Vladimir Despotovic (Bioinformatics & AI, Department of Medical Informatics, Luxembourg Institute of Health) 通讯作者：Guy Fagherazzi (Deep Digital Phenotyping, Department of Precision Health, Luxembourg Institute of Health) 作者列表：Vladimir Despotovic (Bioinformatics & AI, Department of Medical Informatics, Luxembourg Institute of Health)、Milena Despotovic (Translational Medicine Operations Hub, Luxembourg Institute of Health)、Abir Elbeji (Multi-Omics Data Science, Department of Cancer Research, Luxembourg Institute of Health)、Petr V. Nazarov (Multi-Omics Data Science, Department of Cancer Research, Luxembourg Institute of Health)、Guy Fagherazzi (Deep Digital Phenotyping, Department of Precision Health, Luxembourg Institute of Health) 💡 毒舌点评论文的亮点在于其临床导向的问题定义和对可解释性的探索，特别是通过分析门控权重与症状严重度的相关性，为模型的决策逻辑提供了一层临床意义。然而，其核心短板在于整体创新性不足，更像是一个针对特定临床问题的有效工程应用，而非方法论突破。作者声称其贡献之一是引入MoE架构于临床多模态数据，但这在通用临床预测领域已有先例，论文未能与之充分区分。最关键的是，在强调“可扩展筛查”的同时，其核心代码、模型和数据均未开源，这严重削弱了其学术贡献的可复用性和实际影响力，使得整篇工作停留在了概念验证阶段。 ...