最优传输 on 语音/音频论文速递

最优传输 on 语音/音频论文速递 https://nanless.github.io/audio-paper-digest-blog/tags/%E6%9C%80%E4%BC%98%E4%BC%A0%E8%BE%93/ Recent content in 最优传输 on 语音/音频论文速递 Hugo zh-cn Wed, 29 Apr 2026 00:00:00 +0000 A Distribution Matching Approach to Neural Piano Transcription with Optimal Transport https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-distribution-matching-approach-to-neural-piano/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-distribution-matching-approach-to-neural-piano/ 音乐转录 | 7.0/10 BEST-STD 2.0: Balanced and Efficient Speech Tokenizer for Spoken Term Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-best-std-20-balanced-and-efficient-speech/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-best-std-20-balanced-and-efficient-speech/ 音频检索 | 7.5/10 Beyond Mapping: Domain-Invariant Representations via Spectral Embedding of Optimal Transport Plans https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-beyond-mapping-domain-invariant-representations/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-beyond-mapping-domain-invariant-representations/ 领域适应 | 7.5/10 MCI-OTFusion: A Multimodal Model for MCI Detection and Cognitive Score Prediction https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mci-otfusion-a-multimodal-model-for-mci-detection/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mci-otfusion-a-multimodal-model-for-mci-detection/ 轻度认知障碍检测 | 6.5/10 Prosody as Supervision: Bridging the Non-Verbal--Verbal for Multilingual Speech Emotion Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-prosody-as-supervision-bridging-the-non-verbal/ Tue, 21 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-prosody-as-supervision-bridging-the-non-verbal/ 这篇论文旨在解决低资源多语言语音情感识别（SER）中标注数据稀缺的核心瓶颈。作者提出了一个颠覆性的范式：**将SER重新定义为无监督的“非言语到言语”迁移问题**。其核心假设是，非言语发声（如笑、哭）中蕴含的韵律情感线索比言语更纯粹、更跨语言，因此可以作为更好的监督源。为此，作者设计了**NOVA-