少样本 on 语音/音频论文速递

少样本 on 语音/音频论文速递 https://nanless.github.io/audio-paper-digest-blog/tags/%E5%B0%91%E6%A0%B7%E6%9C%AC/ Recent content in 少样本 on 语音/音频论文速递 Hugo zh-cn Wed, 29 Apr 2026 00:00:00 +0000 A Bayesian Approach to Singing Skill Evaluation Using Semitone Pitch Histogram and MCMC-Based Generated Quantities https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-bayesian-approach-to-singing-skill-evaluation/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-bayesian-approach-to-singing-skill-evaluation/ 音乐理解 | 7.0/10 Denoising Of Stochastic Ray Tracing Room Impulse Responses https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-denoising-of-stochastic-ray-tracing-room-impulse/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-denoising-of-stochastic-ray-tracing-room-impulse/ 空间音频 | 7.5/10 EdgeSpot: Efficient and High-Performance Few-Shot Model for Keyword Spotting https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-edgespot-efficient-and-high-performance-few-shot/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-edgespot-efficient-and-high-performance-few-shot/ 语音活动检测 | 7.5/10 EMG-to-Speech with Fewer Channels https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-emg-to-speech-with-fewer-channels/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-emg-to-speech-with-fewer-channels/ 语音合成 | 7.5/10 Improving Active Learning for Melody Estimation by Disentangling Uncertainties https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-improving-active-learning-for-melody-estimation/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-improving-active-learning-for-melody-estimation/ 音乐信息检索 | 7.5/10 LLM-Based Post-ASR Error Correction for Disordered Speech https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-llm-based-post-asr-error-correction-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-llm-based-post-asr-error-correction-for/ 语音识别 | 7.5/10 Low-Resource Speech-Based Early Alzheimers Detection via Cross-Lingual and Few-Shot Transfer Learning https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-low-resource-speech-based-early-alzheimers/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-low-resource-speech-based-early-alzheimers/ 语音生物标志物 | 7.5/10 Monitoring exposure-length variations in submarine power cables using distributed fiber-optic sensing https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-monitoring-exposure-length-variations-in/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-monitoring-exposure-length-variations-in/ 音频事件检测 | 6.5/10 Multimodal Fusion-Based IPCLIP Network for Mixed Reality Surgical Assistance https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multimodal-fusion-based-ipclip-network-for-mixed/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multimodal-fusion-based-ipclip-network-for-mixed/ 多模态模型 | 6.5/10 QFOCUS: Controllable Synthesis for Automated Speech Stress Editing to Deliver Human-Like Emphatic Intent https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-qfocus-controllable-synthesis-for-automated/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-qfocus-controllable-synthesis-for-automated/ 语音合成 | 7.5/10 Separate this, and all of these Things Around It: Music Source Separation Via Hyperellipsoidal Queries https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-separate-this-and-all-of-these-things-around-it/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-separate-this-and-all-of-these-things-around-it/ 音乐分离 | 7.0/10 Stress Prediction from Temporal Emotion Trajectories in Clinical Patient-Physician Conversations https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-stress-prediction-from-temporal-emotion/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-stress-prediction-from-temporal-emotion/ 语音情感识别 | 7.0/10 Synthetic Data Domain Adaptation for ASR via LLM-Based Text and Phonetic Respelling Augmentation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-synthetic-data-domain-adaptation-for-asr-via-llm/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-synthetic-data-domain-adaptation-for-asr-via-llm/ 语音识别 | 8.0/10 Variational Low-Rank Adaptation for Personalized Impaired Speech Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-variational-low-rank-adaptation-for-personalized/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-variational-low-rank-adaptation-for-personalized/ 语音识别 | 7.5/10 Opening the Design Space: Two Years of Performance with Intelligent Musical Instruments https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-opening-the-design-space-two-years-of-performance/ Tue, 28 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-opening-the-design-space-two-years-of-performance/ 音乐生成 | 6.5/10 ICLAD: In-Context Learning with Comparison-Guidance for Audio Deepfake Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-iclad-in-context-learning-with-comparison/ Tue, 21 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-iclad-in-context-learning-with-comparison/ 本文针对音频深度伪造检测模型在真实场景（in-the-wild）中泛化能力差的核心问题，提出了一种名为ICLAD的全新范式。该框架利用音频语言模型（ALM）的上下文学习能力，实现了无需训练的快速适应。其核心是创新的**成对比较推理**策略：在离线阶段，引导ALM为每个样本同时生成“真实”和“伪造”的 MoVE: Translating Laughter and Tears via Mixture of Vocalization Experts in Speech-to-Speech Translation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-move-translating-laughter-and-tears-via-mixture/ Tue, 21 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-move-translating-laughter-and-tears-via-mixture/ 这篇论文旨在解决语音到语音翻译（S2ST）系统普遍缺失非语言声音（如笑声、哭泣）和情感韵律的问题，这严重限制了跨语言交流的自然度和语用准确性。作者提出了三大贡献：1) 一个**可扩展的表达性数据合成管道**，能自动生成高质量、带情感标注的S2ST训练对，克服了数据稀缺瓶颈；2) **MoVE（混合声 Few-Shot and Pseudo-Label Guided Speech Quality Evaluation with Large Language Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-few-shot-and-pseudo-label-guided-speech-quality/ Sun, 19 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-few-shot-and-pseudo-label-guided-speech-quality/ 本文旨在解决非侵入式语音质量评估在标注数据有限场景下的性能瓶颈。作者提出了GatherMOS框架，其核心是将大语言模型（如GPT-5）作为一个元评估器，通过精心设计的文本提示，融合多类异构信号：包括手 SpeakerRPL v2: Robust Open-set Speaker Identification through Enhanced Few-shot Foundation Tuning and Model Fusion https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-speakerrpl-v2-robust-open-set-speaker/ Sun, 19 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-speakerrpl-v2-robust-open-set-speaker/ 本文旨在解决开放集说话人识别中的鲁棒性问题，即系统在仅有少量目标说话人注册样本的情况下，需同时准确识别已知说话人并可靠拒识未知说话人。作者在先前SpeakerRPL V1框架基础上提出了三项关键改进：